ASCII by Jason Scott

Jason Scott's Weblog

Awesome! Unsurpassed! —

One of the side effects of the massive cleaning and sorting effort in my office is that tons of to-dos are now flying up to the surface, able to be addressed as time permits. I still don’t have a lot of time but now instead of glancing over a sea of scary paper and boxes, I’m splitting apart small piles and finding a lot that can be handled in minutes and then filed away in a more permanent (and traceable) fashion.

Hence the scanner is now back and in a part of the office where I can set stuff off to scanning and pull from piles of older computer material that I know very few other people have. Among this, for an example, is the above single-page flyer, which is for a chess program called SFINKS. I chose it for several reasons, not the least of which is the addition of the words “AWESOME!” and “UNSURPASSED!” in the corners of the flyer. That is one kick-ass chess program!

Anyway, this flyer, along with a lot of other stuff, is on my sub-site called digitize.textfiles.com, a scanned-historic-images site I put up in respond to Benj “Watermark” Edwards and a series of marked-up scans he was putting up of older material. I think we sort of reached something like a truce a while ago and he’s certainly been pretty productive, even moving out into the realm of nostalgia-baked column writer. And about once a week, he puts up a new ad or scan or the like, using a watermark/declaration system I find quite comfortable and compatible with the future. Meanwhile, I have not kept that up with Digitize.

Part of that, of course, is that I’m making a movie. But another part is that I’m starting to run into a nomenclature/directory issue. The catalog of scanned items is already starting to grow out of control at a mere 143 exhibits. What will it be when I add the thousands of items I still have around? Answer: utterly untenable unless I do something.

This is the problem with adding new amazing crap. You add too much, and people can’t find it as easy or they glaze over. Google added millions of images from the Life archive. MILLIONS. That’s more than any reasonable person could hope to ever search for the rest of their lives. (And still miss shots like this amazing arcade in Times Square.)

Since I end up doing a lot of scanning and adding of items myself, I’ve been kind of adding my own idea of “interesting” where “interesting” is either a funny, insightful, weird, bizarre, rare, or charming scan. This is hugely subjective. I then try to give as much info as I can, and then go to the next piece.

This does not scale.

This is a “problem”, one that I will address. But as a result the additions to digitize will be rather slow for a while, especially while the movie is going on. Then we’ll start seeing something or other happen, not sure what. Until then, go read up on how great this chess program is (according to the ad for the chess program).


Categorised as: Uncategorized

Comments are disabled on this post


6 Comments

  1. Michael Kohne says:

    OK, so you need some easy way to categorize things. Might I suggest a two-level system to start with? Very simple high-level categories (‘Advertising’, ‘Magazine’, ‘Manual’, etc), then further categorized by date or perhaps just year. If you keep the number of top level categories under control, then you can very quickly categorize something.

    Obviously, that puts some limitations on letting folks find things they are interested in (or, even better, stuff they didn’t know they were interested in). I therefore suggest adding OCR’d text to each item’s page. That way Google will be able to get at the content of all the scans and users interested in, for instance, chess, will find stuff.

    The ultimate problem is that there are an infinite number of possible categories that an item could be in. You could introduce some kind of keyword or tagging system, but it would be less overhead on your part to just OCR the things (and yes, the OCR is going to get it wrong sometimes. Don’t let that freak you out. And don’t make getting the OCR perfect a pre-condition of getting it up there.) and then let Google sort it out from there.

    You could even (if you wanted) put search boxes on your site that simply redirected to a Google search using the site keyword to restrict to your site. This would have the bad effect of the user’s seeing ads (since Google runs ads on it’s site), but would have the good effect of not having to take a lot of your time to implement.

    You might also consider recruiting a set of elves to help you. Assuming you could find people you trust, you could use some kind of CMS to allow them to fool with the categorizations and proofread the OCRs.

    It also might be interesting (like you need one more thing to do) to take a look at the conversations that Joel Spolsky and Jeff Atwood had while designing the site ‘stackoverflow.com’. They had a podcast and blog (blog.stackoverflow.com) where they went on about a lot of things, but along the way talked about their rationale for how they organized the site. I don’t remember which podcasts where involved, but each podcast episode has a link to a transcript wiki where a transcription of the podcast is posted.

    Good luck!

  2. ross says:

    HELLO

    i have much more computer-related stuff that i didn’t send because uhh, it had pretty pictures.

    if i scan this stuff, could i submit it to digitize? what are your guidelines? are you gonna put this stuff on archive.orG?

    ross

  3. What you need is an intern who’s a library student of information technologies. But I suppose you already knew that.

  4. ross says:

    i would kill for a secretary

  5. Mike says:

    Hi Jason,

    I am in a class with Colin McEnroe at Trin. College in Hartford, CT. and he suggested I get in touch with you to do an interview. Is there any way you can email me? I can’t find your contact info anywhere. I’d love to ask you some questions.

    Thanks,
    Mike

  6. Flack says:

    I also noticed that there is no link to digitize.textfiles.com from the main textfiles.com page. If this is intentional, please ignore.