By: Flack

Flack — Tue, 09 Dec 2008 02:01:01 +0000

I also noticed that there is no link to digitize.textfiles.com from the main textfiles.com page. If this is intentional, please ignore.

By: Mike

Mike — Fri, 05 Dec 2008 02:05:20 +0000

Hi Jason,

I am in a class with Colin McEnroe at Trin. College in Hartford, CT. and he suggested I get in touch with you to do an interview. Is there any way you can email me? I can’t find your contact info anywhere. I’d love to ask you some questions.

Thanks,
Mike

By: ross

ross — Wed, 03 Dec 2008 07:02:56 +0000

i would kill for a secretary

By: Rowan Lipkovits

Rowan Lipkovits — Tue, 02 Dec 2008 23:38:36 +0000

What you need is an intern who’s a library student of information technologies. But I suppose you already knew that.

By: ross

ross — Tue, 02 Dec 2008 17:47:30 +0000

HELLO

i have much more computer-related stuff that i didn’t send because uhh, it had pretty pictures.

if i scan this stuff, could i submit it to digitize? what are your guidelines? are you gonna put this stuff on archive.orG?

ross

By: Michael Kohne

Michael Kohne — Tue, 02 Dec 2008 17:24:24 +0000

OK, so you need some easy way to categorize things. Might I suggest a two-level system to start with? Very simple high-level categories (‘Advertising’, ‘Magazine’, ‘Manual’, etc), then further categorized by date or perhaps just year. If you keep the number of top level categories under control, then you can very quickly categorize something.

Obviously, that puts some limitations on letting folks find things they are interested in (or, even better, stuff they didn’t know they were interested in). I therefore suggest adding OCR’d text to each item’s page. That way Google will be able to get at the content of all the scans and users interested in, for instance, chess, will find stuff.

The ultimate problem is that there are an infinite number of possible categories that an item could be in. You could introduce some kind of keyword or tagging system, but it would be less overhead on your part to just OCR the things (and yes, the OCR is going to get it wrong sometimes. Don’t let that freak you out. And don’t make getting the OCR perfect a pre-condition of getting it up there.) and then let Google sort it out from there.

You could even (if you wanted) put search boxes on your site that simply redirected to a Google search using the site keyword to restrict to your site. This would have the bad effect of the user’s seeing ads (since Google runs ads on it’s site), but would have the good effect of not having to take a lot of your time to implement.

You might also consider recruiting a set of elves to help you. Assuming you could find people you trust, you could use some kind of CMS to allow them to fool with the categorizations and proofread the OCRs.

It also might be interesting (like you need one more thing to do) to take a look at the conversations that Joel Spolsky and Jeff Atwood had while designing the site ‘stackoverflow.com’. They had a podcast and blog (blog.stackoverflow.com) where they went on about a lot of things, but along the way talked about their rationale for how they organized the site. I don’t remember which podcasts where involved, but each podcast episode has a link to a transcript wiki where a transcription of the podcast is posted.

Good luck!

Comments on: Awesome! Unsurpassed!

By: Flack

By: Mike

By: ross

By: Rowan Lipkovits

By: ross

By: Michael Kohne