Hi Jason,
I have a question for you, I’m hoping you can help me decide what to do.
I have a lot of vintage computer magazines and books. Not Information Cube level “lots”, but still, boxes and boxes of them. A lot of this stuff is destined for AtariMagazines.com and AtariArchives.org. Some of it, I don’t have permission to post but it’s still interesting and good stuff, and maybe I’ll have permission one day.
2.5 years ago I moved all this stuff from my house to my new office. I unpacked some of it, never unpacked a lot of it. Now, my family and are planning to move from northern California to Portland, Oregon this summer. Which means moving all of these boxes of magazines once again.
Which brings me to my question. I have a great duplex scanner. Two actually. Should I just cut the bindings off these magazines and digitize them all? Should I just decide that the content is the important part, and not fetishize the objects themselves? Right now, they’re hard to access, the information is impossible to search, etc. Or is it better to have the actual *thing*?
If you could digitize everything in the info cube, but destroy the originals in the process, would you?
And what about particularly rare mags — early issues that are hard to find, expensive on ebay?
My feeling is that for the stuff that isn’t extremely rare, I should just digitize it, bringing it one step closer to OCR and getting it online. . . or at least easily searchable on a hard drive. Then toss the original paper and move on. But I would like a sanity check from you on this.
One related thing that may interest you. I feel that OCRing these magazines is critical – something I have been doing for years at AtariMagazines.com and AtariArchives.org. But as I’m sure you know, OCR alone is not that great, you need human proofreaders to clean up the text if it’s going to be online. So I am creating a tool that will OCR pages, then send the OCRd text and images of the corresponding pages to Amazon Mechanical Turk, to have actual human people proofread and correct the text. It will allow me to get a LOT of high-quality, human-proofread OCR quickly. Basically it will be like Project Gutenberg’s Distributed Proofreaders project, but it could be used with any text, not just PD text. (In addition to me using it for old computer magazines, it will be a web-based service that businesses could use.) Is that tool something that would help you in your preservation efforts?
Thanks for your thoughts on this stuff,
Kevin
Hey there, Kevin. Thanks for thinking of me. Sorry it took so long to respond to this.
I am sorry that this strange, weird little world of computer and technological history has to experience the same issue as so many other realms do – that of doing terrible things in the name of good. I shouldn’t be surprised this is the case. But one could always hope that just as computers seem to be the tool to end all tools, the machine that makes machines that make even better machines, there might have been a chance it wouldn’t fall prey to the same Faustian bargains extant in a thousand other situations. But there we have it.
In the case of documents and materials that are perfect bound, that is, attached by adhesive like so:


Well, with current scanning technology the best way to absolutely get the most effective scan/snapshot of the material is to destroy the binding. Just break that poor thing apart, scan it flat in a nice scanner, and then end up with a broken, used, impossible-to-keep pile of paper.
Now, don’t get me wrong – there’s been an enormous amount of effort applied out there to deal with the binding-being-broken issue. For example, some scanners of particularly rare books take a head-on photo of a flat book page and then use all sorts of mathematical trickery to calculate the curvature of the pages from the binding to flatten them out. Google does it when they scan books for their massive blorb of content. A lot of really smart people are working on that problem, and if you’ve never heard of Unpaper before now… well, you’re welcome.
But at the end of the day, in the currency of the present, the absolute best material to have would be a series of paper sheets and scan them flat, at a nice and high resolution. And if you have something that you can get into that form, the resulting scans will be much better – but again, you’ll have destroyed the source material in the process. Wrecked it.
This is a huge internal debate for me. Huge. As big as it gets.
After much thought, I came up with the following rule-set for the day I destroy something to save it.
IF I have a document or paper set that requires some level of destruction to scan properly AND IF I have three copies of it AND IF there is no currently-available digital version of the document AND IF there is a call or clamor for this document set THEN AND ONLY THEN I will split the binding and scan at a very high resolution and additionally apply OCR and other modern-day miracles to the resulting document so that the resulting item is, if not greater than the original, more useful to the world.
This is, as you might imagine, an impossibly high standard. So high, I haven’t had anything pass it yet.
I’ve certainly embarked on large scanning projects before – for a year I scanned over 7000 pages of documents from Steve Meretzky’s collection, a scanning project that saved a lot of time for the archive that eventually took those documents over. I also scanned these items at an insane rate, 800 dpi, meaning that you could see this level of detail in the final images:

In his case, though, I didn’t have to worry about hurting these one-of-a-kind copies of Meretzky’s notes and papers – they were all in a binder and they could be brought out, scanned, and put back. I was lucky. And, by extension, a lot of people are lucky. (There’s still plans to put all these scans on archive.org – ideally in a few months.)
Sitting in my cube are entire collections of magazines, entire runs of all the issues that ever came out. The IF there is a call or clamor part of the above statement usually kicks in first and I haven’t scanned them in. For example, if you want an entire run of a newsletter dedicated to the typesetting software TeX; well… I got a box I can show you. But it just hasn’t seemed justified to go and scan that all in, in hopes someone will find it interesting. I’ve been focusing on other things as of late.
And then every once in a while, I discover someone has embarked on a project that I would normally be doing if my ruleset had been achieved, but since they have a smaller ruleset, they got there quicker. Such it was, recently, that it turns out someone is scanning in a bunch of issues of BYTE magazine.
Here’s the thread in question. The scanning fellow shows up regularly and points to a multi-hundred-megabyte PDF file of an issue of BYTE magazine, including a nice introduction and overview of the contents, and the resulting downloaded file is easy to read, browse, and enjoy. It is very, very hard to look this gift horse in the mouth and find faults – I mean, this guy is scanning hundreds of pages, very quickly, and providing them for free. But here we go, finding cavities anyway…
Somewhere in the middle of the love fest that is this thread, someone points out that one of the pages is scanned improperly in the PDF, and a page is missing. The response from the scanning gentleman, frankly, chills me to the bone:
“I will fish the magazine out of the garbage and get those fixed.”
So after scanning these magazines, he immediately trashes them. Whoop, right into the bin. Now, the PDFs are great, but they’re not exactly excellent. The resolution is sub-par (so you can’t easily read many of the ads or look at details) and any printing or close-up viewing of the page is blurry indeed. But that’s it, they’re in the trash and gone.
Somewhere along the line, I convinced myself of this way of thinking: Well, instead of being a guy who owns these and throws them out, here’s a guy who scans them and puts them online, and then throws them out. This is the same internal gymnastics that makes it possible for me to vaguely respect all those boring-ass condo villages in the suburbs, because at the very least putting all those people in tightly-packed shitboxes sure beats that same amount of people taking up a hundred times the space with houses sporting massive useless lawns. The upside, you see.
But this falls apart quickly when one investigates what happens next: people begin sending the scanner/destructor their own copies of BYTE. Now he’s not just destroying his collection, he’s unwittingly convinced other people to give up their collections to the cause, destroying even more copies along the way, copies that can never be scanned at a better resolution, or given a chance to be cleaned up from said higher resolutions before being turned into a standards compliant and quick-as-lightning PDF. (With an archive of the original TIFFs around, as well.)
I have to stress – there’s no evil at work here. Scanner-destroyer is donating a lot of time for this project. People are benefiting from this effort, as they can read issues of BYTE that they never read or heard of when they were younger. BYTE is a world-class magazine in the 1970s and 1980s – as good as it gets in a technical realm. It’s a pleasure to read and hours of thoughtfulness afterwards. It’s good. It’s worth saving.
But this situation, this striking-the-balance problem of destruction versus saving, of trash and triumph – it’s one I haven’t really had to address yet, and I know that that day will come, and with it will be some very sad, very intense feelings as I take a razor blade to something fate and respect entrusted to my care.
I will not enjoy that day at all.
17 Comments