Scan, No Scan (and a Cube in the mix) —
This week has been spent sorting through the Information Cube, that insane 40x8x8 shipping container in my back yard, and packing up magazines by the thousand to go away.
The reason this is happening is because of an arrangement I made last year, with the Strong Museum of Play (and is also the International Center for the History of Electronic Games). In this arrangement, they are going to be the new physical caretakers of the magazines I’ve been collecting and have been donated to for 30 years, and in return, I’m not going to die under a pile of crates of magazines that collapsed on me while I was trying to find “that issue with the ad for 9600 baud modems”. Fair enough.
I have negotiated a number of terms of this material, which would be a bit involved to go into right now. But I still have access to them, and I will have first refusal if they decide to de-duplicate and send to other archives. In an ideal world, I’d inventory all the magazines before they went up to Rochester, but in an ideal world I’d not be doing this in the dead of winter.
The archivists at the Strong have given me a large list from my own rough inventory of items, and Kevin Driscoll and I have been going through the crates and piles, cleaning things up as we go, and moving the relevant magazines into crates destined to go north.
The Strong is a good place. And the important thing is, we’re over the hump with regards to materials regarding computers – after taking a bit of a nap on them, institutions now realize they want these endless journals of computer programming and products. And after talking to a lot of places, I rested on the Strong as the place to take my magazines and hold them. They’re well funded, they take care of their stuff, and believe me, it’s a hundred times better than the Cube.
(National Geographics, however, are a curse, and I’m working hard to find some place or dealer to get them to. The world does not hurt for National Geographics.)
Anyway, here’s the REAL point of this post.
Every time I mention ANY of this, EVERY single time, and from EVERY quarter, comes the exact same theme of comments and the same question framed every which way, but coming down to this:
Where are the Scans of these, and Where Can I Read Them?
The Scans, the Scans, the Scans. On one level, it’s very encouraging that people have an interest in the material, and that they now recognize the value of access to computer materials online. Without a doubt, that’s what the place I work for is dedicated to and it’s the central thesis of the Internet Archive’s existence.
But there’s a missing part in there: the actual, physical, person-intense process of scanning.
It is not an interesting job. It definitely not enjoyable. And it’s endless, an infinite process of gathering items, getting them into machines, and then whatever level of quality assurance is done afterwards to make sure they’re functional.
Scanning, say, a brochure from a 1978 computer user group, which is a couple pages total, is at least a functionally rewarding and quick turnaround. Scanning what is ultimately 50,000 pages of PC Magazine is not. And believe me, it’s 50,000 at least.
There have been heroic, involved projects to scan magazines over the years. One of the leaders is Bombjack, but there are many others. And when these scans come into my radar, I get them into the Internet Archive’s stacks as fast as possible. Easy access, and quick reading, and the online reading capability are important things, and I’m proud we’re able to support these projects with that.
But I think people need to realize how time-intensive (and sometimes money-intensive) scanning is. I have a scanner in my house, but I also worked very hard on documentaries and emulation in the last couple of years. Being in the dark room with the scanner and adding materials has been my choice to avoid. Because I’m not in an urban center (and therefore can afford the space to have a Cube in the first place), people are not really able to get out here. I have had a volunteer who’s done some books, and he’s been great to do so, but I can’t have someone here day in, day out right now.
Meanwhile, people clamor for the scans. The scans, the scans, the scans. Why are you holding them back, they hue and cry.
One particular edge-case fellow has decried for years, years on end, that I am “holding back” on one particular magazine (one there are significant copies out there, trust me) that he believes I am intentionally not scanning, for reasons he can only construct as evil and self-centered, and holding back from the community. I realize he’s an edge case, but he doesn’t exactly inspire me to tramp out to the backyard.
So, rest assured, this is a situation that’s on my mind. It’s a victory – people realize this is a real and valuable thing. But it’s also the first step of a mountain – scanning magazines and materials is intense, intense work.
My proposed solution is simple.
First, get as many items safe and sound up into the Strong Museum and the ICHEG Archives, so that they’re taken care of, reference-able (with actual catalog numbers, so you can ask for June of 1984 and get it). This first run-through with the Strong has been huge, and I have a theory they actually want more, but we just haven’t synchronized inventories yet.
Next, work with the Internet Archive as a fundraising situation to get both a book scanner and paid volunteers to scan material at the Strong. Lots of scans. LOTS of them. These folks up there are friends, and scanning, like I said, is terrible work, and terrible work should be compensated wherever possible.
The Archive will gladly host the resulting scans. So as items get scanned, they’ll show up in the appropriate collections. The work will be done.
I believe there’s two types of scans that are possible in the contemporary world – “good enough” scans that give you all the information and insight you could want, and “artisanal” scans that are for specific one-off images or pages that have value and merit on their own. Obviously some materials deserve artisanal effort all the way down, but many, many don’t (Conference proceedings come to mind – black and white pages, never-ending, with no illustrations.)
It’s a huge project. I think the first big step was no longer locking items away in the Cube, like a bomb shelter, from the ravages of cleaned-out basements and dying enthusiasts. I think that step is far from over and I expect to continue to be sent items.
The next step, after ensuring their proper place among the stacks of art and works in the world, is to bring them to you digitally.
When I make the call for funding, I hope people answer.
Categorised as: computer history | housecleaning | Internet Archive | jason his own self
Comments are disabled on this post
When the time comes, broadcast word as far and wide as possible, and we’ll help how we can.
Yeah, Nat Geos are hard to get rid of. Someone local has a basement FULL of them trying to dump them. Turns out everyone collected back issues of it for some reason, plus Nat Geo themselves released the archives on DVD.
I have a handful of things that really should get scanned, but I have to find time to get to that part of NY for a day (about 2 hours away from here).
Is there some reason to keep the nat geos at all then? If the scans are available for $59.95 and approximately EVERYONE kept the blasted things, should you consider just tossing them? I know it sounds like heresy, but if there’s N copies already, and it’s already scanned, perhaps one more copy doesn’t need to be kept.
Or put an ad for “free magazines” on Craigslist, if you don’t mind some odd folks showing up. I got rid of a bunch of medical journals like this.
Scanning is tough work for sure. Something a modern DIckens might make use of. But here it really is a service to humanity – one dreary page at a time. Would it not fit the description of “community service”? It certainly would fit the bill for being both very useful and punative at the same time. I think if judges understood how well it did those things you would have a steady stream of “volunteers”.
A couple years ago a Google employee demonstrated a project he called the Linear Book Scanner that he developed in his 20% time. At the time of demonstration there were some wrinkles with the thing that made it an admittedly imperfect solution; namely actual, literal wrinkles and page tears occasionally cropped up during the scan. Even so, it had a pretty respectable success rate back during the first demo. Two years on, I’d expect some of the issues to have been fixed, making it an even more attractive option for use in production. (Especially given its serviceability and the price point compared to the state of the art in bookscanning, which mostly consists of scanners that aren’t able to work with as little operator involvement as the Linear Book Scanner can get by with.)
Since you’ve long been an advocate for recognizing the need to sometimes adopt the “good enough” stance on scans for some materials, it seems like the Linear Book Scanner is a near-perfect tool for that approach.
Do you know if anyone at the IA ever looked into the Linear Book Scanner for the purposes of peforming a serious evaluation for its use in production? If they did, do you know what the reasons may have been for not rolling it out? Have you considered using this hammer for your own use? I imagine that if you took an approach where you only used it to scan items for which either you have a backup copy of the same work for manual rescans of the problem pages, or you’re reasonably sure there are plenty of backups out there, or you aren’t otherwise overly concerned about some damage to the materials, then the output efficiency of this thing would register as a couple orders of magnitudes better than the scanners that the IA is using now.
Without taking too much time for this, I’ll tell you the linear book scanner is a disaster.
Having started to do a lot of archival scanning for a historical project I’m working on, I might advise the following for whomever does do the scanning.
Don’t try to guess what needs high quality scanning and what doesn’t. Assume that it all does, and scan at a high ppi and bit depth to an lossless, widely-used open format. (TIFF probably.) Yes, this takes up a lot of space and increases the scan time, but it will save immeasurable time and wear and tear on the original materials down the line.
You’re creating an archival master copy. From this, it’s pretty trivial to generate copies with lower res or higher compression, or do OCR work. The thing is, acceptable sizes and compressions will change with time. What used to be overwhelmingly large back when we were surfing the web on 56k modems with lower resolution monitors and 1GB hard drives is now often too small to do any practical work with (or sometimes even read easily). Far better to go back to the master digital file and re-export to whatever future standard than to have to pull the physical source material again. (Not to mention the time it will take for someone to redo all that work.) As storage space continues to get cheaper and larger, the burden of master scans, on the order of hundreds of MB each, will be markedly diminished, but the human cost is unlikely to decline at any similar scale.