The joke, which I used in a lot of introductory speeches about the Internet Archive Software Collection, was that we are the largest collection of downloadable software on the planet, period. Find us a bigger one, and we’ll download it and add it.
The reason I can make that declaration is because the first few years I got involved, it was a binge of acquisition both online and off, including downloading of FTP site collections, CD-ROM sets, floppies, and all sorts of online collections. It’s very big. It’s somewhat problematic to step through it all, but it’s all there, in one form or another.
It was important to me to gather as much software as possible as quickly as possible because I was very worried we were going to fall into a “too late” situation, be it one of original media dying, or previous (excellent) attempts to gather software fading away into obscurity. There have been some pretty amazing large collections in the past; but then again, most of them ended up on CD-ROMs, so it’s been a case of just gathering CD-ROMs and getting the data off them. Which I did, by the thousand.
The Archive now has something on the order of 8,000 CD-ROMs at least, contributed by myself and dozens of other people. They range from installation disks for modems to entire collections of software from various historical sites. They are companion CD-ROMs to magazines, driver compilations, and promotional one-offs. In some rare cases, they’re CD-Rs acquired from collectors that have very rare material indeed.
They are rather difficult to search.
In the Archives “Biz”, there’s various schools of thought about willy-nilly acquiring “stuff”. All acquisition comes at a cost, be it large or small, and it comes with ongoing issues of maintenance, accessibility, and resource draw. That’s all to be expected, but the approaches have variance and there are some hard-core beliefs and policies out there that, when encountering someone not going in that direction or following those policies, gets a little bit of shade thrown.
I get a lot of shade thrown.
2015 was the year that, getting a few articles about me in various rags, and URLs being available towards those articles, that I stumbled on the most scintillating subtweeting that had been going on (for some time!) about the work I do and how I approach it.
Now, I assure you, not a tear was shed on my part, mostly because I’ve been involved in tight-knit communities, and their little paper dramas and kabuki theater of outrage and nose-raising. It was mostly a surprise because my whole thing of “Archive Guy” had been, generally, a positive one – folks either liked what I’m doing (I thought), or had no particular opinion and a kind of “well, it’s your shitpile” approach. Not so! The anger is palpable out there, in a small and delightful crowd of archivists and librarians who think I’m doing actual damage.
I listened thoughtfully to the arguments, engaged a few folks, got the lay of the land with regards to criticisms.
And then? Well, full steam ahead.
Related to my 2016 goals, I’ve gone ahead and have started shoring up a small choice I made some time ago, and which had some interesting outcomes.
In the rush to get CD-ROMs online, I chose to rip them as fast as I could and get the ISO images up into the Archive, while not scanning any of the CD-ROM covers or discs and certainly not going into any excessive metadata work.
This is a mortal sin in some circles. I did it willfully and gleefully, knowing that materials would be harder to find, but they’d be up somewhere, ready for people who really needed to know it was there, and where to send their own collections. It was a gamble, and it paid off.
Now, I’m in the process of scanning in those CD faces, as the images in this entry can attest. They’re weird as art, and very helpful as reference points. I’ve been intrigued that for many people, the art alone is giving them visual alerts to the materials inside. That means that it’s one thing to have a CD-ROM of old computer art, another entirely to have the hand-drawn label on the CD-ROM that people remember clearly from the past.
The collection, in other words, did the important thing first, and the next-important thing is happening. I am digitizing these sleeves very quickly, at about 500dpi, and taking these TIFF files, uploading them, then running a script that makes a nice JPG image that you can look over quickly.
As I finish a pile of CD-ROMs, these discs will be going to the Internet Archive’s physical storage, where they are available for reference and access in the future, or even rescanning.
Of course, they don’t have an indexing system there, yet.
That’ll happen later.
Comments are disabled on this post