I am an extremely lucky person.
I’m lucky for a host of reasons, but in this particular case, I’ve been matched up with the Perfect Job very early in my life – my 40s. Some people get earlier, of course, but many more get it later, if at all. Life at the Internet Archive is just what I wanted it to be. Conflicts are barely anthills. Achieved dreams loom in every direction. Triumphs have been many, failures often more hilarious than troublesome.
When I joined in 2011, I was given several overarching aspects to think about, and I added a few of my own. One of them was software and another was the emulation in a browser thing, both of them going quite swimmingly. Another was to spiff up the donation page, and at this exact second the design’s a little cramped for the holiday matching fund drive, the flexibility of the new design and the addition of subscriptions turned out to be well worth my attention.
So, 2014 looms. What’s got my attention and why did I use a word like frightening in the title of this entry?
First of all, I’m not “done” with the JSMESS project and I’m certainly not done adding software items to the archive – those will continue and may even dwarf the rest of what I’m doing for some time to come. They’re both big, important things and I’m working on them nearly daily, as are many others.
We needed easier money donation, and we needed software emulation in the browser, and now we have that, and it will get better as time goes on.
In 2014, I want to go after two other weaknesses in the Internet Archive arsenal: Metadata and Discovery. (And maybe Accessibility if we can swing it).
When I interact with professional librarians and archivists, or even folks who are really, really into the subjects that I’m focusing on (vintage software, crazy old crap), the conversation quickly turns to how in fact these items are being described and given metadata. And then the question of how it can possibly be found at all.
So, in the very specific realm of software, bear in mind we’re making up for decades of institutional neglect. Oh, hobbyists and intense amateurs were getting shit done, let’s not diminish that work at all. But it was all being done under this cloud of “are we in trouble” that meant that the hosting and interaction of the materials meant that a few random brave souls would make good collections (Home of the Underdogs for binaries, MobyGames for metadata, mame.dk for ROMs) and then things would go south for a variety of reasons and the information and data would disappear again, sometimes for good. No institution stepped in. Not really. And so here we are, with the Internet Archive now stepping in. Become the largest historical collection in the world? Check.
To do this, we absorbed many terabytes of data, from a wide range of software. Some people were very specific about high-quality descriptions and naming. Others…. were not. But again, to make up for lost time, in it went.
Same with old documents related to computers, old videos, old audio. My philosophy has been, and continues to be, get it online first. GET IT ONLINE FIRST. Deal with EVERYTHING ELSE LATER.
If it’s online, it’s not in a box in a basement or attic. If it’s online, it can be commented on. If it’s online, it can be shifted around effortlessly and included in greater and greater things. And if it’s online, it isn’t rotting on some piece of magnetic plastic or dimpled plastic or broken plastic. Granted, we’re buying a whole other range of long-term problems putting it on spinning disks and what have you, but the long-term preservation of the item is now a whole lot easier, should we be responsible. Being online is a great thing.
Once stuff is online, and as I just implied, an awful lot of stuff is now online, then we can talk about metadata, organization, discoverability.
And that time is now.
I unintentionally got quoted all over the archiving and library scenes when, in a talk I was giving at the New York Public Library, I said “Metadata is a Love Note to the Future“. This rang true with a lot of people, and it speaks to the oddness of what metadata is and who and how it serves.
Intense, machine-searchable information about artifacts and collections, be they digital or physical or whatever, has a value that is primarily based on faith. You can enjoy the object right now in your hands, but turning it into a photograph or a .wav file and then tacking on a whole range of information you might not have even had at the moment, is preparing for a future that you have no idea about.
I assure you, there are hundreds of books contemplating the nature of objects in past, present and future, and how we as human beings interact and interface with these objects. I’m not going there. But I’m going to say that the effort put into generating contextual data about an item provides all sorts of benefits, but almost completely in theory unless you know you have an audience waiting for it. That makes it a very tough sell for people to ‘just do’, like they might bookmark or do a retweet or notation in a weblog. It’s involved. you usually have to pay people. And if you pay people, it gets expensive quickly.
So my efforts will be to make metadata generation for items on the Internet Archive as painless, as collaborative, as rewarding as possible. I’ll likely utilize custom scripts, wikis, let’s-raise-the-barn events and shout-outs for folks to get involved however they want to. I also will work on automation of same, where a person is signing off on the efforts of machines, instead of typing in the year when the stupid thing is telling you the year right there and in a billion obvious locations.
It’s a tough problem with a lot of moving parts! Hence it’s a goal, to be implemented over time and with endless refinements as I progress. I’ll let you know how that goes.
Even more fundamental is the issue of Discovery and Exploration.
There are people who have no idea the Internet Archive exists, Wayback or digital media or anything. There are people who only know it for Wayback. And then there’s people who know it “pretty well”, knowing we have a whole bunch of audio and video and books and software. You are likely among this last group.
And you still have no idea, no idea, how much stuff is at the Internet Archive and its collections.
I just checked The Thing That Tells Me Stuff and it tells me that in my time at the Archive I have personally uploaded 229,000 individual “items” (some of which are grouped files) for a total of 262 terabytes of data.
I’m throwing a lot in, but I’m hardly the only one throwing a lot in. Some of my co-workers in the “collections” group I work at have shoved in millions of individual items, ranging from documents and journals through to the video, audio, and so on. Let’s not even touch the wayback, which has over 368 billion (with a b) URL captures.
When I send you somewhere, say, deep into a collection of magazines or over to some Apple II documentation or up into a massive audio record… well, forget the surface, we’re not even scratching the surface of the surface.
It is a terrifying, frightening cornucopia. It is a horn of plenty so pitch-dark with content that I am not 100% convinced the problem is solvable, unless the nature of humanity changes overnight and even then we’re talking a couple years of hard work.
But there you go. In conjunction with other efforts by other folks at the Archive, the plan is to make strides in discoverability, usefulness and access to the vast and ever-growing stacks of the Internet Archive, which, again, I promise you, are massive.
Every site that has a forward-facing website and then terabytes of goodness down the line has this exact problem, by the way. Every museum and archive with warehouses and storage units extending into the darkness has the problem as well. It’s not a new problem, but it’s one I’m willing to tackle.
Hey, if they weren’t called ratholes, everyone would want to go down them.
Comments are disabled on this post