ASCII by Jason Scott

Jason Scott's Weblog

Archiving Into the Infinite —

Snowed in by a blizzard (except for the part where I happily did donuts in a nearby parking lot), I focused some effort on sorting the office in my home, where most of my artifacts and papers are located. When I moved into this office about a year ago, I ended up moving wholecloth my entire previous room, which this old photograph somewhat illustrates. This means that I went from a densely-packed room of information to a very-slightly-less packed room of information, in a house with 5 times the space. Silly.

I chose to sort things in some very general ways, and buy about a dozen plastic stackable bins from the local Home Depot. They’re basically clear, not as prone to crushing as cardboard, and easy to pick up and move around. Without getting bogged into whether each piece of paper had a place or was in the right area, I made some general piles: old high school and college papers, “zines” and other such publications I’d picked up over time, professionally-printed magazines and documents, and personal creations or letters or drawings. Working this way, I found I had 6 bins of easy-to-sort information in no time, and put them into the attic. The room seems nicely emptier, more room to put in better shelves, and maybe even decorate in something other than “the previous owners enclosed their porch”.

The attic already had many boxes of magazines; some time ago I realized I was collecting them without trying, and I cleaned out a comic book store’s stock of cardboard backing and plastic bags and started filing them. The cardboard boxes these magazines were in were, as I said, starting to show how they aren’t good for long-term storage. So I took the remaining empty plastic bins and filled them with these already-stored magazines, finding that one bin held the same as two-and-a-half boxes. Within no time, the attic had a nice little stack of bins in the corner.

I’m bringing up this Martha Stewart-like description of my blizzard-day housecleaning because I think the functionality of the process is similar to how I approach my archives, and for better or worse I think lessons can be learned from it. I encounter in my frequent searches for archives and information a lot of common mistakes and weirdness, most of which can be easily fixed with the same amount of effort it took to create the collections in the first place. What harm can there be in giving out advice, I figure.

Warning: Not Many Professionals Agree With Me

I want to make it clear that how I go about things is a very organic process, one where errors have been made and issues have been encountered along the way that fundamentally changed how I proceeded from then on. Also, I tend to speak with some sort of implied authority, when in fact I have no formal training as an archivist and certainly no sort of degree in the library sciences. To be honest, I’m not entirely fond of the library sciences, but I’m sure that’ll change once I have a few parties with them.

As (I hope) is obvious, I am someone who collects at a rate that would best be described as “furiously”. If I encounter a website that has a large amount of files, information or similar product that is of interest to me, one of my small army of basement UNIX boxes starts a massive wget, pulling every image, document, and program off the site. I sometimes put the resultant files on CD-ROMs or onto tape, or even pull the relevant part into one of my sites. What this means is, I tend to stumble upon an awful lot of archives and see an awful lot of styles.

There are several sub-species of archivist on the Internet. Let’s set aside the professionals like the Library of Congress or or one of the many collections now showing up at educational institutions. Let’s go for the places where it’s just a person or a small group of people who have a bunch of stuff and want it somewhere.

The two most common mistakes I encounter on these personal archives are perfectionism and possessiveness.

Perfectionism is where I see a site that has a lot of files in the wings but which has not added them because the people involved are concerned about getting it just right, with every single bit presented to you in the coolest, slickest interface and setup imaginable. The information they’ve made such an effort to assemble is undigitized, not presented, and coming soon, ever so soon. They explain on their site how it’s going to be a complete collection of this and that, but what they have to show for it is very tiny. Also, these sites tend to use an awful lot of gimgaws, like flash menus or heavy graphics, with the intent being that you’re not just browsing for content, you’ve been sucked into some sort of virtual computer warehouse spinning and jumping around you with the impressiveness of a science fiction movie. That is, the experience of trying to get at the content is ideally even more fantastic than the content itself. And, in fact, it often is.

Possessiveness is where somebody executes effort to generate content (I’m not talking about writing or creating graphics, just the collecting of previously extant content) and then, at some point along the line, considers themselves the legal guardian and sole beneficiary of that content. Even though they’ve chosen to create a site that is on the Internet, the greatest linking of worldwide information to have ever existed, in fact they only want a small portion of the benefits of the Internet, the ones where they get lots of fans and where everyone comes to them for the “goods”, and not the other aspects, like the fact that their content will be copied a thousand times over in the first month. To this mode of thinking, the loosing of their website must be tightly controlled, the dissemination of their archive is a crime, the loss of control of the content from them must be halted at all costs. The first cost tends to be usability, where the website owner throws in strange scripting to prevent the saving of images, or cracks down on someone copying the whole site. The second tends to be corruption of the content, where the images are given horrible watermarks indicating where they came from, or where the documents are modified to provide ads for the site that compiled them. Note, please, that I am again talking about amateur and private archivists as opposed to professional sites that charge money to access their content; those folks are on a completely different vector of existence from what I do. Naturally, their way of going about things would be patently different as well.

Both of these mistakes are poison to the eventual quality of your site, and a sign that you came into the party with the entirely wrong mindset.

Perfectionism dismisses the ability of your intended audience to impose their own sets of filters or understanding on what you’ve collected. In your quest for the best quality digitization, the most complete meta-information, the ultimate graphical presentation and the adherence to every possible standard that exists in the web, you’ve sacrificed your site’s ability to be alive. You’ve also clamped onto the content an awful lot of un-reproducable weight that won’t survive the next iteration of your site. This time there’s a little animated fish that tells people if the .zip file has text in it. What about next time, when you want everything to match up with a MySQL database of file attributes? Are you holding back your collection because you’re afraid of what people might think? How many hard drives contain a copy of your in-progress website? One? What if that hard drive dies? Do you have any backups? Did you keep original copies of the archives you’re ballasting with scripts and graphics?

In terms of possessiveness, the end result of your efforts will be that people depend utterly and entirely on your site for the content, and will each have small pieces of it. This means that if your site goes away, the content goes away, or at least your assembly of it. Your site going away doesn’t necessarily mean you take it down either; a hard drive crash or a loss of funds to pay for hosting will do just as well. Basically, in your quest to be the “owner” of the information, you’ve made yourself the “sole parent and guardian” as well, and your centralized site is fortified against what you see as interlopers and marauders. You know, your users.

The First Step: Coming to Terms With Your Problem

You might not recognize yourself in these paragraphs, because I’m being somewhat drastic in the descriptions. But ask yourself these questions.

Am I afraid someone is going to ‘take’ my site?
Do raw directory listings make me look ‘bad’ to my users?
Do I hate it when people just right-click and put the images on their site?
Am I using a database (MySQL, etc.) or scripting (PHP, SSI) to provide static content?
Is a lot of my content offline or inaccessible because it’s not ‘ready’ yet?

Framed this way, maybe you see what I’m getting at; you’ve got fears that are getting between you and putting the content out for people to find it. These fears will be reflected in a site that is paranoid and hard to use because you’re thinking of yourself as a business or a top-notch service and forcing your work into that template, to the detriment of the source material.

The Second Step: Make Your Content Available

I’m not one to point out problems without suggesting some ways to go about fixing them, so let me give some pointers. If the previous paragraphs haven’t turned you away from this entry, then you’ll find them just dandy.

The first key is to not let yourself get pulled into silly abstract concerns about the long-term arrangements. Taken to their logical extremes, these concerns about the permability of magnetism and the functionality of various storage media go to some depressing extremes indeed. It will depress and paralyze you. Ignore it, focus on, for example, the next five years. In that case, you want some basic backups, some clear copies of your content, and maybe a friend who can keep a copy as well.

One of the greatest things to watch is how quickly the new technologies can swallow the old. I used to spend hours browsing collections of files, peering through a paint program at every single image, trying to classify them, consider them. Now, I can pull up a thumbnail gallery in seconds and drag and drop images into folders with ease. The situation is even more intense with hard drives; I take the full contents of my old “it kicks ass” 100 megabyte hard drive and just drag it onto one of my USB drives, then run a program that finds all the doubled files and removes them. Then I blow the whole thing to a CD-ROM in the half-finished state as a backup and work on sorting the data when I have time. Seconds where minutes and hours once lurked…. The point is, don’t fret on having all your stuff in absolute pristine condition before it’s presented, because as long as it’s out there, people will write programs that do the sorting and evaluating faster and faster with each generation of machines, and the most important part, the part where you had these files and got there somewhere safe, is at hand and easy enough to do.

The next thing you must come to terms with is the concept that the information owns you more than you own it. This is likely where our two philosophical paths diverge and you will go on your merry way; it’s been nice knowing you! For my own part, I simply can’t look at these thousands of textfiles and artpacks and demo programs and music and act like they’re “mine”. As for my descriptions and other meta-data I’ve added to them, those are “mine” in a way but they would be useless without the core data, and the core data isn’t “mine”. So while I appreciate it when people think of me as the “textfiles guy”, I’m just the latest in the line of people who’ve had possesion of them, and the one that got them into your hands in many cases, but I would be nowhere if other folks hadn’t created textfile CD-ROMs or put up lists of them on their BBSes and AE lines. To act like my luck in being around when the Internet came forward is some sort of skill that I deserve due compensation for (and more importantly, ownership of the files) is delusional.

This is a very, very difficult thing to get across, just like any core philosophy. It’s why some people love and some people hate the GNU Public License. It’s why some people give money to the local library and other people threaten them. It’s an inherent belief system I adhere to: not so much that information wants to be free, but that information that was free is still free. I call the syndrome of turning previously-free content unfree as “tollkeeperism”. Just because you’re first, you think you can build a little toll booth and charge everyone a nickel to go by after you, or tell them how they can use the road you’re charging them to use. It’s profitable, it’s good business, but you’re not in business. You’re just a person with information to share. It’s a high calling, but it’s not a knighthood and it’s certainly not an excuse to lock everything away so you can approve every bit that goes by. If it’s actually stuff that you personally created, then huzzah! Yes, stick a price tag on it and go to the market. But it so often isn’t. You didn’t write it, you didn’t create it, it’s just your collection. Why act like others shouldn’t collect it, too?

An example of how technology came along to solve inherent problems of distribution is Bittorrent. It has its detractors, but my own take is that it provides a way to download massive files (for example, an archive of an entire website) in a clean, efficient manner. And best of all, the program simply insists that the copy it gets be an exact duplicate of the original (believe it or not, some other technologies let you get working sub-pieces, diminishing the integrity of the original file). You can then pass around this massive file as just another thing to trade. This is why now has a torrent site. It’s also why I don’t have much belief or patience in people who say it would be too difficult to put their entire site up for download.

It’s an excellent test; are you comfortable with what you’re putting up that you could just give it away in one big chunk to whoever comes by? If not, why not? What’s the worst they could do; put up an exact duplicate of your site? Maybe even charge money for it?

So what?

People do that with all the time. Sometimes they even take the color scheme. This, I hope, indicates that on a very personal, very singular level, I experience this supposed downside of my open philosophy about the content of my sites. Folks take copies, put them up, and start serving them, far out of my control, using my descriptions, to whoever they can find, taking people away from my site. I can’t stress this enough: I love that. When I first started looking for BBS textfiles in 1998, I was basically stumped; I could find small smatterings of files I’d remembered, but I couldn’t find something like out there. Now people go searching and they stumble onto my site through google or the like and they pull an immediate copy. Or they find the other sites and pull their copy. Files that were once on the verge of being either lost or very hard to find are now living, breathing in thousands of machines across the country (many people have used my torrent site). That’s fantastic.

Sometimes, I do searches for “textfiles” on peer-to-peer networks, and to my surprise (or maybe lack of it) I find copies of the entire archive of available for download. There it is, sandwiched between digitized albums and months of amatuer porn, is a good sizeable chunk of every major BBS-borne textfile from the 1980’s and 1990’s. Just another ware. Just another single click of the mouse and it comes onto your computer, which has more than enough space to hold it. Maybe I’m still too wide-eyed for my own good, but this is a miracle.

One day there will be no It’s not an announcement that I’m shutting down, and it’s not some weird prophecy. It is just the way of the world that things come and things go. What I hope, to be sure, is that when the site has long gone away, the archive of text is out there for people to find. I hope the information outlives me. I hope the works live a very long time. That’s the most important part of the site; what it provides for after it is no longer up. Otherwise, I’ve spent an awful lot of time on a sandcastle.

Every week, I go to sites that were once “the largest” and “the best” and find an empty site, or an unpaid domain name, or a greatly reduced archive because the owner had too much bandwidth costs. In some cases, they remove content because there’s been too many downloads, an irony I find breathtaking. This loss of content is happening now, while you read this, all over. And it’s so avoidable, that’s what hurts the most. is mirrored (many states and countries with different laws, plus unannounced mirrors). is simple to use (every directory has a .descs file, so you can pull the information and not be married to my html.) lets you download all of it (via torrent, soon via archives). thinks you’re the best part of the site: the person who cares enough about history to want a copy for themself. I hope my site makes you feel at home.

Categorised as: Uncategorized

Comments are disabled on this post

Comments are closed.