ASCII by Jason Scott

Jason Scott's Weblog

The Geocities Torrent: Patched and Posted —

Obviously other things are starting to come to the forefront with regards to saving digital history, but starting projects and not seeing them to fruition/completion is no way to go about life, even if it’s a business plan for some people.

So, the Geocities Torrent, a 900gb monster compressed to 643gb and spread via the usual channels for such things, turned out to have a slight flaw – it was fucking huge. And for UNIX filesystems only – run it on Windows are you are a sad little torrenting panda. Yet another flaw was that I only occasionally create new torrents, and I almost never create torrents that are any size, and a whole lot of buffoonery occurred, meaning we are now where a handful of folks are sitting at 99.95% downloaded and have been sitting there for a while.

This is all my fault. There was a lot of data, with a lot of time involved, and I should have immediately made a copy of the resulting archives and shipped drives to various locations. That’s what I’d do now. I’d also have made the filenames non-case-sensitive. So yeah, bad on me.

What I will stand with pride on, however, is the whole idea of the torrent in the first place. Released one year after the closure of Geocities, the actual idea of Geocities had already faded to nothing. With the announcement of the torrent, the lights came back on and a lot of press was generated about the loss of Geocities, the cavalier way Yahoo! had treated all this user data. The debate of data retention and web history was rekindled in a big way.

But, until now, it was pretty nigh impossible to get 100% of the torrent.

So, one of the most inspiring use of the available-but-slightly-damaged material was One Terabyte of Kilobyte Age, an actual archeological study of the material in the Geocities morass we’d dumped on the world. Run by two professor-artists, Olia and Dragan, the weblog would push out greater and greater forays into the stories behind the works. In some cases, they found the actual people who had made the animated GIFs or who had created the pages, and talked to them. In some cases, they were too late, and the interviews became obituaries. It is VERY worth your time to check them out.

Like others, the happy pair started to become more and more cranky as the torrent failed to complete, as did people who wandered into the Archiveteam IRC channel. Let’s just say I was a tad undiplomatic. But, over time, we traced the problem to a handful of files that had gotten corrupted on the original sets of archives. I must point out: the original data downloaded was never corrupted, just the archives generated from them that were then used to generate the torrent. And, unfortunately, it is not entirely easy to generate drop-in replacements for files in the torrent. The solution became one of generating brand new archives for the corrupted files, and making a “patch” of sorts. And that’s what happened, although it took a good long time because THIS IS A LOT OF GEOCITIES.

As I am now retired from the over-5gb torrent business, Dragan and Olia went ahead, tested the archive, verified it all worked, and have made a new torrent.

HERE IS THE NEW TORRENT. YOU WANT THIS ONE.

If you have been downloading the original torrent, do not despair. The files WILL match up – just go from one to the other, and it will add in what you’ve finished into this new torrent.

Again, the resulting files are not for everyone – they’re definitely not what you want if you just wanted to browse some old Geocities sites for nostalgia – the live sites Reocities and Geocities.WS do that job for you nicely. This is a collection for historians, for researchers, for developers. For those who want to do study on the heritage on something so soon gone and yet so much of part of how we got here.

Enjoy.


Categorised as: computer history

Comments are disabled on this post


2 Comments

  1. deepgeek says:

    Still, thanks for all of this. I am interested in a now obscure form of Science Fiction called “cyberpunk,” and am working on a 1/5 TB greppable thing that should yield me some fan fiction from long ago.

    I might do a podcast on what I did with the torrent, and what it is like wrestling with yet, would you consider republishing the script here is I do?

    yours,

    DeepGeek

  2. Scuzz says:

    It’s become a running joke among my friends that I will never finish this torrent. I think that I’m probably going to have issues because of the “file length too long” thing, but we’ll see.

    Thanks for the update, I’ll continue sitting here and seeding on my ~15Mbit/s up connection for at least a few months after I get the torrent (assuming that I eventually get it :P). I don’t know how long I’ll be able to keep it up with inbound caps on internet connections in the US.

    Thanks for making the original torrent! Looking forward to mucking through this once I have it.

    Also, quick question for Jason: “A Little Less Conversation, a Little More CDs” it’s easy for me to mirror that if you have a good format by which to do it. Should I just crawl the website? Do you no longer need mirror-ers?