It has been an interesting few days.
Creating the archive of Geocities content from the Archive Team’s collection took my machine roughly 10 days to compress. The resultant collection of .7z files is 642 gigabytes, expanding out to 909 gigabytes. Then I began creating the actual .torrent file, which is merely a collection of pointers to the files that trackers and clients use. This took 13 hours, and I had to do it twice: it turns out the default “piece size” is 256k, this sent the machine up into the 2 million plus “pieces” and a LOT of clients do not like getting two million entries in anything. Rejiggering to 16mb “pieces” did the trick. But it still took another 13 hours.
A few of us in the Archive Team IRC channel did some testing, and we’re off on a roll. The swarm has been in the hundreds range since.
I’ve been sending out e-mails about the torrent existing over the past week to the over 800 people who requested to be notified. This slow rollout isn’t because I think the torrent can’t handle it – it’s just that Gmail is not as easy to run little scripts against collections of mail to extract a mailing list. So there’s a little copy-paste action going on and I am not going to do that full time. A few hundred of those folks have gotten notified and I’ll probably be done with the full list shortly.
And then came the press.
So, I’m going to punch the press in the whizzer for a paragraph or two.
The whole point of this exercise was to gain attention to the issues and cause that Archive Team is involved in: preserving digital heritage and lambasting entities/companies that treat user-generated content like so much trash. I think the issue transcends anything I’m mucking around with and represents a real and vital issue as more of life moves online. By boiling things into “Geocities as a Torrent”, attention was sought, attention was got. But along the way, I’ve gotten another taste for contemporary news-gathering and the stratification of quality is getting ridiculous.
On the one hand, I’ve got reporters like Ken Gagne of Computer World and Lauren Schenkman of Science Magazine who have contacted me, spoken to me on the phone, and then gone off and gotten related individuals on the phone or e-mail to discuss the issues. They’re doing this with pretty fast turnaround. And I guarantee I’m probably a tad spoiled by reporters like Stacy Schiff, who spoke to me for hours to get background on her excellent Wikipedia article, or Kim Zetter, who shows that you can write an informative article without being fawning.
And then come the slightly-slapdash ones, who write articles using my one weblog post as their source, but then go off to find some additional illustration. Not really great, but then again these are newish organizations not really interested in a whole lot of standards when it comes to telling the stories. Pleasant surprise should occur when they get things right. (For example, a lot of places wrote that the torrent is 900gb and will expand out to terabytes, something nowhere in anything I’ve written.)
One that made me go off the rails was this article in PC Magazine, which was written by Sara Yin and had the name of an employer I had quit 10 years ago and spelled the name of that employer wrong – ignored the original weblog post about this and never contacted me once. So I made a little noise about it, got a few buttercups up in arms that I’d be so mean, and ultimately got a few additional insights into perceptions of my personality.
Oh, sure, PC Magazine made a correction, but not before it got syndicated to hell, with the wrong information baked in. And the corrections do not follow. It was especially galling as PC Magazine was an entity that I was reading like a bible in my teens, even submitting software for their new PC Disk Magazine subsidiary because I thought it was such a point of pride to be in its pages. Well, obviously not anymore – now they have crap farmers using the first three google links to write inaccurate stories and still calling themselves “reporters” in a land with people with Schiff and Schenkman. For shame.
There have been some amusing podcasts mentioning the situation, for example Infosec Daily has the story at the end and Dan Misener did a recorded interview with me that was so much fun and got the message across so clearly that it’s actually included in the torrent. Even This Week In Tech mentioned the event, comparing it to zombies and yelling “BRAAIIIIINS” and hey, whatever works for you.
Right now, there’s only one seed machine, but I am duping the archive over to a portable drive, and a number individuals and organizations are mailing me hard drives to get copies to seed as well. So anyone going on seeing that the top seeds are “merely” at 8 percent or some lower number, that torrent is about to speed up dramatically.
I’m glad the word got out about this. Even if people choose not to download the data (and come on, this is a hell of a lot of data), they remembered Geocities one last time, and remembered what Yahoo did. Maybe that’ll change something down the road.
So there we go. One last thing – another Geocities archiving project, Reocities, was done by Jacques Mattheij, who is such an awesome dude and so perfect as a counterpart to what Archive Team is doing, I hereby call out some tech conference to bring us both in for a panel. We will fucking kill the room, I guarantee it! Kick out some lame “how to distribute your blah” speech and give us 90 minutes. Trust me. Get on that.
Oh, and PS: I put all of my Geocities archive on this:
Was it really that hard to keep around, Yahoo?
Categorised as: computer history
Comments are disabled on this post