A Torrent of Attention

It has been an interesting few days.

Creating the archive of Geocities content from the Archive Team’s collection took my machine roughly 10 days to compress. The resultant collection of .7z files is 642 gigabytes, expanding out to 909 gigabytes. Then I began creating the actual .torrent file, which is merely a collection of pointers to the files that trackers and clients use. This took 13 hours, and I had to do it twice: it turns out the default “piece size” is 256k, this sent the machine up into the 2 million plus “pieces” and a LOT of clients do not like getting two million entries in anything. Rejiggering to 16mb “pieces” did the trick. But it still took another 13 hours.

A few of us in the Archive Team IRC channel did some testing, and we’re off on a roll. The swarm has been in the hundreds range since.

I’ve been sending out e-mails about the torrent existing over the past week to the over 800 people who requested to be notified. This slow rollout isn’t because I think the torrent can’t handle it – it’s just that Gmail is not as easy to run little scripts against collections of mail to extract a mailing list. So there’s a little copy-paste action going on and I am not going to do that full time. A few hundred of those folks have gotten notified and I’ll probably be done with the full list shortly.

And then came the press.

So, I’m going to punch the press in the whizzer for a paragraph or two.

The whole point of this exercise was to gain attention to the issues and cause that Archive Team is involved in: preserving digital heritage and lambasting entities/companies that treat user-generated content like so much trash.  I think the issue transcends anything I’m mucking around with and represents a real and vital issue as more of life moves online. By boiling things into “Geocities as a Torrent”, attention was sought, attention was got. But along the way, I’ve gotten another taste for contemporary news-gathering and the stratification of quality is getting ridiculous.

On the one hand, I’ve got reporters like Ken Gagne of Computer World and Lauren Schenkman of Science Magazine who have contacted me, spoken to me on the phone, and then gone off and gotten related individuals on the phone or e-mail to discuss the issues. They’re doing this with pretty fast turnaround.  And I guarantee I’m probably a tad spoiled by reporters like Stacy Schiff, who spoke to me for hours to get background on her excellent Wikipedia article, or Kim Zetter, who shows that you can write an informative article without being fawning.

And then come the slightly-slapdash ones, who write articles using my one weblog post as their source, but then go off to find some additional illustration. Not really great, but then again these are newish organizations not really interested in a whole lot of standards when it comes to telling the stories. Pleasant surprise should occur when they get things right. (For example, a lot of places wrote that the torrent is 900gb and will expand out to terabytes, something nowhere in anything I’ve written.)

One that made me go off the rails was this article in PC Magazine, which was written by Sara Yin and had the name of an employer I had quit 10 years ago and spelled the name of that employer wrong – ignored the original weblog post about this and never contacted me once. So I made a little noise about it, got a few buttercups up in arms that I’d be so mean, and ultimately got a few additional insights into perceptions of my personality.

Oh, sure, PC Magazine made a correction, but not before it got syndicated to hell, with the wrong information baked in. And the corrections do not follow. It was especially galling as PC Magazine was an entity that I was reading like a bible in my teens, even submitting software for their new PC Disk Magazine subsidiary because I thought it was such a point of pride to be in its pages. Well, obviously not anymore – now they have crap farmers using the first three google links to write inaccurate stories and still calling themselves “reporters” in a land with people with Schiff and Schenkman. For shame.

Anyway.

There have been some amusing podcasts mentioning the situation, for example Infosec Daily has the story at the end and Dan Misener did a recorded interview with me that was so much fun and got the message across so clearly that it’s actually included in the torrent. Even This Week In Tech mentioned the event, comparing it to zombies and yelling “BRAAIIIIINS” and hey, whatever works for you.

Right now, there’s only one seed machine, but I am duping the archive over to a portable drive, and a number individuals and organizations are mailing me hard drives to get copies to seed as well.  So anyone going on seeing that the top seeds are “merely” at 8 percent or some lower number, that torrent is about to speed up dramatically.

I’m glad the word got out about this. Even if people choose not to download the data (and come on, this is a hell of a lot of data), they remembered Geocities one last time, and remembered what Yahoo did. Maybe that’ll change something down the road.

So there we go. One last thing – another Geocities archiving project, Reocities, was done by Jacques Mattheij, who is such an awesome dude and so perfect as a counterpart to what Archive Team is doing, I hereby call out some tech conference to bring us both in for a panel. We will fucking kill the room, I guarantee it! Kick out some lame “how to distribute your blah” speech and give us 90 minutes. Trust me. Get on that.

Oh, and PS: I put all of my Geocities archive on this:

Was it really that hard to keep around, Yahoo?

Goodbye to VK7AX BBS

I have a number of scans running in RSS feeds to let me know about stuff related to my movies, projects, and people chit-chatting about me.  I also have a scan watching for the words “BBS” and “Bulletin” in them, which more often than not scoop up references to “Bulletin Board Systems”.  It does a pretty good job and the false positives aren’t that annoying. What it means, however, is I get to hear a lot of reminiscing and correlation of the BBS era to events and items of the present day.

Or I get announcements like this:

Closure of VK7AX packet BBS network

After approximately 35 years of operation, Tony VK7AX has decided to close the Packet Radio BBS known by the SSID of VK7AX-6.

This decision was not taken lightly, however the time has eventually come, according to Tony, to close the BBS.

Due to the decline of RF users in recent times to ZERO and increasing running costs, he has reluctantly come to the conclusion that it is not worth continuing to maintain a system purely to act as a Bulletin Forwarding machine only.

He takes the opportunity to thank the many sysops and friends (world wide) that have helped or contributed in one way or another over many years. Without the true Amateur Spirit and encouragement shown to him by many, he feels that he would not have continued maintaining and operating the BBS for as long as he has.

The BBS will be turned off on Saturday 30 October 2010.

This follows the closure of the Tnos packet Gateway machine VK7AX-8 which was decommissioned approximately 3 months ago for the same reasons.

Once again many thanks to everyone for their support and valued friendships.

Finally, for those still interested in packet, please support the only remaining BBS in VK7 – VK7HDM
( email: ddm@lnx-vk7hdm.dnsalias.org )

73′s
Tony VK7AX
(Past SYSOP VK7AX-6 and VK7AX-8)

The language used might require some explanation, even from followers of BBS history. Please excuse me if I utilize links instead of my own quaint linguistic approach; a lot of people have done really good work on explaining it.

But first, some pretty pictures. Let’s start with Tony VK7AX:

Tony VK7AX is in fact Tony Bedelph. He lives in Tasmania,  and he has been doing things with amateur radio and other signal transmissions for many, many years – longer than I’ve been alive, and I’m old.  If you browse his website for the amateur radio/television station he works on, you grab a hold of a lot of years of work. Slow-Scan images, links to syndicated shows, and information about his various projects he’s been involved in.

You can not do better for an introduction to everything about what Packet Radio and Packet Radio BBSes are than this excellent tutorial from Larry Kenney, WB9LOZ.

It is not clear to me how to reach Tony – I would love to archive the BBS he has shut down as of Saturday, to capture this long-running warhorse for all of time. Here’s hoping I figure it out.

The BBSes that are going down now will often be like this – fantastically old, filled with years of history, disappearing quietly.

I try not to get to sad about this.

Historian!!

GDCarchiving 006 GDCarchiving 003 GDCarchiving 008 GDCarchiving 005

The stack of huge boxes on my door containing this material heralded an important milestone in my life: my first assignment as a professional historian/archivist.

From the official announcement:

To celebrate ‘GDC 25′, the conference organizers have appointed an official historian for the show in the form of noted technology archivist Jason Scott, known for his Textfiles.com digital archive and his history of preserving important digital artifacts.

Scott, who has created the BBS Documentary and the just-debuted interactive fiction documentary Get Lamp, will be in charge of receiving and synthesizing historical accounts, anecdotes and other media from GDC attendees, and digitizing extensive printed, audio and video archives.

He will be posting a twice-weekly blog post on the official GDC website news page [RSS feed] and sister outlets starting in early November, revealing exclusive videos and audio lectures, stored on the UBM TechWeb Game Network’s GDC Vault website, alongside other images and analysis from the history of the event.

Alongside this announcement, GDC organizers – part of the Game Network, as is this website – and Scott are calling for submissions from people who’ve attended the show over the past 25 versions. They’ve set up an official email address, gdc25@gdconf.com, which CGDC and GDC attendees can email with anecdotes and reminiscences of attending previous GDC shows.

In addition, if previous attendees have content from classic shows they’d like to share, please tag photos as ‘gdc25′ on Flickr or upload videos to Vimeo or YouTube, and email the official ‘gdc25′ mail address.

As it says, I am going to be posting updates of the incoming pile on their official weblog for the GDC (Game Developers’ Conference, in case you’d not picked that up) and not here. However, I am allowed to keep copies of what I scan and digitize and even host stuff that I think is relevant.

It’s a paid position, but I’m not being feedy-hand-bitey when I say it’s not enough to live on – it’s a fee for a part-time contractor to do this work. I would like to get a couple assignments or jobs that could help me bring in income, because to be honest things are on a very slow downward spiral financially. I know that sounds surprising, but GET LAMP and BBS Documentary are high-quality projects – they cost money to print (and more have to be printed) and I do actually, you know, travel and buy technology to support my work. I just mention this in case anyone wanted to talk to me about possible employment, part-time or other… I’m around, people.  Let’s do lunch.

What’s important about this role is very obviously not the money. It’s that I am truly and honestly doing work in the field I wanted to switch to, computer history. I’m doing it for something very interesting, and I am being given an opportunity to prove myself in this task. It is challenging, it is fun, and it will be a great time telling the stories coming to me. Not to mention all the artifacts that people are getting inspired to tell me about. This is fantastic.

This very simply would not have happened without the Sabbatical. Absolutely not. I never get tired thanking the people who gave towards that funding, because I never stop benefiting from it. Thank you, people.

Now, let’s archive some GDC!

A Softwear Archive

Sometimes, when you’re an archivist of my stripe, your tools don’t just include computers – sometimes they include something like this:

And really, you better have a truck when you drive down to the local post office with your “fuck if we’re going to deliver this” yellow slip and come up against this waiting pile:

You see, not all the history is downloadable, not all of it comes on hard drives or in a small package. Sometimes, it comes in a very large pack indeed.

So, what’s in this major haul of boxes? Well, it’s going to be pretty easy to explain and perhaps somewhat difficult to hear me out.

A while ago, Randal Schwartz of Perl fame announced that he was leaving his current home of many years and setting off on a new life, and along the way he would be discarding a lot of his old material to lighten himself up. So out would go the trappings, to the dump or friends who were buying things, and then he’d be all set for a different way of living.  Along the way, he took a photo and said “look at all this stuff” in his twitter stream. Another person, Daniel Packer, suggested that I be contacted.  Things being what they are, discussing contacting me in a public space immediately contacts me, so I hopped in and offered to pay postage.

And that is how I ended up with 22 years of computer conference T-shirts.

So, I guess for a certain segment of the population, the news of this is sufficient, but to another, perhaps larger segment, the question is why the fuck do you want someone’s laundry?

Well, let me first say that the vast majority of these shirts, possibly all of them, have never been worn. They were given as prizes or gifts because he’s Randal Schwartz and Perl is Cool and so Randal got them for free.  So get your nose out of the gutter.

But what attracts me to this is that these are an easily collectable slice of computer history and cultural context.  The shirts are printed for all sorts of reasons, and provided with all sets of expectations and goals.  The collection, as it is, gives you a glimpse of the last 20 years of computing that later times might really want. It is going to be relatively trivial to photograph these, list them, package them up, and then have them available for the future.

So here’s some off the cuff shots of these shirts. I don’t have time at this moment to really catalog them all, but maybe you can see where I’m going.

2010-10 052 2010-10 054 2010-10 055 2010-10 056 2010-10 057 2010-10 060 2010-10 062 2010-10 066 2010-10 061 2010-10 063 2010-10 064 2010-10 065 2010-10 067 2010-10 068

The hardest part is done – these shirts have been rescued from recycling and disappearing. Now we’ll see what comes of this.

Thanks, Dan and Randal.

Archiveteam! The Geocities Torrent

Well, here we are on October 26th, 2010.

Can it really be a year ago that Archive Team had dozens of people assaulting Yahoo’s servers desperately trying to save disappearing history? Well, let’s be frank — not disappearing history, but in fact history being actively and quickly destroyed on purpose.  I mean, it’s not like Yahoo! had some sort of terrible server failure or something. They in fact had made the active decision to turn off the site called Geocities, an at-that-point 15 year old hosting site that contained terabytes of user-generated content.

Oh, we were having a great time one year ago – rushing around from this server to that, faking the Googlebot user agent string, bringing our full downloading power to bear. At one point we were well past 100 megabits of bandwidth yanking onto all our archives. As October 26th leaked into the 27th, we watched as site after site disappeared. Sites that were, in the vast majority of cases, less than 10 megabytes. Remember the last time 10 megabytes mattered?

Well, apparently it mattered enough to Yahoo! to decide to kill off Geocities across a couple days, after announcing somewhat quietly that all that data was going away. The usual sarcastic-hand-wringing and point-and-laugh ensued from popular press. “Remember Geocities?” and “Good Riddance” were the order of the day. So it came as a surprise to some that Archive Team thought all of this worth saving – by any means necessary.

What we were facing, you see, was the wholesale destruction of the still-rare combination of words digital heritage, the erasing and silencing of hundreds of thousands of voices, voices that representing the dawn of what one might call “regular people” joining the World Wide Web. A unique moment in human history, preserved for many years and spontaneously combusting due to a few marks in a ledger, the decision of who-knows for who-knows-what.

Well, actually we do know what – it was to show that Yahoo!, after purchasing Geocities for nearly $3 Billion Dollars With a B, was cutting costs for the 2009 financials.  Faced with a lingering, saddened death, new management sought to save money where it could, and projects unshielded by internal advocates were thrown out with the bathwater. (And the bathtub, and probably a number of unused plumbing supplies filling one of the back offices). The amount saved? Probably very little – the servers ran themselves (it appears there was no actual team assigned to Geocities beyond maintenance for the last year of its life) but by saying that something that was there was no longer there, the illusion of progress could appear.  So an announcement happened, and then over the next few months, the death march continued, until October 26, 2009 fell and with it the sunset of Geocities.

Of course, Yahoo! might have tried spinning off the company, but it doesn’t appear to be the case that Yahoo! knows how.  So death appeared to be the only option, since shutting down Yahoo! properties was “in” that year.

But you see, websites and hosting services should not be “fads” any more than forests and cities should be fads – they represent countless hours of writing, of editing, of thinking, of creating. They represent their time, and they represent the thoughts and dreams of people now much older, or gone completely.  There’s history here. Real, honest, true history.  So Archive Team did what it could, as well as other independent teams around the world, and some amount of Geocities was saved.

How much? We’ll never know. One of the Archive Team members called Yahoo! to find out the size and was rebuffed. When we called later in the year to ask exactly when the site was going down on October 26th, we were told that the person who spoke to us last had been let go. It must be like spring break down at that place.

But we know we got a bunch of Geocities sites – a significant percentage, especially of earlier, pre-acquisition data. We archived it as best we could, we compared notes, we merged and double-checked and did whatever needed to be done with what we happened to have.

So now, on this one-year anniversary, Archive Team announces that we are going to torrent it.

YES THAT IS RIGHT, WE ARE RELEASING GEOCITIES ON A TORRENT.

This is going to be one hell of a torrent – the compression is happening as we speak, and it’s making a machine or two very unhappy for weeks on end. The hope had been to upload it today, but the reality is this is a lot of stuff – probably 900 gigabytes will be in the torrent itself. It’s not perfect, it’s not all – but it’s something.

Who will want this? Anyone who feels like browsing among the artifacts of yesterday, who wants some data to play with, who is doing research into history, who wants to get some mileage out of a few weblog postings of crazy glittery animated GIFs and MIDI music. It’s not for everyone. Some people will probably grab a few files out of the thousands of archives in the torrent, unhook and call it a day. Others will want all of it, every last bit, to put onto their $80 1TB hard drive they bought down at the local computer mart.

UPDATE: The compressed archive is 652 gigabytes, and you can stop down at that famous computer history site The Pirate Bay and get the torrent.

While it’s quite clear this sort of cavalier attitude to digital history will continue, the hope is that this torrent will bring some attention to both the worth of these archives and the ease at which it can be lost – and found again.

Clear your disk space – this one’s going to be a doozy.

FURTHER UPDATE: There’s an update on the status of the torrent on this entry.