ASCII by Jason Scott

Jason Scott's Weblog

Archiveteam! The Geocities Torrent —

Well, here we are on October 26th, 2010.

Can it really be a year ago that Archive Team had dozens of people assaulting Yahoo’s servers desperately trying to save disappearing history? Well, let’s be frank — not disappearing history, but in fact history being actively and quickly destroyed on purpose.  I mean, it’s not like Yahoo! had some sort of terrible server failure or something. They in fact had made the active decision to turn off the site called Geocities, an at-that-point 15 year old hosting site that contained terabytes of user-generated content.

Oh, we were having a great time one year ago – rushing around from this server to that, faking the Googlebot user agent string, bringing our full downloading power to bear. At one point we were well past 100 megabits of bandwidth yanking onto all our archives. As October 26th leaked into the 27th, we watched as site after site disappeared. Sites that were, in the vast majority of cases, less than 10 megabytes. Remember the last time 10 megabytes mattered?

Well, apparently it mattered enough to Yahoo! to decide to kill off Geocities across a couple days, after announcing somewhat quietly that all that data was going away. The usual sarcastic-hand-wringing and point-and-laugh ensued from popular press. “Remember Geocities?” and “Good Riddance” were the order of the day. So it came as a surprise to some that Archive Team thought all of this worth saving – by any means necessary.

What we were facing, you see, was the wholesale destruction of the still-rare combination of words digital heritage, the erasing and silencing of hundreds of thousands of voices, voices that representing the dawn of what one might call “regular people” joining the World Wide Web. A unique moment in human history, preserved for many years and spontaneously combusting due to a few marks in a ledger, the decision of who-knows for who-knows-what.

Well, actually we do know what – it was to show that Yahoo!, after purchasing Geocities for nearly $3 Billion Dollars With a B, was cutting costs for the 2009 financials.  Faced with a lingering, saddened death, new management sought to save money where it could, and projects unshielded by internal advocates were thrown out with the bathwater. (And the bathtub, and probably a number of unused plumbing supplies filling one of the back offices). The amount saved? Probably very little – the servers ran themselves (it appears there was no actual team assigned to Geocities beyond maintenance for the last year of its life) but by saying that something that was there was no longer there, the illusion of progress could appear.  So an announcement happened, and then over the next few months, the death march continued, until October 26, 2009 fell and with it the sunset of Geocities.

Of course, Yahoo! might have tried spinning off the company, but it doesn’t appear to be the case that Yahoo! knows how.  So death appeared to be the only option, since shutting down Yahoo! properties was “in” that year.

But you see, websites and hosting services should not be “fads” any more than forests and cities should be fads – they represent countless hours of writing, of editing, of thinking, of creating. They represent their time, and they represent the thoughts and dreams of people now much older, or gone completely.  There’s history here. Real, honest, true history.  So Archive Team did what it could, as well as other independent teams around the world, and some amount of Geocities was saved.

How much? We’ll never know. One of the Archive Team members called Yahoo! to find out the size and was rebuffed. When we called later in the year to ask exactly when the site was going down on October 26th, we were told that the person who spoke to us last had been let go. It must be like spring break down at that place.

But we know we got a bunch of Geocities sites – a significant percentage, especially of earlier, pre-acquisition data. We archived it as best we could, we compared notes, we merged and double-checked and did whatever needed to be done with what we happened to have.

So now, on this one-year anniversary, Archive Team announces that we are going to torrent it.

YES THAT IS RIGHT, WE ARE RELEASING GEOCITIES ON A TORRENT.

This is going to be one hell of a torrent – the compression is happening as we speak, and it’s making a machine or two very unhappy for weeks on end. The hope had been to upload it today, but the reality is this is a lot of stuff – probably 900 gigabytes will be in the torrent itself. It’s not perfect, it’s not all – but it’s something.

Who will want this? Anyone who feels like browsing among the artifacts of yesterday, who wants some data to play with, who is doing research into history, who wants to get some mileage out of a few weblog postings of crazy glittery animated GIFs and MIDI music. It’s not for everyone. Some people will probably grab a few files out of the thousands of archives in the torrent, unhook and call it a day. Others will want all of it, every last bit, to put onto their $80 1TB hard drive they bought down at the local computer mart.

UPDATE: The compressed archive is 652 gigabytes, and you can stop down at that famous computer history site The Pirate Bay and get the torrent.

While it’s quite clear this sort of cavalier attitude to digital history will continue, the hope is that this torrent will bring some attention to both the worth of these archives and the ease at which it can be lost – and found again.

Clear your disk space – this one’s going to be a doozy.

FURTHER UPDATE: There’s an update on the status of the torrent on this entry.


Categorised as: computer history

Comments are disabled on this post


94 Comments

  1. Geocities content gets better with age. Thank you for preserving an important part of web history.

  2. PatentlyAbsurd says:

    Good job guys. I find it hard to believe though that there wasn’t *somebody* at Yahoo who was like, “guys, can’t we spare a few hard-drives to archive this, you know… just in case?”.

    If not, they’re all bleeping assholes.

  3. none says:

    I would happily torrent this and seed it for years. Please get it up asap so we can all get our nostalgia on.

  4. M says:

    Wow. It is so exciting to see your commitment to archiving the earliest days of the www.

  5. Im curious – 900gb decompressed? Im guessing compressed would be substantially smaller?

  6. Chris says:

    Thanks! I’m going to load it up behind a 1TB thinly provisioned LUN, with deduplication turned on to see how big it really is. Then I’m going to resurrect it on facebook by running some perl against it to start creating new facebook pages one for one from the old sites! FACEBOOK WILL LOVE ME, I CANT WAIT!!!

  7. Steve says:

    Thanks so much for doing this Jason! I’ve been checking the news coverage of this, and articles are all very upbeat and positive about this- it’s incredibly different from the coverage last year. Maybe absence does make the heart grow fonder?

  8. Used Servers says:

    Cant wait to get the torrent… find some of my first sites. Will have to speak to my attorney about putting it up from our hosting for people to enjoy.

  9. Real History is fragile. Thank you for going through this effort.

  10. jas says:

    please split it up and allow users to pick and choose what they want in each torrent (within reason)

  11. Scott6 says:

    FUCK YEAH. YOU GUYS ARE AWESOME!

  12. bbstorrents says:

    It is ONE torrent, but you are able to pick and choose what you download/seed
    that will probably help a lot concerning how huge this torrent is.

    right now the torrent is in the red and says no connection could be made because
    the tracker is refusing

    hope it ends up working

    thanks for this,

  13. bbstorrents says:

    update: i enabled DHT and it’s sending at max speed now

  14. Acidus says:

    DHT is working even if the trackers aren’t. However seed propagation is pretty slow. Right now myself and all my peers are stuck at 3.7% availability

  15. dubstep says:

    Im curious – 900gb decompressed? Im guessing compressed would be substantially smaller?

  16. […] now, it appears that a big project to rescue the entire Geocities world has released it as a torrent weighing in at 652 gigabytes. Perhaps the whole thing can be hosted somewhere. I hope so – I never had a Geocities page (I was […]

  17. […] worth reading the blog post by the folks who did this explaining why they did it, noting how little people realized that this was basically erasing digital history and culture: […]

  18. bbstorrents says:

    “AcidusNo Gravatar wrote:

    DHT is working even if the trackers aren’t. However seed propagation is pretty slow. Right now myself and all my peers are stuck at 3.7% availability”

    add the tracker url to the torrent

    udp://tracker.openbittorrent.com:80/announce

    don’t know if that will help much though, like you said, seeding isnt so hot.

    people say they will host it some place and seed it forever, but what folks say and what they do are two different things.

  19. Richard Wheston says:

    So I guess this means we know how big the Internet is–900 GB!

  20. Vedetta says:

    I’m downloading at a very decent speed. P2P file sharing is very useful. Thank you for archiving and sharing.

  21. It’s useless if it isn’t on web.archive.org

  22. echao says:

    It’s nice to see folks are interested in keeping a snapshot around.

  23. Quentin says:

    Great work !
    2 questions:
    -Do you know if it’s legal or not to release part of this archive on a public website, and make it browsable like it was on Geocities ?
    -Is there any explanation of the process to grab such a big amount of data ? I guess HTTrack is .. undersized for this kind of project ;)
    Best,
    Quentin

  24. mmmm says:

    Mmmmm, my deleted 1st homepage was in there!!!!!!!!!!!!! Bloody yahoo!!

    Yes I am gonna buy a 1T drive, there where many cool websites at geocities, apart from the ‘romantic’ concept of cities, areas, etc… Damn, I am crying as I write….

  25. Dan Anos says:

    The torrent only loads to 23%, but I managed to salvage 104 sites out of the partial zip files:

    Please see http://networkprogramming.wordpress.com/2010/11/17/saving-geocities/ for the list.

  26. Dragan says:

    Please, archiveteam, continue to seed the torrent. I am stuck at 44.63% for days. And i bought a 2 TB drive (so i also have space to UNPACK all of this)! Please!!!1

  27. bzodd says:

    so that’s it? stucked at 44.6 % for some days now … hope it’ll go on soon.

  28. Paul says:

    It is now up to 53.5, and no longer stuck.

  29. war8200 says:

    The torrent is still stuck, now at 351GB, will it be seeded again so people can complete the download?

  30. kevo101 says:

    I cant download a 700gig torrent my isp
    will throttle me to forever.
    You should sell this loaded on new 1Tb hd’s
    and just add a few dollars to the cost of the hd. I would buy it

  31. Joerg says:

    I got the file de.geocities.com – now it’s completed. How can i use this file? “It’s not a valid WIN32-application. “

  32. Joerg says:

    OK … tried to use 7zip … and it works :)

  33. Dragan says:

    So, i gots a bit more of the torrent, however the availability is only at 53.54% … plz seeeeeeed!

    Make this work, not just conceptual!

  34. Starfury says:

    I promise to seed a full year or more when this download completes… Stuck for weeks at 53% :-(
    Please seed; I’ll help to spread this great piece of work!

  35. Dragan says:

    SEEEEEEEEEED

    Is there anybody who managed to download the complete torrent yet?

  36. Richard says:

    Hi guys!
    I hope someone in possession of the entire file is willing to seed, because it seems like everybody is stuck at 53.5%….

  37. Dragan says:

    SEEEEEEEED … the whole Internet is stuck at 53.5 %

  38. Joel says:

    Please somebody with the full torrent seed! as the previous comments we are all stuck at 53%.

  39. bzodd says:

    Where is the problem? Such a tremendous announcement and the torrent always gets stucked for weeks. This is ridiculous. Get the seeders up and running then it’ll spread.

    Damn it … It’s disappointing.

  40. […] them before they get pulled down forever.  They received significant coverage recently when they archived a 900GB chunk of Geocities before Yahoo! gave it the axe in 2010.  Even more recently, an anonymous geek (or group?) archived […]

  41. elestirmen says:

    is this complete version?

  42. jas says:

    why are some geocities sites still up?
    http://www.geocities.com/~budallen/98reskit.html