ASCII by Jason Scott

Jason Scott's Weblog

Archiveteam! The Geocities Torrent —

Well, here we are on October 26th, 2010.

Can it really be a year ago that Archive Team had dozens of people assaulting Yahoo’s servers desperately trying to save disappearing history? Well, let’s be frank — not disappearing history, but in fact history being actively and quickly destroyed on purpose.  I mean, it’s not like Yahoo! had some sort of terrible server failure or something. They in fact had made the active decision to turn off the site called Geocities, an at-that-point 15 year old hosting site that contained terabytes of user-generated content.

Oh, we were having a great time one year ago – rushing around from this server to that, faking the Googlebot user agent string, bringing our full downloading power to bear. At one point we were well past 100 megabits of bandwidth yanking onto all our archives. As October 26th leaked into the 27th, we watched as site after site disappeared. Sites that were, in the vast majority of cases, less than 10 megabytes. Remember the last time 10 megabytes mattered?

Well, apparently it mattered enough to Yahoo! to decide to kill off Geocities across a couple days, after announcing somewhat quietly that all that data was going away. The usual sarcastic-hand-wringing and point-and-laugh ensued from popular press. “Remember Geocities?” and “Good Riddance” were the order of the day. So it came as a surprise to some that Archive Team thought all of this worth saving – by any means necessary.

What we were facing, you see, was the wholesale destruction of the still-rare combination of words digital heritage, the erasing and silencing of hundreds of thousands of voices, voices that representing the dawn of what one might call “regular people” joining the World Wide Web. A unique moment in human history, preserved for many years and spontaneously combusting due to a few marks in a ledger, the decision of who-knows for who-knows-what.

Well, actually we do know what – it was to show that Yahoo!, after purchasing Geocities for nearly $3 Billion Dollars With a B, was cutting costs for the 2009 financials.  Faced with a lingering, saddened death, new management sought to save money where it could, and projects unshielded by internal advocates were thrown out with the bathwater. (And the bathtub, and probably a number of unused plumbing supplies filling one of the back offices). The amount saved? Probably very little – the servers ran themselves (it appears there was no actual team assigned to Geocities beyond maintenance for the last year of its life) but by saying that something that was there was no longer there, the illusion of progress could appear.  So an announcement happened, and then over the next few months, the death march continued, until October 26, 2009 fell and with it the sunset of Geocities.

Of course, Yahoo! might have tried spinning off the company, but it doesn’t appear to be the case that Yahoo! knows how.  So death appeared to be the only option, since shutting down Yahoo! properties was “in” that year.

But you see, websites and hosting services should not be “fads” any more than forests and cities should be fads – they represent countless hours of writing, of editing, of thinking, of creating. They represent their time, and they represent the thoughts and dreams of people now much older, or gone completely.  There’s history here. Real, honest, true history.  So Archive Team did what it could, as well as other independent teams around the world, and some amount of Geocities was saved.

How much? We’ll never know. One of the Archive Team members called Yahoo! to find out the size and was rebuffed. When we called later in the year to ask exactly when the site was going down on October 26th, we were told that the person who spoke to us last had been let go. It must be like spring break down at that place.

But we know we got a bunch of Geocities sites – a significant percentage, especially of earlier, pre-acquisition data. We archived it as best we could, we compared notes, we merged and double-checked and did whatever needed to be done with what we happened to have.

So now, on this one-year anniversary, Archive Team announces that we are going to torrent it.

YES THAT IS RIGHT, WE ARE RELEASING GEOCITIES ON A TORRENT.

This is going to be one hell of a torrent – the compression is happening as we speak, and it’s making a machine or two very unhappy for weeks on end. The hope had been to upload it today, but the reality is this is a lot of stuff – probably 900 gigabytes will be in the torrent itself. It’s not perfect, it’s not all – but it’s something.

Who will want this? Anyone who feels like browsing among the artifacts of yesterday, who wants some data to play with, who is doing research into history, who wants to get some mileage out of a few weblog postings of crazy glittery animated GIFs and MIDI music. It’s not for everyone. Some people will probably grab a few files out of the thousands of archives in the torrent, unhook and call it a day. Others will want all of it, every last bit, to put onto their $80 1TB hard drive they bought down at the local computer mart.

UPDATE: The compressed archive is 652 gigabytes, and you can stop down at that famous computer history site The Pirate Bay and get the torrent.

While it’s quite clear this sort of cavalier attitude to digital history will continue, the hope is that this torrent will bring some attention to both the worth of these archives and the ease at which it can be lost – and found again.

Clear your disk space – this one’s going to be a doozy.

FURTHER UPDATE: There’s an update on the status of the torrent on this entry.


Categorised as: computer history

Comments are disabled on this post


94 Comments

  1. Chris Blow says:

    Dear god man … it’s … genius.

  2. Chris Barts says:

    The sense of scale is staggering, both in terms of how many HTML files, image files, and so on that represents, how much 900 GB would have seemed to people in Geocities’ heyday, and in terms of how cheap it is to store, transmit, and generally deal with 900 GB of data these days.
    But this has been on my mind. I just got a 2 GB MicroSD card that doesn’t even cover the last part of my pinkie and is thin enough I’m justifiably afraid of snapping it in half every time I handle it. I could fit it up my nose. It cost $19.99.

  3. For the United Network of Newton Archives I preserved a few important Newton-related sites in the last few months over at http://mirrors.unna.org/www.geocities.com/. I’d love to grab the torrent and see if there’s anything I missed or if there are any gaps I could be filling, but 900GB, that’s hefty. That said, 900GB shouldn’t have cost that much for Yahoo! to keep live.

  4. Jake McGraw says:

    Great news and thanks for taking the time to make this publicly available. I hope you guys are prepping for the eventual wholesale destruction of the various also-ran social networks (MySpace, Bebo, Friendster). It’ll be a scary day when they announce their eventual closing.

  5. AA says:

    Hi,

    Will you be dividing the torrent based on sub-domains? Say, br.geocities.com, pt.geocites.com… ?

  6. Benj Edwards says:

    Please archive Flickr and eBay next. I’ll be donating hard drives to the cause.

  7. Jim Green says:

    Any possibility of getting it posted to Usenet for us oldskoolers?

  8. Cait says:

    Huge round of appause.

    Well done, all of you. This is a brilliant, brilliant thing.

  9. NeoGeo says:

    Any chance of burning it to DVD’s and selling copies to those of us who cannot easily download that large a file? I would pay for the cost of the DVD’s and effort involved!

  10. Griff says:

    I’d like a copy of the Internet. I brought a hard drive…

    But really, thanks for preserving that stuff. Many problem I had with software was fixed by someone’s geocities site. Yahoo should have just put it on static standby. If you do torrent it, it looks like I’m off to Micro Center for yet another hard drive.

  11. Jonathan says:

    For a variety of reasons, using torrents for distribution would make a lot more sense to break it up into smaller chunks, say 50-100GB each chunk. (hash checks, client capabilities, download time, etc).

    Ideally, you would also make the files inside the torrent as granular as possible, so downloaders can select in the client what content they want if they do not want all 900GB.

  12. I remember setting up one of my first websites on AOL hometown back in 1998. Geocities was the first place to let me upload 400kb zip files for some mods to a game. It is unbelievable that some nitwit from Yahoo! thought deleting all of this made sense… I could pay $100 a month to have all of this hosted with plenty of bandwith. I find it hard to believe that with the proper advertisement that running geocities cost them enough money to make it worth closing. I just wish they would have looked into other options or worked with us to archive it rather than being sneaky about it.

    Then again I remember when Yahoo! stood a chance against google as a search engine…

  13. Stephica says:

    Thank you so much for doing this. I think that ten to twenty years from now the majority of people will finally appreciate what you have done.

  14. John Blackmon says:

    I’ll put it right next to my full Gopherspace archive from 2007.

  15. Wonderful, thanks. Any chance you can get together with archive.org and put it all online?

  16. lazykite says:

    Sell 1T hard drives with the data on there. ;)

  17. Plux says:

    NeoGeo: Sure, it will only be around 110 disc’s for dual layer, and around 200 for single-layer discs. :) So, say 300 usd + shipping for single layer, and about 800 usd for dual-layer + shipping. And say well about 10-16 hours burning them…

  18. Mike says:

    That’s a whole lotta Under Construction GIFs.

  19. Nocturne9 says:

    This is quite wonderful!!! Artifacts of history indeed!!!

  20. anonymous says:

    Good riddance. Fuck geocities and almost all user generated content.

  21. jsls1976 says:

    @NeoGeo, that would make almost 200 DVDs… probably cheaper and more practical to put those on an external HD and sell it.

  22. Brokengoose says:

    This would take quite a few DVDs. DVD-R DL stores a bit less than 8GB each, so you’d need more than 100. I suspect that it would be cheaper and easier to just sell pre-loaded 1TB external drives.

    If they break the torrent down by subdomains, many torrent clients will allow you to download just the files that interest you.

    It might make sense to release the collection as it’s encoded. That way, the earlier bits will be well-seeded, and bandwidth will be freed for the later bits.

  23. STiger says:

    Man… I would love to look at that… Geocities was like a whole ‘nother realm of the Internet I didn’t even really know about until late into its life.

  24. rane says:

    Talking about wasted bandwidth.

  25. Recursive says:

    Who will archive you when you go down?!

  26. […] Documentary” creator and digital historian Jason Scott and friends have done what they could to keep some of the Yahoo-shuttered Geocities from the dustbin of history. Dropbox founder Drew Houston explains how search ads and public relations ballooned his customer […]

  27. heretoo says:

    too bad blueray costs so much..

  28. Marius Loots says:

    Thanks for your efforts! I got the note from Yahoo and pulled most of what I could from my Geocities account. Still need to find time to put it up somewhere else.

  29. Mark says:

    You have just time traveled to the internet as it existed 15 years ago.

    Obnoxious backgrounds, blinking fonts, animated gifs, centered text, and tables all the way across the sky. This is what much of the internet looked like before CSS became standard and accessible.

  30. Anne says:

    If I could, I’d definitely buy a preloaded 1TB HD with Geocities on it. It’s doubtful I could torrent this awesome collection.

  31. P Curry says:

    You’re going to have legal problems at some point, somebody somewhere will have copyright issues. OR other issues… I documented 2 decades of civil rights violations in the South, without a thanks, and some people are glad it is gone. But I will be watching to see what happens, unlike anand, I never made any money from it. I am taking what I started there to go commercial. It was a wonderful idea they could have built on and I was surprised they just didn’t charge fees and try more innovative marketing. It was lazy of them. I miss many of the sites. Somebody should have built a crossover platform, listing the old geocities cites and where they could be found now.

  32. ike says:

    is it possible to request you NOT upload a particular geocities page?

    i had to contact reocities and oocities seperately to get a page i created years ago deleted… (i had deleted the index file so there was no way of reaching it..)

    are you giving authors of pages a similar option before you release the torrent?????

  33. I don’t plan downloading the file, but it’s quite nice to know it’ll be out there when and if I need it. Thank you very much!

    I curious about one thing: I guess a lot of files are repeated given how certain “standard” gifs (“Best viewed with Netscape!” comes to mind) were spread around back then. Have you somehow deduplicated those files by, I dunno, making a “/commonfiles” folder, putting a single copy of each such file there, and changing the original “src=” and “href=” to point there, to help reduce the torrent size? Or will it contain all their many individual copies?

  34. S1Lv3R says:

    splitting the torrent up would mean peers who download only parts, and thus in the end lack of sources who offer the whole package.. i guess i’ll be shelling out those 60 EUR for a external 1TB drive to archive it… together with a httpd this will be my ‘internet-on-the-go’ heh.. great work, guys

  35. you’ve done a wonderful thing

  36. Votre says:

    Did you guys & gals ever consider offering it fir sale on an external HD?

    Much as it pains me to say this, I’m not anxious to piss off my ISP by torrenting a 1TB download/share.

  37. Eric Mesa says:

    As someone who was deep into geocities, tripod, and angelfire back in the day – thank you.

  38. Aurora says:

    Thank you.

  39. I think this is great news. We need to have a historical reference to the “early” days of the net and like it or not Geocities was a big part of that in the ’90s.

    A thing that’s always bothered me about storing data online is that you never know how long that particular service is going to be around. I mean, if someone like Yahoo is pulling the plug on one of their services, what reassures you that any other service you use will always be online?

  40. Michel says:

    Awesome, my crappy first stabs at writing HTML in Notepad as a kid must be in there somewhere :D Does anyone know what will happen to the geocities.com domain? It would be so cool if Yahoo would be willing to give it up to people who plan to restore Geocities :)

  41. Franky says:

    Actually, what I always wondered is, how copyright acts on this whole matter. I mean the material you copy (and you copied the whole design, etc, not only the infromation) was made and thus belongs to innumerable people.

  42. Rendus says:

    Remember, this isn’t all of Geocities – it isn’t even all of Geocities’ linked-to content. Just what they could grab. Who knows how many hundreds of GB of files were stored that were never linked to – warez, porn, personal pictures, eBay photos, etc.

  43. Punny Fun says:

    @Chris Blow

    “Oh my god, it’s full of stars”

    There, FTFY

  44. Andrew Kent says:

    Certainly wouldn’t mind this torrent, maybe one day I’ll figure out what the address was to my old page… I moved to xoom.net early on for the FTP access though and small frame instead of required banner ad placement, if only my back ups had survived it was a pretty good site considering what other people were doing for personal sites on the freebie services.

  45. archive.org did crawl what they could of the geocities space. it has been rolled into the wayback machine.

    http://www.archive.org/web/geocities.php

  46. Phil Howell says:

    This is awesome. Definitely let me know when it’s up – I’d be interested in hosting/mirroring the sites.

  47. cryptoknight says:

    this is what i’m talking about. i can’t wait to see the awesomeness of all these geocities pages from past times. these need to be preserved. think of all the insight that would be lost otherwise. all the communities within that community itself. all the abandonware haha. cant wait to start torrenting

  48. padruig says:

    Thank you for helping archive an often overlooked and under-appreciated aspect of computing history.

  49. Yeah, I do want this, and already shot an email off. When can we expect the torrent? I’d love to piss my ISP off :P

    After all, I’m on a 50 Mbit line and will seed for quite a long time. Hooray for giant torrents :P