The Geocities-is-going-away thing broke wide a short while ago. The “Jason is Saving Geocities” thing is breaking wider by the day, so I guess we need an update.

After my initial call-out, a nice selection of folks showed up to the Archive Team IRC channel, ranging from the offering of bandwidth and disk space or simply moral support and coding. We’ve been downloading at an enormous rate, probably along the lines of a gigabyte a half-hour of Geocities, through all our different vectors.  Because we’re talking literally millions of files with an average size of 1 to 30 kilobytes, it becomes harder and harder to get a “big picture” view of everything we’ve grabbed, but after 48 hours of work, Archive Team has saved over 200,000 Geocities sites. We’re now pulling in new sites at the rate of something like 5 a second. Is that fast enough? We’ll see, won’t we.

Stuff like this filters around pretty quickly, because the concept is short (someone is mirroring geocities!) and I have an awful lot of verbiage out there about archiving and other general opinions. In other words, I know when something I’m doing gets attention because I start hearing an awful lot about King of Kong and Goatse. But let’s keep it on-point, shall we?

For all the lazyasses who are writing “I hope they back up my website too!” I can only say back up your own site, motherfucker. We’ll hopefully get it but we’re not a for-pay service or likely to be comprehensive. We’re targeting (or trying to target) sites where the persons behind them are dead or unseen for a decade, so just by saying you know of your site and are still around puts you in a lower priority.

A side-effect of the whole process is I now know way, way, way too much about Geocities than I ever expected to. We’ve had to dissect every aspect of how the site functions to understand how to mirror things, from its history through how it does crazy javascript ads. Some of it is stupid and some is hilarious, but this contextual bit is important to understanding the data we have. I’ll let you leaf off from here if that doesn’t interest you, but I want it down somewhere.

Geocities was once called Beverly Hills Internet. The company was founded in 1994 but it wasn’t until mid-1995 that they publically offered what people now think of as a Geocities trademark: free webpages, or “homesteads”.  Here’s an announcement of the program coming out of beta and being offerered generally in July of 1995.

The homesteading system is very hard to get across as a good idea, looking back, but I’m sure at the time it made sense. Instead of offering things as or (which was a sign of being UNIX derived), BHI (then renaming to Geopages, later Geocities) separated people into “Neighborhoods”. You’d have a neighborhood for science fiction, for movies, for technology. Your page would join a Neighborhood and you’d stay in theme – so your page on Star Trek would go into the Science Fiction neighborhood (called “Area51″), and you’d be a number on the “block”, like 4454.  I have a document written by “Blade” in which he painstakingly overviews all the neighborhoods, when they joined the fun (Area 51 joined in April of 1996) and the “suburbs”.

Suburbs? Well, the website/neighborhood/XXXX format was limited, so they added “suburb” directories, which then had their OWN block sets. So now you had two formats; the previously mentioned w/n/xxxx format, and a new one, which would yield URLs like

This is how things went for the next couple of years. There were a bunch of neighborhoods, all with a pile of suburbs, and then a bunch of numbers under that for the “blocks”. This scaled oddly, but it did in fact scale.

Then Yahoo bought Geocites for $3.5 billion dollars, which sounds like one of my usual dismissive throwaway numbers, but it really was that amount. Assuming this article is at all accurate, 200 of 300 Geocities employees were laid off, payment was in cash and stock (probably mostly stock), closer to 2.5 billion, and Yahoo simultaneously announced they were going to “fix” geocities to work in the Yahoo paradigm. The founders, as usual, were given new meaningless terms in the new monolith. Who drives into work happy that they get to be “senior vice president of industry relations” instead of CEO? Man, that makes a gun look tasty. Meanwhile, the remaining 100 employees appear to have been scattered to the winds, in various sales offices and several new Yahoo office buildings. Must have been awesome.

So then Yahoo started integrating Geocities into their blorb, which I’m sure was a engineering marvel and a wonder to behold; and here we have the third Geocities URL structure: This utterly broke the neighborhood/suburb model, although all indications are that it was starting to fall apart well before this acquisition, with the wrong types of people being slotted into neighborhoods it made no thematic sense to be in, like putting a biker bar in a gated community. Regardless, we now had three different settings, like strata in which to see the geologic time difference.

We’re pretty sure we have the first two completed. Again. WE THINK WE HAVE EVERY SITE FROM 1999 AND BEFORE ON GEOCITIES THAT WAS LEFT. (Update: My team is more inclined towards “most” than “all”.) We’re still running tests on this and likely some “hidden” material will still come to light, but we have enough that a historian could “get it” even if a completist or armchair archivist wouldn’t.

The number of total sites currently on Geocities is elusive. There were numbers bandied about between 1996-1999 of millions, with 3.5 million the largest number I could find. Bear in mind, however, that 1. Yahoo are fucking liars, 2. People who are about to be bought for billions of dollars might be inclined to be fucking liars, and 3. The press will often aid and abide fucking liars, sometimes intentionally, and sometimes not. But what is definitely clear is that Yahoo purged a lot. How much, again, unsure, but we have found one neighborhood (WallStreet, ha ha get your jokes in, comedians) that is utterly empty, as well as the holiday special NorthPole.  Gone, utterly.

Others are in better shape, with hundreds or thousands of sites left in them and their suburbs. Obviously if someone jams their secret mp3 they spent 3 hours calculating in 1998 in a place nobody ever found, then we won’t find it. But generally, stuff is being found. Rsync is a huge help here; we can liberally grab crap and make it “do the right thing” against the global list and collection.

I have only the merest of time for people (some friends) going “why even try saving geocities” so let’s instead move onto the other question I’m starting to get, which is “where can I get this”.

It is more important to me to grab the data than to figure out how to serve it later. People who have been talking about copyright and stuff seem to think I’m going to sell it or take credit or some crap. I don’t see how the final collection won’t end up online, but how is elusive – maybe a torrent of a bunch of zip files, or as a curated collection, or as a bunch of hard drives. However it is, I’ll make sure people can get it, somehow.

So there we go. It’s running fine, things are happening, and I’m sure in the time it took to write this we’ve grabbed another 5 or 10 thousand memories from the soon-to-be-gone Geocities. GO ARCHIVE TEAM GO.

