ASCII by Jason Scott

Jason Scott's Weblog

Geocities: Lessons So Far —

The Geocities-is-going-away thing broke wide a short while ago. The “Jason is Saving Geocities” thing is breaking wider by the day, so I guess we need an update.

After my initial call-out, a nice selection of folks showed up to the Archive Team IRC channel, ranging from the offering of bandwidth and disk space or simply moral support and coding. We’ve been downloading at an enormous rate, probably along the lines of a gigabyte a half-hour of Geocities, through all our different vectors.  Because we’re talking literally millions of files with an average size of 1 to 30 kilobytes, it becomes harder and harder to get a “big picture” view of everything we’ve grabbed, but after 48 hours of work, Archive Team has saved over 200,000 Geocities sites. We’re now pulling in new sites at the rate of something like 5 a second. Is that fast enough? We’ll see, won’t we.

Stuff like this filters around pretty quickly, because the concept is short (someone is mirroring geocities!) and I have an awful lot of verbiage out there about archiving and other general opinions. In other words, I know when something I’m doing gets attention because I start hearing an awful lot about King of Kong and Goatse. But let’s keep it on-point, shall we?

For all the lazyasses who are writing “I hope they back up my website too!” I can only say back up your own site, motherfucker. We’ll hopefully get it but we’re not a for-pay service or likely to be comprehensive. We’re targeting (or trying to target) sites where the persons behind them are dead or unseen for a decade, so just by saying you know of your site and are still around puts you in a lower priority.

A side-effect of the whole process is I now know way, way, way too much about Geocities than I ever expected to. We’ve had to dissect every aspect of how the site functions to understand how to mirror things, from its history through how it does crazy javascript ads. Some of it is stupid and some is hilarious, but this contextual bit is important to understanding the data we have. I’ll let you leaf off from here if that doesn’t interest you, but I want it down somewhere.

Geocities was once called Beverly Hills Internet. The company was founded in 1994 but it wasn’t until mid-1995 that they publically offered what people now think of as a Geocities trademark: free webpages, or “homesteads”.  Here’s an announcement of the program coming out of beta and being offerered generally in July of 1995.

The homesteading system is very hard to get across as a good idea, looking back, but I’m sure at the time it made sense. Instead of offering things as www.website/user or www.website/~user (which was a sign of being UNIX derived), BHI (then renaming to Geopages, later Geocities) separated people into “Neighborhoods”. You’d have a neighborhood for science fiction, for movies, for technology. Your page would join a Neighborhood and you’d stay in theme – so your page on Star Trek would go into the Science Fiction neighborhood (called “Area51″), and you’d be a number on the “block”, like 4454.  I have a document written by “Blade” in which he painstakingly overviews all the neighborhoods, when they joined the fun (Area 51 joined in April of 1996) and the “suburbs”.

Suburbs? Well, the website/neighborhood/XXXX format was limited, so they added “suburb” directories, which then had their OWN block sets. So now you had two formats; the previously mentioned w/n/xxxx format, and a new one, which would yield URLs like www.geocities.com/Area51/Neptune/XXXX.

This is how things went for the next couple of years. There were a bunch of neighborhoods, all with a pile of suburbs, and then a bunch of numbers under that for the “blocks”. This scaled oddly, but it did in fact scale.

Then Yahoo bought Geocites for $3.5 billion dollars, which sounds like one of my usual dismissive throwaway numbers, but it really was that amount. Assuming this article is at all accurate, 200 of 300 Geocities employees were laid off, payment was in cash and stock (probably mostly stock), closer to 2.5 billion, and Yahoo simultaneously announced they were going to “fix” geocities to work in the Yahoo paradigm. The founders, as usual, were given new meaningless terms in the new monolith. Who drives into work happy that they get to be “senior vice president of industry relations” instead of CEO? Man, that makes a gun look tasty. Meanwhile, the remaining 100 employees appear to have been scattered to the winds, in various sales offices and several new Yahoo office buildings. Must have been awesome.

So then Yahoo started integrating Geocities into their blorb, which I’m sure was a engineering marvel and a wonder to behold; and here we have the third Geocities URL structure: www.geocities.com/yahooid. This utterly broke the neighborhood/suburb model, although all indications are that it was starting to fall apart well before this acquisition, with the wrong types of people being slotted into neighborhoods it made no thematic sense to be in, like putting a biker bar in a gated community. Regardless, we now had three different settings, like strata in which to see the geologic time difference.

We’re pretty sure we have the first two completed. Again. WE THINK WE HAVE EVERY SITE FROM 1999 AND BEFORE ON GEOCITIES THAT WAS LEFT. (Update: My team is more inclined towards “most” than “all”.) We’re still running tests on this and likely some “hidden” material will still come to light, but we have enough that a historian could “get it” even if a completist or armchair archivist wouldn’t.

The number of total sites currently on Geocities is elusive. There were numbers bandied about between 1996-1999 of millions, with 3.5 million the largest number I could find. Bear in mind, however, that 1. Yahoo are fucking liars, 2. People who are about to be bought for billions of dollars might be inclined to be fucking liars, and 3. The press will often aid and abide fucking liars, sometimes intentionally, and sometimes not. But what is definitely clear is that Yahoo purged a lot. How much, again, unsure, but we have found one neighborhood (WallStreet, ha ha get your jokes in, comedians) that is utterly empty, as well as the holiday special NorthPole.  Gone, utterly.

Others are in better shape, with hundreds or thousands of sites left in them and their suburbs. Obviously if someone jams their secret mp3 they spent 3 hours calculating in 1998 in a place nobody ever found, then we won’t find it. But generally, stuff is being found. Rsync is a huge help here; we can liberally grab crap and make it “do the right thing” against the global list and collection.

I have only the merest of time for people (some friends) going “why even try saving geocities” so let’s instead move onto the other question I’m starting to get, which is “where can I get this”.

It is more important to me to grab the data than to figure out how to serve it later. People who have been talking about copyright and stuff seem to think I’m going to sell it or take credit or some crap. I don’t see how the final collection won’t end up online, but how is elusive – maybe a torrent of a bunch of zip files, or as a curated collection, or as a bunch of hard drives. However it is, I’ll make sure people can get it, somehow.

So there we go. It’s running fine, things are happening, and I’m sure in the time it took to write this we’ve grabbed another 5 or 10 thousand memories from the soon-to-be-gone Geocities. GO ARCHIVE TEAM GO.


Categorised as: computer history

Comments are disabled on this post


74 Comments

  1. Cassie ST says:

    My very first site was on geocities back in ’96. Not surprised to hear it’s finally bitten the dust. Thanks for the archive work on legacy accounts.

    May your “pay forward” work be rewarded tenfold!

    Cheers
    CST

  2. I’ve had a GeoCities site since February 1999, and yes, I remember the old address formats, as well as both of my old addresses (‘…/Area51/Capsule/8211/’ at first; I moved to ‘…/Area51/Lair/7946/’ a few months later). Many of my early web friends had sites there (one had written some not-too-bad Highlander fanfiction, much of which WebArchive has thankfully saved), and I used to browse the neighborhood blocks once in a while to see what else was around.

    It’s sad (but not too surprising) to hear GeoCities is closing soon. Great to hear about your archive effort, especially since the Internet Archive’s most recent archives of some pages are quite old, when they exist at all (e.g., anywhere from 2002 to 2007 for my own site, with some pages and many images missing). Nice to know that some things like my Asperger Essay might not drop off the face of the Earth if I do. :-)

  3. [...] I learned of Peter’s tale/fate, a story came out that Yahoo will pull the plug on GeoCities, and lots of great info might be lost. At the time, I thought it a minor, but the source page on Peter may disappear. Cycling back to [...]

  4. Megan says:

    This is very cool but I’m confused. Say I had a geocities site in 1999 and still have it until it goes away today, (which I do, unfortunately)…does that mean you would find the oldest version of the site that existed in 1999? I’m so curious. My geocities site today is still crap, but I have backups of it going back to 2000. It was in 1999 that the whole idea of backup eluded me because I was 15 LOL. Oh PageBuilder, how I don’t miss thee.

    • Jason Scott says:

      We are merely duplicating all of Geocities that we can find. Previous incarnations, or anything not left on Geocities, we don’t have. Others might. We’re just making a backup of the site before it disappears.

  5. Thanks fo all your work in savbing Geocities. I was rather fond pof the “neighborhood” concept and enjoyed strolling through those “neighborhood maps” they used to have, finding a 5th grade class’ website here, something about heraldry there, etc. I picked my street name because it sounded cool, and the number because it was similar to the house number where I grew up.

    http://www.geocities.com/athens/styx/3301

    I started several Geocities pages for different things, and some of them I still update and many are still active, attracting maybe a total of several hundred visitors daily. Their WISIWIG tools were easy to use, once you got the hang of it, their HTML interface even easier, and their site statistics tools were wonderful!

    I am currently looking for the best way to save it all myself that does not involve me copy-pasting the HTML to work files and downloading every single image (there are thousands!). Any recommendations?

  6. [...] First Tweet Apr 27, 2009 textfiles Jason Scott Highly Influential We think we have, possibly, all remaining Geocities sites from 1999 and before view retweet [...]

  7. [...] Scott continues to back up Geocities, and, in the process of doing this, has posted page-heaps of under construction and email icons. [...]

  8. Tony says:

    Can you help to archive these two sites and let me know their links accessible from the internet?
    http://www.geocities.com/tonyboneka (1999 until now)
    http://www.geocities.com/tonyphuah (2003 until now)
    Thanks in advance.

  9. [...] there are still old school heroes on the internet. One of which, Jason Scott, is trying fervently to back up as much of Geocities as possible before it closes. I can only wish [...]

  10. [...] that hierarchy proved limiting and then confusing, as neighborhoods expanded into blocks and suburbs. After a Yahoo buyout in 1999, the new [...]

  11. [...] and the Archive Team have started to archive Geocities and the progress is described in ‘Geocities: Lessons So Far.’ There are two great applications to backup your own, long forgotten, Geocities [...]

  12. [...] Scott and the Archive Team have created The GeoCities Project and are working non-stop trying to download as much of GeoCities as possible before it shuts [...]

  13. jk says:

    I have three questions:

    Just to be sure, are you copying old hobbyist sites or emulation/rom sites or anything that may be deemed ‘not important?’

    Are there any copies of the original web-based creation software? This would be exciting, but I don’t have high hopes as it was only accessed after logging in. And the background images and sample files, scripts etc…

    And what about downloadable content linked to pages, are these being archived?

    Thank you very much for your work.

  14. Erik says:

    What, in the end, killed geocities? The rise of the big social networks like Facebook?

  15. scott says:

    Thanks for backing them up. I started building websites in 1998 at the age of 12. I now run a web design company as my full time job. My first site was built on geocities and I’m glad to see it was backed up http://web.archive.org/web/*/http://geocities.com/extremescott

  16. [...] saved Geocities! Jason Scott gibt einen kurzen Lagebericht und erklärt, welche Lehren er bisher aus der Arbeit an dem Mammutprojekt gezogen hat. Die vom Archive Team verwaltete [...]

  17. Sam says:

    Hi guys, – Found you this morning after a quick search as to “Why” my geocities site was gone – when I went looking for some old “Army Pics” I had scanned in a long time ago… – I fall into the class of gone for 10 years – Problem: when Geocities sold out, and when my ISP closed, I lost two things. – my old e-mail address allowing me to reset my password, and the available – existant support staff to believe and help me reset my password. After that, I gave up, and it rotted there until it died… I did not even know they were closing, and I even told myself – I should probably go back up those old wedding, Army, and other photos before they someday disapear… – problem two. I got a few, but they would stop me downloading my own stuff due to bandwidth restrictions…
    Problem 3, they are now gone altogether..

    Thank you – HERO’s – for attempting to save what you can – I would love to get anything back some day if you ever get that far… If not, I will deal with it just like I did the hard drive crash 10 years ago where the originals were…

    I tend to be a bit of a junk-pile historian – hanging on to relics of my families past – I am the one who saved my great grandpa’s framing square that he built their shack int he hills with, and the cake plat from my grandmother… That is me… – I believe that someone will care about the ramblings, rants, and publishings of those original geocitiesers… I was one of them.

    Life will go on.

    Sam

  18. [...] Geocities: Lessons So Far Geocities was once called Beverly Hills Internet. The company was founded in 1994 but it wasn’t until mid-1995 that they publically offered what people now think of as a Geocities trademark: free webpages, or “homesteads”. [An article about the Archive Team trying to save Geocities content before Yahoo takes it down.] [...]

  19. Bravo to you for doing this! I’m sure it’s one *hell* of a task, but I guarantee that someday people will talk about how this treasure trove of stuff was almost lost, but thanks to your efforts it didn’t all just go *bloop* into the ether and disappear forever. It’s a sad indictment of Yahoo that they don’t care enough to save this early web material. Thank you for being willing to do it.

  20. [...] Stanza. • Anyone worried about the disappearance of GeoCities may have a new hero in Jason Scott, who – along with the Archive Team – is trying to download as much of the information on the [...]