ASCII by Jason Scott

Jason Scott's Weblog

Geocities: Lessons So Far —

The Geocities-is-going-away thing broke wide a short while ago. The “Jason is Saving Geocities” thing is breaking wider by the day, so I guess we need an update.

After my initial call-out, a nice selection of folks showed up to the Archive Team IRC channel, ranging from the offering of bandwidth and disk space or simply moral support and coding. We’ve been downloading at an enormous rate, probably along the lines of a gigabyte a half-hour of Geocities, through all our different vectors.  Because we’re talking literally millions of files with an average size of 1 to 30 kilobytes, it becomes harder and harder to get a “big picture” view of everything we’ve grabbed, but after 48 hours of work, Archive Team has saved over 200,000 Geocities sites. We’re now pulling in new sites at the rate of something like 5 a second. Is that fast enough? We’ll see, won’t we.

Stuff like this filters around pretty quickly, because the concept is short (someone is mirroring geocities!) and I have an awful lot of verbiage out there about archiving and other general opinions. In other words, I know when something I’m doing gets attention because I start hearing an awful lot about King of Kong and Goatse. But let’s keep it on-point, shall we?

For all the lazyasses who are writing “I hope they back up my website too!” I can only say back up your own site, motherfucker. We’ll hopefully get it but we’re not a for-pay service or likely to be comprehensive. We’re targeting (or trying to target) sites where the persons behind them are dead or unseen for a decade, so just by saying you know of your site and are still around puts you in a lower priority.

A side-effect of the whole process is I now know way, way, way too much about Geocities than I ever expected to. We’ve had to dissect every aspect of how the site functions to understand how to mirror things, from its history through how it does crazy javascript ads. Some of it is stupid and some is hilarious, but this contextual bit is important to understanding the data we have. I’ll let you leaf off from here if that doesn’t interest you, but I want it down somewhere.

Geocities was once called Beverly Hills Internet. The company was founded in 1994 but it wasn’t until mid-1995 that they publically offered what people now think of as a Geocities trademark: free webpages, or “homesteads”.  Here’s an announcement of the program coming out of beta and being offerered generally in July of 1995.

The homesteading system is very hard to get across as a good idea, looking back, but I’m sure at the time it made sense. Instead of offering things as www.website/user or www.website/~user (which was a sign of being UNIX derived), BHI (then renaming to Geopages, later Geocities) separated people into “Neighborhoods”. You’d have a neighborhood for science fiction, for movies, for technology. Your page would join a Neighborhood and you’d stay in theme – so your page on Star Trek would go into the Science Fiction neighborhood (called “Area51”), and you’d be a number on the “block”, like 4454.  I have a document written by “Blade” in which he painstakingly overviews all the neighborhoods, when they joined the fun (Area 51 joined in April of 1996) and the “suburbs”.

Suburbs? Well, the website/neighborhood/XXXX format was limited, so they added “suburb” directories, which then had their OWN block sets. So now you had two formats; the previously mentioned w/n/xxxx format, and a new one, which would yield URLs like www.geocities.com/Area51/Neptune/XXXX.

This is how things went for the next couple of years. There were a bunch of neighborhoods, all with a pile of suburbs, and then a bunch of numbers under that for the “blocks”. This scaled oddly, but it did in fact scale.

Then Yahoo bought Geocites for $3.5 billion dollars, which sounds like one of my usual dismissive throwaway numbers, but it really was that amount. Assuming this article is at all accurate, 200 of 300 Geocities employees were laid off, payment was in cash and stock (probably mostly stock), closer to 2.5 billion, and Yahoo simultaneously announced they were going to “fix” geocities to work in the Yahoo paradigm. The founders, as usual, were given new meaningless terms in the new monolith. Who drives into work happy that they get to be “senior vice president of industry relations” instead of CEO? Man, that makes a gun look tasty. Meanwhile, the remaining 100 employees appear to have been scattered to the winds, in various sales offices and several new Yahoo office buildings. Must have been awesome.

So then Yahoo started integrating Geocities into their blorb, which I’m sure was a engineering marvel and a wonder to behold; and here we have the third Geocities URL structure: www.geocities.com/yahooid. This utterly broke the neighborhood/suburb model, although all indications are that it was starting to fall apart well before this acquisition, with the wrong types of people being slotted into neighborhoods it made no thematic sense to be in, like putting a biker bar in a gated community. Regardless, we now had three different settings, like strata in which to see the geologic time difference.

We’re pretty sure we have the first two completed. Again. WE THINK WE HAVE EVERY SITE FROM 1999 AND BEFORE ON GEOCITIES THAT WAS LEFT. (Update: My team is more inclined towards “most” than “all”.) We’re still running tests on this and likely some “hidden” material will still come to light, but we have enough that a historian could “get it” even if a completist or armchair archivist wouldn’t.

The number of total sites currently on Geocities is elusive. There were numbers bandied about between 1996-1999 of millions, with 3.5 million the largest number I could find. Bear in mind, however, that 1. Yahoo are fucking liars, 2. People who are about to be bought for billions of dollars might be inclined to be fucking liars, and 3. The press will often aid and abide fucking liars, sometimes intentionally, and sometimes not. But what is definitely clear is that Yahoo purged a lot. How much, again, unsure, but we have found one neighborhood (WallStreet, ha ha get your jokes in, comedians) that is utterly empty, as well as the holiday special NorthPole.  Gone, utterly.

Others are in better shape, with hundreds or thousands of sites left in them and their suburbs. Obviously if someone jams their secret mp3 they spent 3 hours calculating in 1998 in a place nobody ever found, then we won’t find it. But generally, stuff is being found. Rsync is a huge help here; we can liberally grab crap and make it “do the right thing” against the global list and collection.

I have only the merest of time for people (some friends) going “why even try saving geocities” so let’s instead move onto the other question I’m starting to get, which is “where can I get this”.

It is more important to me to grab the data than to figure out how to serve it later. People who have been talking about copyright and stuff seem to think I’m going to sell it or take credit or some crap. I don’t see how the final collection won’t end up online, but how is elusive – maybe a torrent of a bunch of zip files, or as a curated collection, or as a bunch of hard drives. However it is, I’ll make sure people can get it, somehow.

So there we go. It’s running fine, things are happening, and I’m sure in the time it took to write this we’ve grabbed another 5 or 10 thousand memories from the soon-to-be-gone Geocities. GO ARCHIVE TEAM GO.


Categorised as: computer history

Comments are disabled on this post


74 Comments

  1. Flack says:

    Wow! Kudos to you guys for undertaking such a Herculean effort! I know there are people such as myself who really appreciate your efforts and get what you are doing. Is there something the “average guy” can help with by showing up in the IRC channel?

  2. Eve M. says:

    And for those with not much time (or the skills you need) but with an inclination to help, would tipjar donations be of any use?

  3. henrik says:

    Hey Jason. Long time reader, first time commenter. Here’s a morbid, but fairly relevant question: What happens when you die? Who gets all the crap you have archived over the years? I mean, clearly a bunch of the stuff is available online, but you have must have a lot of offline data sitting around too. Plus all that various junk people send you etc, etc, etc. So, I was just wondering if you have given any thought to that at all.

    • Jason Scott says:

      I have a number of beneficiaries set up to take over the collections, and who will make proper decisions for the material. Ideally very little will fall between the cracks.

  4. Shii says:

    Ever wanted the log of Koko the Gorilla’s “chat” (or pounding keyboard) with AOL users? Well, the only copy of it was here:

    http://www.geocities.com/RainForest/Vines/4451/KokoLiveChat.html

    Now it’s gone. Luckily the Internet Archive took care of that one, but who knows how many other sites would have vanished into the mist if not for Jason?

  5. In case no one else gets around to saying it directly, Jason, thank-you. This is amazing.

    I haven’t personally had a website on Geocities for years, but, in a certain way, I feel like I grew up there as much as I did in any physical neighborhood. When I heard that Yahoo would be closing up shop, I was devastated. Geocities is where I first learned to write HTML, then where I posted my first webpage. It’s where I read my first hacking tutorial, my first philosophy text, and my first fan fiction. It’s where I made all the penpals that got me through high school and where my first girlfriend and I would post notes for each other during the week when we could see on another.

    I probably could have gone another decade without ever even thinking about Geocities, but imagining it all disappearing without a trace was somehow heart-breaking.

    I’m sure many more people feel the same way.

    Thanks.

  6. SFR says:

    A few of these old pages of mine still get a fair number of hits (no one is more surprised than I), especially the one on the Hippocratic Oath. Thank you for saving it for all the folks who’ve posted links and references to it, and all those who may still come looking for it.

    –SFR

  7. Op3r says:

    y0 redpriest

    Nice to know you are saving geocities site. Geocities was awesome, hope you can get the god damn domain and recreate it back again? 😀

  8. Mike says:

    Bravo to you for doing this! I’m sure it’s one *hell* of a task, but I guarantee that someday people will talk about how this treasure trove of stuff was almost lost, but thanks to your efforts it didn’t all just go *bloop* into the ether and disappear forever. It’s a sad indictment of Yahoo that they don’t care enough to save this early web material. Thank you for being willing to do it.

    Mike

  9. JB says:

    Great that you’re doing this! I pulled down my old Geocities page with wget just to have a copy for myself. Awesome if you can put them all online again somewhere. Heck, I’d even update mine if that were to become an option 🙂

  10. MoonBuggy says:

    Just out of interest, did you try asking Yahoo to provide a dump of all the sites directly? I can only assume they’d say no, but it never hurts to ask – especially when it could be a fairly quick and easy job for them to ship you a hard drive with the lot on and net some cheap positive publicity.

  11. donnah says:

    Ah Konrad and group, if I weren’t already married, and old enough to be your grandmother, I’d ask you to marry me. Thank you so much for all of your hard work!

    D (former Geocities site)

  12. Lesbo Slobberhoots says:

    Yo, wazzup. I jus tried to open a new site over there for my man stick enhancification fluid but I got jack $hit. can i craete my new site in your archive dude?

  13. Yesss says:

    You are the man. All of you. Even the women.

  14. fodi says:

    You’re Mad Max! Love the commitment. Fuck the cynics.

  15. Eric Jung says:

    Can’t believe you’re getting criticism for this. It’s noble work. Keep it up!

  16. Zorin says:

    I knew you were awesome ever since I first read stories about you and your Terror Ferret Co-Sysop in the 80s, but this is just way too much.

    Preserving our history is something that I really wish more folks would do. Information is so fleeting and easily destroyed these days. It has to be maintained, copied to newer mediums, babysat, for lack of a better term.

    I’m glad you’re putting in this effort where others have failed. It would be so trivial for Yahoo to keep the old Geocities stuff around, but noooo. They don’t care. We do.

    The onus is on us.

  17. Jonathan Wilson says:

    I suspect Yahoo would say no to providing a Geocities dump, mostly because of copyright issues. (for example, what happens to the content on Geocities that was posted by someone without permission from the copyright holder) And also privacy issues (e.g. content on Geocities that wouldnt be found by normal trawling but would be there on a data dump)

    Are there any roadblocks you guys are running into when it comes to archiving this stuff? Stuff you cant pull because Geocities serves ad-laden HTML instead of the correct content due to its “anti-direct-linking” policies?

  18. Anne Madison says:

    Well, after I got done being an OPUS sysop, I turned around several years later and tried my hand at moderating a group of fan-fiction writers on Egroups. When that got absorbed by Yahoo, we installed our archive on GeoCities. It’s still there. It’s Pepto-Bismol pink on a black background (what was I thinking?). The fancy-in-1999 pulldown menus don’t work in Firefox anymore. I believe, but am not sure, that it was the first set of web pages I ever did that used css. I had forgotten about it entirely, but I’m going to faithfully download it and store it away somewhere where it’ll be safe. What is the digital equivalent of tucking something away in lavender?

    Kindest regards, and you’re always sure of a warm welcome if you make it back to Baltimore.
    Anne

  19. I just found out about Yahoo’s decision to close Geocities this year today itself and started working copying files I had uploaded to Geocities back to my computer including backing up a copy of my website homepage.

    I don’t know what I’ll do without my personal website. I hope it can be transferred to another web host and would still be able to edit it even then.

    My personal website till Yahoo shuts Geocities down is http://www.geocities.com/maneeshpangasa

  20. […] least one gallant individual agrees. Jason Scott and his band of merry geeks, whoever they are, have decided this must not happen and have […]

  21. CharlesV says:

    Jason, have you considered a distributed archiving system? Since especially with something like geocities, there is a limiting factor of the number of requests per second/minute/hour an individual machine can handle, some sort of program that would grab a chunk of nodes to process from a central list (nodes compiled from your hard work and ingenuity in parsing the geocities / whatever / etc system), grab data locally, and upload to a central archive server? While processing, those nodes are “checked out”, and once checked into the archive system, they’re removed from the to-do list.

    Probably a little late in the game for geocities, but I’m sure there’s a lot of distributed computing / manpower you could put to use for future projects.

  22. Shreela says:

    A sweet thing to do. I wish I could remember my Geo neighorhood was. I’m surprised Google didn’t get involved, but I guess there’s a difference between newsgroups and people’s personal webpages.

    Where are the backed up sites? If they’re on archiveteam.org it wouldn’t load for me.

  23. Nerds are working to preserve your crappy Geocities site…

    ……

  24. Steve Pordon says:

    Why? BECAUSE IT’S THERE, FUCKHEAD!

  25. Flack says:

    Hope this doesn’t affect the bandwidth you have funneled toward this current project:

    http://rss.slashdot.org/~r/Slashdot/slashdot/~3/A5S1BuEtGQ4/article.pl

  26. Sarah says:

    Those asking ‘why’ only really need to Google, you’ll still find Geocities pages appearing high up the ranking depending on what you’re looking for (along with an inordinate amount of dead domains hosting spam blogs). True the site hasn’t been updated in years but it’s got what you need.

    I’m more fascinated by the ‘how’ question personally though. After all the javascript and ad systems were cantankerous to begin with, it would be fascinating to see the workflow and scripts used to pull this old sucker apart.

    And yes, being geeky also means I’d like to know ‘when’ as in “When can I get a huge tar ball to throw on my local webserver and muck around with” but all things in good time for that one.

    Good luck with the project and thanks for archiving yet another bit of geek and internet history before it evaporates.

  27. DC Oriole says:

    What if I don’t want my site archived because of a legal agreement?

    Perhaps I WANT it to go away.

  28. otto says:

    Someone needs to preserve all that data. Regardless of what some naysayers think, it’s a part of history. It needs to be kept somewhere. Eventually, it will be a miniscule piece of the internet pie, but it’s a great look at the early internet landscape.

  29. Rubes says:

    No matter how good or bad Geocities was, this is just an awesome project.

  30. bobby says:

    Well I for one am a bit concerned. I have a web site on GeoCities because I changed ISP’s and lost my hosting in the process. I am now the proud owner of mind numbingly slow AT&T dial-up service and have no free hosting through them, except for GeoCities through the AT&T/Yahoo partnership that is. So now, my rinky dink web site that gets maybe 2 hits a week, but it still MY web site, is being copied by these guys. Not sure I want someone making copies of MY site. I’ll do that myself thanks…

  31. jonbro says:

    amazing.

    If you ever feel inclined to make an archive team t-shirt, I will buy it.

  32. test says:

    This is absolutely fascinating. Keep up the good job. Historians in the year 3000 might thank you for this.

  33. Ugly and neglected fragments…

    GeoCities is an ugly disorganised mess. But it’s still a huge shame that Yahoo! is closing the whole thing down….

  34. mdy says:

    Bobby, two posts above me: Your site is published on the internet. If you wanted to keep your material private, you shouldn’t have published it. On the internet.

    Geocities had a pretty bad reputation (“Geoshitties”) but I remember reading lots of fascinating things (to a 15 year old) about Star Wars fandom, obscure technical information for 1960s-era Mopar vehicles and homebrew gun making that I couldn’t find in the library.

    Can’t say I’m sorry to see Geocities go, but this project is fantastic. Good luck!

  35. Babsy says:

    I just wanted to extend a huge thank you to everyone working on this archiving effort! I, too, spent an inordinate amount of time on Geocities back in the day, and I have many fond memories associated with them. For instance, I still remember how ecstatic I was when they upped their allotted web space to a whole 2 Mb per person! 😀
    Best of luck to you, and thanks again!

  36. Mark says:

    Hi Jason, I hope you are successful. My oldest website is also on Geocities and still have the original files. I am looking for webspace to host the stuff. DO I need to keep looking or will you host the webpages again somewhere?

  37. Mark says:

    Thank you for backing up the websites. Will you also host them in the future? My oldest website is still on there and I still have the original files. Can you recommend a free host where I can move the site to ?

  38. bobby says:

    MDY…. I realize it’s on the Internet, it’s a web site after all. But, at the same time, I don’t see a reason for someone to be archiving it for me, not that Google doesn’t do that for me already. So…. I’ll just rip it down now and be done with it. Too bad the host site is going away though, it was sort of an Internet staple. It’s kind of like what would happen if mcDonalds discontinued the Big Mac, of course I doubt anyone would be running around collecting them for posterity :-)….maybe if they Crabby Patties…..

  39. Zorin says:

    >Perhaps I WANT it to go away.

    Well that’s too bad; you shouldn’t have put it on the Internet.

    Think of it like a book; just because you wrote a book doesn’t give you the power to recall every copy ever sold. And no one should have that power; destroying information is an *evil* practice.

  40. latecomer says:

    GENIUS!!!

  41. […] (Jason Scott is trying to make a backup.) […]

  42. […] good news, someone is attempting to archive the whole lot and is making fast progress. I have mixed feelings about […]

  43. Thank you Jason for the hope of saving the effort I have put in from 1995. Personally I have lost some of my links and some of my sites which Geocities sometime back removed for inactivity and change of email. Now that I know that Geocities is running for the hills, I shall be aware to either not upload anything more and also start downloading my files onteh net as my hard disk had crashed some years ago and I lost everything. Thanks for the extended time , and hope that what I have shared for thousands or millions of visitors who have enjoyed the data will be around fro a little longer. Ronnie. I am also known as the ‘Bangalorewalla’.

  44. spenser says:

    Wow! Finally, another site that remembers that green on black is wonderfully easy to read. Not, the gamer “black to be cool” theme, but old green screen tech.

    For the poster above who has no free hosting for his geocities site, perhaps the answer is to sign up for a gmail account and squirrel it away over there for a while.

  45. sjoep says:

    isn’t most of this allready archived by archive.org? Not that i don’tappreciate your effort,I do. As a historian i rely on people saving sources like this for future research

    • Jason Scott says:

      Short form: No, it’s not already archived by archive.org, although their hard work and efforts are appreciated.

  46. Matthew says:

    I am so impressed. The moment I heard that Geocities was going down I wondered if someone was doing this – it’s fantastic that you are.

    I hope that one day we will have big international organisations to preserve internet history. Good luck until then!

  47. CharlesV says:

    Dreamhost is offering _2 years_ of domain reg and hosting to those who can prove they were geocities users (create a page saying “I’m Off to Dreamhost! on your geocities account and submit it with your account registration). Not shabby!

    http://blog.dreamhost.com/2009/04/24/theyre-internet-history/

  48. Jason, thank you so much for what you are doing! I posted (on bambicni.stumbleupon.com and bambismusings.tumblr.com) about my concerns about the loss of the geocities sites – where folks put up some great information and/or genealogy trees and some no longer have an active account because they died, lost their password and couldn’t figure out how to get back in, or maybe didn’t create a yahooID when it was purchased by Yahoo! and couldn’t back it up, or a slew of other reasons. Sure, there’s a lot of stupid stuff too, but there are some great sites on these types of free sites. It is great that you are doing this!

  49. Mark Braun says:

    IF my backing up is just a safety net until somebody like you gest the Nobel Prize for creating a Noah’s Ark for webpages, I’m OK. I’m happy to pay anybody to host this, and a couple of other, small sites if only to keep my clubs alive.

    Good luck bud. A daunting task that you’re taking on but probably the feat that will become a legend!

  50. […] Here is more on the subject in the form of a brief history lesson and update from Jason Scott: After my initial call-out, a nice selection of folks showed up to the Archive Team IRC channel, ranging from the offering of bandwidth and disk space or simply moral support and coding. We’ve been downloading at an enormous rate, probably along the lines of a gigabyte a half-hour of Geocities, through all our different vectors. Because we’re talking literally millions of files with an average size of 1 to 30 kilobytes, it becomes harder and harder to get a “big picture” view of everything we’ve grabbed, but after 48 hours of work, Archive Team has saved over 200,000 Geocities sites. We’re now pulling in new sites at the rate of something like 5 a second. Is that fast enough? We’ll see, won’t we. ASCII by Jason Scott / Geocities: Lessons So Far […]