ASCII by Jason Scott

Jason Scott's Weblog

ASK THE GUY MIRRORING GEOCITIES —

Welcome, one and all, to ASK THE GUY MIRRORING GEOCITIES, where you get to ask and get answers from the freaky guy who has been mirroring geocities with a lot of other people! Let’s get started!

Q: So how is all that going, anyway? You did stuff and then you disappeared when it got good.

A: Well, after we finished downloading as much of Geocities as we could, we all started living again like normal people. I went back to editing my movie, getting things going with my fundraiser, and being a human being. Meanwhile, I started to go over the basic architectural aspects of the geocities data that was downloaded.

Q: So that didn’t take long and you’re done, right?

A: Heck no! The geocities download is as fucking arcane as the process by which they deleted Geocities in the first place! We’ve had looping directories that eat up hundreds of links, tons of duplicated files, lots of issues with capitalization (Geocities liked capitalizing words in URLs except when it didn’t) and just general ass-grabbery with the lineup. I’m going through them to make a curated piece.

Q: So you’re going to put everything up, right?

A: Oh, probably eventually, but right now I’m just concentrating on writing scripts that clean up the pre-Yahoo Geocities, the portions of Geocities that are in the www.geocities.com/Neighborhood/XXXX or the www.geocities.com/Neighborhood/Suburb/XXXX formats. This alone is looking like something around 300-500 gigabytes of material.

Q: Why not put everything up immediately?

A: I’m a pretty hard-core incrementalist, and I’m trying to make a torrentable/give to archive.org/share with friends/stare at endlessly version of Geocities, original version that will be a bunch of .tar.gz files which you can throw somewhere and go FUCK YOU, YAHOO, WE GOT IT ANYWAY. I have two personal rules in effect here: first, it’s better to burn bridges using burning pieces of previously burned bridges, and it’s good to make sure that before something becomes canonical, you did you damnedest to get it in pretty good shape. That’s what I’m doing.

Q: But reocities.com is kicking your ass!

A: Seriously, it’s not a race to see who can “save” Geocities faster! It’s all about getting the data back. Everyone’s doing it different ways. This is mine. I want geociti.es to be the quality choice, or at least a quality choice.

Q: What is with the goggles?

A: They let me see into the future, when you stop asking me questions about this! It’s all coming! Trust me! Go read this article at time.com which has both me and the reocities guy, which closes with my favorite pissed-off-historian quote I’ve ever had quoted in a news piece!


Categorised as: jason his own self

Comments are disabled on this post


4 Comments

  1. jas says:

    how about working together with the other guys that were grabbing geocities?

    seems like all of the people involved in this had a lot of drive and know-how. seems pooling efforts would
    only be fruitful.

  2. deepgeek says:

    500 GB? Holy S***!

    What do you think that will g-zip downto? Must be a big torrent, and I guess if we want to actually browse the archive, we have to unzip it suburb-by-suburb.

    Any chance of a “highlights” version?

    DG

  3. nimbus says:

    I second that (re. a “highlights reel.” I’m interested and saving the stuff is A Good Thing, but it’s going to be hard to find anything really good even once it’s been sorted out. What would be great would be if there were some sort of voting system so the most interesting stuff rose to the top (like on digg.)

    That said, Wired featured a few selected geocities items recently: http://www.wired.com/rawfile/2009/11/geocities/

  4. DeepGeek says:

    The great thing about this version of geocities will be that it is a ware. I think I will actually get myself a seperate USB drive for this. It would be nice to figure out a way of just grepping through all the HTML files. I wonder if grep has a “recursive” flag. If not, you could always use HTTRACK to rearrange the whole thing and put the HTML’s into one directory for ease of grepping.

    “The Cautous” usually appreciate having a way to search things themselves!

    DeepGeek