ASCII by Jason Scott

Jason Scott's Weblog

Saved! Sort of. —

As time goes on, people begin to use phrases like “saved” when it comes to Archive Team projects. That’s not quite accurate.

If a website or webpage is simple, utilizing only images and text, chances are pretty good that we can get a reasonable copy of it. If, however, it utilizes any strange scripting, access control, or any of the modern craziness that we see on the web, things become pretty dark pretty quickly. Sometimes photo galleries have JavaScript zoom, or some of them use YouTube or some other services to feed out the video. Our scraping falls apart if you need some sort of plug-in or program to do even the simplest of maneuvers.

In the case of, for example, Hyves, a whole bunch of different problems are showing themselves in the saved pages. Part of its power and character were that people could modify all sorts of different things on the page, and you would see fundamental differences from site to site. Even as we were grabbing websites at the rate of fifteen a second, we knew we were going to miss things.

hyvesgrab

That said…

One of the most inspiring parts of the release of the geocities torrent, was the amount of work and curation that is been done with the data by Dragan Espenschied, who not only downloaded and analyzed the resulting files carefully, he’s created reports, graphs and discuss the errors and mistakes made along the way. His and Olia Lialina ongoing tumblr weblog gives snapshots of pages long gone, with commentary and themes. He even had direct state funding for a year to re-create a virtual machine that could provide GeoCities pages comprehensively and easily.

My hope is that some academics, researchers and other people who have an investment in the history of Hyves will go through our 25 TB of data and help re-created in a more robust and involved fashion. I’m not sure what we come out the other end, but there is a record of what happened there no matter how shallow it might be in some places.

I compared it to the difference between seeing a picture of your home and having artifacts from inside the home: maybe you want the second, but the picture might be enough for you to remember a lot that you might’ve forgotten.

The resulting items will be tattered in some places, perfect on others – we are not saving the domains, or the full context of what this all was. We’re just stopping pure oblivion from occurring in the name of progress and percieved liability. It’s a minor point, but an important one.

sample_damaged_file

The Archive Team has taken many millions of snapshots of many homes. Many are long gone and we pack our little WARC files and .TAR archives and send them into the annals of history. Here’s hoping the future appreciates it.


Categorised as: Archive Team

Comments are disabled on this post


Comments are closed.