ASCII by Jason Scott

Jason Scott's Weblog

Datapocalypso! —

Well, nothing like coming back from a holiday trip to find yourself slashdotted. Actually, there’s really nothing like finding yourself slashdotted back in 2001. Nowadays, not so much; a mere six thousand visitors over a couple days with a few more crawling in days after that.  Web servers have gotten pretty good at what they do and Slashdot has changed over the years, not representing the bottleneck and hit-gun it used to be.

What hasn’t changed at all is that awesome level of Opinion Tourist that accompanies the wave of page views and commentary. Within a short time, both on the Slashdot site itself and in my comments section of my entry, came a gaggle of responses ranging from insightful to, shall we say, distracted.

This jury of my sort-of-peers arrived as a result of this weblog entry, in which I describe the absolutely shitty way that AOL Hometown was shot behind the shed, and the deeper meaning and ramifications of this act with the amount of shared data resources we now depend on, not to mention the cultural/anthropological loss.

Positive responses were enjoyable/informative to read, as anybody agreeing with you tends to be an enjoyable read. Non-positive responses can be lumped into the following general headings:

  • Allow me to explain that I keep backups. I am awesome.
  • HOLY CRAP YOU SAID LAW I HATE LAW NOT SURE WHAT ELSE YOU’RE SAYING BUT HOLY SHITFUCK YOU SAID LAW
  • If The Service Is Free, No Compassion For Thee
  • Your comparison to evictions is terrible, based on a number of criteria that I mostly made up
  • I sure do hates me some AOL Hometown and the People Who Use It
  • Watch as if by magic I say something indicating that not only did I fail to read your weblog posting or consider what was discussed, I apparently didn’t even read the slashdot summary or, possibly, my screen.

Getting hung up on the solution set is a classic problem of a left-brained person: they see the issue being discussed and then a proposed solution, and instead of acknowleding the problem, they start re-engineering and insulting the solution, pointing out where the problem is. So a lot of people got way-laid by me saying “law”, some using the example of “I do free hosting, I would never be able to do this” or “How dare you consider passing a law, that’s like shoving radioactive rods up the nose of children.”

Let’s talk terms here. A guy who is giving free hosting to a couple buddies or even a business or two, especially if he’s not actually incorporated or a business, is a fucking couch. Obviously when we’re discussing the liability of hotel chains your lame RV parked over in the Wal-Mart down the street doesn’t count with regard to Innkeeper Legislation. OK? OK.

Similarly, I happen to think law is the solution in this case, because I am convinced the stakes are so high, with data being so critical to our infrastructure, not to mention our history. You may differ, and that’s why some places require almost no formal education to become a teacher while other places subject you to massive interview and evaluation cycles to get anywhere near a group of students alone. The Spectrum of Opinion, welcome to it. The fundamental discussion is whether this data retention is of sufficent importance to warrant attention.

One brain surgeon or two explained to me that real-live eviction law doesn’t allow for maintaining of space for an orderly exit. I beg to differ.

And as for “free”, I think we’re going to have a few rounds of root beers over whether a place, like Google, that browses through your e-mail via robots and uses it to generate statistically relevant advertisements on your page, or places like Flickr that do in fact have advertisements for seeing your content and charge you on top of that for additional features, or places like Ustream that have profit-sharing and used to do indirect advertisement but now overlay ads on your content, are “free”. Some people confuse “no money down” with free and that’s why they’re getting fucking kicked out of their houses, finding themselves at the mercy and procedures of actual eviction law.

But then again, people are people, and they turned a lot of this into a discussion of AOL Hometown specifically, and AOL Hometown policy, and what’s “right” and “good” and what they “had” or “didn’t” do, as if this wave of “sorta” was going to be convincing under any harsh light.

So let us be crystal clear about what the situation was with AOL Hometown’s shutdown; this was not a case of catastrophic disk failure or a revelation of bad data integrity practice. We’ve had plenty of these, most recently with a site called journalspace.com that was primarily mirroring but not backing up data and had a loss, permanently. This happens all the time and is really sad and we all get to stand here in the future and point fingers and giggle at the past, but it is not the situation being discussed here. AOL chose to shut down the site and pulled the data from being accessible; the webserver was disconnected and the URL redirected to a new location, where a smug little weblog posting was all that remained to mark its passing. The fact that you utilize a multi-tier self-started backup operations paradigm across geographically variant hoo-ha is not what we’re talking about, Poindexter.

This was a case where someone or a group of someones made a decision to take the site down, and by take down, they chose to rip it down posthaste, with a specific amount of “warning time”, and accompanied by a flawed, scattershot attempt to mail everyone associated with the sites, and then doing a massive, global redirect of many tens of thousands of “sites” to a single weblog posting. This procedure happened because of money issues, most certainly, and likely not out of a sense of evil or meanness, but it also happened in an environment where this approach was considered legitimate and valid. This is the heart of what I’m trying to get to: they saw absolutely nothing wrong with this.

They could have spun the data off to a separate firm. They could have contacted archive.org about a transfer of sites. They could have alerted the media in the manner that, say, some entities have posted public notices about auctions or bankruptcies. They could have made the timetable six months intead of four weeks. And, once the data was down, they could have provided a read-only, FTP-only, or otherwise non-browsable accessibility point that a person with the proper credentials could retrieve said data from for months from now, just like (for another real world example) a commercial entity will tell you that the material you used to keep with them is now being kept in a new location and with proper ID, you can retrieve it. They don’t just burn the fucking building down and then put up a sign saying “We burned the building! Thanks for visiting!.

But right now, this approach is considered so inherently A-OK that a good percentage of people writing responses or nib-nabbery paragraphs about this situation totally skipped over it. Not only was the situation acceptable, it was beyond that – it’s considered normal. One of the core points I’m trying to make is that it shouldn’t be considered normal. As a archivist, this horrifies me. As a historian, it saddens me. And as a fifteen-year user of what people now call The Web, it infuriates me.

I happen to think it specifically being AOL Hometown is besides the point, but some people have decided to focus on it being AOL Hometown and ignore the larger issues, and never let it be said I can’t drill down to the specific from the general. So let’s go enjoy a history lesson.

September, 1998. The internet is still new enough that Jon Postel will not be dead for another month. Google has just been incorporated as a for-profit company, Paypal has been founded, and MySQL has been introduced. A 25 gigabyte hard drive is about to be announced by IBM. And does America Online have a deal for you!

If you are a member of AOL, your dashboard has a new service announced, called AOL Hometown. All you need to do is tell your AOL dashboard to pull in your site and they will double your disk capacity. For the majority of people pulling over to the service, this is a offer that’s almost impossible to refuse: people crow at the expansion of their sites up to 12 megabytes of disk space. Remember, though, that you can’t just be any old schmuck to sign up; you have to already be an AOL member, and it’s provided as part of your service with AOL.  As you can see in this message, it definitely costs money, and isn’t even cost competitive, in a world where you can get an extra 10 megs of hosting space for a dollar, “like you would use that much space to begin with”.

A lot of people hate AOL. AOL caused the September that never ended a mere five years previous, and even though Netscape and Microsoft were in the middle of what was called The Browser Wars and had done their part to turn the previously-more-technical Internet into a graphical interface, AOL was the leader in the ratio of preparation-to-deployment for users. People were being shoved onto the Internet at large and being given very little in the way of direction, but they all knew that one of the best values in the world was having A Web Page.

With A Web Page, you see, you could create a full-color, hyperlinked, beautiful page about any subject you wanted. At a time when color printing could cost you a dollar a sheet of paper, you could have a full-color presentation available all over the world. Perhaps a person who now carries a music machine with 80 gigabytes can’t envision this, but this technology was amazing, vast, and falling into the hands of people who wouldn’t have ever composed a newsletter, or even a diary. While it probably would have been great if everybody had been given Netcom accounts and made to learn about HTML the hard way, the trend was towards easier and easier methodology, and most importantly, some very non-technical people were given a voice.

AOL itself was purchased by Time Warner in 2000. Its fortunes rose and declined. Through this, AOL Hometown was shifted around, customer service experienced deep and wide-ranging changes, and over time it became harder to get customer service, to make sure you were notified of changes, and to be given news of your website, one which you might have not changed for a half a decade but which contained information, hopes, dreams, history.

The approach by which AOL deleted AOL Hometown was haphazard, obviously ad-hoc, and internally inconsistent. If you try and see the page where they explained to people how they might export data, you will find that this page has been deleted and backdated.

While four weeks seems like more than enough time for warning to some folks, the question rests: why? There are some very specific rules with the retention of financial data for a licensed company. Logs of Yahoo Searches are currently kept for 90 days, but that’s just Yahoo making stuff up based on social pressure; there’s no law in effect regarding this. Google simply says they will maintain your deleted mail for a limited time; there’s no policy with regard to how they would bail out Gmail data to you if they went under or shut down the service.

What? Google? Pshaw you say. Google is forever! Well, just look no further than their shutting down of Lively, a goodbye message that clearly states that “thousands” of chat rooms, locations and avatars were built by users, and which represented probably hundreds of not thousands of hours of work. And what was the export policy by Google? Well, please take a gander at their announcement: “We’d encourage all Lively users to capture your hard work by taking videos and screenshots of your rooms.”.

FUCKING AWESOME. If a place that encouraged you to come in and place items you’d created, and was suddenly going to shut down, told you that you could use a point-and-click camera at the window to capture your “hard work”, you would be setting motherfuckers on fire.

The point is not “Google”, not this or that, but a general malaise and terrible lack of ethic with regard to this work. It’s considered a normal thing to ask if a discontinued product will be open sourced. They might say no, but it’s considered quite normal to ask. Similarly, it’s totally within the realm of reason to throw ideas of Creative Commons at any generated artwork; places like Flickr and now Wikipedia kind of force-funnel you down that road. You can choose not to, but it’s weird and a special case.

By the way…

Need further proof that AOL is just making stuff up like everyone else? Throughout November and halfway through December, all the files were still accessible. You could log in via an FTP site and download your files. It is believed they had 25 servers with this data, and they decided to delete them all at once, with no retention. AOL never explained this. Why should they? There’s no social stigma other than being told they suck, which the remaining employees are quite used to hearing. There’s certainly no law on retention or accessibility of this data. There’s just the chortling echos of technically-savvy website owners or people sitting on equally-shaky footing wiping their brow and being glad this sort of thing will never happen to them.

Until it does.

So what do we do?

Well, let me give a personal example.

Through one of the weblogs I browse, I found out a website called podango.com (a podcast hosting site) was going down. The word had gone out to subscribers of the service that the company was going to be going through some rough times, much as a hedgehog being thrown into a blender was in for some tough times, and maybe you should get your shit off our servers immediately. In line with what I’ve been talking about, they gave everyone five days at the end of December 2008 to do it. Five days. Five days versus four weeks; what’s the gooddamn difference? Technically savvy people given less than a week, over Christmas, to figure out how their data was going to be transferred, to figure out how to get RSS feeds transferred. Some people came back from holidays and found all their shit gone. Didn’t check e-mail during Christmas? Sorry, podcaster!

So what did I do?

I fucking downloaded it.

Check this out, kids:

30506       applephoneshow.podango.com
8242        caseclosed.podango.com
4           developer.podango.com
58280632    download.podango.com
5916        gildersleeve.podango.com
4           image.podango.com
14          insidepodango.com
4           my.podango.com
20          sites.google.com
2835048     supernova.podango.com
18128       suspense.podango.com
4           www.ilifezone.podango.com
8512904     www.podango.com
# find . -print | grep '[Mm][Pp]3' | wc -l
    4080
What you’re looking at is about 70 gigabytes of data from podango.com, lock stock and barrel. Over 4000 distinct episodes of podcasts. It took my machine five solid days to do it, but I downloaded all of that lame site. Do I have a favorite podcast on there? No. Did I know someone with a podcast on there? No.

I did it because I had the means (disk space), the motive (the sense of history and the recognition that this was historically relevant work representing thousands of hours) and the opportunity (a fast connection and five days before they were to die). A back-of-envelope calculation tells me I just rescued 41 days of podcast, along with all relevantly hosted images, show descriptions and XML data.

This one will pay back immediately; people are already contacting me, profusely thanking me.

So what am I saying here?

We need the A-Team.

archiveteam

ARCHIVE TEAM would be like CERT (the Computer Emergency Response Team) used to be, where it was a bunch of disparate people working together to solve a problem in a nimble and networked fashion. They’d find out a site was going down, and they’d get to work.

They’d go to a site, spider the living crap out of it, reverse engineer what they could, and then put it all up on archive.org or another hosting location, so people could grab things they needed. Fuck the EULAs and the clickthroughs. This is history, you bastards. We’re coming in, a team of multiples, and we will utilize Tor and scripting and all manner of chicanery and we will dupe the hell out of your dying, destroyed, losing-the-big-battle website and save it for the people who were dumb enough to think you’d last. Or the people who, finding you’d been around forever, had the utter gall to not be near their computers during your self-created, arbitrary sunset period.

Archive Team would also help publicize your demise in their mailings and discussions, getting the word out to a greater audience that you were dying. If law isn’t the answer, vigilante teams of mad archivists are the answer.

I really don’t have the time to formalize this, so feel free to take up my torch and run around setting barns on fire. I’ll stay on #archiveteam on EFnet for a while, in case people want a place to hang out. Set up a bot. Set up a way to communicate things are dying or that a site needs reverse engineering to yank the crap out. Find out where a mirror ended up, or what needed said mirror. Let’s do this thing.

Anyway, so there’s my clarification.

UPDATE: Holy crap, Archive Team is now real. Check it out.


Categorised as: computer history

Comments are disabled on this post


40 Comments

  1. Oh, DAMN. I want to join the A-Team!

    Seriously, it seems like the kind of thing one could throw distributed computing at — for the spidering, at least.

  2. Fred Blasdel says:

    CERN (the Computer Emergency Response Network)

    WTF? Surely you were attempting to refer to CERT (Carnegie Mellon University’s Computer Emergency Response Team), and not CERN (Conseil Européen pour la Recherche Nucléaire).

    • Jason Scott says:

      Awesome catch, Fred. The Archive Team needs your anal approach to this daunting task.

  3. Steve S. says:

    Of relevance, Pownce had a similar shutdown recently. Pownce was bought and users were given 2 weeks to export their data before the service was shutdown and the data gone forever.

    http://www.techcrunch.com/2008/12/01/pownce-deadpooled-team-moves-to-six-apart/

  4. Ryan Russell says:

    I had those floppy cases.

    How about this for a law: escrow your crap to the a-team upon epic fail?

  5. randomwalker says:

    grep has a “-i” option for case-insensitive search.

  6. ben says:

    hi,
    when I look over what’s going on ficlets I think there is an additional negative effect because of “unsensitive” shutdowns: That’s the erosion of trust in sites like these. So people will use it less enthusiastically. And that again is contradictory to the purpose and the spirit of all the internet services that are build.

    Maybe I could help building a lightweight solution that is not perfect, but better than a total loss:

    – setting up a site like archiveteam.org with some information (faq, …)

    – a mailing list where people can report and discuss datapocalypsos

    – a script that can be run by some few a-team-core people, that does:
    1. scrap the whole site
    2. makes a splitted archive (max. 2GB tarballs) of all the files
    3. strips down the pages to bare html (removal of images, styles, and javascript)

    – the whole (bare html) sites are listed on a simple index and searchable by a google custom search.

    – the tarballs with the entire sources are on a restricted ftp site for the core members to download. They distribute it via p2p/torrent.

    So when a not-so-tech-savvy user is coming to the site, he can look for his content. If thats not enough, each of the bare html site has a reference to the tarball (eg. ficlet-20080923-4.tar.gz). He can try to get the files via p2p/torrent.

    Maybe we could try this with ficlets.com.

    ben

  7. Mary says:

    What a passionate plea. Thank you. I wish there was some way I could help. This cause has needed a voice like yours for a long time.

  8. Random reader says:

    “Getting hung up on the solution set is a classic problem of a right-brained person”

    Left-brained, you mean?

    • Jason Scott says:

      The classic problem of a left-brained person is calling other left-brained persons a right brained person. Fixed, thanks.

  9. Andrew says:

    Interesting post, as was your original on AOL Hometown (Obviously if I posted in that discussion it’d get lost in the noise).

    I’ve recently been looking at grabbing sites for archiving (even just specific pages, but usually sites – say a weblog subdomain, sometimes an entire site). I use HTTrack for my initial tests (being on Windows) – however, I’m interested to know – Jason, what do you use, and do you know any decent websites with info on exactly what you propose – archiving entire sites? (I’ll keep this post open if you reply here I guess).

    While I am specialising in IGDA Game Preservation work, and related IA collections, so that’s more my interest, the general field of websites disappearing is terribly bad. Having a guide on how to get a copy of said site is always a good thing to have around for such an emergency (I do realise Flash-based sites are likely a no go with any method of downloading, but it’s mainly normal sites I’m worried about).

    There’s also cases when a site entirely changes at date X, where the IA crawlers might have a few pre-event pages, it’s also good to have a copy of the old site.

    If anything actually gets setup for this it’d be awesome to know. The IA would be a nice place to use (at least for disk space, which I don’t have a lot of ATM), but I’ve no idea what team they have for website archiving to be honest.

  10. Drew Wallner says:

    This post became increasingly relevant to me today, as Livejournal has apparently imploded and just fired most of their staff (including every single engineer). Lovely. Time to start looking at backup/download solutions for seven years of entries…

  11. Edward says:

    They saw nothing wrong with this because there was nothing wrong with this. It is considered normal because it should be considered normal. You try to make a point that it shouldn’t be considered normal, and that point comes out of your own predilictions, tendancies, and preferences. I don’t see any reasonable or rational cause or reason to suggest that any alternative should be expected or obligated by law.

    As an archive fan, you seem to have an unwavering desire to lock the history of everything into some semi-permamnent state of reflective accessibility. Nobody should be obligated to facilitate this. Heck, you label an entire archive of podcasts as historically relevant based on, I guess, an abstract notion that everything is historically relevant.

    Don’t conflagrate that issue with the fact that AOL is a terrible company and when you deal with terrible companies you deserve what you get in principle. That is an entirely separate issue. I think the real issue is that people got Service X from Company Y, and Service X was not an archiving service, a backup service, nor a web hosting service with in perpetiuty accesibility. There should be no obligations or expectations beyond whatever was spelled out in the terms of service on signup. I am assuming that this condition is satisfied in all the egregious cases you spell out, but as your key point is not attacking that, but instead hoping for a principle of obligatory archiving.

    Furthermore I don’t see how de facto archving serves the good of anyone outside of the directly interested or the archive fetishist. If the directly interested parties are not making appropriate archiving decisions then their negligence is not my, nor societies, concern. The archive fetishist is a special interest group that also is not my, nor socieities, concern.

    I adore your blog though. Don’t get me wrong, I think you are pretty rad. I think you are very misguided in your outrage, transforming justifiable lament (things suck, lets talk about things that suck because they suck. I can agree and sympathise that this sucks) into a call for action that I think is pretty silly.

  12. Andrew says:

    I’m now reading the updated comments, and man, Edward just seems to embody almost all the things pointed out in Jason’s list that were taking his points wrongly, in this actual post. Deja vu eh? 🙂

    Edward; to be more specific to one important point, if archivists and historians can’t access the “defacto archive” data (be it this data or ANY data ever kept by anyone…) then there IS no history, and certainly Jason’s work would be highly limited! Allowing 4 weeks to get your sites data is ridiculous for all the reasons Jason states and more, but basically since they were so terrible (not even trying to save the data themselves in any way) he points that out, but also points out better companies who do it just as badly too. Ignoring any law argument and you still need to take into account the ethical, historical and social consequences of these decisions.

  13. Andrew says:

    Obviously my question got lost; but if you did have a quick and dirty secret to an easy way to mirror a page, site, or domain or whatever Jason, enquiring minds (well, mine) wouldn’t mind knowing 🙂

  14. Christina says:

    Count me in! I was about to alert you to the Livejournal thing, but it appears I’ve been beaten to it.

  15. Edward says:

    Andrew, I think I’m a combination of begrudged and confused. Even squinting I can’t seem to connect the list of misconstrued points with my response.

    Historians record history, not insist that historical things be preserved for people to observe in their original form in situ.

    It would have been polite and customer friendly to give more notification. That is the sum of what this situation has to teach us. Oh, and that you and you alone are responsible for backing up your stuff.

    Again, I really don’t feel I’m parroting the things Jason identifies as misconceptions. I’d appreciate a more detailed explanation if you insist that I am.

  16. Andrew says:

    /me shrugs

    If you see it that way, then it’s an odd opinion, but fine, whatever. I’m sure you won’t find a historian to agree to those points (in all cases, they’d prefer any digital data available to be usable in any future time – if you want an analogy, I’m sure an Archaeologist isn’t stupid for digging up old things to show us about history, even though physical things must change, digital things don’t even need to!), but fair enough, if you think that’s the way of it, I can’t exactly restate my comment, heh.

  17. Edward says:

    I never said a historian wouldn’t want the digital data usable in any future time – what I said is that it is wrong to say that a person who has data, or at one time hosts data, is obligated to satisfy or support the role of the archiver in preserving it on their network and hardware.

    I really don’t quite understand what your objection to that is. I’d be tickled pink if you’d indulge me in showing me what it is.

  18. ross says:

    enough talk archive vigilantes GO!

  19. Yoz says:

    I love the Archive Team idea. Have you spoken to Brewster Kahle and the rest of the archive.org gang about whether they’ve done this sort of thing before? Given that they have, on occasion, slurped in massive archive data from particular sources (though I’m not sure if any of those sources have been websites) they might have some good tips, and may even want to assist with storage & bandwidth.

  20. Deth Veggie says:

    “You label an entire archive of podcasts as historically relevant based on, I guess, an abstract notion that everything is historically relevant.”

    There really isn’t anything particularly abstract about that notion. It is a well-established and agreed-upon basis for much of modern archaeological and historical theory. The concept of “history” being defined as a limited series of “great events” by/around “great men” has been largely discredited for decades.

    – Your Friendly Neighbourhood Archaeologist

  21. […] ago, Eviction, or the Coming Datapocalypse, has kicked up a bit of dust. His argument (see also his follow-up post) is that services for hosting user-generated content need to take more seriously the consequences […]

  22. Andrew says:

    Edward, I have no idea what you’re on about because I never said that, and I don’t think that was Jason’s point (the point being that AOL gave limited access for a very limited time to the people who put the data on their servers in the first place).

    The secondary point about them not even attempting to get it archived anywhere is a mixture of ignorance of it’s importance, and even if they did know, lack of any real feeling that it “is important” – if people felt that (more should, they don’t realise how much is lost!) then they’d realise computer historians would be willing to take dark archive copies, or help preserve it. Computer Museums usually host large amounts of data like this (software with no original disks, information about things, and of course, pure data), and obviously the Archive.org might be able to help in a worldwide capacity too.

    It’s not a matter of having a law stating a host must at all times archive everything for the future, that’s a rather odd angle of thought, and most certainly wouldn’t work. Jason’s idea was to have a eviction notice period for hosts, with proper notification of events, which would be reasonable to expect for paid hosting and, as he stated, just as fair for “free” hosting (where free is made up of paid adverts). Sometimes sites don’t even allow data to be taken as a local copy, which if it is blogging software or likewise means it’s nearly impossible to get that local copy anyway, making it the fault of the host in that sense too.

    It’s even worse for things like virtual worlds, which unlike webpages won’t ever work without a server to connect too, therefore rendering things like “Lively” (and countless other shutdown MMO’s) utterly inaccessible after they’ve been shut down. Access to data which should be easy (webpages) but isn’t (it’s immediately deleted, and no access allowed for the original people who put it there) is really bad for everyone.

  23. epc says:

    Way back at the dawn of time, well, 1995, IBM used to allow people to create personal web pages which were hosted at http://www.ibm.com/friends. We did this at various tradeshows, Internet World, WWW, etc. and built up maybe 300 active sites (well, pages, it was highly templated).

    Mid-1996 it occurred to us that we likely didn’t want to be hosting 3rd party content on the corporate web site and set about migrating people off. We set up an export and sent out emails for six months and had about a 40% success rate at moving people (including redirecting to their new sites). Dead silence from the remaining people. Close to two years later I got a flurry of email from people complaining that their sites had been deleted (which we did do, maybe six months after the initial notice).

    I think that anyone providing services online needs to have a winddown plan that accounts for the time it takes for people to understand what’s going on, decide whether or not to keep their data, and move it. AOL’s behavior is simply unacceptable since, in the scheme of things, it wouldn’t have “cost” them much more to keep the service up 90 days. What likely happened is a beancounter said “WTF? this service doesn’t generate enough revenue to pay for itself, it’s not funded for 1Q09” and made that decision some time in November 2008.

    At the same time, people who are using such services, anywhere, need to be present and aware. Not in a you-have-to-read-and-understand-my stupid EULA sort of way, but if you ignore repeated emails (or changed your email and not updated various services with the new address), if you don’t actively maintain the site, if you’re not engaged enough or value your own work enough to give a fuck until years later, then you do deserve some of the blame.

  24. […] he is inundated with snotty, cynical geek responses, and lashes back, in the process conceiving of a team of amateur archivists who spring into action, and mirror those […]

  25. Alex says:

    Re: backups. Simple analogy.

    Yes, you should get a smoke alarm and take out insurance.

    No, this does not absolve Crazy Arson Guy of responsibility for *setting fire to people’s stuff*.

    Yes, you should wear a seat belt.

    No, this does not mean that driving like an idiot is OK.

    Yes, you should stop smoking.

    No, this doesn’t mean spending billions of dollars lying about the fact cigarettes turn your lungs to shit over thirty years is OK.

    Basically what I’m saying is…don’t be evil. The categorical imperative. The Prime Directive (“Don’t be such a cunt about it”) applies here.

  26. Alex says:

    Further, who *erases data* anyway?

    It’s not like storage is expensive; in fact, for many values of storage, there is no measurable marginal cost (there’s a lot of underutilised HDDs out there) until you fill a really big bucket. (For example, the price of going from 250GB to 500GB is nutzoid cheap.)

    I really wonder if the person who did this can even measure how much they supposedly saved.

  27. Mike says:

    What you are doing is tremendously important. I agree that these websites and website hosting companies shutdown without allowing adequate time and resources to download them is a disgrace. Thank God you have a sense of history and the means to save some of them.

  28. Shii says:

    Furthermore I don’t see how de facto archving serves the good of anyone outside of the directly interested or the archive fetishist. If the directly interested parties are not making appropriate archiving decisions then their negligence is not my, nor societies, concern. The archive fetishist is a special interest group that also is not my, nor socieities, concern.

    Two years ago a free hosting service in Thailand went down. It took with it the sole website of the Thai Assembly of the Poor, an organization representing two million subsistence farmers and ex-farmer urban slum inhabitants, and with that went their self-description of how the World Bank’s “development” programs destroyed Thailand’s rural economy. Rather than attempting to rebuild their website, which is still visible in the Internet Archive, the Assembly started posting images of rural protests and activist work into unstable, unarchived Data Clouds such as blog hosting services and Slide.com.

    This shit is important, dude.

  29. Edward writes:

    “Historians record history, not insist that historical things be preserved for people to observe in their original form in situ.”

    Wrong, and stupidly so. How does one “record” events that took place centuries or millenia before one’s birth?

    Historians reconstruct the past from the massive and hard to fake wealth of documents left behind, most of which didn’t look all that earthshaking at the time. Eg. old phonebooks, diaries of people you never heard of, etc. All that one need do to discover this is talk to an actual historian.

    But running your mouth off is ever so much more fun than checking your facts, isn’t it?

  30. Stripes says:

    All my “out there in the cloud” stuff I _also_ keep on my computer. So if my web site provider shuts down, I still have all the html files and pictures and movies and crud ready to upload somewhere else.

    If my digital picture sharing website goes away I still have them too.

    None of that makes me feel less for the folks who lost they’re painstakingly created digital scrapbooks or whatnot…

    A long time ago I had all my digital photos on some pay-to-print service or other. The notified me they were shutting down, and were no longer able to send me the digital files back, but could accept orders for prints for a little longer. It gave me some short lead time, I ignored it because that was my “backup” and all my photos were “safe” on my laptop. Then my laptop ate them. Which made me very unhappy.

    I had a backup of that laptop, but it didn’t help because the way in which my pictures were eaten left the thumbnails. So I had been making backups of the thumbnails, and at some point I the oldest backup was actually the only one that had the real pictures, and that went away.

    Which is a great shame.

    It is also a decent reminder to folks that think they “have it covered”. All it takes to screw you is for a few more things to fail at the right (excuse me, wrong) moment.

    That said, if we make a set of laws about “digital evictions”, one had best hope that it explicitly defines who doesn’t have to participate, because most computer laws don’t, and sometimes have been a serious problem either to people who didn’t think it should apply to them, or to folks it shouldn’t apply to who have company lawyers that DO think it MIGHT apply.

    You have to hope that when the law gets to the other side of the meat grinder we call congress that it doesn’t penalize folks that take decent (but not heroic) steps to keep your data backed up if the backup system fails.

    You are also going to have to accept it will effect what new free or low cost services companies are willing to provide. After all they will have to factor in the cost of providing a legally defined orderly shutdown. Plus making sure that their backups manage to be “good enough” to pass legal muster.

    Hope too that the actual service isn’t “close enough” to something that looks like it _should_ fall into this law that a lawyer thinks it does, but “far enough” that it ether can’t meet it, or becomes prohibitively costly to do so.

    Not all internet services are run by AOL or Google. Many are run by tiny start-ups that don’t really have the money to provide what they are providing, let alone jumping through a few more hoops.

    I mean, I would loved to have had a chance to buy CDs full of my digital photos. On the other hand, if they had been required by law to provide them to me for up to six months after shutdown, and for free they would have had to set aside a tidy sum for that, and probably never would have started business. Then I wouldn’t even have the relatively few bits of that era of my personal photography I thought was worth having someone print and mail me.

    Also more companies that are flaming out would totally flame out rather then being bought up “oh, if we buy them out of chap 11 we have to provide all their customers a way to migrate to a new service, which having seen the fragility of the last generation of service they will do so as fast as they can rather then just stick with the ‘under new management’ version of the service… never mind then”

    Again, I agree it is a problem. Bigger then many folks think even. (smaller then people starving, and many other issues though)

    I just don’t think all solutions to a problem are necessarily better then the original problem 🙂

  31. Vigilante teams of mad archivists are absolutely the answer, and law is absolutely not the answer. It is heartening to see your quasi-realization of this.

  32. I ought to have been more general about this initially. Additional legislation is never the answer to a problem, and teams of people who care about solving the problem working on solving the problem are always the solution to the problem.

    *philosopher waves a libertarian (notice the lowercase “l”) flag

  33. […] Scott’s Protection From Online Eviction? and his follow up post make the argument that services like AOL, MySpace, flickr, or Skype should be treated like […]

  34. Gail Gary says:

    What many people are not grasping, especially, I would say, those under 40, is that we live in an age in which more people have the ability to live their lives online and in words (rather than simply thoughts or private voiced conversations on the phone or in written, mailed letters) than ever before. We are seeing an age of subliteracy in the archivable sense, however, because much, if not most, of people’s publicly expressable lives happen to be conducted online or on the phone.

    If these personally-compiled-content websites are anihilated on some whim or other, financial or otherwise, we literally lose history. This entire Internet-mad era will be known as the time when humankind was essentially highly documented and also, mostly lost, as most of their product has been or will be or is being expunged for lack of space or money or interest. When you lose personal expression, regardless of how important you find individual utterances on individual ISPs, you lose history.

    For people to proclaim “AOL is crap” or “why are people so naive as to think…” is missing the point. It’s history. Personal history. Important to someone, if not to you. Important to the ages, to get a clue as to how we were living and who we were. And it will be lost—all the stupid stuff and all the gold too, that is the essence of the human personality as it winds through its day and has opinions and points out interesting tidbits and sore spots and political weirdnesses.

    Team Archive is a great idea, and a true humanitarian contribution. Thanks for this! You are thinking for all us on this one. (All of us still thinking about something beyond merely ourselves at this particular moment.)

  35. […] a follow-up post “Datapocalypso!” (Jan 2009), he responded to various criticisms and misdirections: This was a case where […]

  36. Greer Watson says:

    Reading all this, I am struck by a couple of things. People seem to justify archiving/saving websites (or other on-line date) for one of two reasons: because it is collectively of historical interest, and/or because there are people who have put online, but failed to back up, material of great sentimental significance to them personally. I have no quarrel with either justification; but I think there are other things that are being ignored.

    First: the latter focus is on the owner of the site – undoubtedly the person most concerned with saving a memorial. However, there are plenty of websites that are created at least as much to be useful to other people. The readers of websites may have no legal rights over them; but that doesn’t mean that they are happy when a site they have been enjoying suddenly vanishes.

    Of course, it is easy to say that they should save the site to their own computer. I have certainly done that on a number of occasions, just as I keep multiple back-ups of my own website. But not everyone knows how, any more than most people know how to HTML a webpage. Take GeoCities sites: a lot of them were written using “Site-Builder” software available from Yahoo. People who used that never did have a copy on their own computer: the site was created on the server; and making a back-up was distinctly less than simple.

    When GeoCities was counting down to doom, instructions were posted so that people could save their websites. These said to go to the top left of the screen, click on “File” and then on “Save as”. However, on many people’s browsers, this didn’t work: because of the side-frame that held the ads, all you got was an error message. Sure, I figured out how to get around this (“View/source”, save that; and then right-click on every graphic on the page). The point, though, isn’t that I personally worked this out: I had my site backed up anyway, though I used the technique to save a couple of hundred others. The point is that most people know even less about computers than I do. In fact, several people asked me to save their sites for them because they couldn’t figure out how to do it themselves.

    Second: this is all human-oriented, even when you talk about the significance of archives to history. What about the websites themselves? Some of them are useful. Some of them are well-designed, with attractive layout, colours, and graphics. (Some are remarkably horrible; but what of it? The finger-painting of a kid in kindergarten isn’t the work of Picasso. That doesn’t mean we close art galleries.) No one seems to point out that the destruction of something that is useful or beautiful is wrong in itself.

  37. Greer Watson says:

    Re 39: Sorry – should read more closely, I guess.

    By the way, I gather you are all still sorting out the various GeoCites sites your team managed to save. Compared with your million or so, I got only a mere handful. However, given the amorphous nature of GeoCities, there are probably a few things I caught that you could include. I do have a list I could send you.