Archive Team Yahoo Video Final Push (and a rousing speech) —
What, two Archive Team posts in a row? Well, it comes down to several factors:
- I’ve been travelling for well over a week and change
- A lot of the posts in the hopper are essays and rants not quite out of the oven
- This whole yahoo video download is very important
- I am attending GDC in my capacity of Historian, and that is an all-day thing
So many times I’ve gone out to some location for a while for a conference and all I have to show are a a few photographs and the stated fact I went to the conference. This time, I’ve been also talking with people at night, and also working really hard with the swelled-ranks Archive Team to download the Yahoo Video juggernaut.
To recap: Yahoo are fucks. Wait, let me try again.
Yahoo! are about to delete all user-generated content on Yahoo! Video and that is really busting my crank, as well as the crank of a lot of people that have joined Archive Team to rescue it. We’re now to the point that the whole process is pretty smooth, and we’re getting in the end-time amount of stuff left to do, but we need your help, UNIX-knowing person with a server having more than 500gb free. Oh, you know who you are.
For anyone who can’t join in the fun, let me post this speech I gave at the Personal Digital Archiving conference last week. There’s both an MP3 and a text script. Bear in mind the audio does not match the script – I can’t help but improvise. I am sure they will have a video for watching later, although there’s no slides, so all you’re missing is my crazy gesturing and hat.
Why is it echo-y? Well, the Internet Archive has the most awesome speaking room ever:
They moved recently into an old Christian Science hall and the servers and offices are all scattered within this incredible building, now redesigned to use the heat from the servers to heat the building.
Anyway, here’s the pitch about the Yahoo Video final push:
We’ve been downloading like crazy. There are 9.3 million user accounts/spaces on Yahoo! Video. We’ve scraped the user information and user photos from over 9 million of those. We’re expecting the remainder to go down very quickly.
Meanwhile, we’ve downloaded roughly 7 terabytes of Yahoo Video and are downloading a bunch more, from whatever 4 million of those 9.3 accounts uploaded. A lot is coming in, and this is due to the work of dozens of people pitching in where they can as we have a raucous time.
The generous folks at rsync.net have donated a month’s storage of over 6 terabytes for video for a holding spot while folks rush to get videos somewhere so they can download more.
Yahoo deletes ALL these files from the site (it will likely continue as a directory or paid-content version) on March 15. That’s less than two weeks. It’s going to be close. We need people with UNIX, Bandwidth, and Disk. If you have that, please come to #archiveteam on the EFnet IRC Network and join up, or talk to us, or whatever you’d like to do.
Or, send us cash. Literally. The way we’re doing this is to put them on pairs of 2 terabyte drives. Those drives are relatively cheap but cost money. It costs us roughly $180 including fast shipping to get every 2 terabytes (again, a pair of them; data shouldn’t go to a single drive). Anything you want to donate to this cause will help us buy more drives. paypal to email@example.com marking clearly that this is toward drive space.
OR you might be or know an institution who would like a copy of what we’re downloading, and can provide us an array of disk space to send you a copy. These are going to be upwards of a few million .flv files, along with .html files describing them and user account information besides. Maybe you’re an academic institution or a research facility or whatever. Would you like 4 years of a self-driven sociology and history project run by millions of folks? Sure you would. Write me or come on the IRC channel.
There, I’ve made my pitch. Within a few days, it will become harder and harder to give folks blocks of things to do, although we expect we will be splitting up in-process tasks as we lift away the hundreds of videos an hour we’re currently bringing in. It’s a huge management headache but we seem to have it all going well. We’re learning a lot about a huge volunteer team project, too.
Enjoy the speech. Give some cash. Give some time.
Categorised as: computer history
Comments are disabled on this post
Awesome initiative and effort by people who care about historical digital archiving. I wish the best luck for all people involved in this.
What will Yahoo (indeed, The fucks) be deleting next?
Just listened to the audio of your talk, and it definitely made me think about what I’ve put up online over the years. Luckily, I have all of my photos and artwork stored locally (and backed up!) and there are ways and means to get most of the stuff that I’ve blogged over the years copied for posterity. I’ve heard recently of friends having stuff yanked from Facebook with no specific reason given, so the risk is definitely there.
Unfortunately, I’m not in a position to make a meaningful contribution to Archive Team’s work right now (stoopid economy!) but I’m blogging this as widely as I can to let others know about what you’re doing.
Hear, hear, Sir!
By the way, have you ever considered doing a TextFiles/ASCII-themed iOS app?
I ask because I am writing this from my Apple iPad.
Jason, thanks for a great talk on the important work you’re doing. The story about the Archive Team giving the woman a 4 MB zip file of her late husband’s memories saved from Geocities moved me to tears! There is much humanity in our digital history.
Google Video is doing the same thing, I think they are deleting everything on May 15, not sure why they need to delete when they could easily just archive it.
UPDATE – Google video actually has an export feature now that lets you easily export your videos into Youtube