ASCII by Jason Scott

Jason Scott's Weblog

A Valentine from Archive Team —

Well, not so much a valentine as a status update. But we’re just trying to help the lonely, here.

It’s also another call to arms.

But first… the Geocities Torrent/Collection. You’ll be glad to know multiple copies are out in the wild. However, the main torrent has taken a little time to get to the roughly 90% seeding that it’s now at. The main reasons were a key disk drive failure and a dying but not dead router. Both problems are fixed, and we’ve been cranking away, and expect to have 100% seeded within the week. Additionally, a stack of hard drives handed to me at Shmoocon resulted in five copies on said drives going out via good ol’ mail. Some people will seed, others will work with the items, either way – saved.

People who have downloaded the amount they got so far have been ripping apart the 900gb of files in the collection (643gb compressed) and working with it. I think my favorite, even though they’ve made fun of the choices I made in compression/setup/schedule of downloading, is One Terabyte of Kilobyte Age. Between their commentary, combining of contexts, and general curating, you can begin to see all the interesting ways people used Geocities to express themselves and live their lives. This is just what I had hoped would happen.  I don’t think this torrent is for everyone, which is why the README is very clear that sites like Reocities will do a lot of the job for you.

So great, Geocities is pretty much out there, and will join the world again, with a lot of neat uses of it coming up.

As for Delicious.com, I’ve been scraping usernames and URLs from Delicious from some time – so far, I’m past 3 million accounts. We’ve not really been expanding that project out because Delicious hasn’t been declared dead or sold – yet. But when the time comes, this constantly growing list of users and URLs will be utilized to do a final grabdown of the whole mess.

So what’s next for Archive Team? Funny you should ask!

Yes, that’s right, our big project right now is Yahoo Video, which has the more-hilarious-every-fucking-day slogan of “It’s On.”

Late last year, Yahoo! snuck this banner into the top of Yahoo! Video:

Yahoo Video didn’t grace the world with Alt text for this image, which is kind of a slap to the blind and visually impaired (and if you think the blind/visually impaired don’t use a video site, well, let’s go with “you’re wrong”).  So allow me to make it slightly more accessible: “Yahoo! Video is changing. We will have new policies for user-uploaded video, and existing uploads will be removed on March 15, 2011. If you have uploaded video content to the site, click to find out more.”

If you click on said banner, you get the help page. No other guidance. What I guess they want you to do is root around like a fucking mole in panic mode trying to find out how to get your stuff. Let me save you some time; they want you to see this:

“On December 15, 2010 the functionality to upload a video to Yahoo! Video was removed and download functionality, available through March 14, 2011, was added to users’ video profiles to allow retrieval of content. The user-generated content will be removed from Yahoo! Video on Yahoo! Video on March 15, 2011. We apologize if this causes you any inconvenience.”

So as usual Yahoo! is deleting terabytes of user-generated content and as usual they are doing it in a clunky, fucked-up manner and as usual the timeframe is arbitrary and out of nowhere and as usual Archive Team is here to clean up the fucking mess.

So we’ve been downloading it. We’ve been downloading it for a month. Seriously.

It’s been a crack team of people, all donating time, bandwidth and disk space to download every single video out of Yahoo! Video. We’re at full clip, but we need more volunteers, or we’re not going to make it.

Even though it’s the second-most popular video site behind YouTube, Yahoo! Video is actually sort of small – we’re sure it’s under 20 terabytes, and it may be significantly smaller than that. We know it has 9.3 million users. We’ve scraped the full user information and video listings for 6 million of the 9.3 million users (with the rest falling very quickly, probably within the next few days). From those video listings, we’ve been downloading the user videos: we’ve grabbed the videos of over 1.4 million accounts (not every account has any videos), and we’re well past two terabytes. We’ve got this whole thing down to a science – you need a unix box, python, wget, and an iron will. That’s it.

I’M MAKING A GENERAL CALL HERE. GO TO #ARCHIVETEAM ON EFNET AND ENLIST TO ARCHIVE TEAM. We’ve got less than a month now before Yahoo pulls the plug.

We’ve got the whole thing going smooth as glass. If you’re not sure you can help, come in and ask. If you think we’re nuts, go away. And if you want to know how we’re doing, stop by.

Don’t delay. We have less than a month.

It’s On.


Categorised as: computer history

Comments are disabled on this post


7 Comments

  1. sep332 says:

    Did Yahoo email the account owners to tell them how to get their videos back?

  2. Different Jason says:

    I saw (on Twitter?) that you need/want people with bandwidth, disk space and Linux. Anything those of us with bandwidth, disk space and Windows can do?

  3. RandomUser says:

    Jason, you could create a small partition and install a small Linux installation than help out. Or alternatively you should be able to do all this from within a LiveCD of Linux. If you dont want to install.

  4. Adam says:

    What will the archivers do with the videos once they have been downloaded? Will multiple people have certain chunks of the list?

  5. A 3rd Jason says:

    I have 24tb avail..anything I can do? I have a 100/20 Comcast connection too

  6. Shadyman says:

    Jason (the First) said in his post to go to #archiveteam on EFNET IRC. :)

  7. marius says:

    Still have about 100 GB to download from that geocities torrent. Since the 20th of January when uTorrent crashed the last time, I downloaded 108 GB and uploaded 420 GB, but there’s no seeders again.

    I’m planning to repackage it in a more efficient manner (sort of like the old style game “rips”) while keeping it extractable in the future (using open source compression software that’s available for both linux and windows)… hopefully i’ll get it down to about 400 GB compressed.

    Joining the irc channel now.. my server has plently of bandwidth that’s going to waste so why not keep it busy with something.