ASCII by Jason Scott

Jason Scott's Weblog

The Wikipedia Database Secret —

Actually, it’s probably not entirely a secret, but here you go.

Wikipedia database dumps
fail constantly.

They fail in great numbers, and are then not re-attempted for weeks. As a result, many changes go on for months with no backup. The databases sometimes scroll off, meaning you lose the older ones while not having new ones. It is a big goddamn mess.

Maybe you didn’t know about these database dumps. I’ve been downloading them pretty seriously for a few years now. Even the insane twiddling of a thousand little emperors can’t divest things like the great talk pages, the surviving-for-a-while articles later deleted, and the link lists. It’s worth it to have these things. It’s something else to keep around.

The administrators like to say they’re working on it, but the fact is, they’re not able to keep up. They’re breaking. Millions coming in and they can’t make this work. It’s very sad.

I save a lot of things. It never hurts and disk space is cheap. I’ve been downloading every wikipedia image uploaded. I’ve been downloading every flash sent to 4chan. I’ve been grabbing many webpages and offered items, and many times in the past few years this has been rewarded, as things are lost forever… until I put them up again.

Here’s hoping someone there gets their act together. I’m waiting….

Categorised as: Uncategorized

Comments are disabled on this post


  1. Spirit says:

    May I ask what do you use for website saving?

  2. Terry McCall says:

    packrat. =p

    That’s some interesting info though… with a project as big and active as wikipedia I wonder so hard what kind of solution there could be.