ASCII by Jason Scott

Jason Scott's Weblog

It’s Time To Keep Solving the File Format Problem —

Here’s how a planned large-scale set of roving ideas and goals ends up being a case of just doing one thing right.

Theproblem

The original plan when I cooked up Just Solve The Problem Month was that there was a set of problems out there that just needed a few hundred people to contribute time and effort, and some otherwise seemingly insurmountable problems could be solved or really, really beaten down into a usable form.

Aaaaand what instead happened was:

  • We announced and set up a Just Solve The Problem Wiki for the first problem.
  • A lot of people worked on the Wiki.
  • I got very busy.
  • People kept working on the Wiki.
  • It’s been two years.

So, basically, we have one single output from this idea.

And that idea is the File Formats Wiki.

The File Formats Wiki has been progressively, constantly updated to for two years and has now entered its third year of existence. If you check the statistics page, you can see there’s been thousands of pages added.

I had previously said that November could be Just Solve the Problem Month. Well, the machine that this Wiki is on had to get some maintenance, so I moved it to December. And now, it’s just File Formats Month, I think…. that’s a good enough project itself.

That is, now.

So let’s go over it again, shall we?


 

The world has a problem. Well, the world has many problems, but this one is a big one.

File Formats.

Oh, we have been so good at file formats. We’ve happily come up with file formats whenever and however we want to, whenever information has been captured up into a bundle. We’re so good at it, we often don’t even look at what formats are out there – we just make a new one. And then we later change the file format we created to accommodate some problem we didn’t account for, and so the file format becomes “file format v2.0” or “file format NG” or even worse “entirely new name for file format”.

And then we forget about them.

Really new file formats, that is, ones created in the last five to ten years, are pretty lucky, relatively – there are webpages and standards posted into all sorts of locations, and utilities available besides, to deal with them. And really, as time has gone on, one person or another will document a file format to the best of their ability, stick it on a webpage, and there it will set for some time before being deleted or forgotten.

All of this is a problem. This is a problem that can be solved.

We’ve set up a Wiki, called the File Formats Wiki.

Everything on it is Creative Commons 0 (Public Domain-ish) because it turns out you can’t just say something is “Public Domain” and have it stick worldwide. But the contract is basically that – everything on the FF Wiki can be used anywhere else – the knowledge can be spread to other similar projects (and there are some) and every boat can be lifted.

Every file format or container for information is eligible. We even have Piano Rolls, Tree Rings, DNA, and Spoken Languages. Obviously, our big focus is on computer file formats, though.

We’re looking for descriptions, links, clarifications, and tracking down of documents. We want to make this stuff get assembled in one place as best as one can.

We want it that when you find something is in ‘a format’, that this Wiki could be your first stop on the research, and possibly your last one as well.

You register, you can edit it, and you can get started basically immediately.

I’ll be working on it all month. I hope you will too.

Let’s solve this problem!


Categorised as: Archive Team | computer history

Comments are disabled on this post


3 Comments

  1. DANoWAR says:

    What about other versions of file formats because of new versions of tools? Like, is the ADRIFT 4 file format the same as the ADRIFT 5 file format, even if it uses the same extension? (Just an example)

    Another suggestion: Mark formats as human readable or machine readable…or text-based/binary.

    • Jason Scott says:

      I hear a lot of suggesting, and not a lot of doing.

      • I’m always struggling, myself, with all those sticky questions about when to combine things and when to split them, and which title to use (fully spelled out, abbreviated, which of the variant spellings or capitalizations?), and especially, which category something belongs in. But I just forge ahead anyway, even though I know that by now there are a lot of inconsistent past decisions embedded in the structure of the wiki.