ASCII by Jason Scott

Jason Scott's Weblog

Just Solving the Problem: A Review —

So, in the summer I suggested the idea of a “Just Solve the Problem” month, the idea of putting together a way for people to assemble and attack an extant, “unsolvable” problem and improve the state of the universe before it burns out. I suggested November, which is a nice, generally dull month. And I suggested the “File Format Problem”, which I had opined was just the sort of “unsolvable, unless endless energy was applied to it” problem that is out there.

It’s now December 1st. We did Just Solve the Problem Month. How did it go?

The short answer is it went very well. Dozens of people contributed, and we built a wiki that references thousands of file formats, and has entries on many hundreds. All of the contributed writing is CC0/Public domain, and Archive Team will be going through the Wiki shortly and deep-downloading all the referenced websites from the Wiki, to ensure these found materials are not lost going forward.

The Just Solve the Problem Wiki is changing URLs, too. The new URL is:

fileformats.archiveteam.org.

The Justsolve.archiveteam.org site will soon become a general one for “Just Solve the Problem”, and will have links over to the fileformats wiki that this first project generated. The fileformats wiki will continue to live on and be added to – the advantage of the project is that the 30 days can be considered the “beta” or “startup” phase and additional changes can be done to the now-living permanent URL and site and continue to grow.

So let’s discuss the positive aspects.

First, we now have the foundation of a file formats reference list that extends out in many aggressive directions, covering a wide spectrum of encapsulated standards for reading information. While of course we’re nowhere near “complete”, this site stands as a non-affiliated, non-censored/constricted collection of information to be used to bring a whole family of older or obscure files into the modern era. It’s a good place to be.

Next, the wiki format means it’ll be possible for people so inclined to continue to contribute to the project – we didn’t just put out some bindered report and call it a day. It’s a living site, just one that is going to have its own impetus for sticking around and not because it’s part of some “event”. I’ll be contributing to it for sure, and I hope others continue to as well.

Just as a general directory, the File Formats wiki has a lot of usefulness. Check out this entry on ATASCII, which is the Atari-specific character set used by Atari Computers for a number of years. Character set images, utilities, videos, and PDFs provide the documentation for someone encountering ATASCII (or beginning to understand what they have in their collection) and will provide all sorts of help for those folks for a long time to come. As we buffer away the linked-to resources and store them, the critical information referenced will have permanence as well. It works out very nicely.

Expanding it out past just some basic file formats was also enjoyable, as people added information about punch cards, photographic film, and Interactive Fiction. Good links going to a lot of places, and it was fun to see what churned out of the effort as time went on.

So yeah, wild success on a number of fronts. The project was announced, the project happened, people contibuted to the project, and a Thing has come of it, a Thing that has a future to be added to and improved over time. That puts it past a lot of “hack event” projects and way past a lot of open source endeavors. A rousing success.

The rest of this entry is me punching people.

Before I punch anybody, though, allow me to punch myself.

The project had a lot more intra-party stress than I had expected, and a lot of weird moments and dark times. I’m going to take some credit for that. I’m a Doer. I make things Go. I am action oriented, and believe that effort expended beats effort discussed. I’m not against planning and preparation – far from it. But even the planning and preparation I do tends to be oriented towards the goal, and not necessarily talking about all the ways that the effort is not worth it or why some aspect of it is less than kosher. This means by some standards, I run roughshod over the feelings of others to get things rolling. This method works very well in Archive Team’s main contexts. It does not necessarily work in all contexts.

It especially didn’t work with a project consisting of the type of flutterbys who would be most deeply attracted to a file format enumeration exercise.

I got a lot of “whys”. Why are we here? Why are we doing this? Why include languages in a file format collection? Why are we doing the hierarchy this way, or that way? Why are we using breadcrumbs versus infoboxes? Why are we locking ourselves down in these terms when we have these other terms? Why? Why? Why?

I should have realized that the people whose daily lives are invaded and owned by rigid structure, would, devoid of that structure, start to implement even more rigid structure themselves. And they would resent being told it was something going on later. I thought the constriction of the types of institutions people work in related to digital preservation was a necessary evil, to be resented and left out until the last minute – for some people, it was a prerequisite for a sane, safe process. So that was a surprise.

Instead of calling this a one-time event, I got ahead of myself and called it a potential first round of a perpetual event. This completely confused some people, possibly terminally. They wondered why I’d chosen THIS problem, when X and Y sort-of-solutions existed for it. And they assumed (in some cases), that file format enumeration would be the annual event, not a one-time event that would be followed by other one-time events.

I say this, and yet some people shone through. If you take the time to browse through the edit history, you start seeing names of people like Dan Tobias and Halftheisland and others – people who contributed hundreds of edits, the tough boring ones that can mean completeness instead of dilettante smatterings of entries. They worked hard on this thing. Bravo, you folks.

Of the 109 people who registered accounts, and remember, you had to mail us and request an account, 50 never did a single edit after getting their account. Boo.

What’s next? I seriously don’t know. I’m going to cool down for a month or two, clean up things on the Wiki with the help of others, and reformat the justsolve page to be general. Right now, my urge to do another one of these is the same to shoot myself in the crotch with a nailgun while standing in a tub of saltwater, but I have worked in a number of things (films, plays, loading firewood) where talking right after things are done sounds like the fall of heaven, but hope and energy reblooms and the return to positive hopes springs eternal. So I’ll get back to you on that.

All in all, an amazing experiment. I learned a lot. The world is a better place. The Problem is a lot more solved than it was 30 days ago. Thanks for participating, everyone.

 


A Month of Entries —

I have been busy.

So very, very busy – so busy, I haven’t had the time to sit back and write the kind of quality, long-term facing entries I prefer to use this weblog to write. While I understand people using these in the manner of tumblr, with disposable one-offs choffed from the snatched seconds of a busy life, I prefer to write things that people can link to, can get ideas from, can shove into my face like an Italian Ice six months or a year down the line.  So I end up deciding that no entry at all beats a half-assed entry that doesn’t add much to The Conversation, even as I watch many aspects of The Conversation be navel-gazing horseshit.

Well, for the month of December I will be blasting out an entry a day, every day. There’ll be some backdating and some pauses, but that’s what’s going on. I’ve got things I want to talk about, not the least of which are my current projects and ideas, and this is the place for them.

So look the hell out.


JSMESS Breakthroughs —

We’ve been nibbling away at the Javascript MESS project for a long while, and the current lead guy has had a breakthrough in the last couple of days.

We’ve been partially inspired by a number of Javascript projects out there that have shown the whole thing is possible, and not just possible, but probable. Most striking is the Janus Javascript Emulator, an emulator of the Amiga family of computers that gives you an enormous amount of startup options, reads a range of floppy images, and produces a pretty good approximation of the Amiga’s functioning… all in Javascript!

I’ll re-iterate the main driving reason for using MESS over just writing individual emulators – it’s that it offloads the difficult emulation issues like accurately portraying peripherals and unusual cards and the trickiness of some memory maps to where it should be. If you decide to watch either the MESS or MAME development trees, you’ll see that they’re constantly, unendingly, improving the whole endeavor top to bottom. There shouldn’t be yet another set of people redoing that work on a lower level with emulation – they should be contributing up to those two projects.

Once the system for porting MESS to JSMESS is smoother, I forsee a time when you go to a page and download an “emulator pack”, that consists of a bunch of Javascript files and support Javascript files that you then install wherever you’d like, a model not that dissimilar to the JWplayer or VideoJS players, which focus on Video and Audio. You figure this project releases quarterly updates of the collection, with improvements in speed and the wrapper that drives the players, and there we go – all of computer history is embedded like audio and video are now.

Anyway, we have now gotten JSMESS to render the following systems:

  • Odyssey²
  • Atari 2600
  • Texas Instruments TI 99/4a
  • Colecovision
  • Fairchild F

We have also gotten it to start but fail at:

  • Atari 800
  • Commodore PET

The key to these failed ones is that the process of compiling the new Javascript has become notably easier. We’re going to step through a bunch of these platforms to see performance, and then work on a system of near-automatic compiling to be able to generate the roughly 400-500 platforms that MESS has.

I keep making the call for it, but the fact is developers who want to hop in on this project would be very welcome, especially people familiar with javascript or who have worked with the MAME/MESS codebase. But all are welcome – understanding how this all works is a shared experience that should really have a lot of people looking at it. Write me or visit #jsmess on EFNet in IRC.

But we’re moving along! Life is awesome! And most importantly:

Texas Instruments Household Budget Management has been saved for future generations – and who can put a price on that?


Just Solve the Problem Month 2012: Nitty Gritty —

This will be somewhat long and somewhat involved. It’s a posting meant to give my personal positions and assessments on the Just Solve the Problem 2012 Project, which is “Solve the File Formats Problem”. Ideally, it answers everything but specific choices down the line. If not, the Wiki will have answers. Here we go.

General Purpose of the Project – What is the “Problem”?

I call the File Format issue a “Problem”, and I go farther than that to say that it’s a problem deep enough that it requires hundreds of people a month to work through. For some that’s a very optimistic estimate and for others, well, it’s not clear there’s a “problem”. So let me explain that.

The “Problem” goes like this. There are people, a lot of people, who have information that is encoded in some sort of format, be it electronic or wrapped in colored ribbons or stamped with some bizarre upside-down chicken scratch. They’re faced with an issue that the thing they have within their possession is missing a final piece or set of pieces to release the information within. If you have a floppy disk, for example, you need a drive to read it. But even if you have the drive to read it, the machine to read the data might not be with you, and even if you have THAT and you are able to put the disk in the drive next to the machine, you might not know how to get the information in the format on the drive into something you can read. It’s a problem, you see.

(Now, there’s a SECOND layer of information you’re going to miss no matter what, and I’m just mentioning it to be complete – obviously if there is no record of the context of this data, that’s not going to be evident no matter what you do. If this poem is the last poem a person wrote before going missing, or if this scribbled set of English words that looks like a shopping list is in fact a calculation for committing a bank heist in code, then no amount of what you do is probably going to find that. For that, you need the efforts of lore, of interviews, and of collecting context. But that’s not part of this problem.)

So, the File Format problem comes into contact with the lives of hundreds of thousands of people. In many cases, they take the most efficient route: Fuck it. It’s old, it’s probably useless, you’ve gotten by without it for 15 years, chuck that crap, we need that room for a guest bed.

But that’s not sufficient if, say, you’re an archive or a library that has been gifted with many floppy disks created by a celebrated artist who died young and left a lot of mystery behind them.  Then you not only want the data off the floppy disk, you want to really understand the format of the files on the floppy disk, including whether or not you can recover files, find changes in the files, or a whole other manner of data that might have significance. As that article I linked to shows, it has significance indeed – critical changes in lyrics and structure, even to the point of showing possible intentions to change the work further. The simple miracle of pulling data off long-obsolete floppies becomes a bigger problem as you try to understand the formats, and even worse, understand unexpected side-benefits of the formats. There’s a lot there.

So assume, then, that what’s hidden away on that “dead media” or inside some file folders in a .zip file you found has actual significance.

For most institutions and individuals, this sort of interest/dalliance has a very specific path: you have a pile of one kind of crap, maybe two (108 floppy disks, 12 data cassettes) and you want the stuff “off” it however you can. Having looked around, you might or might not find solutions, although solutions do abound. And you are certainly never going to finish getting the 108 floppy disks and 12 data cassettes finished and then launch into a crusade to find every piece of magnetic media on your suburban block and volunteer to help everyone excise their data from the plastic coffins all of it languishes on.

My line of interest and work puts me in touch with a lot of people in the “history” biz, be it the professional archivers and librarians or the intense hobbyists of vintage and retro computing. And what really started to stick with me was the way that almost all of them had this file format problem, and had come up with some level of solution to it. Someone might go so far as to make a product, or a utility, or a code library to deal with it. For example, the ANSILOVE project does a fantastic job of taking ANSI Artwork (a hack of a DOS-based text encoding system that got used to make great pieces of art) and drilling down deep into interpretation to go as absolutely wide in saving the text for future generations. Trust me, for 99.5% of all cases, it “solves” the ANSI Art translation problem. There are exceptions and there will ALWAYS be exceptions, but generally, if you want to see these files presented in a wide range of modern platforms, this utility/project will do it for you.

That said, a lot of people who remember that obscure format might have no idea something like ANSILOVE exists, might not know to go look for it, wouldn’t know where to start. Even though the solution exists, it might as well not exist for these people because they don’t even have the faith or the thinking to consider the solution might not consist of finding vintage hardware, dragging what they want into the old thing, and losing months trying to make the environment work again.

Now, expand this specific situation out. Keep expanding it.

No, really, really expand it out and now you’re running into The File Format Problem writ large, the wide spectrum of missed communication, lost information, and obscurity that things are suffering in. That’s what’s being addressed here.

And when I hang out with people who have an interest in some aspect of this issue, they say the same things: The problem is unsolvable, because it’s too big – too much is left to do, too many things have to be searched, there’s no funding to research this forever. We can’t make it happen, and we’ll just have to make do, keep writing grant proposals, and hope for the best.

This project is meant to put that idea to bed – to make it so that only the most obscure, customized, no-record-exists-of-the-data-or-the-format situations will linger on. To make it that if someone gives you something on media, you can say “yeah, I think we can work with this” and be surprised that it doesn’t, instead of the other way around.

But There Are a Lot of Already Existent Projects Like This, Don’t You Know

I’ve said this a dozen times now and here it goes again: Just Solve the Problem is ABSOLUTELY NOT THE FIRST TIME THIS HAS BEEN TRIED AND IS DEFINITELY NOT THE ONLY SUCH PROJECT UNDERWAY IN THE PRESENT. There are plenty of versions of this project out there. Tons. To imply otherwise would make me a liar and blind on top of it, because many folks have come out to let me know, just in case I didn’t find out myself.

What distinguishes this project from all those similar efforts is the following:

  • We have no affiliation whatsoever. There’s no organization with its own politics and biases in place, there’s nobody to go “woah, hey, let’s not go there”, and there’s certainly not anyone who’s going to pull the plug because someone flies a little too close to the sun.
  • The white-hot effort is very directed within a 30 day period. Yes, this will stay up after 30 days, but the idea is that people are able to work on this project immediately and know that a bunch of other people are right there, working with them. There will be many eyes, many hands, involved in this effort and you’ll reload to see more and more changes go on throughout the month.
  • I will say this even more explicitly in the next section, but this project doesn’t shield its eyes from these other projects and sites; it embraces them. It depends on them. What makes the File Format Problem project even somewhat achievable is the very existence of all these other resources.

Freed of these (entirely legitimate) boundaries of budget and scope the other projects and sites have, Just Solve The Problem 2012 can go in directions heretofore unexplored or left as “frivolous” and “wasting the budget”. That’s the big deal.

And the other big deal is that the effort to enumerate all these items is absolutely a public domain effort. The basic tenet is that all the collation and combination and addition of cross-referenced information this project brings can seep back into all the linked projects. The people working away can paw through the piles of data this project brings and then pick back whatever they want – it’s like they got a team of researchers and contributors for absolutely free. The rising tide that lifts all boats.

So let’s go into the basic fact of the project:

The Project Initially is the Collation of Already Extant Information!

The quote from William Gibson is this:

The future is already here — it’s just not very evenly distributed.

The idea being that a lot of what we think of as “the future” exists, but only in limited areas, available to researchers or the rich or otherwise prevented from being universal. This same situation exists with file format information – it is very, very rare for things that were put into a format to have not had documentation generated for them. In the case of a lot of file formats, software might have been written that reads and writes in that format. The code might function as the documentation, or in someone’s head right now is the format information that would unlock the data from its obscure setup.

I expect very little original research to be necessary to solve this problem for the vast, vast, vast majority of the file formats being addressed. There are people who are well versed in the BetacamSP format. There’s machines, there’s documentation, there’s examples and there’s available tapes. It’s just not right here, just like all of that same sort of stuff for punched cards and piano rolls and Lotus 1-2-3 files are not right here.. but again, they can be.

Myself and a veritable army of volunteers have been uploading Shareware CD-ROM images to archive.org. We’re well past 1,500 CD-ROM images. And we’ve got a couple thousand more to go up. Well, buried on those shareware CD-ROMs are tens of thousands of utilities, written in the present day of the file formats they use, that can transfer between formats, commit action on those formats, and create new files using those formats – and that’s not even counting the documentation, which often shows off the file header information or back-channel knowledge of the file formats being used. The concrete answer to thousands of file format questions are just sitting there, waiting for someone to connect them up.

Good work has been done on the other directories and sites by their staff. In many cases, they have limited resources of disk space, bandwidth, contributors or funding to go too far. We can take what they have and integrate it and link right back to them. We can make it that if someone finds they have an IFF/ILBM image from an Amiga, well hell yes we’re going to have a page with every last piece of collated information, including code and writing, that will help them make that stuff live again.

Realize, therefore, that there will be volunteers on this project who will do nothing but shuttle between websites and add links to the Just Solve the Problem Wiki. That’s all they need do – wander into the entry for FLAC and dump in a hundred informative links, and then move onto the entry for Wax Cylinder and add those. They don’t have to knock on doors, or make phone calls, or run endless nights of coding and experimentation – they have to take someone else’s experimentation and endless nights of coding and link to it. That’ll be quite a heroic act in itself!

What kind of Person Will Be Involved?

I’ve sketched out some roles that people might play in the Wiki:

  • Explorers are on the never-ending quest to find more file formats, more obscure references to file formats, and hidden away gems and information the File Format Problem can use. They reference the Sources or even find brand new Sources to use and add them to that list.
  • Backfillers go to already-extant entries and add in greater details, including summaries, links to pages others have written in the web about the format and the subject, and acquiring some select images or items to represent the format. They pull from the Sources but also just do the basic effort to make a page be something more than blank.
  • WikiWonks look over a given page and fix the MediaWiki encoding so that the items is more easily readable. If you create general templates that can serve out pages better, we’ll apply them to the whole of the pages that fall under the template. The more time freed up to acquire information, the better.
  • Essayists are writing or referencing sets of documents to create critical new histories or descriptions that go far beyond a technical view of a format. If there are litigation, research, or human aspects to the format that should at least get a summary, the Essayists are adding them.

All of these are in flux and you need not be one the whole time. But all are needed and all will play a vital part.

I’ve glanced at the Wiki and you do things a Certain Way and So You Are DOOOOOOOMED

So, my experience in some quarters with a project like this is that people feel they need an entire specification written out, must float it amongst committees,  must run it for authorization and sign-off from authorities, and only then begin the slow process of applying for a grant to make it all happen.

Yeah, well, guess what. We’re already underway.

As of this writing, we’re dumping in general headings of file formats, building up a huge source directory (including sites, documents, books and other materials), and kind of flinging together the ontology as we go along. Give it a week, it’ll all be different.

I thought you hated Wikis.

I’m known as a major Wikipedia critic but that’s a very different thing than the software itself, Mediawiki. I happen to like that software very, very much. And being under the white-hot testing environment of the Wikipedia, I know the software holds up, and holds out. For the act of collaboration, of calling together all this data and then implementing templates and automatically generated directories, it’s a great way to go. I’m not concerned about the software suddenly hitting some upper limit as we do this. We can concentrate on getting the problem under control.

So now what?

Well, all I ask is that you try.

Write to me at justsolve@textfiles.com that you want to register an account on the Wiki. Give me the username you want. Come on and poke around.

If you think the project is worthwhile, tell your friends or colleagues or communities about what we’re doing. Rope them in – get everyone to pitch in.

I wish we had a thousand people working on this. 1000 people for 30 days would demolish this problem. The resulting directory of file formats and links would be a breathtaking version 1.0 of reference material, the go-to location for getting started on reading or saving a file that otherwise would languish and disappear. It’d be a place where you’d know who to contact and what to use. It’d just solve the problem.

Let’s do it.

 


Just Solve the Problem 2012: High-Level Stuff —

As Just Solve The File Format Problem Month bears down on us, I thought I’d get this show on the road with more high-level discussion of the idea. It makes sense to read the first posting on this, as well as having you be aware a posting after this one will be much more detail-oriented. Because, after all, I’m obviously attracting people who are details-oriented.

Reaction to the announcement of this project and this general “solve the problem” idea has been rather massive, to say the least. Accolades, essays, and tweets (not to mention e-mails) have poured in from a wide spectrum of folks, taking heart and inspiration from what I was addressing. So, thank you to all of that.

A small sliver of folks thought I was proposing that one month out of every year was “Solve the File Format Problem Month”. No. I was saying that we should have some project that just needs a lot of bodies that a lot of people agree would be nice for the world if it had all those bodies. And then work on that for a month. That would be “Just Solve the Problem Month”, November of each year. And that the first year, this one, was “The File Format Problem”.

I’d love it if later years did other projects of a general nature, like “Let’s Destroy this Idiot Software Patent Situation Month” or “Let’s Track Down 100,000 Orphaned Works to Free Them Month”. But first things first.

Also, I don’t think that on November 30th, we immediately shut down everything and issue a .txt file, going “DONE”. I’ll ensure the Wiki related to this project stays up, and people are welcome to keep adding to it, making improvements, and generally continuing the original effort, just with this perceived boost from what I would hope is a lot of people.

In terms of amount of people, I wish it was 1,000 people. One thousand people, spending some significant amount of time on an effort, can do a breathtaking amount of work. That’s a wish, not a plan – I can’t get the word out far enough on my own, and so if people want to raise some hay, please do so.

There’s now a twitter account: @justsolve. The Wiki, running on hardware and bandwidth donated by TQHosting, is up at justsolve.archiveteam.org. If IRC is your thing, there’s a #justsolve channel on EFnet.

Finally, before I go work on a more details-rich post, let me address two corrections that come out of the many fine reactions. They are, I believe, two sides of the same coin, so I’ll link them up.

First, there was a lot of commentary from archives people and library people about this work I was doing “for” them. The implication is that this work that was meant for archives people to archive things or for library people to library things, or whatever. But it’s not that way. It’s not that way at all.

The goal, ultimately, is to produce a very large, very comprehensive file format repository and guide… one that is not hung off of some grant or living in the shadow of the flaky politics of an institution. In that shadow, a directory or cataloging must, by dint of its administrators, limit itself to some specific subset of items, say, a list consisting merely of everything one building has come into contact with and nothing else. I want to shed all this political and institutional bullshit I see choke so many uplifting projects like blasting poison into a forest, I want this thing to go high and wide and free. I want people to wince and want to have nothing to do with one format and go hog wild on their personal favorite. I want the question asked to be “why not” instead of “why”. I hold, very strongly, that given enough organizational approach in a Wiki, you can sustain an insane amount of information. That is what I want to do here. So yes, libraries and archives and vintage people and historians and curious onlookers and bored developers will all find a place here, a welcoming place. It is, in fact, for none of them and for all of them. I think this is the critical different that distinguishes this project from so many others.

And that’s the second part, the other half of the coin. I am well aware this is not the first project of this type. Oh, man, could you imagine if this was the very number one time someone tried this? That’d be weird. Trust me, I’ve been collecting these things for years.

I am not interested in forcing people to copy over information already elsewhere on the internet. Instead, I’m looking for an army of people to put together a comprehensive directory and collection of related items that will provide a foundation to enable anyone, be they pro or amateur or tourist of the world of data, to get a footing on just what they’re looking at.

And that’s what’s up. Next: Some details.


Someone’s Been Talking —

This year I’ve been doing many, many more presentations and conferences than I would normally do – I guess this whole “save the digital planet” thing has taken off.

Along with all the presentations have come travel and trying to re-adjust my life so I can do the majority of what I do from anywhere I happen to be. This has mostly worked out but it does mean some things get shortchanged. Like, oh, ASCII.TEXTFILES.COM.

So if you’re not browsing my twitter feed or otherwise are hankering for seeing me scream at people in audio and video recordings, here’s a handful of the presentations, with more to come after I fix them up.

In Brighton, England, I gave a presentation at an event called DCONSTRUCT. The talk was called “The Save Button Ruined Everything”, and here is a page with the full presentation in audio-only. A shame about no video because I was sporting a crazy-ass look.

At the wonderful OSBridge in Portland, Oregon, I gave a talk called “Open Source, Open Hostility, Open Doors”. They did a great video recording and put it up on youtube.

Finally, just this weekend I gave a talk about the Prince of Persia source code extraction I played a part of. It was at Derbycon, which I got a nice invitation to last year but didn’t see enough of – and this time I stayed all the way through. Irongeek, who does the video side of things, had my talk up on youtube within 48 hours! Here’s the full presentation.

There are roughly five other presentations I’ve given this year, but some need editing (the recordings are done oddly) and some have no real recordings at all (I was paid for them and left it up to the paying body if they wanted it to be exclusive or not.) But for now, here’s a few hours you’re not getting back.

Enjoy.


The Charge of the Scan Brigade —

UPDATE: Read the Bottom.

I’m writing this quickly because it’s a simple idea, the simplest of simple ideas, although it could really change things up in the world. It is a San Francisco-based thing at the moment, in case you want to know if you can throw yourself bodily at it or need to throw your San Francisco friend bodily into it.

Here’s the pitch.

The Internet Archive (where I work at) has a room full of Scribe scanners. These are very nice scanners! They can take a book or item that is bound, or items that are not bound, and allow you to scan them very, very quickly. Much quicker than the classic “person with a flatbed” and in a way that is not “person with a flatbed and an x-acto knife and a very soon to be sad bound book”.

A Scribe scanner can scan a hundreds-of-pages book in less than 10 minutes. If you really have your act together and the book didn’t spend 9 years at the bottom of a swamp or have a “surprise” flaw in it, you can do it in under five. These are the books you see scanned on the Internet Archive’s book collection.

To do this, the Internet Archive has a set of paid employees (often doing contractual scans with libraries and other organizations) and volunteers. They work nearly 24 hours a day, in a couple dozen locations around the world. They scan books. A LOT OF BOOKS.

Here is a Live Statusboard of books being added to the Archive. Make it the full-screen item on a monitor in your room – it’s very exciting. And clickable, in case something catches your eye.

So this one’s been brewing, and I have the go-ahead to pursue it.

I AND ANOTHER SET OF PEOPLE HAVE INTERESTING AND VALUABLE PRINTED MATERIALS RELATED TO COMPUTER HISTORY WE’RE SITTING ON. LET’S ASSEMBLE SOME SAN-FRANCISCO-BASED VOLUNTEERS TO SCAN THEM ON SUNDAY AFTERNOONS FOR A SET PERIOD OF TIME. LET’S PUT ALL THIS STUFF ONLINE AND THEN I AND OTHERS CAN CHOOSE TO DONATE THESE MATERIALS TO COMPUTER HISTORY ARCHIVES, SAFE IN THE KNOWLEDGE THEY ARE BOTH ONLINE NOW AND STORED FOR THE FUTURE.

Pretty simple, huh.

It is October 1st, 2012. This Friday, October 5th, the Internet Archive has an open lunch where there’s tours of the place, including the scanning room, and people get up and talk about what they’re up to. The Internet Archive is at 300 Funston Street. I’m here all week and into next.

Do you have an interest? Would you like a tour (either during the lunch on Friday or another time you arrange with me) and then you’d get schooled on how to add an item’s metadata, and then you scan in these materials?

Well guess what.

It’s time.

Contact me at scanbrigade@textfiles.com or come to this lunch at noon this Friday, let’s talk it out.

My dream is there’s this known shift, afternoons on Sundays, where the Scan Brigade mailing list agrees who takes the shift for this week, and it gets done. The more people, the more likely every slot gets filled. As a bonus, I’ve been told there’s room for Scan Brigade people who want to come at other times than this established Sunday to come in. So do you get the scanning bug and want to do it three times a week? That can happen.

Let’s do this.

Let’s demolish this pile.

It’s worth a shot, right?

CHARGE.

Update: 

So, I shot a little high with this one. What I didn’t know (and probably couldn’t, because I’m a remote employee), is that the Internet Archive hasn’t run a weekend shift in a very, very long time – and they no longer have an evening shift of any kind, having scaled back recently. Maybe this is an excellent time to consider a tax-deductable donation! But regardless, this means I can’t have people in on unusual hours, since there’s no-one else there, and I don’t live in SF, which means I can’t open the place on weekends (otherwise I totally would).

So consider this one dormant. We’re looking at lending me one of the machines locally in New York, where I live, and then I’d have a different range of NY-based folks involved in the project. Details on that if/when it happens. The dream is not over!


What’s New at the Ol’ Salt Mines —

Hey there! Haven’t really talked in a while. Been pretty busy.

What with? Oh, speeches, presentations, editing and shooting some movies. You know, the usual?

How’s that archive.org thing doing for me? Oh, it’s doing fucking awesome, trust me. Best job I have ever had. Great folks, great stuff.

What have I been up to there? Oh, this and that.

Other than that, not much. Oh, but my buddies have been SUPER busy.

Everyone’s got their nose to the grindstone. You know.

Hey, gotta go. Good talkin’ with you.


Sony Vegas Media Bins: Not Ready for Prime Time —

Let me explain what’s going on with this entry. Summary: Most of you can ignore it.

Perhaps it’s a reflection of how things have gone with online discourse, but I’m mostly posting this as a way of gaining search engine attention to the fact that Sony Vegas, the video editing software, has something called “Media Bins”, and those media bins kind of suck. In this way, maybe Sony will make one of their guys at Madison Software (now called Sony Madison) fix this part of the software.

I’m doing what is perhaps a bit of an edge case with Media Bins, but one would hope in a world where Sony Vegas brags about being worthwhile of professional projects, something like Media Bins would be basically usable.

Media Bins are ways to declare clips or parts of clips in Sony Vegas to be entities. For the DEFCON Documentary, I decided to do this because with 1.4 terabytes of film footage, and thousands and thousands of potential shots, my old trick of rendering out clips seemed quaint and unwise.

So here I am with a big ol’ mess of folders:

…which should be fine and good. When I do the final edit (I think, I hope), I’ll be able to traverse the folders and see the inside clips, which I’ve worked hard to describe, like this:

(There’s two clips because one is the video track and the other is the separately-recorded audio track. It’s weird but it would take forever to bind them up, and sometimes I want to choose which one I use.)

And this all should be great except:

  • No export function. Once you make a bin of clips, you are owned. You can’t produce a .cbin file and put it into another Vegas project. You can’t even really cut and paste.
  • At 5,000 clips, it’s already slow to do any work in the main Media bin. Terrifyingly slow, really, considering how many cores and how much memory is available. This smacks of no optimization whatsoever.
  • Random scrambling of subfolders. I’ve had cases where I find the subfolders have MOVED TO OTHER FOLDERS. That is CRAZY. That’s grade-school.
  • No export function. Saying it again, I am now trapped in this specific .veg.
  • Can’t re-adjust the order of the folders. Why? When you create them, that’s the order they stay in, forever. WHY? Some people have come up with a hack of moving ALL of the folders to a sub folder and then, by hand, putting them back up one level in the order you need. Seriously? Seriously.
  • Drag-and-drop…. well, it sometimes works.

All of this points to a rather immature feature set, which is odd considering it’s been in there for years.

Fix it.

Thanks.


The Last of the GET LAMP Batch —

I figured you’re my loyal audience so you should know first – I turned around and realized I was down to the last 130 or so copies of GET LAMP. In other words, get your orders in as soon as you can to guarantee you get a copy of the original box+coin combination.

I haven’t yet decided if I will sell more, and I’m almost certainly not going to make more coins. The coins are awesome but that is a lot of money to spend on an item. The fans who were most directly affected by the existence of text adventure games to want an artifact bought them, while people of the “that’s nice” variety are the ones trickling in. Coins are expensive, friends – between 50 and 60 percent of the unit cost! (And also amazing).

On a more forward-looking scale, I do believe we’re just on the end of possibility with physical copies of media except for very high-end premium packages provided as one-offs for really special fans. That’s the sad part (although it’s much more convenient to point at something and go “NOW SHOW ME” than to click a button that says “AND MAKE IT EVENTUALLY APPEAR WHERE I LIVE). We’ll see how I feel in a week and after my duplicator lets me know how much another print will cost me.

I am delighted to have sold over 3,800 copies of GET LAMP around the world and that the image of my cousin smiling is in thousands of homes, and that this isn’t the sum total of his existence on the internet.

Hurrah for selling out GET LAMP!