ASCII by Jason Scott

Jason Scott's Weblog

The Robot Army of Good Enough —

(I do not speak for my employer. I am just very loud.)

Pretty much any organization of any size has certain themes, beliefs and outlooks baked into them. Some of them might be obvious from the outside. Others are so inherent that the members might not even notice they’re completely steeped in it.


At the Internet Archive, there’s a philosophy set about access and acceptance of materials and presentation of said materials that’s pretty inherent throughout the engineering and the website. Paraphrased, in my own words, it’s this:

  • Always provide the original.
  • Never ask why a user wants something.
  • Now is better than tomorrow.
  • We can hold it.
  • Many inexpensively is better than one or none luxuriously.
  • Never send a person where a machine can go.
  • Enjoy yourself.

Some of this exhibits itself in how people use the site – they can grab anything, they can get a “library card” account but they don’t have to, and they can embed or direct-download anything they want. While the machines will derive out versions of the content, you can always find that massive .AVI, .PDF or .WAV that the content came from. They don’t keep user logs to any real degree. They don’t get in the way.

Internally, the rest shows itself in engineering and code – use commodity hardware which will break more often but which can be bought in much greater amounts instead of “Ol’ Trusty” that’s intended to work for five years without fail and “Ol’ Trusty” is all we have because we can’t afford more. The code will put an item up before it’s fully “baked”, that is, you’ll see the original .AVI file for a video item and maybe 20-60 minutes later, another derivation will show up, and then maybe another one after that. This sliding window of material population really confuses the end users in some cases, I’m sure. But it means you get it now, now, now, instead of when it’s all wrapped in a bow.

As things currently stand, and based on my now three years (!) of working for the organization and going out into the world to speak about the place and get feedback, the resulting good and bad of this approach is this:

  • Good: Nobody is doing what we’re doing in many cases, we have so much stuff, every time I wander there I lose an evening walking the stacks.
  • Bad: The site looks like poop, and it’s pretty hard to find the stuff.

So, to get out ahead of “poop look”, efforts are underway to redesign the site, and what I’ve seen, I really like. That’s all I’ll say because it’s not my project.

Regarding finding material and there being stuff, I think the priorities of the Archive have been really firm and right-minded: get the stuff first, quibble on accessibility or presentation later. Turning things away is how tragedy happens. What’s worse – something was taken in and put into a big storehouse? Or something was offered, and because it failed to have a MARC record or a metadata post-it-note on the outside of the archival-quality container file, it was sent back out into the night?

But the real miracle, the one that is perhaps really not obvious from the outside, is how much of the Internet Archive’s work is done by machinery and code.

When an item is uploaded, the user can designate and mention all sorts of aspects of what was sent in – the title, the description, when it was made, who made it, and a bunch of other interesting data attributes. The format allows a lot of extension, so if you want to indicate which of the 300 audio files you uploaded have dog sounds and which ones are recorded using a specific type of microphone, you can do that. It might not mesh with other items all that well, but that’s not your problem – you’re adding things that a machine might not ever know.

But a set of machines at the Archive do know a lot about your item, and will do work to add it all and create other versions of your item. For example, you can upload a .zip file of .jpeg images, and if you happen to name it *, it will create a .PDF file of it, an OCR’d version of any text in it, and an animated GIF file of the pages. With movies, it will take a massive .AVI and it will create a thumbnail set, a web-ready version (if it can), and so on. And bear in mind, this collection of tests is massive – it tries to determine the average pixels per inch, the orientation of texts, the framerate of the video, the number of tracks in a collection of MP3s and if there’s any tagging built in. It does a lot. And most importantly, with zero human intervention.

evaluateAnd here’s where the “controversy” happens.

By “controversy”, of course, I mean “people murmuring under their breath in the area of disciplines the Archive overlaps with”. Other organizations and practicioners of the arts of archiving, you see, have their own baked-in philosophies and credos, spoken and unspoken. And they don’t exactly see eye to eye.

Some I’ve encountered and observed:

  • Machines can’t beat people.
  • Zero metadata beats inaccurate metadata.
  • Digital is a Cult. Physical is a Truth.
  • Another six months won’t hurt.
  • Who are you and why do you want this?
  • Pick a format, document it utterly, and use it forever.
  • Justify, Justify, Justify

There’s many more. Some come from policy, some from personality, and some from how people are brought up into the discipline. We’ve destroyed the term “disruptive” as being meaningful in discussions, but the concept that a new outlook or idea could fundamentally change the nature of the realm it is part of is still quite valid. To some extent, the Internet Archive is an upending of century-old approaches, while still loving and promoting the shared beliefs:

  • Our history depends on our artifacts and writings.
  • Education without context is flimsy and transient.
  • Reading is fundamental.
  • What happened before is important tomorrow.
  • Humanity is worth the trouble.

In my capacity in outreach, I find myself in a lot of conferences, restaurant tables, hallways and sidewalks talking to people who believe in these shared beliefs but don’t buy 100% into what the Archive is up to. They question whether a Robot Army is the way to do this inherently human activity, that of cataloging and classifying, of summarizing and representing.

The problem, ironically, is that people think of it as binary: all machine, all people.

Where we are now, the machine takes a rough stab and occasionally a refined stab at what comes in the front door. It will try to OCR the text, it’ll figure out the orientation or how many pages or what baked in records exist in the digital object, and it will report those. It would also appreciate your input as uploader, thank you very much, but it doesn’t stop dead waiting for you, either.

To this end, the resulting output, especially the machine-generated side, is not perfect. But most importantly, it can be overruled. Always. It can always be shoved aside as “that’s not perfect, this is perfect”, but the amount of items getting that “perfect” treatment are going to always be a small percentage of total. They just are.


So, this week, I was working on a way to make the endless piles of texts on the Archive more accessible. The solution I cooked up was to take the OCR’d text generated for all “texts” classified objects, throw them into a word frequency generator, remove the obvious stupid ones, and put that up into the Archive. That actually has worked out pretty well.

It’s not, perfect, of course. Never perfect. But here’s what it returned (and put up) in 10 seconds of analysis on a 945 (!) page book on Architecture:

figure; landscape; design; standards; soil; concrete; architecture; water; surface; aggregate; landscape architecture; saver standards; asphalt concrete; tor landscape; standards tor; water table; water level; standards lor; lor landscape

The “standards lor” stuff doesn’t fly – it’s an error. But the vast, vast majority of it is what a person might reasonably need to know “what the hell is this book about”. You can make decisions in a very short time if this is the book you want to browse through. You have more information than you had before.

Similarly, you can probably guess what these books are about from the keywords:

software; ibm; computer; graphics; apple; color; disk; program; commodore; game; hard drive; hard disk; word processor; disk drive; megabyte hard; deluxe paint; sale price; retail price; public domain
moog; modular; output; arturia; modulation; input; filter; frequency; manual; sequencer; moog modular; modulation input; connection jack; key follow; low pass; keyboard follow; audio output; audio input; input connection; trigger input
iso; wedding; lovegrove; julie; bride; pictures; chapter; shoot; shot; picture; wedding day; wedding photography; light matters; healthy profits; business strategies; wedding photographers; opposite figs; finoncial mastery; exposure compensation

Again, perfect? No. But each of these was generated, automatically, and without a miserable intern or low-paid person doing a job that would probably never be funded in the first place. But those keywords tell you a lot, and they’re getting the job done, even if you have to keep an eye out for what exactly “finoncial mastery” is.

And frankly, nothing stops the addition of a second set of scripts for quality control, that provide lists of all the generated tags and allowing a person to go “that one doesn’t look quite right” and to have it taken away. The difference is, now it’ll be one person overseeing hundreds or thousands of items at once, using the brainpower so that in one weekday they will do more resulting work than a year of the most highly-trained, perfect and precious professional dedicated to metadata entry. And in the case of the Issue of “Compute!” Magazine, the Moog Synthesizer Manual, and the Professional Wedding Photographer book above, you’ll get what you need, now now now.


And as a side note: I love this is what my mind is being used for. I love that I work for a place where this sort of thinking is what is needed. And I love what the result of this effort is – a place where millions of items are flying out the front door every single day, spirited away for a thousand reasons, and making the world a better place.

I can’t imagine doing anything else. Keyword: “happiness”.





Saved! Sort of. —

As time goes on, people begin to use phrases like “saved” when it comes to Archive Team projects. That’s not quite accurate.

If a website or webpage is simple, utilizing only images and text, chances are pretty good that we can get a reasonable copy of it. If, however, it utilizes any strange scripting, access control, or any of the modern craziness that we see on the web, things become pretty dark pretty quickly. Sometimes photo galleries have JavaScript zoom, or some of them use YouTube or some other services to feed out the video. Our scraping falls apart if you need some sort of plug-in or program to do even the simplest of maneuvers.

In the case of, for example, Hyves, a whole bunch of different problems are showing themselves in the saved pages. Part of its power and character were that people could modify all sorts of different things on the page, and you would see fundamental differences from site to site. Even as we were grabbing websites at the rate of fifteen a second, we knew we were going to miss things.


That said…

One of the most inspiring parts of the release of the geocities torrent, was the amount of work and curation that is been done with the data by Dragan Espenschied, who not only downloaded and analyzed the resulting files carefully, he’s created reports, graphs and discuss the errors and mistakes made along the way. His and Olia Lialina ongoing tumblr weblog gives snapshots of pages long gone, with commentary and themes. He even had direct state funding for a year to re-create a virtual machine that could provide GeoCities pages comprehensively and easily.

My hope is that some academics, researchers and other people who have an investment in the history of Hyves will go through our 25 TB of data and help re-created in a more robust and involved fashion. I’m not sure what we come out the other end, but there is a record of what happened there no matter how shallow it might be in some places.

I compared it to the difference between seeing a picture of your home and having artifacts from inside the home: maybe you want the second, but the picture might be enough for you to remember a lot that you might’ve forgotten.

The resulting items will be tattered in some places, perfect on others – we are not saving the domains, or the full context of what this all was. We’re just stopping pure oblivion from occurring in the name of progress and percieved liability. It’s a minor point, but an important one.


The Archive Team has taken many millions of snapshots of many homes. Many are long gone and we pack our little WARC files and .TAR archives and send them into the annals of history. Here’s hoping the future appreciates it.

Letter to David Stiles —

Mr. Stiles:

I’m sure you get a bunch of these letters all the time, but I figured I’d add myself to the pile.

I don’t know when I first got my hands on your Treehouse book, but based on the publication date, I guess it was when I was 10 or 11. This is the one with the brown paper and the comb binding, that was bound at the top, so that it was like a big notepad.


This book affected me profoundly in many different ways.

Obviously, it inspired me to think about making treehouses, and access to my parents’ workbench in the basement coupled with an awful lot of suburban trees meant that I was able to find a host of victims throughout the entire summer. I have memories of finding 2x4s at construction sites or thrown away from various other projects and turning them into the building blocks of palaces heretofore unseen. Naturally, a 10 year old working alone or with friends leads to the occasional injury, and a thankfully-post-tetanus-shot shoe puncture or two, but boy, did I have fun. I think of those initial projects, one or two almost approached the realm of functionality, at least in allowing me to survey the forest or my parents’ yard from the height of 5 to 7 feet, which wasn’t so bad at all.


More than that, though, your book contained a number of other important lessons and inspirations.

As I’m sure you intended, the book takes a very open and free spirit with regards to both design and scope – you show how one, two, three or four trees can still result in a treehouse, and every treehouse could be an exploration of a host of ideas. I loved these different places you suggested, covered as they were with kids having fun and getting the most out of what they had. Your flourishes in terms of pets, decorations, and bystanders to the fun made it obvious these were not going to be solitary works of architecture but parts of people’s lives. It’s guided a lot of how I approach the things I build, physical and virtual, since then.


And most notably, the whole unusual (for me, at the time) design of the book, with a sense of being this strangely-bound book on off-color paper, made me feel like making “a book” wasn’t a case of always being a perfect bound piece of “literature” with just words and the occasional illustration – this book was full of life and strangeness and notes about being outdoors and part of the world, that I think I’ve internalized to a huge degree.


Anyway, I wanted to thank you for your book that is more than just a book, and was more than just something to read for me so many years ago.

Jason Scott

David and Jeanie Stiles

David and Jeanie Stiles

Three Times the Consoles, Three Times the Carts, Three Times the Library —

Please excuse the dust.


For the last couple of weeks I’ve been working with a range of volunteers on a massive expansion of what we call the Console Living Room at the Internet Archive. Previously weighing in at about 800 game cartridges from seven console systems, the new collection is roughly 2300 cartridges and a total of 21 different consoles. Through a combination of JavaScript, black magic, and unicorn tears, all of these games are playable directly in your browser without plug-ins. Not just confined to the big winners in the console game biz, this collection now spans 25 years, multiple iterations of technology, and a breathtaking range of subject matter for entertainment and education.

In other words, if you had this over at your house, you’d either be a museum or the richest kid in your neighborhood.

Making video games playable in browsers is a small but fun percentage of what the JSMESS project is attempting to achieve. Like most projects involving emulation, the urge to just make it look like we want to screw around with Pac-Man clones and our favorite platformers is pretty huge. But this is much, much more than that.

The reason that consoles make such an easy combination for something like JavaScript is that the control schemes are relatively simple, the scope of the programs are properly limited, and once you have things going, they pretty much go. The one exception is speed, where you notice any slowdowns or missing features faster than one might do with, say, a spreadsheet. But consoles are a delightful example of “what you see is what you get”.


Let’s start simple. The average development time of a console game has been as little as a month and as long as a year, with a few multi-year efforts. Development teams range from a single person to a few dozen. Money spent is somewhere between the thousands and the hundreds of thousands, with a few millions sprinkled here and there. The marketing and advertising and paper filled with ink about these games goes into hundreds of thousands of pages. The audience for the games has progressively built up its own corpus, the overviews and newsletters and family forums, for decades now.

A few million ancillary items here and there, and suddenly we’re talking about a culture.

What I am seeking with this material, presented in such a large mass and so openly available, is to remove any time delay between a reference to this software and access to the software itself. This is what we get with websites, movies, music, and text: you reference it, and seconds later, there it is.

Now we’re going to get the same with a mass of games.

Naturally, it doesn’t take long for the glorious human mind to be faced with these thousands of programs, marvel at them for a whopping 30 seconds, and then start pointing out what’s missing. For this natural reaction, all I can do is point at various limits or choices that are in place, knowing that we have a lot of time on our side, and improvements to the emulator and the world at large likely loom. We just tripled the amount of consoles and tripled the amount of available programs. And what crazy stuff have we revealed!

Obscurity is no excuse for items not appearing here. We’ve got game consoles with as few as four released cartridges, or even a breathtaking 50. And we’ve got cartridges that would fetch thousands of dollars on the open market, playable instantly. Unfortunately, unlike jewelry or paintings, obscure consoles and cartridges do not guarantee quality or replayability. But you’ll get by.

I hinted at construction references at the beginning of this. That’s because with so many items being added it once, and the information about them scattered in so many directions, there are entries that are needlessly empty and links that are still waiting to happen. But they’re playable, a little fooling around will let you decode what’s going on, and frankly, many of these games defied the need for an instruction manual or even more than 30 seconds of study. I guarantee you’ll figure out what kills you and what gives you bonus points in a very short period of time. Meanwhile, the metadata nightmare will fade.

I reached out to many volunteers to help with metadata entry, and anybody who works in that business understands that the stickiness rate tends to be rather low. Face an eager and wide-eyed volunteer with the prospect of endless recounting of details about obscure video games, and the urge to go watch a video of somebody watching paint dry becomes irresistible.

But every once in a while, some angel make themselves known and there’s more than a few angels behind-the-scenes on this one. Soon you’ll know the story on tons of video games that once clogged department store shelves and which almost nobody experienced before now.

So, speaking of which, the main point of why I’m bringing this all up.

The last time I announced that such items were being added to the Internet archive, I put out a call with a challenge. It was answered to some degree, but I wanted it answered even more loudly this time: now what?

With a simple link (and in the future embed code) you will be able to direct readers, colleagues, students, or relatives towards a breathtaking variety of games. You could provide your own personal oral history of interacting with the game and your use. You can link to a variety of games to demonstrate the approach to the Pac-Man game mechanic over the decades. You can show the initial steps taken by a developer or coder and then show that person’s progression over the years. You can instruct students to play a number of platformers and attempt to write on the political implications of resource management in platform first.

In other words, this console living room is a console library.

And this library wants patrons!

I fully intend to expand the library over time, but we’ve got a pretty sizable chunk of videogame history here. I want to see it referenced, written about, and improve based on feedback.

Is that so wrong?

Do me proud.


Some members of the art group Monochrom have gotten into the world of movie-making and I think you might just enjoy the living hell out of it. It’s called DIE GSTETTENSAGA.

GSTETTENSAGA is the best kind of low-budget filmmaking: science-fiction collides with mythology and political satire, and a landscape both gray and filled with colorful personalities. The characters each have their own agenda, their own wishes, and find themselves at odds with almost everyone else. Some want power of the political kind, while others want power to drive the simplest of electronic appliances. Everyone is searching for something, and we are dragged along in many different directions as they both find what they’re looking for and find something completely different.

There are times when I feel I’m watching an absurdist play by Beckett, if Beckett decided to work on the Mad Max franchise. Darkness falls, daylight breaks, and each turn in our journalist team’s adventure brings something new and weird to the table. While there’s times I have no idea what I’m looking at, I’m definitely always enjoying looking at it.

I’m sure we have many more amazing cinematic works from the DIE GSTETTENSAGA creative team in the future.

Five Unemulated Computer Experiences —

While I and many others work to turn the experience of emulation into one as smooth and ubiquitous as possible, inevitably the corners and back alleys of discussions about this process present people claiming that there are unemulated aspects and therefore the entire project is doomed.

I thought I would stoke that sad little fire by giving you five examples of entirely unemulated but perfectly valid vintage computer experiences.

Disk Drive Spin Vibration

Some games on home computers would feature permanent player death and the requirement to start over in the event of a catastrophic loss. To ensure the death was permanent, the player status would have to be recorded on a floppy disk drive with a floppy disk in it. Therefore, a trick could be implemented: by putting your finger underneath the latch of a floppy disk drive, you could feel the vibration of the disk beginning to spin, and you could flip up the drive door, disengaging the magnetic head and ensuring that the death was not recorded.

Computer Fans

There are currently no attempts to emulate the sound of a computer fan or have it speed up and slow down slightly over time, eventually reflecting the decay of the fan and the steadily noisier experience as time goes on. In a tangential relation, there are currently no emulations of system failure due to overheating.

Chip Unseating

One common cause of machine issues in older systems would be the slow working out of seated chips on motherboards and other circuits. The resulting glitches and behavior would be noticed by experienced owners, resulting in a reseating of the chips, either by full-board pressure or by pressing down on individual chips and experiencing the clicking into place.

Damaged Floppy Noise

One of the most terrifying and disheartening sounds was the sound of a distraught grinding across a damaged or demagnetized portion of a floppy disk. The noise told you that it was going to be a crapshoot whether the data would ever be heard from again. Variations in the sound also told you how close you were to total data loss, and whether you were at the beginning of a slow decline for that sector.

Power Outage

Emulators do not have an option for sudden and dramatic loss of power. Is not possible to indicate a lightning strike, a brownout, a black out, or the yanked out power cord. This is a central and fundamental aspect of the Atari 2600, where careful glitching of a system including yanking and replacing cartridges could allow you to access game options and experiences that would otherwise never be reached.


The point of me bringing all these up is not to be particularly weird, but to point out that emulation is not a binary experience – it is a continuum, a spectrum. For some people, the mere reappearance of older computing information is a miracle. For others, it is a endless opportunity to point out flaws, complain about glitches, and otherwise drag the conversation into a Xeno’s paradox of unfulfilled promises and impossibly high hurdles.

As time goes on, I expect some experiences to fall by the wayside, and to live only in lore and stories. Unfortunately, that is the nature of history, and computers don’t get a pass, just because the material involved gets re-created with such fidelity.

So, let’s focus on what’s been done and refine that, instead of a mystical set of experiences that may never see the light of day again except in our stories.

My First Arcade —

This nondescript building, sitting on the corner of Highway 52 and Fishkill Hook Road, has the distinction of being my very first Arcade.


Naturally, it hasn’t been an arcade a very, very long time. I doubt it stayed an arcade past 1983, in fact. In the interim 30 years, it has served a bunch of purposes (I believe it did real estate, and maybe some other lawyerly duties), and now rests in the same space as a used car lot. In the harsh daylight, it’s more of a barn than a building.

It holds a minimal amount of space, is made of the flimsiest of wood, and has very little recommending it.

But in my youth, this was the first time I encountered anything like an arcade. And it was magical.

This dates back to the era when the formula for making money off of arcade machines was a simple one, and skirted every zoning law and building requirement a town might have. Very few small towns were prepared for the idea that a person would have no restaurant functions, no bar, and certainly nothing that required a specific cash register, but instead they would fill a simple room with machines that took quarters and didn’t give anything tangible back. It enabled arcades to place or infest themselves into locations that otherwise had no useful purpose, that were low rent, and blown up bright with signs and glowing inhabitants for everyone to see. Municipalities were caught flat-footed, and the resulting protests and drive to have them removed has left battle-worn scars and legal tangles that persist to this day. People ask why there aren’t more arcades out there, beyond the obvious ‘fad’ trope everyone comes up with. It’s because, by and large, they are illegal or prohibitively expensive to license. Pop-up arcades like this one are the reason that situation exists.

My strongest memories playing here would be at night, and was a function of my parents bringing me by. I’ve had trouble guessing my age, but if it wasn’t in the single digits, it was certainly 10 or 11. In the present this building appears to have an office set up, but at the time that I was there, I doubt it even had the second floor, and if it did, somebody was probably holed up in there. I remember the use of Christmas lights, some amount of neon, and the darkness, always the darkness. This place is summer to me. It’s moths in spotlights in the parking lot. It’s warm breezes and faces coming in and out of shadows. It’s opportunity and it’s the magic way that we can set up a world with just there merest waves of our hands.

Here is why I have been working on a documentary about arcades for the past couple of years: there is nothing unique or special about this building. Really, seriously, nothing special. But shoving 15 or 20 machines into it, stringing up some brightly colored lights, and staying open late hours, turned a summer night into an otherworldly, amazing experience. Just like an amazing recipe can be split apart into the components that made it up, you can do the same with an arcade like this and never get at the heart of what was going on. I want to make a film that captures this.

It was here at this arcade, for example, that I encountered a machine so weird, so unusual, but I remembered it for years afterwards without understanding what machine it was. It was Space Wars, and to my reduced height and innocent eyes, it looked as much of scientific experience as anything I dreamed of. What it was, actually, was a machine just a few steps away from mainframe and old-school hacker culture, pinned down with a coin slot and presented with the most direct of fanfare. You can be sure that when I saw this aesthetic and approach later in home computers and networked technology, I plugged myself right into it.

Besides this arcade, which remains unnamed and unknown to me beyond my fuzzy memories, there was also the Dream Machine in the Dutchess Mall in Fishkill, New York. I stopped by and took a picture of where it used to be. The image is much less inspiring.


The arcade would have been roughly located in front of the boxes in the center of the photo, maybe 100 feet closer than the black fence. Not exactly the best memorial. The arcade and the mall are now a long-gone endeavor replaced with a Home Depot.

On the large building to the left, you can see a scar where the mall entered one of the anchor stores, The large department store was named Jamesway and the deteriorating front door feature still strikes terror.


(I found this Flickr set of photos of the area, where the atrium still hung off this dead department store, and a few other features remained.)

The Dream Machine, unlike my other first arcade, was a chain that extended down into New York from Massachusetts, and had many multiple mall locations. As a result, the space and darkness aesthetic was much better implemented, and the sense of entering an interesting and engaging world was even stronger. That said, the arcade was staffed by relatively indifferent people, compared to the hungry entrepreneurs running the arcade in the barn.

One of the employees of the dream machine wrote an enjoyable reminiscence on this page.

I can’t overstate how much these arcades formed me as a person. It is one thing that have an interest in technology and home computers, and it is another to face what are obviously computers of incalculable power and beauty, with brightly colored sides and glowing screens that promised so much more. The years have allowed all the secrets of the creation of these machines to be picked apart, and even re-created on simple devices of today. But that moment, when you walked into a pantheon of godlike boxes, you knew you were entering something truly special.

If I can capture one small percentage, some misty shadow of the magic of what these things were, then this documentary will definitely be worth it.



One other thought: There are plans under proposal to knock down my first arcade’s building, and replace a number of the structures nearby with a gas station. I’d bemoan this destruction of my childhood memories, but the fact is that the story of Hopewell Junction and Fishkill and the carving out of its heart on the altar of “progress” is neither a unique one or a useful one when talking about the events of decades ago. But rest assured, there have been layers upon layers of free-wheeling development of the area, ensuring that within my lifetime, I’ll barely recognize a tree, much less a structure. Mabuhay.

The Flippy Disk Thing —

This is meant to be a quick overview and primer of the flippy disk and disk hardware situation with regards to 5.25″ floppy drives. It is for reference and referral by others for an easy understanding of the issues.

Right now, there are tens of thousands of floppy disks out there that were used extensively in the 1980s and 1990s to store data, much of it one of a kind, on home computers. They looked like this:


Yes, even with the lighting and glowy strings. This is what the 1980s looked like, deal with it.

The capacity of these floppies ranged, but were generally very small by today’s standards – between 88k and 3 megabytes, with a lot of them being 140k. To give perspective, that image of the floppy disk is 144k.

Myself and many others have said that with the floppy disks now going into decades of existence, the ability for them to read any data that was put on them so long ago is a bit of a crapshoot. Either it works great to the present day, or it is gone, gone, gone.

It is incumbent on the present-day world to rescue and preserve the data off these floppies, as the choice won’t be left to the next generations to do so. Debates about the worthiness of the data are fine enough, but in another few years the debate will not be about whether items can be saved, but how best to work with, present, and study all that was saved. It is a dying medium and resource.

Currently, there are three methods to transfer the data off these floppy disks, one of which doesn’t exist yet.

  • Original Floppy Drive from Original System
  • Repurposed General-Purpose Floppy Drive
  • The Flop-o-suckerator 3000

The Flop-o-Suckerator 3000, while not currently extant, is likely to be a customized piece of hardware that takes in a floppy disk and “does stuff” to it 100% geared towards pulling all the data off it, and having no other real function. There are examples of this approach for audio cassettes, books, and even business cards. Chris Fenton famously made such a device for Cray Disk Packs, so this isn’t entirely theory – there just will be one made for floppies. Anyway, back to contemporary reality.

It is still possible to acquire, use, or otherwise work with the original floppy drives from the original hardware. Here’s a few for you to see how they look:





There are two sub-options with these original hardware drives – connect them to the original hardware and run programs to pull the data off (an example of this method is Apple Disk Transfer) or connect the drive to an intermediate piece of hardware designed for that kind of drive, to let it be hooked up to a modern system. (For example, the ZoomFloppy for Commodore 1541 floppies).

Let’s set that aside for the purposes of clarity. This option exists, but sourcing original drives is vaguely harder than sourcing general-purpose floppies made in the 1990s.

So let’s go to the third option: Repurposed General-Purpose Floppy Drive.


There are a lot of these out there – at least, a lot more of them were made than the other vintage hardware. They were commodity, getting shoved into a lot of generic boxes and low-cost servers and bargain-basement discount computers and electronics stores. Sometimes they were built well and sometimes they were not built well.

They pretty much all share one fatal flaw, hence this entry.

To make them run with modern systems, there’s a number of pieces of hardware out there that basically connect them via USB and let you run software to pull data off floppies.As a bonus, using these commodity drives and the right software/hardware will allow you to use them for a full range of floppy disk formats, such as Apple II, TRS-80, Commodore 64, Atari 400/800, IBM DOS, and so on. In some cases, you can just pull in a raw “disk image” (which will be very large) if the format is proprietary or there’s aspects we want to save for later study. It’s a generally good situation.

Among the floppy drive-to-USB hardware out there, there is the FC5025, Kryoflux, and DiscFerret.

So, this sounds almost perfect, really – just get stacks of floppy disks, put them into these different solutions, pull out the disk images, and we’re golden. And for many, many thousands of disks out there, this has been done for nearly 20 years. It’s great.


It used to be, in ye olde days, that it was possible to use floppy disks a different way. Quick explanation.

Floppy disks look like this:

5.25-inch_floppy_diskThe disk goes into the floppy drive with the little oval “window” on the bottom going in first. You then clamp down on the floppy by closing the drive door.

Inside, the floppy drive has a number of indications about the disk coming from the disk shape itself. Specifically, there’s that notch in the upper right of the image, and there’s that tiny, tiny circle to the left of the big white circle in the center.

The notch on the upper right means “can the disk drive write to this floppy”. If there’s a notch, the answer was/is “yes”. If there’s no notch, the answer was “no”. It was possible to buy little stickers (yes, little stickers) that you would put over the notch to say “don’t write on this anymore”, and those stickers could be removed to say “go ahead, write over this”.

When commercial copies were made, they used drives modified to ignore the notch. So you could have a floppy with NO notch on it, meaning “never write on this disk”, with data on it put there by a modified drive. This is why if you see a commercial floppy disk from the time, like this one:


…there’s no notch, protecting the data inside from you accidentally writing over the poor thing during some late night of floppy switching.

Without going too much into variations, floppy disks were either single-sided, or double-sided. In some cases, there were floppy drives you could put the floppy disk into and it would read both sides. In most cases for early 1980s stuff, that’s not the case.

So if a disk was double-sided, it had notches on both sides of the disk. Here’s what that looks like:


So, if you look at this floppy closer, you’ll also notice it has not one, but two holes to either side of the big white circle in the center. That’s the issue, that’s the rub.

While these commodity new floppy drives do a lot of things, they don’t handle a disk that can be turned upside down and inserted very well. These two-sided floppies are known colloquially as “flippy disks”.

Therefore, these newer commodity floppy drives need to be modified.

Modification is a pain in the ass, summarized by this image:



Therefore, for most people, buying a flippy disk or using original hardware will be a good choice. Otherwise, you will only be able to save the images of floppies that use a single side.

If you know the entire collection is one-sided, then that’s fine. But many aren’t. Luckily, many collections tend to be of a type, so if you find a stack of old DOS floppies in need of transfer, and one works, many might work. It’s not a perfect situation, but you’re now armed with a little more insight as to what’s going on.

I realize this entry is a mash-up of really basic knowledge and really obscure, specific knowledge, but I wanted it all in one place for people to understand what’s going on. One of the biggest tropes in the vintage community is mentioning, sneeringly, this specific flippy disk problem as why we can’t have Regular People doing disk imaging, and we are well past the point that it can be assumed Regular People will want to venture and root around for the right people to talk to and the right hardware to buy without being given some idea of what they’re getting into.

You’re getting into buying a drive and a USB adapter. If your disks have two sides as described above, you are buying a flippy drive or having a drive converted to flippy. That’s it. 

Your disk images have a welcome home at the Internet Archive. I will help you put them there. Your questions have a welcome home with me as well.



The Pit —

2013 was a really excellent year, by most standards.

Besides the finishing and the release of the DEFCON documentary, I got to travel to many different places and a couple new ones, get the word out about various projects I was working on, and otherwise move forward on stuff I did working on for years. The triumph of helping get a major emulator into browsers has long-standing repercussions that I doubt I even fully understand.

Huge, major triumphs!

But there’s a downside, there always is.

In 2013, I said yes to an awful lot of things. I said yes when people wanted me to speak, I said yes when people wanted to work on projects, I said yes when people wanted to send me stuff to be saved or preserved or otherwise fixed.

During this past year, somebody would ask me what I’m up to in the coming weeks, I would tell them, and they would be horrified.

And looking back, they had a right to be: I figured out that I spent something like 220 days of 2013 away from home.

As a result, when things finally calmed down, I realized that my living space was completely unworkable. Packages and shipments have been coming in all year, mail had been somewhat dealt with, and all of my various interests and back projects and everything else but the world provided had converged to make my home into a terrible pile.

Here’s a glimpse of what that means:



I’d be a fool to complain about the situation. All I’m saying is that it ended up being the case that I built a massive backlog. There are good people on all sides waiting for me to provide them with information, with assistance, with material goods, you name it.

So 2014 begins with me working very hard to knock some of that down. It’s a relatively deep pit, but apparently a pit I can get out of.

A second, more mundane situation has also arisen: reduced finances.

I have been steadily cutting back my lifestyle of spendthrift wonder for about four years now. Unfortunately, I continue to fail to get things completely under control. As a result, I have good savings that have been mandated by employers, but my spending money is basically at zero.

This doesn’t count budget set aside from my Kickstarter campaign for my documentaries. But it does mean that I am much less likely to be paying for dinner anytime soon.

Additionally, my taxes are a mess for the past few years. An accountant has been contracted, in this particular pit is being dug out of this well.

I mostly mention all this just to give some context as to how things can be so successful on one hand (talks, travel, interactions) and yet still be a mess when looked up from another angle. The key is to know what to do in that situation.

What I’ve done is finally sit down and realize that 2014 has to be a lean year indeed.

Now, a lean year for me is not such a bad thing: after all my job, my interests, my hobbies, and my fun are pretty much all the same thing. And I do have a couple trips planned for the year, mostly places flying me to give a presentation or make an appearance at the event. So I’m hardly going to become a hermit.

Among this effort to spend more time at home and get things done, is to work on stuff worth talking about and writing about. It’s possible to end up on the speaking circuit and event circuits so intense that you don’t actually do the things you talk about. It was starting to get that way. It’s not going to be that way for the foreseeable future.

Seriously, these are some of the best problems of the world. I’m kind of happy to have them.

Back to the digging.

FTP’s Bright Sunset and Frozen Night —

FTP is kind of over.

Now, don’t get mad at me for telling you this. It’s not like I’m the one killing it, and I’m certainly one of its biggest fans. It’s a really mature technology that does exactly what it’s supposed to do. It is flexible, pumps through almost anything, and has features that do everything you probably want to do with file transfer.


And when I say over, I don’t mean obsolete. And I certainly don’t mean unused in the present day. Many things still use FTP as a method of transferring files, and providing access to all sorts of material.

But it’s quite obvious this isn’t the way of the future. Companies and individuals that utilize constantly changing data, or data that needs to be distributed, utilize a whole other variety of technologies. Many of these are web-based, while others are special protocols that blow out over the web into devices and phones. If it’s starting up, and it needs to get you some data, it is probably not using FTP. If it is using FTP, it’s probably not telling you it’s using FTP. And people who need to get things done don’t reach out for FTP.


But more markedly than FTP the protocol, it is FTP sites in particular that are really on the way out.

Coming as I did from the bulletin board system era, a well populated BBS might have a few minor text files, followed in the early 90s by a CDROM in a drive, and maybe topping out with a stack of CD-ROMs for a few hundred megabytes of accessible disk space. There were a notable handful of massive bulletin board systems that had much more data, but these were unbelievably rare and often cost a monthly fee.

Compare, then, the experience of an FTP site on the growing pre-web Internet, which would have thousands or tens of thousands of files that dwarfed anything you could get through a BBS. The names of these sites were extremely obscure, reflecting host names of systems within departments and shoved into some dark worthless corner of a science lab.

Even though the name didn’t tell you what the contents were, these sites became so populated and so important that their names became synonymous with what they held. Trust me – you know what these things were for, and what they held. They were stunning in their power and they were the true online libraries of their time.

This summer time of the FTP site dominated the 90s and into the early 2000’s. Support files, drivers, game demos, hilarious films, browser executables, pictures, you name it. Many of them became fairly organized shambles of files, containing thousands of some obscure aspect of online life. There were even websites to help people navigate these FTP sites, trying to find what you needed. A number of FTP search engines existed, although they often required the filename more than what they contained or what they represented.

Many of these FTP sites did their best to join the World Wide Web and its unique needs, littering themselves with.files and HTML overlays. The Gopher format allowed superior methods and browsing the information. But as gopher thought of favor, and the major browsers stopped supporting it, it was another doomed way to navigate.

Finally, we had the experience of various FTP sites going down, and mirrors of those sites becoming subdirectories of the remaining ones. This Russian nesting doll situation has grown ludicrous to the point of some sites being in amalgamations of dozens of previous ones. Besides a few other navigational headaches, this also means that the loss of an FTP site in the modern era could actually be the deathknell for hundreds.


For the past couple of years, but really picking up intensively in the past few months, archive team has been aggressively downloading these FTP sites. We are not pursuing at risk FTP sites – we are in fact considering all FTP sites to be at risk at this point. If it’s on FTP, it’s probably doomed.

The FTP site collection is now in the hundreds of gigabytes and is growing constantly.

Naturally, when somebody puts up a block of data like this, it doesn’t take too long for the “you missed a spot” nerds to show up and start critiquing randomly.

The most valid and yet invalid argument is that currently these FTP sites exist on as massive tar archives or zips. “Give us our FTP sites back like they used to be,” cry the people who cry about such things. Well, sorry. That ship has sailed. I think that ship is on fire. Oh well, that ship is actually now burning other ships.

Instead, the should be considered what they are: cryogenic capsules of masses of data, waiting for the sort of duration, extensibility, and data mining efforts that so many of our computers are becoming so good at. They can be split apart, refactored into new ideas, or even pulled back into some mega FTP site of the future. By making the clearest, least fiddled-with archives of these FTP sites, we give the future infinite options. Anything else would be kind of silly.


We are ramping up faster and faster to do this. FTP sites die quiet deaths – a letter to faculty, a dropped connection. They go quite gently into the night.

Buried on these sites are proof of versions of software that others claim never existed, unique pieces of art that lived only on bulletin boards but were pulled up as refugees in the early 1990s, and even one-of-a-kind pieces of code that might otherwise have disappeared. This is to say nothing of the drivers, support documentation, configuration programs, and other parts related to hardware and software now not just obsolete but potentially forgotten. The value was obvious.

As I and others spend this new year gathering up all of this data, I look forward to projects coming along that utilize it or reference it. That’s kind of why it’s done. Sure, I myself would love the ultimate FTP site providing me every piece of the 1990s computer world for reference and utility. But you have to have the data, to make it pretty.

Combined with the efforts to grab every piece of CDROM plastic that has ever existed, I hope the plans are clear. The world is lost so much of what is come before. But it won’t this time.

Not on my watch.