ASCII by Jason Scott

Jason Scott's Weblog

One. Million. Files. —

A little milestone went by in the last day or so: my site cd.textfiles.com has now surpassed 1,000,000 files hosted. (Actually, more like 1,029,000 and growing, but still, a friggin’ million.)

I don’t really go out of my way to talk about that site all that much, mostly because of the fact that it’s a legal threat magnet; hardly a month goes by without someone, somewhere, representing somebody, going bugfuck on me for having something on there they don’t like. As a result, I don’t publicize it and I truly intend never to monetize it.

If you haven’t been there, cd.textfiles.com is basically a massive collection of all those crappy shareware CD-ROMs being sold in stores, meets, online and elsewhere, which quickly became called “shovelware” because the “creators” of the CDs would shovel thousands of programs onto a CD-ROM and then sell it as a new product. In many cases, these guys would almost make it sound like they “created” the stuff themselves, instead of the fact they were putting it all on the backs of other guys who were distributing these files under a shareware license.

Without getting too much into the whole ownership thing, the fact is that people were out there collecting everything they could find on BBSes to sell it, usually without checking too closely what that stuff was. Then again, copyright infringement wasn’t the new crack, either, so some of these CDs are a bit wild and wooly in terms of content.

There’s mostly DOS and Windows-related discs, but also stuff for Atari, Apple II, Amiga, and what could best be described as “other”. And like I said, there’s over a million of them. Programs, pictures, songs, textfiles…. everything you could imagine, that someone might put online, probably has a few examples in this collection. To be sure, there’s a ton of doubled stuff, but in many cases I have various versions of a specific program or file, showing that all-too-important-and-easily-lost progression of a work over years. Why just see the last in a lineage that goes back a decade? So this thing is basically one huge-ass beneficial learning tool.

The primary beneficiary of all this, of course, is me. I have used this archive extensively in the past, either to check up on a fact, review the functionality of a program, or read documentation or textfiles regarding history. Facts and files from this collection show up in the BBS Documentary, and I’ve spent many a fine evening walking these directories and finding something new.

It brings up a small but important point: there are political and opinion-related issues that these files essentially solve. While there are certainly cases of lies or deceit or other untoward human aspects being reflected, there are also facts that have become muddied and lost over time, subject to people making claims because no-one is around or cares enough to confront them. And people like me, historians, we end up repeating these politically-charged misrepresentations because we have to go by the word of the person, if we have no evidence to the contrary.

Here’s an example off the top of my head. While he was not ultimately interviewed (scheduling issues) for the BBS Documentary, I spoke for quite some time with Bob Mahoney of the EXEC-PC BBS (his onetime employee Greg Ryan does make an appearance in the documentary), and during an extended phone call, he dropped a bit of a bomb.

He claimed he had come up with the name “ZIP” for extensions.

See, that’s quite a claim, and he said “I chose ZIP for Phil Katz when he asked me to come up with one, because it represented ‘fast’, like zipping around, and it was kind of sexy, like ‘zipper’, and I knew people would respond to it.”

Well, you see, Phil Katz died in 2000. I’ve never spoken to him, and never will. So I have no way to verify it.

Unless I go to the files! So looking on cd.textfiles.com for early versions of ZIP, I quickly found a thank you to Bob Mahoney, written by Phil Katz, thanking him for coming up with the ZIP extension. So there we go, controversy solved.

It is not a cure-all. It is another photograph, another album pulled out from the muck to allow historians to look up stuff.

It is of use to patent lawyers who need to cite earlier examples of a concept before some bastard makes 20 cents every time you buy a mug on ebay, or play a song in your browser.

It’s of use to people who want to remember how far we’ve come, looking at grotty, poor graphics (which, by the way, we loved) and simple one-level games. One of the real advantages of digital history is that, with a relatively minor amount of effort via emulators and wiring to old systems, we can experience the past in almost the same sense as it originally was intended to be experienced.

And it’s certainly of use to a bevy of names, places, and people who accomplished so much with so relatively little, preserving their identities in the modern era, when so much information could easily be lost.

I love that little site. All one hundred and twenty gigabytes of it. It’s caused its share of headaches and I’m sure it’ll continue to, but I can list a lot of reasons why it should stay up.

In fact, I can now list a million.


Categorised as: Uncategorized

Comments are disabled on this post


6 Comments

  1. Bishoujo fan says:

    I found “A GIF PICTURE OF A WOMAN”.

    http://cd.textfiles.com/carousel/GIFA/GIRL12.GIF

  2. OricXe says:

    Why are there so many textfiles.com subdirs that aren’t mentioned? Not all of them are listed on the main page of textfiles.com

  3. Allan Clark says:

    Indexing by md5 sums or other checksums, forming filenames of multiple checksums (or file-lengths, even), would cause duplicates to be obvious… watch for collisions, though. I do this in my index. Using hard-links to generated filenames, then doing a “find” looking for >2 inodes, should show you which are dupes.

    Bah, I could be speaking out of my butt. I haven’t even looked at your files, so my “help” might just be a nuisance.

    You’re doing a mondo-cool thing, but you know that of course. Dunno if it was your choice. Few things are entirely choice…

  4. Tom Miller says:

    Rats, I only have 770,000 files online. I thought I had a pretty complete copy of the archive. I guess I will have to go looking some more.

    Tom Miller

  5. Jason Scott says:

    It’s been a year since I posted this, Tom! I’m now at 1,568,888 files. But who’s counting?

  6. […] hitting the one million file mark in December of 2005 and passing the two million mark a few years later, we’re now at […]