ASCII by Jason Scott

Jason Scott's Weblog

Good PDF, Bad PDF —

It might come as a surprise to some that I’m such a fan of PDF (portable document format), considering I’m all into textfiles and all. But PDFs, in the abstract, are just what I like most: a way to guarantee, through onset of changes in technology, display, and interaction, to ensure the original intent of the author/creator is maintained.

The Portable Document Format, in its basest form, is a way to ensure this very thing: that what goes in is what goes out. That if you send this out to the world, the world can see, with very little variation, what you intended them to see. This has always been one of the most problematic aspects of telecommunication and information transfer; PDF helps nail a lot of the main problems. Even in text communication, you’re often at the mercy of font, screen length, line breaks, weird characters. While obviously, good information doesn’t need to be a picture-perfect transfer, it certainly can’t hurt.

I will gladly acknowledge that the distributor/creator of PDF, Adobe, are bastards. Letting them loose on a standard is like letting a child molester loose in a kindergarden. They’ve added things to the PDF format that violate privacy, lock features down arbitrarily, and phone home. That’s very true. But the standard is capable of ignoring all that crap, and alternate PDF clients/viewers can do 100% of the functionality any reasonable, non-sociopathic person would want from a document.

Here’s a pleasant enough creation meant to be printed on card stock, cut, and dropped in the face of people who use their cell phones. It’s a jaunty, full-color layout, with a clear font, pleasant look and affronting demeanor. And it clocks in at 469k, resulting in easy transfer and near-momentary rendering on most machines.

This is not as well put together a PDF. It’s a manual for a Buck Rogers arcade machine. Clocking in at 8 megabytes, however, it does give you 47 pages of information, relying on effective compression of the black and white pages (most of them simply text) to shrink things down. But the ugliness of the scans (many of them at angles or not very good in the contrast department) combined with the general slap-dashedness makes for a medium effort. Naturally, to someone trying to figure out why their machine doesn’t work or what various dip-switches are, this manual is more than sufficient.

And finally, this manual, at a mere 704k, is probably one of the best examples you can have of a PDF, even if the subject matter is somewhat mundane (an instruction manual for a fax machine). It is 81 pages, while being less than 12 percent the size of the Buck Rogers manual. It has an index in the PDF client (where available) with descriptions of different sections. You can do word searching within the document. And the whole thing is clean as a whistle, obviously pulled from the original source material for the manual and converted to PDF. This is where the format shines. Less than a megabyte to produce a 81 page manual perfectly. Works for me!

So here’s an example of a good PDF run and a bad PDF run:

As you might expect, I spend spare time browsing around for historical stuff to acquire. Occasionally, I find I have to purchase items (CD-ROMs, disks, etc.) to be able to bring things into my collection. And some of those things, commercial products that they are, are archived and mirrored throughout my sites and collections but not world-accessible. Other items, like shareware CDs, have ended up on cd.textfiles.com and are being accessed by many, many people looking for something from their past (or maybe just screwing around; I don’t know and I don’t ask).

So recently I was informed about the fact that Nibble Magazine, a journal of Apple II articles and related material, had made the entire run of the magazine available. Books published under the Nibble banner, some indexes, the disk images that used to come with the magazines, and the magazines themselves, all scanned/transferred to DVDs. You could even buy the whole shebang as a package for the low low price of $239. Wooo boy, all that saved time.

So I bought it. Here’s what $239 gets you these days:

In case you’re wondering, yes, those are DVD-Rs with a sticker label on them. This is acceptable, I suppose, although I should also point that you’re looking at the entire contents of the package; there wasn’t anything else that came in the envelope. Two DVD-Rs. Printed label. $239.

Yes, yes, the guy selling it is the founder/editor and yes, yes, he is free to charge what the market will bear and yes, yes, I did actually buy it of my own free will (unless you debate the level of my OCD and how it affects my decisions). But just as he’s free to sell a very crappily-put together package for an exorbitant rate, I am free to say that I consider it as such and tell people generally what quality they’re into.

But it goes a little farther than that. You see, just like that Buck Rogers scan I referenced, these Nibble scans are black and white, poorly contrasted, off-kilter (the shadow of the magazine pages is very visible). Bear in mind, by the way, that the pages are black and white even if they were originally in color. Naturally, there’s no indexing, no OCR utilized for searches, or anything else like that. This is, in short, what I would expect of a ghoulish get-rich bottom feeder trying to sell old magazines before he is caught, than the work of someone who poured over a decade of his life into a project. The .doc file on one of the DVDs (the sole documentation) says that he might rescan some of the pages in the future back into the original color form, and he’ll get an update to the world.

Now, let us turn our attention to a work done right. I’ve raved about these before, but a fellow spent years scanning in issues of Computer Gaming World, and then in a fit of wonder, the original publishers decided to make these issues available for free. Here is the site. The PDFs, all done absolutely exquisitely, completely indexed and OCRed where feasible, all clock in at very reasonable sizes, considering the data within them. The contrast is top notch, the pages line up, and 99% of what anyone could want from one of these issues is there within easy reach.

How, then, is one for free and the other not? How is it that the free one is vastly, vastly superior?

This isn’t “wisdom of crowds” or “web 2.0” or “crowdsourcing” or a billion other buzzwords.

This is someone giving a shit versus someone not giving a shit.

And PDF is just the container for it all. Let’s hope for more of the former out there.


Categorised as: Uncategorized

Comments are disabled on this post


One Comment

  1. Compare that to the Loadstar Collection, which includes (cut/paste):

    ~ All 199 issues of LOADSTAR in .d64 and .d81 format
    ~ All 42 issues of LOADSTAR 128 in .d64 format
    ~ .TXT files of all of the text on the issues for fast searching on your PC
    ~ All 21 issues of UpTime (a rival disk magazine that LOADSTAR soundly defeated and bought)
    ~ JPGs of all of the color covers of the issues when LOADSTAR was sold in stores
    ~ PDFs of all 73 issues of The LOADSTAR Letter, Jeff Jones’ excellent newsletter companion to LOADSTAR
    ~ MP3s of selected Knees Calhoon songs
    ~ .d64 files of every LOADSTAR product published separately from the monthly issues: the Compleat Bible, the Compleat Programmer, all five LOADSTAR Extras, Barbara Schulak’s puzzles, etc.
    ~ All of Dave Marquis’ SID and MIDI music
    ~ All of Walt Harned’s artwork — Walt is the most prolific artist ever for the Commodore computer
    ~ and whatever else I could find from the historic LOADSTAR archives.

    Total price? $24. At that price you could buy this and a new (albeit lower end) Dell Computer and still come in under $239.