ASCII by Jason Scott

Jason Scott's Weblog

Hacking the Scantron —

I have a page of webstats for all my sites. I occasionally check it to see if anything’s funky. Since the sites are all hosted in various places, I don’t necessarily notice if something gets hit until the fun is over.

Such is the case when I found out that in 6 hours, over 24,000 people read a file about Hacking Scantrons.


In a web statistics generator, it looks like this:

The orange is how many unique sites visited within that day, and obviously a lot of unique sites visited that day.

For this, we can lay the blame on this file, decontextualized, being made a “digg” on which is basically a more level-playing-field version of slashdot. People can say “I digg this” and it gets moderated up or off the front page, as well as allowing others to comment, and so on. The link for this one is here. 1,215 people said they “dugg” this file, as of this writing. You can’t “undigg” as far as I can tell, so it’s either “positive” or “abstain” in the voting process. Oops!

There are apparently several ways to take on a “digg” that you don’t like, and the methods were attempted against this “dugg” link: Someone came along and created a fake scantron article link in a desperate attempt to get the original off the front page. And of course someone tried to submit another similar file to get themselves some diggs and that failed as well.

A much more informative aspect is the running commentary from the people linking over to this file, and it brings up a pressing situation that I’ve been studying in a roundabout way: information pollution. Since the file is being linked to directly, I think a lot of people might get the impression this is a living, vetted, working article. It just shows up on the main page of, presented with no useful context, with this description: “Sick of filling in those stupid bubbles? Want to artificially boost your academics? Scantron hacks… Note: the chapstick was just a BS rumor:”

No commentary of any usefulness about the article, and an implication that it is, in some way, useful and truthful. It’s not. It was written over 10 years ago, and even if it worked in some way (and the article doesn’t even imply it’s more than a little bit successful at best), it sure as heck isn’t going to work these days anywhere like it once did. But since the digg user presented it with little fanfare and hot-linked it, people who “just browse” find themselves lapping up information that is problematic indeed.

So why do I even have it up? Well, I’m a historian. Even a few years later, this file contains some interesting information outside of a questionable methodology for messing with scantrons: it shows how people were affected by these devices, gives ideas of the teenage desire to beat the system, shows the use of “how we did” verification via encrypted signature back then, and a host of other tangental signposts.

I would consider the happiness in which people will pick up this file, not even trying to determine its context, and go off on its usefulness or relevancy, to be equivalent to running into an antiques stores and trying the rusted dental tools in the back, right now, in your mouth, and then complaining that the experience was not hygenic or pleasureable.

Over time, this problem is going to get worse, as all manner of material is a single click away. I’m not saying something needs to be done, but it’s something I’m aware of and watching, and maybe as time goes on, I’ll comment on it more and think out more of what “information pollution” could mean.

I say this, of course, with the apparent equivalent of 500 smokestacks blasting out from the factory.

Categorised as: Uncategorized

Comments are disabled on this post


  1. Masked and Anonymous says:

    Actually, you can undigg things, but the link is hidden in an obscure part of the site. Just one of many reasons why Reddit is better than Digg.

  2. Mungojelly says:

    It’s Digg’s fault. The decision whether to put something on the front page of Digg is made by a whole bunch of users– based on their momentary first impression upon seeing the link. It should therefore be considered only a rumor mill. Anyone who takes things there as sovereign facts based on how many times they’ve been Dugg is a fool and deserves their ignorance.

    I’ve been thinking about strategies for verifying information online. The old school media wants us to believe that the way to get accurate information is to have a Very Smart Person sitting between you and the facts– thank you but no. That’s obviously no recipe for infallibility. Wikipedia (or rather, the strategy “put up a sketch version of the facts and then argue about it”) is proving to be just slightly closer to the mark than the old system, but it’s still pretty far off.

    I have a hunch that we could learn a lot about this task from the intelligence community (spies). Basically what they do is to distill what-is-true from tremendous amounts of imperfect data, through large numbers of imperfect (sometimes even treacherous!) agents. They know how to split the job down into small parts (compartmentalizing). They know how to analyze raw documents in ways that not only extract meaning, but extract meaning in ways that more meaning can easily be extracted from the composite of the extractions. I think we need to work on systems like that for citizen journalism.