The Terrible Secret of Spam —
I get a lot of spam.
I know that’s not exactly breaking news, but sometimes you really have to step back and look at a situation to realize just how entirely horrible things have become. It doesn’t help if you can remember when the situation just wasn’t the same within your lifetime, that you can actually be aware of a time when the problem didn’t exist.
Naturally, unsolicted advertising has been around long before I ever was, both in the manner of flyers, mass mailings, telephone calls from machines or people, and just plain guys showing up at your door. In all of these cases, it at least cost somebody something to harass you, some amount of money, time, maybe even putting them at risk of going bankrupt or not being able to pay bills because they were gambling on enough people responding to the marketing effort to make back what it cost.
Still, even that barrier was actually rather tiny, because you could amortize telemarketer calls, salesman visits, junk mail, using a combination of machines and low-paid people. If you did it right, it wasn’t much risk at all.
Now, however, that’s entirely gone. There is, effectively, no cost. Yes, you have to pay to host a machine somewhere (unless you hack into other machines) and yes, you need to pay for bandwidth (unless you hack into other machines), but even if you’re “legit” enough to have a real machine and pipe you pay for, the per-person cost is insanely low. It costs about the same to mail 10,000 people, a million people, or 100 million people with your haraguing, lying, sex-promising e-mail crap bomb.
There is now a massive range of utilities and organizations dedicated to reducing spam, and as a result people get spam but they may forget how much spam is actually out there. Since I’ve been around a bit, I have my own mail server. This mail server now bounces 1,000,000 spams a week.
Again. BOUNCES. ONE MILLION SPAMS. EVERY WEEK.
That’s the stuff that gets turned away based on a number of criteria. That’s still my mailserver being sent some portion of that mail, analyzing the request, and then making a decision.
After that, there’s probably another 40-50,000 spams that are able to get past that initial set of criteria, enough for my internal spam checkers to do something about them. For each one, a host of tests are done against the letter. Machine resources are spent checking style, links contained within, IP address sending information, and so on. From that, it makes a best guess, and provides a number rating. If the number rating is high enough, it goes into my spam folder and I don’t see it. So this means I merely see 10-20 spams a day.
As a result, and because I can delete this stuff on sight using the same part of my brain that regulates breathing, it’s too easy for me to forget what’s going on. All these many thousands of mails, all these millions of bounced mails, all filling my T-1 line. For nothing.
But, you see, that’s not the only spam I see. Spam, once the province of posts on Usenet or e-mails to unsuspecting victims, has now pervaded everywhere. Examples:
- Referrer Spam. I have thousands (thousands!) of connections to my webserver, trying to hit everything on it, giving me fake referrers to porn or straight spam sights. They fill my log, raise my readership numbers falsely, and make it that much harder for my scripts to analyze the logs.
- Comment Spam. Until I implemented the anti-spam method you encounter when posting message, I was getting 3,000 comment spams a day. Again. three thousand each and every day. You realize why I added this hoop, now. As it stands, I still see the attempts which fail, scrolling my logs unnecessarily.
- Form Spam. A new trend in the past few months; any website I run that has any form whatsoever, is getting bots coming in, hitting every entry field it can, then submitting. It has no idea what that does and where it goes; but it doesn’t care, it can’t hurt. I only get 2-3 of those a day. I’m sure it’ll grow.
- Wiki-Spam. I had a little MoinMoin Wiki way back when; it started to get postings on it from spammers, whose programs would methodically do tiny edits and then wholesale juggling of entries, putting in links to stuff and then reverting the changes so they’d be in the history but the owner/other users wouldn’t care.
I have encountered fake weblogs that suck the text of my weblog down into themselves and then surround it with ads. I see fake websites that link to me randomly. I see postings show up on any place that registration isn’t required, and I have gotten well over 40 spam-related “friend me” messages on that silly myspace account I got a while ago. I even get instant messages from supposed russian supermodels who have decided to bang me silly, as soon as I allow them to join my friends list. (I have resisted the urge to approve these supermodels.)
What I’m getting at is that spam has pervaded all aspects of online life, anyplace where a person could possibly have input. If a person can say something, then someone has figured out how to make a matrix of machines or scripts go in and act enough like a person to get their message in.
The thing is… spam works. Nobody would go through this trouble if it didn’t pay off, and pay off big. And it will continue to pay off big, probably forever. That’s the (not so hidden) secret. I don’t think there’s really a solution to this, considering that fact.
I wish I could now knock out some pithy statement or insight that would solve the spam problem forever. I can’t. I have none. And when I remember how much of my machine resources are being burned away, how much of my electricty bill is probably my poor machine jamming away at this endless golfball-sized-hail of spam mail that comes in every second of every day, it’s just fills me with despair.
Is it any wonder I prefer to spend my time creating content, and hating any form of advertising?
I don’t despair often, but I despair about this.
Categorised as: Uncategorized
Comments are disabled on this post
Meanwhile the spam problem gets even worse:
There are wanna-be-admin anti-spam kooks out there
who think they’ve found the Final Ultimate Solution to the Spam
Problem (FUSSP) http://www.rhyolite.com/anti-spam/you-might-be.html
by hitting the innocent people and rejecting their mail. So thanks to
these guys you have to fight spammers and anti-spammers at the same
So Jason is judging the situation right: there is no FUSSP and the
sooner you realize the better.
Of course you can do much against spam, but think before you do it
and at least you should know what “false positive”. If you don’t handle
this with care you might kill internet communication.
If any solution works you need a whole bunch of methods to prevent spam
and to filter spam.
I hope that Jason’s bounces are on SMTP level, because if you accept a
mail and send it back as a regular e-mail bounce it costs you double
bandwidth and you might be abused as reversed spam relay by sending
spam to faked sender addresses.
About form spam: often forms are checked by spammers if they can be
abused for spam relaying, because the PHP mail command is vulnerable.
But they post things anyway and there must be even individual spammers
trying to abuse even handwritten scripts. If you don’t program with
care and the possibility of abuse in your mind, you’re lost.
Fortunately with forms the power is on your (server) side.
So it’s a constant never ending fight. There are two major rules for me:
– avoid false positives, but if they can occur take care of them
– avoid end-of-pipe solutions, fight the cause by not the symptom
BTW it’s really difficult to post a comment here.
Spam sucks, but don’t hold it against legitimate advertising.
cassiel, what do you think is so difficult about posting a comment here?
Wow… that sounds horrible – and explains why I couldn’t find a contact email address the other day.
Maybe the part of your brain that regulates breathing can also filter out “fuck” in your otherwise brilliant posts — they’d be better for it, and, as another commenter pointed out, could be used in classes, etc.
Just a suggestion. Swear away if that’s what you want. 😉
ACME Labs posted a similar story a few years ago on dealing with 1M mail spams per day. The article is a good read on the methods to deal with that volume and tradeoffs on cpu, memory, effectiveness, etc. A bonus is the current graphs of stats for each filter.
I was IT director for a public school district. Some of the more technically adept students would report teacher emails as spam to various spam blocking services. This would get our schools external IP addresses places on blacklists for weeks at a time.
Out of curiosity, Jason, why don’t you use a NOSPAM trick to lower your bounced email count? On that note, why do you bounce instead of silently deleting? Or at least silently delete the spam with the highest “spam index?” I know some people who avoid doing these things for various reasons (the bouncing emails, particularly, because of what happens should a false positive result in someone thinking you’re ignoring them), but I’m curious what yours are.
I use a number of tricks and situations to handle the incoming spam. When I say “bounce”, I mean that my system returns “no such user”. As for “take it in and delete it”, then that means you constantly accept all the messages, in total, which can go into many megabytes, and then delete them, which runs down the system notably. Believe me, I’m doing an awful lot in many directions to handle this, and I’ve minimized it, which means it’s merely at apocalyptic proportions.
Oh, I’m sure you’re doing all sorts of fancy things that I’ve never heard of, having never had to run more than small internal servers. By NOSPAM trick, I just meant getting rid of all the “mailto:” links (or is there only one?) on the site and replacing them with “email me at foo@(NOSPAM)bar.com” or “foo at bar dot com” or some other silly-looking text thing that people can figure out but (most) bots can’t. Or is doing it that way admitting defeat? 😉
I see. No, that trick won’t work; my address is in too many places and I’m too high profile. I always thought that was a sketchy way to do things, anyway.
I found this page while searching for “referer spam” and “ascii.textfiles.com”. The reason I was looking for that combination is that every few days, spanning from 4/26/2010 through today, 46 times now, I get a hit in my website log for my site’s main page that says the referrer is ascii.textfiles.com.
Near as I can tell, however, there’s no mention of davidlauri.com anywhere on ascii.textfiles.com (at least not until I submit this comment today). The log entries are always from the same IP address, 22.214.171.124, which resolves to tokyo.dreamhost.com. My logs aren’t accessible to the public, so there’s no incentive that I can see for SPAMming my logs.
Any clue as to why this is happening?