Main

Tech-geekery: spam Archives

April 6, 2004

Comment spam redux

Okay.. the tide of comment spam is now definitely rising. I have taken the following measures:

1) All co-bloggers can now edit all posts, so unless I missed something they will now be able to delete or neuter comment spam.
2) The default setting for comments on new posts is now "closed". For each new posting, bloggers will have to decide if they want to invite comments or not. This is a temporary measure, until I have resolved item number 3.
3) The single most abused feature is the URL input field. I hate having to disable that, both because it is a courtesy to legitimate posters to let them display their URL, and because the level of discourse tends to be higher when people's posts are associated with a public website or known identity. However, I will not let this blog be polluted through this feature, so when I have a little more time, I will edit the appropriate templates (a single Moveable Type setting to allow or disallow this would be *very* desirable), and URL display will be gone. Blame the terrorists, not me.
4) I will look into preventative measures like MT-blacklist or Typekey. Both solutions have disadvantages. Shared, non-user-visible blacklists similar to MT-blacklist have failed for Usenet and Email so there's no reason to assume that they will work on blogs in the long run. I'll probably install it but it will only be one extra line of defense.
Typekey, on the other hand, makes any blog that uses it considerably less open and hospitable, unless all blogs use it *and* everyone on the internet develops a high level of interest in blog comment posting. Bleah.
A Bayesian filter plugin exists and that might work better. But my prefered solution is still violent retribution meted out on spammers.
5) In the absense of laws allowing for violent retribution, I will name and shame spammers instead.

Continue reading "Comment spam redux" »

April 22, 2004

Name and shame, updated

It's irritating when you're working or relaxing in a room, and someone else is hovering around you, swatting flies. However, it needs to be done. So for your continued edification, I present to you the Bastard of the week award, which goes to German Rotsberg of various gay porn and fake viagra sites. This piece of vermin has repeatedly comment spammed mine and other blogs. Here, reposted in its entirety, is the info returned by the essential Sam Spade service.

I would just love to wake up in the morning and find that German Rotsberg's little businesses have been closed down. He uses GoDaddy for his domain registration, which has an Anti-spam policy although it doesn't specifically cover blog spam yet.
-------------------------------------------------------------------

Continue reading "Name and shame, updated" »

May 1, 2004

Looks like I'll need another domain hosting provider...

To: support@gandi.net:


To whom it may concern,


I get repeated spam from http://www.buycheapdrugs.biz which is registered at Gandi. I noticed that Gandi, unlike other domain name providers such as GoDaddy, does not have a policy against the use of spam to advertise domains registered through Gandi.
Legally, you can institute such a policy, and ethically you should. I request that you add language to your terms of service prohibiting the spam-advertising of domains registered with Gandi.

I am a Gandi customer myself, and a satisfied one. However, I do not do business with Internet companies that do not make it a priority to protect the Internet against spam, and so your response to this issue will determine whether I will do business with Gandi in the future.

Best wishes,
--
Reinder Dijkhuis, [email address deleted]
http://www.rocr.net
http://whio.keenspace.com

From Gandi to me:

To improve our help desk, we have set up a new interface. From now, to contact our support please visit the link: http://www.gandi.net/support-en There you will easily find answers to the most fequently asked questions, in the area you are interested in. At the end of this research, you will of course be able to contact us by email. An online form will help you to ask your question. Then you will receive our personalized answer. Our help desk is still free of charge, in english or french languages, and only by email. Best regards, Gandi support service

From the link provided in the automated message above:

Cases where Gandi can not act

Gandi is a registrar of domain names. We do not provide any webhosting services nor email accounts, that could be used for spam.

Thus we can not desactivate nor delete a domain name on the only reason that it is used, directly or no directly, to send some spam. Because we can not act as a judge.

However please find below some information that could be useful in two kinds of spam:
[snippage]
When you use the Whois on the domain name that you have found in the spam, you see that the domain name is handled by Gandi:

Gandi is an ICANN accredited registrar, and as such registers domain names on behalf of its customer. Gandi does not provide any webhosting nor email accounts to its customer, but only the registration of the domain name. The use of the domain name is only up to the person who owns the domain, and/or to the contacts of the domain: you can find the details of these persons in the Whois.

Boilerplate above the online form referred to in Gandi's automated response:


Warning: our email help-desk is restricted to questions which are not
answered in our website, particularly in our Frenquently Asked Questions (FAQ).

Mail to legal@gandi.net bounced.

This will not do. Gandi has no policy against spamvertised domains and has explicitly made itself unavailable to complaints about this. The issue is "answered on their website" and if you find their answer unsatisfactory, tough luck to you.

ROCR is registered with Gandi until 2005. If their policy hasn't changed by then, I will switch to a domain register that does, even if the other domain registrar is considerably worse in all other respects than this one. Even Network Solutions will do if they cut off spamvertised domains.

Continue reading "Looks like I'll need another domain hosting provider..." »

May 12, 2004

Name and Shame, part 3

I went for two whole weeks without comment spams, but now I'm hit by three of them in a day, all tacked onto the same post (this one, which is now closed), and with clear similarities in style (the URLs, for example, were in all caps in the emailed transcripts). I did the Sam Spade whois thing, and while I don't make a habit of pointing and laughing at a person's genitals I will make an exception this week for:

Registrant Name: Georgi Georgius
Registrant Street1: Simen 12
Registrant City: Styaua
Registrant State/Province: Styaua
Registrant Postal Code: 2321
Registrant Country: RO
Registrant Phone: 40.5298762
Registrant Email: pharm@bonishop.com

Continue reading "Name and Shame, part 3" »

May 13, 2004

A new form of spam, or?

I was puzzling over this for a bit...

blogweirdness.png

This keeps showing up in my Bloglines account. It isn't in Donna's blog itself, but looking closely at her RSS feed, which is provided by a third-party company, even though blogspot provides a perfectly good Atom feed, I found that it was tacked on to the bottom of her feed.

OK, problem solved. Obviously 2rss.com was just trying out a way to make a buck out of the service they provide. Which will fail because their service isn't particularly needed. Livejournals and blogspot blogs typically have working feeds if you know where to find them.

May 20, 2004

CAPCHAs

A number of blogs and forums I visit now protect their comment sections with CAPCHAs, form fields in which you have to type a number/letter combination displayed in an image above the field, in order to prove you're a human. Allow me to state three things for the record, so you can laugh at me if I change my mind in a few months or so.

1. I fucking hate them.
2. I expect spammers will find a way to defeat them, whether this will be Chinese slave labor or a clever combination of Bayesian filtering to determine whether a form field may be a CAPCHA and OCR.
3. Consequently, I do not intend to use this technology on this weblog right now.

May 29, 2004

Swatting flies again

This spammer crawled out from under a little rock, and it's about time I chased it back. Note the phone numbers: 555 is a fictional area code, right?

Culprit: http://samspade.org/t/lookat?a=www.all-debt-consolidation.org aka
http://samspade.org/t/lookat?a=http%3A%2F%2FWWW.I-directv.net
Spam incidents: 2.
Email address used: hrie@yahoo.com
Originating IP's: 62.233.230.117 and 212.160.201.98

Server Used: [ whois.gandi.net ]

www.all-debt-consolidation.org = [ 80.67.173.5 ]



domain:ALL-DEBT-CONSOLIDATION.ORG
owner-address:Acc. Media
owner-address:586 Drew Street
owner-address:14552
owner-address:Little Rock
owner-address:Arkansas
owner-address:United States of America
owner-phone:1.5552556565
owner-fax:1.5552556565
9403324906c56a445851d0a7096a6c52-847167@owner.gandi.net

Continue reading "Swatting flies again" »

July 3, 2004

A veritable avalanche of spam!

This Friday and Saturday, the Talk About Comics forums, where I am an administrator, were hit by what looked like a zombie attack: a flood of spam from a large number of different computers in a very short time. Many of the machines the spam came from were connected to the same ISP, but quite a few were not.
I have never banned so many IP addresses in such a short time, and one of my co-administrators had to ban a bunch more, just to stem the flow. I have posted a thread on the incident on the forums, with a full list of IPs from which spam was posted. If you administer a public system, you may want to know that these are dangerous, and maybe a talented statistician can use this information to reveal where the machine coordinating this attack may have been!

My modus operandi:


1. First I look at the spammed message and add the spamvertised URLs to the word censor, usually replacing them with an obscenity. This effectively neuters the spam and any spams advertising the same URL. This time, I have seen evidence that this technique is working, because quite a few spams had URLs that were filtered out in my last run of additions to the word censor.
2. I look up the IP address, checking if it isn't a shared address from a major internet provider like AOL - banning those would result in many innocent people getting banned. However, if it is a shared address from an ISP that I've never heard of, I will still ban it.
3. I delete the message and start looking for the next one.

The idea behind filtering the URLs is that the spammer will derive no benefit in the form of improved google ranking, even if an individual message is not caught. The "obscenity" under point one serves to make posts filtered in that way stand out so that individual forum moderators will still be motivated to delete them. Besides, we have a bit of a tradition on the TAC forums of using the Word Censor filters to turn swearwords into other swearwords. Also being mean to spammers in the morning helps me go through the rest of the day with a smile and a kind word for everyone.

I was glad, when the work was done, to get some props for the hours of police work I'd put in. That's rare. Still, it caused me to get even further behind with work on Monday's comic (which is almost done, so it's not that bad).

July 22, 2004

Another one to disqualify from the human race

Three spams in a row to this weblog, from an outfit called paxilmedication.biz. All of them posted from IP address 151.37.165.231 .. They have a phone number which may or may not be genuine. It's in New South Wales, Italy.

Again, the domain name is registered with GANDI, which means they can spam the domain with impunity. If you have a domain that you need registered, avoid GANDI until they have updated their policy to ban spamming, and if you're already registered with GANDI, switch away from them and tell them their support for spammers is the reason.

The domain registration may be fraudulent. The only Krastio Atanassov I can find through Google is a linux HOWTO writer who seems to be concerned with anti-spam measures. The spammed web page itself doesn't appear to have a contact address on it (yes, I checked! The things I do in the battle against spam), which suggests that the spam may have taken place as a way to smear and discredit the real Atanassov. Considering the psychopathic behavior of spammers in the past, it is not at all unlikely that they'd go to such lengths.

Continue reading "Another one to disqualify from the human race" »

July 23, 2004

The day I banned half the internet

Being an administrator for Talk About Comics can be an exhausting experience. This morning, I awoke with a weird feeling of premonition, like a disturbance in the Force or something. I thought to myself "It's going to be one of those days when I start off motivated to do real work done, but this motivation will be defeated the moment I log on because there will be another spam attack on Talk About Comics." And Nostra-Dijkhuis was right again. There were already several complaints, both in the Trouble Ticket forum and elsewhere, about casino and other spams, following the same modus operandi as the attack that gave me so much trouble two weeks ago. Then, as now, old threads were resurrected with postings from Guests duko, bugi and wlulax_60, containing off-topic messages (variants from a small pool of standard texts) with a URL randomly inserted mid-sentence. They were once again posted from a wide range of IP addresses, in disparate ranges, but with about half of them belonging to one Internet Provider, Telefonica. When I logged on, Fearless Leader, who really has much more important things to do like paying me, hyping the Modern Tales sites and inventing new things to conquer the world with, had already deleted 120 of them. But they were still trickling in at a steady pace.

Once again, I set about neutering the spams by feeding the URLs to the Word Censor Filter, then banning the IPs from which they were posted, then deleting the messages themselves. I was frustrated to find that wildcards in the ban list didn't work the way I expected them to. But I've figured it out now. But even with wildcards for the third and fourth blocks of the IP addresses, I'm hitting the ban list often, and it really does feel like I'm banning half the internet, or at least most of Spain.
The process took hours of productive time away from me, in which I did things that were the opposite of fun. I am not a violent man, but I have some interesting ideas about how the appearance of the person behind these spams can be improved.

Continue reading "The day I banned half the internet" »

July 27, 2004

Third time's the charm... of a diseased toad!

Because I don't like having my time of my resources stolen from me, I am not happy with the proprietors of Adult-movies.org and Hot-gay.tk (both links go to the Sam Spade pages for the spammed URLs) who spammed this weblog three times from the same IP address ( 83.237.7.75 - may be spoofed) in the past hour or so.

Unfortunately, both domains are less than penetrable. No domain registrar that I recognise as a reliable name, so no use complaining to them (although the domainsbyproxy IDs suggest that it's a subletter for Go-Daddy who are usually responsive). However, I have reproduced what I could find in the hope that a smarter person than me can take them down.

Continue reading "Third time's the charm... of a diseased toad!" »

August 8, 2004

I've installed MT-Blacklist

This is the place to test if legitimate comments get through. Post away! Any old nonsense will do, even Republican talking points.

Update, Sep 14, 2004: this entry now seems very popular with real spammers, so I'm ending the test and closing the topic. First topic I've closed since installing MT-Blacklist. I was rather hoping I wouldn't have to do that again...

September 1, 2004

These guys are holding my comments hostage

The timetable keeps getting reset on the liberation of the comments. I will start allowing HTML, links and images as well as display of the URLs you guys all so faithfully type in when you comment, if I can go for two weeks without having spam pass MT-Blacklist. Unfortunately, spam passes MT-blacklist daily; in fact, the problem continues to get worse.

Just minutes ago, I caught a glimpse of how spammers may defeat MT-blacklist altogether: by brute force. I checked my mail and was flooded with transcripts of comment spams from blackjack-123.com who sent over 100 spams from a wide range of IP addresses in a space of a few minutes. Moveable Type's builtin flood control caught a few addresses and automatically banned them (yay!) but the flood was so overwhelming that it interfered with my ability to add the casino.blackjack-123 address to MT-Blacklist. The flood continued while MT-Blacklist was unable to process the information.
I am sure that this tactic will be used again, and more powerfully.

The news on extracting much-desired retribution from blackjack-123 is also bad.

Continue reading "These guys are holding my comments hostage" »

September 7, 2004

How Bizarre....

I just received the following email:

Subject: :) ever wanted pain pills, overnite shi`ppi'ng to your door

broken, okay? the law. remember the law?


>From: Leatha Gardner [mailto:wgtxv@hrxu.com]
>To: mickey klaers; jacob giernoth
>Sent: Sunday, March, 2004 6:50 AM
...snip...
>with me,fri-ends! --- Gore AR
>wysokinski 10zlobek 91stawaly najslynniejszyprycza

...I have absolutely no idea what that was about.

September 13, 2004

Congratulations!

200.31.79.214 is the 100th IP address to be added to my IP ban list! The lucky winner just spammed the blog with a link to an internet casino. Sam Spade has no info on the URL, and the lucky number points to a university, so I won't do a full exposure of the spam here.

If this blog can go for 14 days without spam passing through MT-Blacklist, I will liberate the comments. The clock has been reset...

December 28, 2004

Spammers fined.

Some good news from the war against spam:


Dutch telecommunications watch-dog Opta has fined its first batch of spammers since the introduction of the anti-spam ammendements to the telecom law late last year that granted this power to Opta. (Links lead to Dutch pages.)

Fourteen other small-scale spammers received warnings.

It's not as good as having them hunted with dogs, but until that (and kicking them in the bollocks) becomes legal (uhm, I have some very imaginative ideas about what kind of treatment should be legal to hand out to spammers), it will have to do.

January 1, 2005

Tip for users of Opera's email client

If you've been using M2, the email client coming with Opera, for a while, you may find that the "learning" spam filter's performance, after initially improving, starts trending downhill, leaving more spam messages unfiltered. I was puzzled by that, but I think I've found the cause: backdated spam.

Continue reading "Tip for users of Opera's email client" »

January 15, 2005

mt-commentproxyblock

I have installed mt-commentproxyblock even though the web page didn't say whether it worked with my creaky old version of Moveable Type. So this will be your test post to see if it's broken the blog and especially comment submission. This plugin's supposed to be really good, by the way.

Update: The plugin doesn't break commenting, but it doesn't appear to do anything useful either. I had another 50 comment spams to archived posts (I still haven't finished closing the archived comments, which is a hugely time-consuming process) this morning, so another hour of my time, on Sunday morning, was stolen from me by those scum.
I've had some suggestions to help solve the problem ranging from intelligent (non-image-based) Turing tests to upgrading to MT3.*. All of these take time to implement and test so it may be a while before I get around to working on them. In the meantime, I will continue to barricade the blog. Apologies if this causes any inconvenience.
Update #2: I have also just got my first, albeit minor, wave of trackback spam. I really want to kill someone painfully with my bare hands right now.

Continue reading "mt-commentproxyblock" »

January 19, 2005

Nofollow

Hallelujah.

Continue reading "Nofollow" »

January 26, 2005

No-follow a bastard child of grep

During the blog's outage, I continued to follow the debate on whether "nofollow" was useful, harmful, neither or both. Right now, I don't feel like catching up with that. I'm sure anyone who's interested in fighting comment spam has seen most of the arguments. Except this one:


I noticed several hours ago that for some reason the trackback section of my index page was no longer marked up properly. ... The discovery was followed by a series of progressively more outlandish attempts to coax recalcitrant code into revealing itself, without success. What really hurt was how the comments, which were encased in the exact same html code structure, performed flawlessly. Then I remembered I had installed the new "nofollow" Movable Type plugin earlier in the day. I removed it, and my problems were gone
More about "nofollow" here.
.
I briefly considered being a hero and repairing the plugin, but then I saw the grep pattern that adds the "nofollow" rel attributes to comment and trackback links, and it is a monster, so I'll settle for flagging this bug. ...

One or two people in that harmful/useful debate have expressed amazement over the speed at which the concept was rushed into becoming a de facto standard by Google, Yahoo, MSN and major blog software developers. If the implementation was equally rushed, it's no surprise that the plugin is buggy.

January 31, 2005

The scum speaks

Pete Ashton pointed to an interview with a varmint comment spammer oh what the hell, "varmint" covers it quite nicely. Like Pete himself says in his linkblog, a "nailgun in the bollocks is too good for them". But the disgusting parasite is kind enough to fill in some of the gaps in my knowledge of the history of blog spam.

Continue reading "The scum speaks" »

February 3, 2005

Spam prevention link clearinghouse

Via Pete Ashton, I found an article on server-level solutions to the comment/trackback spam problem, which everyone who has a weblog that is open to comments or trackback should read themselves or forward to their web server admin.

Continue reading "Spam prevention link clearinghouse" »

August 7, 2005

Ooh, this is a good one

You all know the fake paypal spams that arrive in your email boxes. They usually have alarming messages about your account being screened or suspended because of an "incident". Just now I got a version that's a little more devious:

Continue reading "Ooh, this is a good one" »

December 24, 2005

Looks like the gmail honeymoon is over

I'm now getting oodles of spam to my gmail address. Most of it's in Chinese, and the amount that passes gmail's spam filters is greater right now than the amount that gets caught in them.

I hope this is temporary and that they upgrade their spam filtering in the next few days - or maybe someone should disconnect China from the Internet for a couple more years until it has a government that enables dissent and stifles organised crime instead of the other way around. If not, I may be forced to switch email addresses again. I need to have an email address that I can publish on the internet without subterfuge, and without being swamped with crap. Maybe despammed.com is working properly again?

July 19, 2006

Bad Behavior

Via Branko, I hear of Bad Behavior, a

fingerprinting method for HTTP requests, [which] has proven, as one user called it, "shockingly effective" at identifying and blocking malicious activity, including blog/wiki spam, e-mail address harvesting, automated cracking attempts, and more. It does all of this looking only at the HTTP request headers; for POST data, the content of the spam is not analyzed at all.

If you have a Wordpress blog, you probably need this, but it is designed to be easily integrateable into other PHP-based content management systems. If I read the documentation correctly, I could install it now and have it do basic spam-blocking work in Willow, but I prefer to wait until Mithandir has given me his opinion and maybe done whatever tweaks are necessary to make all the functionality cooperate with Willow.
(Mithandir's own motivation for doing this is probably a bit low right now, though - he's vacationing in Norway, and he reports that the amount of spam on his own website has dropped spontaneously over the past few days. So I may change my mind and muck about with the plugin myself. I should be able to stick in an "Include Once" call....)

September 2, 2006

Joey Manley could use some help keeping the TAC forum spam-free

The Talk About Comics forums are once again being overwhelmed on a regular basis, by spambots hiding behind Telefonica's lack of real anti-spam policies. Telefonica de EspaƱa does have an Acceptable Use Policy but to my knowledge, its enforcement is still a joke.
What Joey wants to know:


There's a flood of fake phpBB user sessions, coming from numerous different IP addresses, crashing the whole server every few hours.

Probably spambots.

Fellow admins: any thoughts on solving this?

Note that I tried my best to install bad behavior, but its header-pushing ways conflicted with sessions.php and page_header.php no matter what I tried.

A large number of the spambots seem to have IP addresses that resolved to:

red.telefonica-wholesale.net

I know that Reinder has banned an entire ISP or two before, but I don't know how to do this. Any help?


So if anyone can help him make Bad Behavior work on PHPBB and/or keep the varmints out through PHPBB's regular banning system, please drop him a line.

And I could use some fact-checking: Am I right in supposing that Telefonica de Espana are still as bad as ever when it comes to dealing with spam, or have they cleaned up their act in the past few years? I'll be doing my own research, but if you have ready knowledge, please contact me.

March 9, 2007

Why Web BBSes suck

First, a question I've been meaning to ask: does anyone reading this know of a web bbs that
1) runs on PHPBB; and
2) has some version of Bad Behavior, such as this mod as its only defense against spam? In other words, no CAPCHAs, no other mods or plugins aimed at preventing the board from being overrun with spam?

If so, I very much want to hear from it. Bad Behaviour has done really well at stopping the endless flood of spam on Talk About Comics that I've been wondering if the time has come to stop making new members jump through hoops to get activated, or even open the forum to guest posters again. You know, make it a more inviting place. I'm not the guy who gets to decide this, by the way, but if there's evidence that Bad Behavior can do the job on its own, I can put in a word. Let me know in email or comments under this post.

I was prompted to bring this up by reading Matt Skala's recent post Why Web BBSes Suck. It's a great post that really opened my eyes to the extent to which I was taking bad functionality for granted for no other reason than that they've always been designed that way. I could quibble about some things, but I think the general thrust of his argument, that Web BBSes have terrible usability and don't serve the needs of their users well, is correct.

There is good news on some issues. Project Wonderful Talk, whose CAPTCHA I've finally been able to defeat, allows the use of Livejournal accounts for identification, which I hope many more boards will adopt (as well as other, similar, multi-site identification methods); PHPBB isn't as ubiquitous as it was a year ago even if it's still very dominant, and BBcode is more standardised than Matt claims it is. I also think the dominance of PHPBB could end very quickly if something truly better came along. Five years ago, when Ultimate Bulletin Board was as ubiquitous as PHPBB is now, it was quickly superceded by PHPBB because PHPBB was less crash-prone and easier to set up. The spambots have since made PHPBB at least as big a nightmare to work with as UBB was then.

So what I'd like to see is a project in which skilled designers and coders who have read Matt's rant build a new Web BBS from the ground up so it has the features the users actually need instead of the ones that Ultimate Bulletin Board happened to have in 1998 and which all other Web BBS systems have copied. And integrated spambot protection that actually works. Those two ingredients together would, I think, make most forum owners drop PHPBB like a hot potato.

April 12, 2007

Well, so much for Movable Type's nifty new spam prevention

The keyword blacklisting in Movable Type has one little drawback: it doesn't work. I've added several variants of "Good Site! Thanks!" including "ood site! Thank" to the blacklist but spams containing those phrases continue to get posted. Update: I've boned up on regular expression syntax and the rules for whole-word blacklisting, and it works well now.

Worse than that, because of Movable Type's insane resource consumption, forced mass rebuilds after a spam cleanup sometimes hit Xepher's resource limits, causing them to time out and the rebuild to fail, meaning that the spams don't get deleted from the posted entries (though they do get deleted from the database). This is Not Acceptible.

Worse, the filter's performance seems to be worsening. Spams that automagically get junked still outnumber spams that don't, but not nearly by as much as they did a month ago. I've got bad experiences with learning filters (Opera's, for instance, tends to learn it wrong even though I'm pretty damned dilligent about catching any spam the filters don't, and marking it as such before deleting it); I don't know which part of the setup is failing to learn about spam, but one of them is. Maybe it's not updating its blackhole list.

This weekend, I'm going to beef up the anti-spam defenses, installing Akismet and everything else that I can find that might block it. Until then, don't be surprised if you suddenly find comments closed across the blog. I'm leaving them open on this one in case someone wants to suggest a neat anti-spam trick or plugin, though.

BTW Trackbacks have already been shut off again, probably for good this time. I've switched off sending trackbacks as well, except possibly to the aggregators that Movable Type auto-pings.

June 12, 2007

Comments on Waffle now moderated thanks to spammers and Movable Type's general uselessness at spam prevention

After an overnight spam attack in which hundreds of spams got posted to the blog, including many that would have been blocked if the regular expression filter actually worked, I have set the comment options in Movable Type to moderated. I was going to switch off commenting entirely, until I realised that moderation would work for the small number of real comments I get here.

There may be delays in getting your comment posted. I haven't switched on email notification for new comments because I'm not that masochistic. The ratio of real comments to spam is very low, and the last thing I want is to have hundreds of spam comments in my incoming email as well as my Movable Type backend.

This message only affects comments on the weblog, not on the webcomic, which has a superior commenting and comment filtering system written by one guy in his spare time.

I could go on forever on how bad Movable Type's spam prevention is. Where to begin? How about with cleanup? I could cook dinner in the time it takes to rebuild a hundred entries - and then let it get cold checking whether the entries have actually rebuilt. At least one batch of twenty rebuilds timed out during today's cleanup, which means that the spam posts on those may or may not have gone from the archived entries.

Why twenty? Why not do a hundred at a time? Partly because of the timeout problem, but today I'd actually have been willing to do the cleanup in batches of seventy-five or a hundred, just to get it over with. All the spams that got posted were gibberish (which I can't filter because there's no regular pattern in it) with links in BBCode (which I can filter using a regex, but as I said, the regex filter doesn't work). But another problem with MT's commenting system is some very poorly-written AJAX(-ish) programming in the backend, which causes common interface elements to behave differently from how they should. You can see that in the category selecter - unlike with regular dropdowns, you can't actually scroll to the category you need unless you keep the mouse button pushed down all the time. If you don't keep the mouse button pushed down, the dropdown will reset itself to its initial position. The same happens within the AJAX(-ish) widget that governs the display options in the commenting backend, so when, brainwashed as I am by more than a decade of using standard dropdown boxes, I thought I'd selected to display 75 rows of comments in my backend, I'd actually chosen to display twenty. So I ended up cleaning them out twenty at a time. Another example of terrible backend scripting is the checkboxes with each individual entry's backend that you can use to close the entry for comments or trackbacks. You have to click them very decisively and firmly, looking straight at them and mumbling incantations along the lines of "obey, motherfucker". And. Don't. Blink. Otherwise, they will revert to the state they were in before you clicked them. I've observed this in both Opera and Safari, by the way. It's unbelievable that something like this was allowed to pass the quality control. If you don't give the act of clicking a check box your full and undivided attention, you'll move your mouse to "Save Changes" and click that thinking you've closed the entry whereas in fact you've left it wide open. It's Movable Type's Christmas gift to spammers.

What else? Oh yeah. The Spamlookup Plugin's word and regular expression filter works only about half of the time. I don't know what causes it to fail, but fail it does. Also lose and suck.

But all this bitching about the superficial design and implementation flaws only serves to conceal Movable Type's fundamental design and implementation flaws. These aren't unique to Movable Type - I could easily write a similarly long and ranty screed about how bad, say, PHPBB is in this regard.

Movable Type and many other content management/commenting/forum posting/yadda yadda yadda systems have this fundamental design problem: There is no single interface for dealing with spam, and far too many of the tools are included as plugins. Bundled plugins, as far as SpamLookup is concerned, but still plugins.

Systems that publish user-contributed material to the web should be written from the ground up to detect and prevent spam The SpamLookup code, as well as additional code like Akismet and Bad Behaviour that users now have to hunt down and install, should be there as part of the core functionality with every installed version of the system, so that the user running the install doesn't have to think about it and spam can be dealt with as quickly and quietly as possible. Spam prevention is as important as the content creation itself, for the simple reason that spam will eventually be posted in such numbers that it will bury and defeat the content creation (see A quick reminder of why there are no comments on this blog from 2005) and, in forums, bury and defeat all other aspects of the forum (see any PHPBB forum that hasn't got a team of rabid, fascist moderators purging the member lists, blocking posts by non-members, blocking fake account creation, blocking whole IP ranges from posting messages or creating accounts, blocking, blocking, blocking).

Over time, the utility of a content creation system that lets spam in drops to zero. For that reason, it's worth it to compromise other aspects of the system, such as ease of use, to prevent spam from getting a foothold. In Movable Type (and PHPBB, and, and, and), we get poor usability anyway, especially in dealing with spam. To close old posts, we need to go to one place, or rather, several places: the posts themselves (there are, of course, plugins for that, but see the previous paragraph). To clean up spam, we need to go to another - the comments backend. To filter our messages, we need to go to yet another - the SpamLookup plugin, and if we have three different kinds of changes to make, we need to open three different boxes to make them. Then there's the general settings in which we decide how to handle comments globally, and we need to go somewhere else again.

Simplifying this isn't a trivial task, in fact now that I think of it, it's rather daunting. However, adding "Delete and close" and/or "Delete and Blacklist" buttons or checkboxes in the comments backend would shave off quite a bit of time from the daily despamming chores. And those would be easier to add if blacklists weren't governed by plugins to start with.

See also: Six Apart Picked Apart.

July 16, 2007

Comments on the comic under threat - new measures in place

If you've tried and failed to get a comment published in the Rogues of Clwyd-Rhan comic archives, let me know. The comic has been under a sustained spam attack for over a day now, with almost half of all IPs and pageviews being the node that gets served when a comment is blocked. To deal with this more efficiently, Mithandir has installed an upgrade to the comment system allowing me to quarantine comment spam. Most of the new batch of spam doesn't get blocked until it reaches the content-based filters, which are processor-intensive. With the quarantine, I should be able to see what IP addresses are used for sending the spam, and block those, which is more efficient.

However, there's always the possibility that something has been broken in the upgrade and legitimate comments get blocked. I already know the upgrade didn't go smoothly, so I'm keeping an eye out for both legitimate comments getting blocked and oodles of spam passing through.

The above does not affect the weblog, where comments will remain closed for the time being.

August 29, 2007

In which CAPTCHAS are not so much a cure that's worse than the disease as a disease in their own right.

While I was on a Moorcock essay tip, I went to the Michael Moorcock's website to see if his short essay Epic Pooh actually did have some sort of a sequel as promised*). Multiverse.org is largely built on forum software, which is a less than ideal way to manage a website to start with, but still I was more than a bit surprised to find that I had to fill in a CAPTCHA before said software would show me search results.

You read that right. I had to prove that I wasn't a bot before I could search. What the fuck? I know from bitter experience that spambots can be a cancer on even a well-protected website and that spam can take down a server. And yes, spammers will post into any text box in any web form. But as long as you don't post search terms to anyone other than the searcher's results page, and there's no reason why you should, I don't see how bots carrying out searches are the sort of problem that can be solved by harrassing legitimate users with CAPTCHAS. Not that there is any problem for which CAPTCHAS are the solution, but this particular use of them takes the bakery.


*) Answer: Yes. "Continued" didn't look clickable but it was, and clicking it caused the next page to show. The printer-friendly version is probably more convenient to read.

About Tech-geekery: spam

This page contains an archive of all entries posted to Waffle in the Tech-geekery: spam category. They are listed from oldest to newest.

Tech-geekery: linux is the previous category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34