Fighting spam with BotMaker

gingerlime · on Aug 22, 2014

What I'm missing the most is any concrete information on false positives. They state the want near zero false-positive, which is great, but what's the actual rate? and how is this measured?

It's easy to reduce spam by X% if you don't care about increasing your false-positives rate.

I'm also curious what ways users have to flag false-positives on twitter. It's easy to 'block or report', but is there a spam box to inspect and mark things that were wrongly classified? If there isn't an easy way, then it's going to be much harder to even measure false positive rates, let alone reduce it.

jqueryin · on Aug 22, 2014

I can myself attest to the fact that Twitter has plenty of false positives. I use a custom url shortener url and on several occasions I have been flagged for spam.

What I'd really like to see is the option to submit a report for the false positive. I'm sure the data mining is picking up on not only my initial submission, but also reattempts with altered content to try and bypass their greedy algos.

Needless to say it's a bit of a bummer when you curate a tweet only to be denied sending. I think they have it out for .CO url shorteners :)

QuantumGood · on Aug 22, 2014

What reason was given you in the spam flag notification?

jqueryin · on Aug 22, 2014

I haven't had one trigger in a week or two so I can't recall the exact response. I've had it happen on separate accounts using url shorteners as well as when attempting to send DMs containing shortened urls.

If it's of any use, I use Tweetdeck.

struct · on Aug 22, 2014

"Key reduction in a spam metric". Stupid question: if they know what spam is well enough to chart it, then why not just use that to fight it?

pornel · on Aug 22, 2014

They're fighting spam in realtime, but they can analyze their accuracy later.

One example: if you have a rule "sending the same URL 100 times in a row is spamming" then you'll let 99 spams through before you identify that they were spams.

ghayes · on Aug 22, 2014

Users can report spam for any tweet ("... More" "Block or report"). I imagine that Twitter would to catch spam before its users do.

netrus · on Aug 22, 2014

One such metrics might be user complains, that only appear after it is already too late.

jonaldomo · on Aug 22, 2014

I was hoping to hear more about what is considered spam. High scores being posted from a game through the API at a high level? What about mass favoriting tweets to get more followers? Or are we just talking about 'Sex pills, free rolexes, getting girls' links to malware sites?

HnHandle · on Aug 22, 2014

Would be interesting if they would have share some more details on the rule engine/framework. How does it compare with drools (apart from it being probably faster).

llasram · on Aug 22, 2014

I wonder how these rules came to be known as "bots" within Twitter. It has an nice symmetry to it though: use bots to fight bots.

on Aug 22, 2014

[deleted]

gingerlime · on Aug 22, 2014

I wouldn't say infinitely easier. Not necessarily even slightly easier. They have lots of challenges that they did mention on the post. Primarily latency. And as the other poster mentioned, spammers can easily check if they got blocked or not, which they can't always so easily with email.

benaiah · on Aug 22, 2014

This is literally the first thing they explain in TFA, if you'd actually read it.

tl;dr: it's all public data, meaning spammers know everything they do about the corpus, and it can't add significant latency.

brianpetro_ · on Aug 22, 2014

TLDR

Twitter built a DSL for writing spam detection algorithms.

junto · on Aug 22, 2014

To publish 'a how we did it' article like this suggests to me that the Twitter engineering team are either justifiably confident in their spam fighting creation (because they are confident it is infallible), or they are naively supremely over-confident; thus publicizing how this works will come back to bite them in the butt.

Only time will tell I guess. From the article it would appear that they aren't doing anything magically different with regards to classification of twitter-spam, but they have found a way to deal with the volume of classification tasks in a pertinent manner. It also gives them a way to quickly respond to new types of spam attacks.

Very interesting. They should consider opening it up as an Askimet competitor. The difference between a blog comment and a twitter post is negligible.

possibilistic · on Aug 22, 2014

  > publicizing how this works will come back to bite them in the butt

I didn't see any rules or heuristics published. Merely that they employ a multi-stage filter (as any engineer could imagine), and that they codify their ruleset with a human-readable DSL (which is kind of interesting, but also kind of weird).

  > The difference between a blog comment and a twitter post is negligible.

The difference in textual content between a tweet and blog comments falling within a similar character length are perhaps negligible, though there is often no such length constraint in place for blogs. Comment spam strategies are free to vary string length to optimize for evasion, click through, proliferation, and other criteria. I would also argue there is a different demographic distribution between Twitter and blogs with regard to readership and participation.

That said, Twitter just outlined a number of reasons why their case is special. They are a high-availability, high-volume, low-latency service. Twitter's spam solution was designed to handle their very special set of constraints. I think a multi-stage filter complete with asynchronous post-processing jobs would be a bit much for your average Wordpress blog. People just starting out with (say, PHP) probably can't fathom a multiprocess deployment architecture. Not to say they couldn't, but the journey is a long one.

benaiah · on Aug 22, 2014

> I think a multi-stage filter complete with asynchronous post-processing jobs would be a bit much for your average Wordpress blog. People just starting out with (say, PHP) probably can't fathom a multiprocess deployment architecture. Not to say they couldn't, but the journey is a long one.

That's why he suggested an Akismet competitor - in case you haven't used it, Akismet is a SAAS solution that filters your comments on their servers, not on your own, so you don't have to worry about the architecture or deployment.

possibilistic · on Aug 22, 2014

My mistake. Thanks for the correction. :)

junto · on Aug 22, 2014

My point appears to have been missed and I was down voted to hell for it, but I merely trying to point out that when you are fighting a war (as they are against spammers) then you shouldn't give away any secrets, no matter how insignificant they may appear now.

thewarrior · on Aug 22, 2014

They haven't revealed the conditions they use to determine if something is spam.