In case you live under a rock, or rather, are smart enough to disconnect yourself from social media: Elon Musk has given in today and agrees to buy Twitter ($TWTR) at his original agreement's purchase price of $54.20/share. After months of Twitter insults to the executive team, accusations of fraud, and even lawsuits, he has thrown in the towel.
So now that Elon's really buying Twitter, warts and all, we were wondering – what's he actually going to do about the bot problem?
The Bot Problem
Zooming out a bit, Twitter has more than "a" bot problem. It has many bot problems.
Some problems it may want to solve – for example, visible problems that degrade the experience of power users like scammer/impersonators. Other problems like sockpuppet accounts and troll farms are in the broadest sense bad, but a hard-to-detect user that boosts Twitter's MAU numbers may attract a little less attention from the executive team.
Let's zoom back in, and just pick this one: the impersonator accounts.
What does an impersonator account look like?
Impersonator accounts are a specific "bot" problem that we'll focus on for this project. Impersonators masquerade as popular Twitter users, following their followers and trying to scam them.
For example, #FinTwit personalities usually have a problem with these impersonators who try to scam followers into crypto schemes. I've heard of other niches where this occurs to – astrology is one that comes to mind, where the impersonator pretends to be the astrologer and scams their followers into setting up a fake astrology reading session for $$.
Here's one example – an impersonator of Michael Green, Chief Strategist at Simplify Asset Management. The real account:
The fake account:
The fake looks pretty obvious when you look side by side, but the scam works by being "good enough" when they message a user and start a discussion. Like most spammy scams, only some tiny fraction of people need to fall for it for the fraud to pay off.
The thesis: this can be stopped
Twitter has thousands of engineers, including a lot of data staff. So presumably if they wanted to stop this problem, they already would have. There's probably some nuance we're missing.
But that's no fun.
Let's go ahead and try to solve this problem ourselves. Can we build a model that reliably detects impersonator accounts, detecting nearly all of them and without many false positives?
We're doing this on the fly here, so we may hit a dead-end or take some twists and turns. Let's lay out a high level gameplan though. Here's what this series will look like, if all goes according to plan (which data projects never do):
- Collecting a dataset
- Engineering features for model
- Building a baseline model
- Improving on our model
- Presenting results
Interested in following along? Subscribe to our blog for updates on this project as they come!
Coming Up Next: Collecting a dataset