Feature Spotlight: The Hidden Clues in an Email Address
By Jonathan Hsieh /
6 Nov 2015
We love giving our users an opportunity to discover more about what’s going on behind the scenes. Sometimes we’ll hear one of our customers say, “Can machine learning really do that?” or “Wow, I didn’t realize that’s how Sift Science worked!”
So we’re introducing a new blog series called Feature Spotlight to give you a peek under the hood of Sift Science and dig into the various features that make up our machine learning platform. In each post, we’ll examine one of the features that Sift Science analyzes to detect fraud – ranging from “Does the user share a Device Fingerprint with a known fraudster?” to “transaction velocity over the past day”. Check it out and send us your questions!
For our inaugural post, we’ll be focusing on how Sift Science takes a simple email address and extracts dozens of fraud signals from that single piece of data.
Why is email such an effective signal for detecting fraud? Not only is it a ubiquitous piece of user data, collected nearly every time someone creates an account or makes a transaction online, it’s also generally unique to a user (or should be). By analyzing the components of an email address, we can actually reveal different “facts” about the person behind the address.
Our initial machine learning analysis looks at the full email address to determine whether we’ve seen it before (and if it’s been associated with past fraudulent or legitimate activity). We calculate the approximate age of the email address based on when we first saw it in our global network. We also normalize email addresses to make our analyses more effective – letter case, supplementary symbols, etc. don’t impact our ability to recognize emails (though we use them as separate signals for fraud).
Here’s what our technology can automatically detect:
- If we’ve seen email@example.com before (on your business’ site or elsewhere in our customer network)
- The approximate age of firstname.lastname@example.org (~10 months old)
- jON.aTh.email@example.com and firstname.lastname@example.org are potentially related
We also break apart the email address into individual elements for analysis. Here are a few examples:
- Email Username
- Repeat fraudsters will often create an army of email addresses by only tweaking a few characters in a username. Sift Science is able to identify this behavior and determine that email@example.com is probably related to firstname.lastname@example.org (and that he’s likely to be fraudulent).
- Does the username contain a known name? We’ll check to see if the email and the billing information share similar names.
- Email Domain
- Is the email domain a known disposable one? Is it a free email address? Signals like these increase the likelihood that someone is a fraudster. Not to worry though – we’ll take care of all these checks.
Signals derived from the user’s email address are just a few of the thousands of signals that Sift Science automatically processes through our machine learning platform. We’re able to learn what fraud looks like for each unique business, enabling them to stop fraud before it can hit.
Want to learn more? Stay tuned for our next installment of Feature Spotlight!