Content Moderation Best Practices: Q&A with Kevin Lee and Jeff Sakasegawa
17 Apr 2018
User-generated content is the lifeblood of online marketplaces and communities. But it can also be a powerful tool in the hands of fraudsters. Content moderators must act quickly and decisively to stop the spread of abusive content. If they don’t, they risk serious damage to a company’s brand and bottom line.
Trust and safety architects Kevin Lee and Jeff Sakasegawa draw on their diverse experience to share best practices for content moderation.
What is content moderation?
Kevin Lee (KL): Content moderation involves reviewing any user-generated content (UGC) on your platform. If you’re Yelp, that means moderating user-generated ratings and reviews; if you’re Facebook, that means reviewing any piece of content a user might post on the site.
What kinds of websites, marketplaces, or communities might benefit from a team of content moderators?
Jeff Sakasegawa (JS): All of them! Any site that allows UGC must be sensitive to how customers are experiencing the business. Online marketplaces and communities use content moderation to foster trust, building a safe space for exchanges between users.
How much of content moderation is proactive, and how much of it involves reacting to problems as they arise?
KL: Most companies don’t have the infrastructure and tools to proactively seek out abusive content. That’s because many companies don’t invest in robust systems for content moderation when they create platforms for UGC. The problem, of course, is that by the time they catch someone abusing the platform, they have to scramble to catch up.
Larger companies like Facebook, Yelp, and Google have taken a more proactive stance in two ways: investing in machine learning, and enabling users to flag problematic content. The second approach is still reactive because it relies on users to report abuse content — but it’s more proactive in the sense that the content moderation system can then draw on that reporting to weed out similarly abusive content in the future.
How can fraud and risk teams incorporate content moderation into a coherent strategy for fostering trust and safety in their online marketplace or community?
KL: Companies that allow UGC must bake in the ability to moderate content at the product-level: by building moderation into the engineering roadmap, or by allowing users to flag inappropriate content.
For example, Facebook went a long time without allowing users to flag problematic content. Their users were a massive but untapped source of content moderation. Indeed, content moderation doesn’t have to come from an in-house team; it can be an external community, as well. If anyone in the community is allowed to post something, then anyone should be able to report it.
JS: Teams encounter pitfalls because they aren’t thinking about moderation from the get-go. Thinking about your content moderation from the inception of your business — and thinking about how you can automate using systems like machine learning — can be vitally important. If you incorporate content moderation into your strategy up front, then you can scale your operation quite well.
Content moderators must walk a tightrope: they have to be thorough, but also impartial. How do content moderators approach the diverse and sometimes controversial content they encounter online while also doing their job effectively?
JS: If you could talk to content moderators off the record, maybe over their beverage of choice, you’d learn that it gives them a lot of heartburn! The problem stems largely from potential moderator bias. Let’s say someone makes a questionable post on your site. Many websites can now draw on third-party information to learn more about a user; this information might reveal that the user is a hateful person. Once a moderator learns about a user’s background, they might start making inferences about the user’s intent that colors their perception of the user’s post.
For a human moderator, it can be very difficult to make judgment calls based on codified policies and procedures. They have to focus on the terms of service and divorce their subjective perception of the user from the rules at hand.
What are some gray area cases that might come up in content moderation?
KL: Let’s say you don’t allow hate speech on your platform. The problem is that hate speech comes in several shades of gray. On Twitter, you’re not allowed to single out a particular race or religion; that’s fairly clear. But the issue quickly gets blurry: while you might be able to say “I hate Americans,” are you allowed to say “I hate white people”? Probably not.
JS: Generally, companies don’t allow users to express hateful views about protected classes. But what constitutes a protected class might vary from company to company. Most businesses deal with gray areas by drawing a line between expressing an opinion and threatening harmful action. For example, saying “I hate Kevin’s haircut (sorry, Kevin!)” is very different from saying “everyone with Kevin’s haircut should be smacked on the shoulder.” While both take a negative view of his haircut, only one encourages violence.
However, many communities online have their own vernacular and code words. Twitter doesn’t allow you to make hateful comments about Jewish people, for example, but communities can start using code words to refer to Jewish people and try to circumvent these policies.
How do content moderators create clear guidelines for what is and isn’t acceptable on an online community or marketplace?
KL: The easiest and most effective way to create guidelines is to come up with concrete examples of what is and isn’t acceptable, and to articulate the large encyclopedia of gray areas. While theoretical guidelines are important, the hands-on application of these guidelines is equally crucial. This is especially important as you scale, both in hiring content moderators and training models.
What are the limits and downsides of manual content moderation?
KL: The three main limitations are scale, flexibility, and response time. Scale: as your platform grows, it becomes increasingly difficult to hire and train people to match the pace of your growth. Flexibility: if you want to expand your business to Bulgaria, for example, you’ll have to quickly find people who can moderate content in Bulgarian. Response time: content can be posted any time of the day, which means your moderators must constantly be moderating in real time, despite their level of fatigue. Machine learning can address all three of these limitations.
JS: Scale is critical to this conversation. As long as a piece of abusive content is live, it can be screenshotted and shared; it can get caught up in the press cycle. That can do serious damage to your business and bottom line. Even if you do have a robust team of moderators, there’s a limit to their ability to respond to this content quickly, to their human capacity to review things without bias. That’s where machine learning comes into play; it has no such limitations. It scales with your business and catches abusive content before it happens.