From content moderation to proactive investigations: Empowering platforms for a safer online environment

Article Designs2

Content moderation - a great first line of defense, but limited

Trust and Safety workflows and processes are almost everywhere we look online. Amazingly, most of the time we aren’t even aware of them when implemented effectively.

T&S analysts are present and working their magic in companies, organizations and platforms globally and across internal verticals. T&S analysts are on the frontlines of platforms – making content moderation decisions, providing data and insights for policy development, creating safeguards and – more and more often – investigating bad actors.

Bad actors are a headache for T&S analysts and platforms. For most threat vectors, a small core of bad actors are responsible for most malign activity on a given platform, such as misinformation or extremist content.

Content moderation can’t solve these problems hermetically. Imperfections in algorithms, policy failings and more let some malign and violative activity through, and even automated enforcement action against caught bad actors can be circumvented. Effective, automated content moderation is crucial for protecting platforms from content-based threats, as that’s what it can detect and action.

Bad actors that are banned, shadowbanned, suspended or otherwise enforced can create new accounts from new IP addresses with fake names, email addresses and more to circumvent platform T&S mechanisms.

However, taking down repeat or serious offender accounts doesn’t take down any of their supporting infrastructure – be they backup entities, amplifying accounts or otherwise, allowing bad actors to easily return to platforms. Essentially, content moderation is a great first line of defense but it’s limited in what it can offer in terms of medium or long-term impact.

Moderation almost never leads to any off-platform actioning activity for bad actors who, at worst, have their account, IP address and other identifiers blocked or banned – as the volume required to action the number of suspect accounts precludes further action.

This is problematic, especially considering a small core of networks are responsible for most on-platform malign activity. Platforms also have access to closed-source information, such as user IP addresses, email addresses and phone numbers and other information used for registration – all of which could be utilized in the service of carrying out investigations and escalating enhanced and actionable findings to law enforcement.

Content moderation also can’t solve the problem of covert violative activity. Meaning, any activity that isn’t flagged as being offensive, violative of a policy or illegal, can’t be effectively caught at scale by content moderation systems.


The importance of deep investigations

Carrying out deep investigations however, can lead to high-impact results over time by uncovering networks of bad actors and exposing their TTPs (Tactics, Techniques & Procedures) to both fill policy gaps, update automated enforcement systems and educate users.

A leading example of this is the inauthentic review industry.

A recent deep-dive investigation showed how real people can make thousands of dollars a month posting fake reviews on online career platforms and marketplaces while attempting to outfox T&S teams. Many services also offer negative review takedown services for brand protection on platforms.

These fake reviews can be detected to a limited extent by tracking posting patterns and metadata for flagging by automated systems, but threat actors will always adapt accordingly and operate in ways unforeseen by T&S teams.

As the author showed, carrying out these investigations and finding the underlying actors responsible for platform exploitation and manipulation can also empower firms to utilize other resources at their disposal – be it legal recourse against bad actors, PR campaigns, adjusting policy and even escalation to law enforcement for serious or illegal cases.

Trust and Safety investigation capabilities are only going to become exponentially more important due to the advent of generative AI. Content moderation systems today are already adapting to the new, generative AI reality and the multiformat content boom that it has created.

These systems are still in the early stages of being able to reliably detect AI-generated content, but this isn’t a flawless approach also. Different generative AI algorithms and tools create different styles and types of images, videos and text – often without their own internal T&S safeguards such as watermarks.

Being able to investigate the malicious use of generative AI for myriad threat vectors will be crucial for platforms to be able to stay ahead of the curve, and further develop and train their own protective and proactive content moderation systems.

Generative AI can also be quite difficult to attribute due to the nature of its generation and the public - and often free - access given to these tools. Oftentimes, the best way to investigate the malign use of generative AI is by looking at it indirectly: meaning, investigating the online assets and networks that interact with and promote the malign generative AI content.

The importance of these capabilities became apparent in late May 2023.

A fake, presumably AI-generated photo of an explosion at the Pentagon was shared by a Twitter-Blue verified account impersonating Bloomberg news. The prevailing theory is that the image was generated by a text-to-image AI generative tool, but it’s still unclear.

The tweet was shared by Russian state media and spread rapidly online, even causing a brief dip in the stock market on the same day. We may never be able to tell who created or generated the image, but by investigating networks promoting it online, we can uncover clues and evidence of potential foreign malign disinformation networks.

The future of proactive T&S departments in industry is clear. Further investing in in-house investigative capabilities and tools for analysis is the order of the day, especially as even more regulation looms for online activity – mandating investigations, escalation to law enforcement and more.

Falkor provides one place for every element of your trust and safety investigations. Analyze data, share insights and track the course of your investigation from one centralized analyst platform. Ensure the integrity of your platform, services, and user base by creating strong and proactive investigative workflows.

More resources