Constructing and querying a data model for online harm- Part 1

By Ph.D. Matar Haller, VP Data & AI at ActiveFence

One of the biggest challenges facing online platforms today – and especially those with UGC (user-generated content) – is detecting harmful content and malicious behavior. Platform abuse poses brand and legal risks, decreases user trust and harms the user experience, and oftentimes represents a blurred line between online and offline harm. 

One of the reasons harmful content detection is so challenging is that it is a multidimensional problem. Items can be in various formats (video, text, image, and audio), any language, and violative in various ways, from extreme gore and hate to suggestive or ambiguous nudity or bullying. For example, white supremacy can be embedded in a first-person shooter game that enables the user to reenact the Christchurch mosque shooting from the shooter’s perspective. 

Content is also uploaded or shared by a myriad of users, some of which are trying to circumvent being banned. Attempts to hide malicious text can use leet speak, whereby numbers replace letters, for example, Ad0lf h1tL3R, or specific combinations of emojis, such as a fire next to a rainbow flag, or by embedding the malicious text in the image, thereby requiring image processing capabilities to detect

Content cannot be analyzed in isolation. Analyzing a single media type alone may misinform the overall assessment of risk, whether by incorrectly flagging benign material (false positives) or by missing harmful content (false negative). For example, a video demonstrating knife usage may be flagged as showing weapons and graphic violence, however when analyzed with the title and description of the clip, it becomes clear that the content is an instructional video meant for chefs.

Similarly, a title or an image taken in isolation may each individually be benign but, when combined, indicate support for Al Qaeda.

So if the world is complex and context is everywhere, how can we capture its complexities at scale and across platforms and violations? And how do we keep it simple so that we can build risk score models that are sensitive to context and excel in this complex and changing online world, where bad actors are continuously developing new ways to avoid detection?

Contextual risk score models must be supported by a data model which captures context. The risk score model does not analyze content in isolation but instead assesses risk by including the surrounding metadata in the analysis, for example, who posted the content, who commented on it, and what else they commented on. Videos get analyzed along with their titles and description, contributing to the probability that the item is violative. The data model must capture these complex relationships between users, the content they post, the groups they are in, and the content they interact with. Furthermore, contextual models must be supported by a model which is cross-platform. The data model must capture posts, tweets, and any other type of content in a unified way, to enable the risk model to be agnostic to how content is presented in any given platform and instead focus on analyzing the content itself.

In our next post, we will provide examples of mapping out the problem domain and explain how our data model enables us to detect harmful content at scale.

Read more in part 2 of the blog