ActiveFence R&D at PyData Conference

PyData Tel Aviv 2022

Check out the recorded session from PyData Global of Matar Haller and Noam Levy’s presentation on constructing & querying a data model to detect online harm. They discuss how the technology behind detecting harmful content online is multi-layered, as is content that users generate. A typical social post has text, an image, and interactions; each must be assessed against algorithms by the model to define a risk score that ranks harmful content. This data model supports trust and safety teams scaling their efforts to catch malicious content by calculating the probability of risk.

To build algorithms that analyze and detect this harmful activity at scale, we need a data model that can capture the complexities of this online ecosystem. In this talk, we will discuss how ActiveFence models the online content, media, creators, and users that interact with the content with likes, shares, or comments. Modeling the relationships between these items yields a complex connected graph, and to calculate a score that accurately reflects the probability of harm, we need to be able to query and access all of the relations of any given item. We will dive into the details of the complex and adversarial online space, the ActiveFence data model, and how we abstract the complexity of querying a graph-like data model using traditional SQL PySpark queries to provide maximum value to our algorithms.