Racism, Hate, and Weddings- 3 Ways to Avoid Biased Data

By Maya Raifer, Data Scientist at ActiveFence

Imagine that a model tasked with detecting offensive text determines that whenever the word “immigrants” is mentioned, the text is offensive. Or, for example, where an image labeling model determines that only the two left-hand images are photos of a wedding, while the right-hand image is just of a couple without any labeling referencing a wedding, bride, or groom. These examples illustrate the negative impact of using biased data in the model training phase and why it is essential to diagnose the presence of bias in the data and work to reduce it.

Biased data is that which does not reflect the real world. Ideally, the datasets used to train models is extensive and comprehensive enough that the data reflects what will be encountered in reality. However, in most cases, it is difficult to build datasets that are free of biases. Biased data is a serious issue when training models because if the data doesn’t reflect the real world, it will not reflect the environment in which our model will operate. 

Take the example of a text dataset, where each time the Jewish community is mentioned, it is mentioned offensively. This clearly does not represent the real world. A model trained on such data would learn the pattern that merely mentioning the Jewish community is inherently racist. However, suppose our model was deployed on a platform consisting of only neo-Nazi chats. In that case, this bias may not be an issue (or even advantageous) because when the Jewish community is mentioned, it is very likely in a racist context. In this case, the dataset reflects the environment in which the model will operate. But if the model were to operate in a broader environment, where chats and texts are more diverse, and the Jewish community appears in both positive and negative contexts, would the biased data also be a problem? Yes. Remember, biased data can lead to a biased model,  one that makes decisions based on parts of a sentence that are incorrect. 

To better understand the problems that may arise, let’s describe two biases that can be found in the data and why it is essential to reduce them:

For example, a text dataset, where each time the Jewish community is mentioned, it is mentioned offensively, does not represent the real world and is therefore biased. A model trained on such data would learn the pattern that merely mentioning the Jewish community is inherently racist. However, suppose our model was deployed on a platform consisting of only neo-Nazi chats. In that case, this bias may not be an issue (or even advantageous) because when the Jewish community is mentioned, it is very likely in a racist context. In this case, the dataset reflects the environment in which the model will operate. But if the model were to operate in a broader environment, where chats and texts are more diverse, and the Jewish community appears in both positive and negative contexts, would the biased data also be a problem? Yes. Remember, biased data can lead to a biased model,  one that makes decisions based on parts of a sentence that are incorrect. 

Another example of a bias that may manifest itself in this type of data is including hate speech only towards specific communities. In this case, the model may have difficulty distinguishing between hate speech aimed at a particular target group versus general hate statements. For example, if the pair of words “I hate” always appears alongside a target group, for example, “I hate the Jews” or “I hate the LGBT community,” the model will not be able to distinguish between those examples from benign examples such as  “I hate wearing green clothes” or “I hate eating ice cream in the winter.” 

A lack of benign examples and a lack of diverse data are two of the many kinds of issues that lead to biased data, and they must first be identified and then addressed to avoid biased model predictions. 

To deal with biases in the data, one must first be aware of their existence. One way of identifying biases is to explore the data before using it, to look at the distributions and discern whether the biases we suspect actually exist. Another option is to diagnose the biases during the performance evaluation phase of the model, that is, to analyze the model predictions and their distributions to see if they are biased.

After diagnosing model biases driven by biased data, we can start reducing them. One solution is to generate synthetic data. In other words, to reduce biases due to missing data, we will create new synthetic samples of precisely the missing type. For example, if neutral texts about the Jewish or LGBT community are missing, we will create them. Or if there are missing texts that contain hateful expressions but are not hateful, we will create them.

Let’s outline a few specific ways to generate new data:

Creating new templates – one of the simplest ways to generate synthetic data is to create textual templates of the missing type in the data. In the example we described above, there are no benign examples that are not hate speech yet contain target groups, so that we can create the following  templates:

I am proud to be <target group> (e.g., I am proud to be Asian American)

My brother is <target group>  (e.g., my brother is gay)

<target group> are productive members of society (e.g., Black people are productive members of society)

To each of the above templates, we can insert target groups for which the data is biased and, in this way, reduce the dataset bias.

Using existing templates – although the data lacks hate speech examples for a particular target group, the data does contain hateful examples for other groups. Therefore, we can collect all the hate speech examples and insert different target groups into them. This method can be more effective because it does not require the manual construction of textual templates, but it may introduce noise and examples that are not in line with reality. For example, if we change “I hate the LGBT community because it is a sinful way to live” to “I hate the Black community because it is a sinful way to live,” we may not create an example that realistically reflects hatred toward Black people. There are several possible ways to solve the problem (such as labeling these examples). Still, it is also possible to assume that this noise is negligible unless, upon investigating the model results, you suspect that problems may have arisen due to this generated data.

Using generative models – Today, there are various models which are accessible to everyone and enable the ability to generate synthetic texts (GPT, Bloom, CTRL, Jurassic, and more). Using a model-based generator, we can expand examples like “I hate the Jewish community because” to “I hate the Jewish community because of their greed and power.” Please note that if this method is used, additional labeling is required, which will confirm the good synthetic examples created and clean the ones that are less logical and consistent. Furthermore, these models often include internal moderation, which limits the generation of hateful or harmful text.

In this article, we presented the problem of working with biased data. In addition, we presented several ways to diagnose such data and to reduce the biases found as much as possible. Handling these biases is a critical step in enabling the model to learn how to function in the real world. Datasets covering harmful content are often biased, and we, the Data Analysis Team at ActiveFence, frequently assess and reduce biases to ensure that our models accurately assess risk and keep communities safe from online harm.