Why is it difficult to receive feedback (even for AI)

By Iddo Aviram, Data Engineering Team Lead at ActiveFence

Labeled datasets are invaluable assets. There is a direct link between dataset quality and model performance. Efficient, robust and sensitive harmful content detection leverages numerous supervised models, so it is critical for us to have funnels that generate useful labels and make the most out of them. Up until here, I’m stating the obvious. However, it turns out that just like in human communications, asking for feedback, providing feedback and actually making good use of feedback is not at all easy in AI. In this blog post, I will survey some of our learnings from the difficulties we experienced in integrating feedback into our AI processes.

Feedback must be given about what the models actually predict.

While this may sound obvious – it isn’t.

At ActiveFence, we deal with Trust & Safety violations on different web-platform entities, including user, content and collection (which is ActiveFence’s abstraction for a ‘virtual place’ for online content). For example, a certain group in our client’s web platform, or a collection in our data ontology, can be detected as violative for hate speech discourse. This detection would then prompt a moderator to review the finding and act upon it and that would be recorded as feedback. 
In our initial approach, we thought it would be possible to simply record the moderation actions and use them as feedback. However, we quickly realized that even if a moderator takes action against this group – for example, sending a warning – that action may not exactly correspond to what our model is predicting. While the model predicts whether or not the group is hate-speech violative, we were basing feedback on a label that says whether or not an action was taken against this group. Actions might be good proxies for labels, but there are critical differences between them:

  1. Actions are taken at the item level, but the individual item may be detected with multiple violations with varying probabilities. Maybe, alongside hate speech, the item was also detected as a bullying violation or perhaps calls for violence? In this case, we won’t know how to associate an action to a violation and it cannot be considered a valid label to train models.
  2. Moreover, even if the item has been detected only for hate speech, the action may have been taken for a completely different violation – which we did not detect, meaning we completely mislabeled it. 
  3. While the group may be found indeed hate-speech violative, moderation action could be taken on a select set of users, and not the entire group, also resulting in a wrong label. 

Lessons learned: We learned that we should base our feedback on something that properly corresponds to our predictions, setting our client-facing product labels (or annotations) to values that are more explicit and unambiguous. We also learned to use the exact same data model for labeled data and machine learning (ML) prediction data: while one is machine-generated and the other human ground truth – they both convey the same thing. Of course, we have in the shared data model metadata fields that make annotations and predictions distinguishable.

Providing human feedback is tedious and might distract from the user’s primary goal. 

Implementing learnings from the previous passage, we introduced a thumbs-up or thumbs-down button to explicitly capture moderator feedback on whether or not any detected violation truly violates the client policy. 

Unfortunately, this elegant solution did not hold in the real world, as it was considered too tedious to use. True, the prospect of improved detection models is a good incentive for moderation teams, but the main concern of the individual user is to correctly and quickly enforce the policy. Any other buttons to mess with, especially ones that don’t directly apply to their main intention, add too much friction. The result: feature adoption falls and we miss important data that could improve our value proposition and the long-term effectiveness of the moderation teams.

Lessons Learned: The UX mechanism that requests feedback should be better embedded while still retaining its explicit nature and lack of ambiguity. Our solution involved a button that dismisses the item in terms of workflow and marks all violations as negative in one click. This technique of reducing UX friction while being very careful to maintain semantics can be applied for positive annotation as well. We are continuing to experiment with this kind of UX improvement.

Internal labeling teams are a critical complementary means to the end-user.

So far we have covered only ways to get true or false positive annotations. However, how can we get ground-truth labels for items that have never been identified by our models in the first place? Clearly, labels for negatives are critical to improving the recall of our models. But going back to the learnings above, asking our end-users to annotate samples of negatives will not go over well, given their focus on efficiency. 

Moreover, our ML models also include explainability features that break down the violation into indicators, for example, the appearance of a weapon or logo of a terror group is an indicator of the terrorism violation. These indicators rely on complex AI models which go beyond the binary label into frames and bounding boxes and have a great multitude of classes. End-users could not be expected to provide us with such complex labels in cases where our model didn’t pick them up. Not only do they not have the time to do so, they are also guided by their own platform’s policies – and not ActiveFence’s baseline policy.

Lessons learned: Instead of relying solely on our end-users for labeling, we have set up a robust, internal team of annotators to complement our end-user feedback loops. These teams manually tag data that is funneled to them, and they focus only on the labeling that falls under ActiveFence’s baseline-policy models or annotations which would create too much friction to native moderation workflows. 

We should strive to store feedback in a consistent manner.

Labeled data comes from different funnels. It may be coming from customer or internal teams using ActiveFence’s SaaS and its API. Sometimes, as hinted earlier, internal annotation teams use specialized annotation software. Furthermore, we collect labels from various additional sources, including partners, public-domain publications and resources, intelligence operations, and more. One challenge that stems from this is inconsistent formatting of data, which leads to a messy dataset and a lack of scalability.

Lessons learned: All labeled data should be stored in a consistent way: using the same format to represent ML predictions and human annotations. By keeping data sets in the same format, we are able to mix and match data, allowing one data set to complement one another in order to paint a whole picture. Using the right tools, will enable us to mechanically create dataset tables that have exactly the data we need for either training or evaluation of a model – allowing us to make good predictions. 

Bringing it all together

Over the years of building our ML models for identifying harmful content and working with content moderation teams, we’ve established a learning of what is – and isn’t – effective data feedback. Our AI, like many of us, needs to receive its feedback in an organized way, using proper, consistent modeling to get the message across. In order to get the feedback, we’ve also learned what we can and can’t ask of our users and more importantly – how to ask for it in order to get the best results. 

If you have found yourself in similar situations of inconsistent feedback, I hope that you’ve been able to learn from our mistakes of the past and have gotten some ideas on how to tackle the complexities of receiving feedback and maintaining labeled data assets.

Similar to the ML world, my team would also love to hear your feedback – no uniform datasets are needed this time. Please feel free to contact us directly.