I sat inside the photobooth-esque machine and stared into the camera as its algorithms ingested the curves and angles of my face and attempted to mirror my sentiment by drawing a picture reflecting my apparent mood.
I felt great, having just attended several fascinating speeches at an artificial intelligence conference, where I now found myself staring at this robotic artist. My forehead relaxed, I felt the muscles in my cheeks and mouth engage into a half-smile, as I waited for the machine to smile back in its own way. It buzzed politely at first, its arm oscillating back and forth, drawing jagged shapes on a piece of paper. The word “melancholy” appeared on screen and I thought, self-consciously, “maybe I’m not smiling enough.” So, I broadened into a toothy grin, eyes squinting.
In the age of artificial intelligence, sometimes machines get us wrong. For those of us in certain demographic groups, machines get us wrong more often.
Suddenly, the buzz of the machine turned into a loud whir, as it began scribbling frantically. “Melancholy” disappeared from the screen, as the word “Panicking” appeared with a number designating the estimated level of panic. Uncertain which of us was panicking, I forced a bigger smile, hoping to soothe the agitated gadget. The number on-screen rose several points as the machine gestured wildly, its pen moving back and forth on the page in fits. Now I was starting to panic. I slunk away awkwardly like a toddler who just spilled milk on the computer.
In the age of artificial intelligence, sometimes machines get us wrong. For those of us in certain demographic groups, machines get us wrong more often. I’m not sure why the machine miscalculated my mood that day, but it made me wonder: how many faces like mine were in the images that trained its artificial intelligence? The fewer faces like mine in the training data, the harder it would be for the machine to analyze mine.
As a data scientist, I spend a lot of time thinking about the datasets behind machine learning algorithms. At Viacom, we are taking important steps to ensure we build fairness into our data products and that the AI services we use represent the diversity of our fans. Too many AI products are built using biased training data. When this happens, a product may not work effectively (or sometimes at all) for groups who are underrepresented in the training data. This kind of AI bias may not even be apparent to developers until a product goes to market. By then, a brand’s image may be irreparably harmed.
AI Bias in Consumer Products
Nowadays, consumer products are regularly built with AI capabilities like computer vision. Often, however, these capabilities fail on users of specific demographic groups, excluding whole customer segments from benefiting from innovative product features. (Joy Buolamwini, a researcher from the MIT Media Lab, has written and spoken extensively about facial analysis algorithms, and their tendency to misclassify darker-skinned and female subjects at higher than average rates.)
Demographic bias has affected plenty of consumer product companies. In 2010, controversy arose surrounding a computer-vision capable camera with game console compatibility, when Gamespot reported that it had more trouble detecting the faces of darker-skinned testers than those with lighter skin. The company maintained that proper lighting would alleviate any challenges with the product’s face recognition features, encouraging users with similar problems to contact customer support.
Consumers of phones and cameras have complained of bias in the functionality of high profile products built with AI. In 2010, Time reported on a digital camera feature that detected images of people with eyes closed, automatically alerting users with the message “Did someone blink?” Users, overwhelmingly of Asian descent, reported too many false alerts of this kind. And as recently as last year, there were anecdotal reports of Asian customers experiencing problems with a phone brand’s facial recognition feature, designed to allow users to automatically unlock their phones. Some users raised concerns that the technology could not properly distinguish among faces of different Asian users.
Combating AI Bias in Product Development
Issues like these bring to light a critical point: AI may be dispassionate, but it is not unbiased. In fact, it often magnifies human bias. Where there may be an unintended human oversight, for example in neglecting to use a database of faces diverse enough to properly train a neural network, any resulting AI technologies will, by design, discriminate toward faces that look more like those included in the training data—and against those that look less like them.
This is not a trivial problem for a company like Viacom, where we leverage AI to develop innovative data products. We use computer vision technologies to more precisely measure audience attention. We use facial recognition to extract features for use in the predictive models powering the data products that price out our advertising units. We develop machine learning products that make our business decision-making, both creative and financial, smarter. Through all of these exciting innovations, it’s crucial to ensure our AI technologies account for the vast diversity of Viacom audiences and talent.
AI may be dispassionate, but it is not unbiased. In fact, it often magnifies human bias.
One way I proposed to do this is by auditing a few of the computer-vision-based services we use to determine how well they did at analyzing different kinds of faces, and the results were striking.
First, I identified a set of 600 images that represented a diversity of faces with different skin tones, facial features, and head shapes.1
Using this test dataset, I fed the images to each AI service and then measured how well each service did at tasks such as classifying a face’s gender or determining whether the image contained a face at all. If the service failed significantly more often on faces of a particular gender or race, then I could identify a potential bias in the AI and take steps to remedy it.
In one instance, I tested a service for how confident it was that a face was actually a face. There were clear differences in confidence levels among different categories of race and gender. The service was less confident in detecting female faces, especially those of black women.
This is Probably a Face: Average Group Confidence Scores by Gender and Race
I looked at the faces that the service was most confident in detecting (with confidence scores of close to 100%) and compared them to those faces the service was least confident in detecting (confidence scores of around 50%).
Here are the 10 face images with the highest confidence scores:
Top 10 Confidence Scores for Face Detection (Confidence Scores: 93.4% to 96.6%)
And here are the faces with the lowest confidence scores:
Bottom 10 Confidence Scores for Face Detection (Confidence Scores: 50.5% to 58.8%)
The service clearly had more trouble detecting female faces of color, suggesting an imbalance, perhaps even an absence, of images of black women represented in training the deep neural networks behind this service.
It was one thing to look at the raw data and discover disparate confidence scores. But when I looked at the actual faces behind the data points, I realized that these images are not unlike the faces that watch our programming every day on BET, VH1, and other Viacom networks. When we develop products using AI, it is imperative that we take audience members such as these into account.
5 Principles for Developing Ethical Algorithms
There’s a lot companies can do to combat bias in AI and protect the quality of their data products. Here are five tips:
- Audit black box AI services for failure rates on key subpopulations. Lots of AI services sold to the business community tout the magic of deep learning. But if it is unclear exactly how the service works, test it out using an external dataset containing a diverse set of data points. If certain subgroups fail more often than others, you should work with the developer on possible solutions.
- Ask hard questions, even of gold standard data sources and technologies. No matter how trusted the source, it’s important to have some understanding of the algorithms and data that created a final product.
- Pay close attention to the demographic makeup of a dataset. For companies that develop their machine learning models based on human-centric data, make sure the distributions of subgroups like race and gender are appropriate for the intended outcomes. The American Society for Engineering Education is one organization that is attempting to tackle AI bias at the ground level by creating open-source datasets that are less skewed on race and gender.
- Don’t assume that bigger data is better data. There’s a myth that big data, especially passive behavioral data, always leads to better models. However, if a large dataset does not exhibit appropriate representativeness, a smaller, more balanced dataset may be preferable.
- When hiring, think like a financier: manage risk through diversification. While it’s hard to admit, every individual has biases. If you’re building a data science team to leverage AI, a mix of talent from different backgrounds and expertise can make a difference. A diverse team of people, each with overlapping perspectives and different blind spots, can see better collectively, even if the computer vision fails.
1Ma, Correll, & Wittenbrink (2015). The Chicago Face Database: A Free Stimulus Set of Faces and Norming Data. Behavior Research Methods, 47, 1122-1135.