Artificial Stupidity

Computers aren’t bigoted—they’re just based on cold calculations, right?

The past two years have featured a steady drumbeat of problems with various artificial intelligence (AI) procedures, centered around a common theme: they produce the same kind of inaccurate bias and discrimination on racial, gender, ethnic, and other grounds that humans do when they’re not careful. Or worse.

More and more courts are using an AI product called COMPAS to predict whether persons convicted of crimes will offend again, and judges are relying on it in handing out sentences. But a thorough study by a third party of actual outcomes over time revealed that  COMPAS was heavily infected with racial bias. The errors it made in predicting that people would re-offend, when in fact they did not, were heavily skewed toward black people; the opposite errors, when people ended up back in court after the program predicted they wouldn’t, skewed heavily toward white folks.

Northpointe, the company that sells COMPAS, responded by saying that race wasn’t even one of the variables their AI looked at. Maybe not—but names were. Addresses were. Schools were. Something caused COMPAS to err in ways that dramatically favored white people over black people.

Part of the underlying problem here is that Northpointe won’t reveal how its algorithm works. That’s not unusual—most AI firms have similar policies. A deeper problem, though, is that Northpointe itself may not even know how its algorithm works. Many of the most advanced AI procedures are essentially “black boxes” that make no effort to explain how they reach their conclusions, with a decision-making process that cannot be retraced.

AI also suffers from the “GIGO” syndrome—Garbage In, Garbage Out. Russian  scientists with nothing better to do conducted an online AI beauty contest in 2016, with thousands of entrants from around the world submitting selfies to be judged against data sets measuring factors like facial symmetry to select the most beautiful faces. When the results were announced, forty-three of the forty-four winners had light skin. Why? Because when the scientists went back over their process, they discovered that the data sets contained only a small proportion of darker-skinned models. The machine therefore concluded that dark skin was an aberration to be dismissed.

In July, IBM proudly announced a new performance review program for its 380,000 employees worldwide that’s based on its “Watson” AI product. The company boasts that it won’t just measure how well each employee has done in the past—it will predict, with what IBM proclaims will be 96 percent accuracy, how well they’ll do in the future. I can’t say whether this will work or not. I can say, though, that if it’s based on a data set featuring “The kind of employees who have produced superior performance over IBM’s long and glorious history,” it’s liable to favor white males.

Facial recognition software is becoming ubiquitous. And much of it seems to work more accurately on (you guessed it) white faces, rather than darker ones. This isn’t happening because of the way light is reflected off lighter skin—it’s happening because the massive data dumps used to teach the systems have disproportionately featured white faces. Not having your face recognized by a computer could turn out to be a plus in some circumstances, but not when facial recognition starts to become an essential component of building or computer security systems.

When you apply in coming years for credit, or employment, or admission to school, or even to obtain medical treatment, you can expect AI to loom ever larger in decisions that profoundly affect your life. If you’re black and get rejected, will it be through some error or ineptitude of your own, or because the AI was built on skewed data? How will you know? Who will you yell at? I have yelled at any number of machines, with no discernible effect.

If anything, having a computer produce biased results is even worse than having a human produce biased results. We all know that humans are fallible, but there’s a tendency to over-rely on results produced by a machine. Computers aren’t bigoted—they’re just based on cold calculations, right? Would that it were so.

Fortunately, there is some good news here. It’s overwhelmingly the case that AI engineers are not trying to produce discriminatory, shoddy results—they are trying to get it right. One technique being pioneered by Google for getting it right is called “Path-Specific Counterfactual Fairness.” I don’t profess to understand how it works. Parmy Olson, writing for Forbes, described it as follows:

Computers can deem a judgement about a person as “fair” if it would have made the same judgement in an imaginary world where that person was in a different demographic group along unfair “pathways”—in other words, if in a parallel universe, a woman were actually a man, or a white man was actually black.

Facebook is also testing a tool called “Fairness Flow,” intended to achieve the same result. I hope these things work, or at least move down the path toward something that works.

Whatever tools are ultimately deployed, they’ll almost certainly raise the cost of AI. That’s fine—it’s worth it. Going back to the COMPAS  sentencing debacle, it is unquestionably unfair to the black convicts who are rated as greater risks than they should be—but it’s also unfair to everyone who will be victimized by the future crimes committed by white offenders let off too easily. As the early nineteenth-century French diplomat Charles Maurice de Talleyrand is alleged to have said: “It was worse than a crime. It was a blunder.”

Having AI developers tackle the problem voluntarily is good but not sufficient. We need more. Amnesty International, Human Rights Watch, and other organizations have endorsed a set of principles called the Toronto Declaration, subtitled “Protecting the rights to equality and non-discrimination in machine learning systems.” Governments and consumer-sensitive businesses need to pay heed, and to give AI systems that comply with the Toronto Declaration the same kind of preferences (or more) that buildings certified as energy-efficient receive. New York City recently enacted a transparency and fairness requirement for all AI systems used by the city, that some are touting as a model for the rest of the country. Humanists need to be fully aware of the ramifications of the AI bias phenomenon, because it’s likely to get a lot worse before it starts to get better.