The talk of artificial intelligence tends to fall into two camps: that of an interconnectedness that streamlines all aspects of human life – or a dystopian technological singularity of the HAL 9000 type, where “Sorry, Dave. I’m afraid of that. that I can not, ”is the last thing we hear before the machines take over and make us meaty slaves.
Right now, we’re in the gray zone of prototypes, so some of our attempts at artificial intelligence are less than perfect. Take language prediction models. They are used in everything from Google searches to legal cases, but a new study conducted by researchers at China’s National University of Defense Technology and the University of Copenhagen shows that they have a systemic racial bias.
Deeply rooted technology
The language models under scrutiny were ELECTRA, GPT-2, BERT, DistilBERT, ALBERT and RoBERTa. If you’re wondering why so many are called ‘Bert’, they are all offshoots of the original ‘Bidirectional Encoder Representations from Transformers’ – a type of machine learning technique developed by Google in 2018.
To give an idea of how widespread these models are: By the end of 2019, BERT had been adopted by Google’s search engine in 70 languages. In 2020, the model was used in almost all English-language search queries. This is the technology that fills the gap in the search box when you type “Why am I so ___?”
The study in detail
The study measured the models’ performance differences across demographics in so-called English-language ‘fill-in-the-gap tests’. Since the cloze task is how BERT systems are trained, researchers were able to evaluate the models directly.
About 3,085 sentences were completed by 307 human subjects who were asked to fill in the most probable word based on their experience. They were sorted into 16 demographic groups by age, gender, education, and race. The ‘fairness’ of the language model’s response was measured by whether the risk of error across two demographic groups was roughly the same.
The results showed a systemic bias against young non-white male speakers. Older, white speakers were also poorly tuned. Not only do the models learn stereotypical associations, they also learn to speak more like some than like others – in this case white men under 40 years of age.
Why is it important?
We already know that BERT is an integral part of our online navigation system, so users who do not follow the models get odd results and options.
When GPT-2 was announced in February 2019 by San Francisco technology company OpenAI, James Vincent of The Verge described its authorship as “one of the most exciting examples yet” of language generation programs.
“Give it a fake headline and it will write the rest of the article, complete with fake quotes and statistics. Give it the first line of a short story and it will tell you what happens to your character next time,” he said. he.
The Guardian called it “plausible newspaper prose,” while reporters at Vox thought GPT-2 might be the technology that kicks them out of their jobs. A study from the University of Amsterdam even showed that some participants were unable to distinguish poems generated by GPT-2 from poems written by humans.
The result should be better training, the researchers at the University of Copenhagen argue, so that the models more accurately represent the diversity of users.
Source: The Nordic Page