Imagine you are a male nurse who recently submitted a resume on a recruiting website. What are the odds of a flurry of women’s product promotion emails hitting your inbox a few days later? New research on artificial intelligence implies that it’s highly possible.
Networking platforms like job sites, booking sites and dating sites store a massive amount of user profile data. These sites increasingly employ machine learning and natural language processing, a subset of artificial intelligence, to analyze information in order to better understand customers.
A popular framework that machine learning trains itself upon is word embeddings, where algorithms learn from a pool of existing text, establish associations between words and apply these associations in future tasks.
In the scenario above, however, the algorithm obviously mistook a male user for female, because it associated the word “nurse” in the user’s profile with female, based on previous learning.
Such inaccurate gender identification, when applied on a broader scale, can exaggerate the gender bias that exists in human language, a recent study by a group of scholars at Boston University finds.
“Word embedding, or any kind of machine learning, unless you tell it that this is a bad relationship [between words], it thinks that it’s the same as any other relationship,” Tolga Bolukbasi, a co-author of the research paper, told Observer.
The research shows that even the word embeddings trained using Google News articles, a presumably diverse data pool consisting of 3 million English words in 300 dimensions, exhibit gender stereotypes to a disturbing extent.
“One might have hoped that the Google News embedding would exhibit little gender bias because many of its authors are professional journalists,” researchers wrote in the paper.
For instance, as computer programmer is a highly male-dominant occupation and homemaker female-dominant, the word embeddings system will offensively answer the question “man is to computer programmer as woman is to x?” with “x=homemaker.”
Homemaker, nurse and receptionist are found to be the most female-skewed occupations, and maestro, skipper and protege are the most male-skewed.
To address the issue, Bolukbaki and his coworkers proposed a de-biasing algorithm to remove the bias in gender neutral words, such as computer programmer and homemaker, while maintaining appropriate gender associations such and queen and female, and king and male.
“That’s not a final solution, but it’s a version trying to mathematically remove bias to a certain extent,” Bolukbasi said.
“It’s hard to measure the immediate effect of machine learning today. Machine learning is more and more prevalent in different areas. If you don’t control that, you can get a lot of bad side effects,” he added.
While there is no evidence whether such bias-amplifying effect exists in languages other than English, Bolukbasi observed that Google somewhat has a preset bias when translating a foreign language. For example, when translating “He/She is a doctor” from Turkish, a gender-neutral language (In Turkish, the pronoun o refers to either a man or a woman), to English, Google Translation will automatically produce: “He is a doctor.”