A.I. Can Be a Double-Edged Sword For Language Learners—Columbia Profs Discuss Why

University educators are carefully embracing A.I. in classrooms, despite the many imperfections of A.I. models and tools to detect their use.

Columbia University campus.
Students walk on the Columbia University campus in New York City. Mario Tama/Getty Images

In a study published in May, James Zou, a professor of Biomedical Data Science at Stanford University, found that multiple artificial intelligence (A.I.) detectors showed disproportional false positives when analyzing text created by non-native English speakers. The study, set out to examine how reliable A.I. detectors are, recorded the rates at which seven different detectors would incorrectly label authentic writing as A.I.-generated, using a mix of essays written by U.S.-born eighth-graders and non-native speakers for the Test of English as a Foreign Language (TOEFL). The result: While the detectors reliably recognized text written by native English speakers, 61 percent of TOEFL essays written by humans were incorrectly labeled as A.I.-generated.

Sign Up For Our Daily Newsletter

By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime.

See all of our newsletters

Zou’s study highlights an enormous challenge as A.I. technology entrenches itself in our education system. The ineffectiveness of A.I. detectors means there’s a risk of falsely accusing students of plagiarism, especially those learning a foreign language. And more importantly, as large language models (LLMs) rapidly advance and the tools to detect their use prove unreliable, how can teachers prevent students from using A.I. to cheat in homework and exams?

“It’s impossible to detect if someone is actually using it,” Alex Bowers, a professor of Education Leadership at Teachers College, Columbia University, told Observer. “I would encourage educators to be especially cautious right now if they think a student has submitted an assignment that’s written by A.I. That student might not be a native English speaker. Maybe they’ve used an LLM, or maybe they’re just a derivative writer.”

Bower’s colleague at Teachers College, Erik Voss, who specializes in linguistics, language assessment and teaching English, believes the main issue with current A.I. detectors is that there’s no way of knowing the true extent to which A.I. was used in a writing assignment.

While many people associate A.I. with image and text generators, common software programs like Microsoft Word already come with automated writing assistance tools, and yet such technology is not as heavily scrutinized as A.I. chatbots, Voss said.

“This is a question for the assessment,” he added. “When we’re teaching in a language learning class, we want to see the progress a student is making and their writing ability. If it’s heavily assisted, it’s difficult to see what their true ability is, as a human writer.”

Voss believes A.I. development will force educators to reconsider the purpose of exams and writing assignments. “It’s to assess a human’s ability to produce writing, their ability to think and convey a message. If we allow assistance with A.I., what direction is the world moving?” he asked.

That said, Voss and Bowers both believe educators should work with students to approach A.I. in a productive way that can accommodate their educational needs.

Bowers teaches classes preparing future educators and school leaders to effectively use school data to address students’ needs. He allows A.I. in his classroom and even runs workshops on how to effectively prompt A.I. chatbots in informative and engaging ways. However, he constantly reminds students of the inherent biases present in current A.I. models.

“The models are trained on mostly the Western internet,” Bowers said. “They’re built around English as a primary language. And in doing so they’ve ingested the biases of the Western internet. Without specific prompting, you can easily have ChatGPT, Bard or Claude replicate the biases that you see in writing.”

Bowers has noticed that inherent biases are common in A.I. models when presenting people in power and gender. For example, if a user prompts a chatbot to describe a profession such as a doctor or an athlete, it frequently assigns the stereotypical gender to that profession. It may not be apparently problematic at first, but relying on A.I. models without acknowledging and addressing the biases in the data could reinforce those stereotypes, he warned.

Voss said he asks all his students be transparent about how they incorporate A.I. in their work. “My approach is to collaboratively find a way to use these A.I. tools to improve our cognition and our learning,” he told Observer.

Bowers noted that, ironically, while A.I. detectors have issues with text generated by non-native English speakers, A.I. itself could be an effective tool for learning a foreign language. “For students learning a language, they’re great,” he said. “The models can really help you understand what’s a better choice of words in another language, and they can be tutors in different ways.”

Ultimately, educators are going to have a wide array of perspectives on the use of A.I. in classrooms, Voss said, and it’s only after serious discussion about how the technology can be used responsibly and what value A.I. assistance actually brings that we can move forward.

A.I. Can Be a Double-Edged Sword For Language Learners—Columbia Profs Discuss Why