Questions about the ethics, transparency and regulation of artificial intelligence (A.I.) have proliferated among creatives ever since ChatGPT exploded into the marketplace in late 2022. Now, two best-selling authors are taking the A.I. chatbot’s creators to court over allegedly “ingesting” their novels and regurgitating summaries of them without their consent, credit or compensation.
In a lawsuit against OpenAI, Mona Awad, the writer behind Bunny and 13 Ways of Looking at a Fat Girl, and Paul Tremblay, the author of The Cabin at the End of the World, claim their novels were used to train the popular A.I. tool. OpenAI “relied on harvesting mass quantities of textual material from the public internet, including Plaintiffs’ books,” according to the complaint, filed in a San Francisco federal court.
Books are key to training language models like ChatGPT, said Awad and Tremblay, as they offer longform and high-quality writing for A.I. platforms to analyze. According to 2018 and 2020 papers from OpenAI, earlier iterations of OpenAI language models drew from datasets that included collections of around 7,000 and 357,000 digital books respectively. OpenAI has not disclosed what specific data was used to train ChatGPT.
Awad and Tremblay also allege that, when prompted, ChatGPT generates summaries of their respective books, “something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.” Both authors are currently based in Massachusetts. Margaret Atwood has described Awad as her literary heir, while Tremblay’s The Cabin at the End of the World was recently adapted into the M. Night Shyamalan-directed horror film Knock at the Cabin.
Copyright issues with A.I. over artwork, photographs and code
The renowned writers are the first to file a copyright lawsuit against the parent company of ChatGPT, joining a host of creators who have raised concerns about the regulation of A.I.-powered tools. In January, a trio of artists sued image-generating platforms Stability AI, DeviantArt and Midjourney for allegedly downloading billions of copyrighted images without consent or payment. Claiming their artwork has been used to train A.I. image generators to reproduce similar works, artists Sarah Anderson, Kelly McKernan and Karla Ortiz called the technology a “21st-century collage tool.”
Later that month, Getty Images announced it was taking legal action against Stability AI for supposedly downloading millions of the stock image supplier’s photographs. Craig Peters, CEO of Getty Images, drew parallels between the service and Napster, the free music-sharing platform which was shut down in the early 2000s after a slew of copyright lawsuits.
Creators of code have also taken issue with the powerful technology. In November of 2022, a proposed class action suit brought forward by programmers argued that OpenAI, Microsoft and GitHub violated copyright law through their creation of A.I. coding assistant GitHub Copilot, which allegedly scrapes public code without providing credit.
While no decisions have yet been made on these lawsuits, there is some clarity regarding how copyright law will operate in regard to A.I.-generated works. In February, the U.S. Copyright Office ruled that images used in a graphic novel that were generated by Midjourney cannot be granted copyright, as they were not created with enough human influence or control. While the novel’s author Kris Kashtanova was given copyright for her text and arrangement of the story, she lost the copyright registration for all aspects created via the A.I. image generator.
OpenAI did not respond to requests for comment.