Sun. Dec 22nd, 2024

2 authors say OpenAI ‘ingested’ their books to train ChatGPT. Now they’re suing, and a ‘wave’ of similar court cases may follow.<!-- wp:html --><p>Sam Altman, the CEO of OpenAI. The data collection practices for its ChatGPT are facing increased scrutiny from social media giants and artists alike.</p> <p class="copyright">JASON REDMOND/AFP via Getty Images; Jaap Arriens/NurPhoto via Getty Images</p> <p>OpenAI is again facing scrutiny over its data-collection practices. This time, it's from authors.<br /> Two writers are suing OpenAI, accusing the company of ingesting their books to train ChatGPT.<br /> A law professor anticipates more lawsuits involving copyright law and generative AI in the future. </p> <p>Two award-winning authors recently sued OpenAI, accusing the generative-AI bastion of violating copyright law by using their published books to train ChatGPT without their consent.</p> <p>Filed in late June, the lawsuit claims that ChatGPT's underlying large language model "ingested" the copyrighted work of the case's plaintiffs, authors Mona Awad and Paul Tremblay. They argue that ChatGPT's ability to produce detailed summaries of their works indicates their books were included in datasets used to train the technology.</p> <p>The suit is the latest example of tension between creatives and generative AI tools capable of producing text and images in seconds. Many workers in creative fields are concerned with how the fast-developing technology could impact their careers and livelihoods. And these concerns may increasingly manifest in legal challenges.</p> <p>Daniel Gervais, a law professor at Vanderbilt University, told Insider that the writers' lawsuit is one of a handful of copyright cases against generative AI tools nationwide. It won't be the last, he added. </p> <p>Gervais expects many more authors will sue companies developing large language models and generative AI as these programs advance and improve at replicating the style of writers and artists. He believes a deluge of legal challenges targeting the output of tools like ChatGPT nationwide is imminent.</p> <p>"This one is really about the input," Gervais said, speaking on the lawsuit's allegations around AI data-scraping and training. "The output wave is coming as well."</p> <p>Proving the authors in the case incurred monetary damages due to OpenAI's data-collection practices, like the complaint alleges, may be challenging. Gervais told Insider that ChatGPT may have gleaned Awad and Tremblay's work from alternative sources other than the source material from the authors, but that it was possible the bot "ingested" their books like the lawsuit claims.</p> <p>Andres Guadamuz, an expert in AI and copyright at the University of Sussex, echoed this concern, telling Insider that even if the books are in OpenAI's training datasets, the company could have obtained the work through the lawful collection of another dataset.</p> <p>And showing that ChatGPT would have behaved differently if it never scooped up the work of the authors is unlikely due to the vast amount of data it scrapes off the web, Guadamuz <a href="https://www.theguardian.com/books/2023/jul/05/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books" target="_blank" rel="noopener">told The Guardian.</a></p> <p>The Authors Guild, a US-based advocacy group that supports the working rights of writers, published <a href="https://actionnetwork.org/petitions/authors-guild-open-letter-to-generative-ai-leaders" target="_blank" rel="noopener">an open letter</a> last week calling on the chief executives of Big Tech and AI companies to "obtain permission" from writers to use their copyrighted work in training generative AI programs and "compensate writers fairly." The organization told Insider that its letter has garnered over 2,000 signatures.</p> <p>Awad and Tremblay's lawsuit was filed on the same day OpenAI received <a href="https://www.businessinsider.com/openai-chatgpt-generative-ai-stole-personal-data-lawsuit-children-medical-2023-6">another legal complaint</a>, alleging the company pilfered "massive amounts of personal data" that it later fed into ChatGPT. The 157-page complaint, which excluded the full names of the 16 plaintiffs, <a href="https://assets.bwbx.io/documents/users/iqjWHBFdfxIU/rIZH4FXwShJE/v0" target="_blank" rel="noopener">criticized</a> the company for absorbing "essentially every piece of data exchanged on the internet it could take."</p> <p>As for the Awad and Tremblay's lawsuit, filed in a district court in Northern California, the authors are seeking damages and the restitution of what they say are lost profits.</p> <p>The filing also presented <a href="https://llmlitigation.com/pdf/03223/tremblay-openai-complaint-exhibits.pdf" target="_blank" rel="noopener">documents</a> containing the ChatGPT-produced summaries of Awad's novels "13 Ways of Looking at a Fat Girl" and "Bunny," as well as Tremblay's "The Cabin at the End of the World." Tremblay's novel was adapted to the M. Night Shyamalan film "Knock at the Cabin."</p> <p>OpenAI and Awad did not respond to Insider's requests for comment. A representative for Tremblay declined to comment.</p> <div class="read-original">Read the original article on <a href="https://www.businessinsider.com/openai-copyright-lawsuit-authors-chatgpt-trained-on-books-2023-7">Business Insider</a></div><!-- /wp:html -->

Sam Altman, the CEO of OpenAI. The data collection practices for its ChatGPT are facing increased scrutiny from social media giants and artists alike.

OpenAI is again facing scrutiny over its data-collection practices. This time, it’s from authors.
Two writers are suing OpenAI, accusing the company of ingesting their books to train ChatGPT.
A law professor anticipates more lawsuits involving copyright law and generative AI in the future. 

Two award-winning authors recently sued OpenAI, accusing the generative-AI bastion of violating copyright law by using their published books to train ChatGPT without their consent.

Filed in late June, the lawsuit claims that ChatGPT’s underlying large language model “ingested” the copyrighted work of the case’s plaintiffs, authors Mona Awad and Paul Tremblay. They argue that ChatGPT’s ability to produce detailed summaries of their works indicates their books were included in datasets used to train the technology.

The suit is the latest example of tension between creatives and generative AI tools capable of producing text and images in seconds. Many workers in creative fields are concerned with how the fast-developing technology could impact their careers and livelihoods. And these concerns may increasingly manifest in legal challenges.

Daniel Gervais, a law professor at Vanderbilt University, told Insider that the writers’ lawsuit is one of a handful of copyright cases against generative AI tools nationwide. It won’t be the last, he added. 

Gervais expects many more authors will sue companies developing large language models and generative AI as these programs advance and improve at replicating the style of writers and artists. He believes a deluge of legal challenges targeting the output of tools like ChatGPT nationwide is imminent.

“This one is really about the input,” Gervais said, speaking on the lawsuit’s allegations around AI data-scraping and training. “The output wave is coming as well.”

Proving the authors in the case incurred monetary damages due to OpenAI’s data-collection practices, like the complaint alleges, may be challenging. Gervais told Insider that ChatGPT may have gleaned Awad and Tremblay’s work from alternative sources other than the source material from the authors, but that it was possible the bot “ingested” their books like the lawsuit claims.

Andres Guadamuz, an expert in AI and copyright at the University of Sussex, echoed this concern, telling Insider that even if the books are in OpenAI’s training datasets, the company could have obtained the work through the lawful collection of another dataset.

And showing that ChatGPT would have behaved differently if it never scooped up the work of the authors is unlikely due to the vast amount of data it scrapes off the web, Guadamuz told The Guardian.

The Authors Guild, a US-based advocacy group that supports the working rights of writers, published an open letter last week calling on the chief executives of Big Tech and AI companies to “obtain permission” from writers to use their copyrighted work in training generative AI programs and “compensate writers fairly.” The organization told Insider that its letter has garnered over 2,000 signatures.

Awad and Tremblay’s lawsuit was filed on the same day OpenAI received another legal complaint, alleging the company pilfered “massive amounts of personal data” that it later fed into ChatGPT. The 157-page complaint, which excluded the full names of the 16 plaintiffs, criticized the company for absorbing “essentially every piece of data exchanged on the internet it could take.”

As for the Awad and Tremblay’s lawsuit, filed in a district court in Northern California, the authors are seeking damages and the restitution of what they say are lost profits.

The filing also presented documents containing the ChatGPT-produced summaries of Awad’s novels “13 Ways of Looking at a Fat Girl” and “Bunny,” as well as Tremblay’s “The Cabin at the End of the World.” Tremblay’s novel was adapted to the M. Night Shyamalan film “Knock at the Cabin.”

OpenAI and Awad did not respond to Insider’s requests for comment. A representative for Tremblay declined to comment.

Read the original article on Business Insider

By