Sarah Silverman, along with authors Richard Kadfrey and Christopher Golden, has filed separate lawsuits against Meta and OpenAI for copyright infringement. The authors allege that their copyrighted books were used without their consent as training material for OpenAI’s ChatGPT and Meta’s LLaMa, both of which utilize large language models (LLMs). LLMs are AI algorithms that are trained on vast amounts of text to learn language patterns and generate human-like text.
The lawsuits claim that the LLMs remix the copyrighted works of numerous authors without proper consent, compensation, or credit. This issue of copyright infringement has been a concern among critics of AI, particularly since the widespread availability of ChatGPT in November, which sparked a surge in generative AI and raised questions about its impact on creativity and the copyright process.
According to the lawsuits, the LLMs were trained on materials that were illegally obtained from “shadow library” websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The OpenAI suit estimates that the Books2 dataset used by OpenAI contains around 294,000 titles, which are primarily sourced from these shadow library websites. The Meta suit also references sources where the books’ training data was gathered, including Project Gutenberg for out-of-copyright books and the “Books3 section of ThePile” dataset available on Hugging Face.
The plaintiffs are represented by lawyers Joseph Savery and Matthew Butterick, who are also representing authors Mona Awad and Paul Tremblay in a separate lawsuit filed against OpenAI in June for copyright infringement.
The lawsuits highlight the ongoing debate surrounding AI and its impact on intellectual property rights. As AI technology continues to advance, it is crucial to address these concerns and ensure that proper consent and compensation are given to authors and creators whose works are used in training AI models.