40635
post-template-default,single,single-post,postid-40635,single-format-standard,stockholm-core-2.4,qodef-qi--no-touch,qi-addons-for-elementor-1.6.7,select-theme-ver-9.5,ajax_fade,page_not_loaded,smooth_scroll,,qode_menu_,wpb-js-composer js-comp-ver-7.9,vc_responsive,elementor-default,elementor-kit-38031
Title Image

The Gray Lady Takes on the Silicon Bros: The New York Times v. OpenAI, Microsoft

The Gray Lady Takes on the Silicon Bros: The New York Times v. OpenAI, Microsoft

Building on the momentum of half a dozen class action lawsuits filed against OpenAI by writers, The New York Times has taken up its own fight against both OpenAI and its partner, Microsoft. [1] Shortly before 2023 drew to an end, The Times filed a lawsuit against the Silicon Valley giants, alleging copyright infringement, unfair competition, and trademark dilution.[2] In addition to seeking monetary damages, The Times wants all GPT or other large language models and training sets that include its copyrighted work destroyed.[3]

For its part, OpenAI said it was surprised and disappointed by the suit, having only learned about the filing by reading The New York Times, despite the fact it had been in “partnership” negotiations with The Times since April 2023.[4] With that, OpenAI claims The Times’s lawsuit is without merit, but still hopes to forge a partnership with the newspaper publication.[5] In 2023, OpenAI established partnerships with the Associated Press and German mass media company Axel Springer.[6]

Fair Use Doctrine––A Legal Limitation on Copyrighted Materials

Through a blog post, OpenAI indicates it will likely argue that its use of The Times’s content falls within the “fair use” exception for copyrighted works.[7] This judicial doctrine, preserved in statutory law, is a well-established defense that limits the exclusive right of copyright owners by allowing use of copyrighted material for specific purposes, such as criticism, comment, news reporting, teaching, scholarship, or research.[8] A four-factor analysis is applied in determining whether fair use can be applied.[9] Those factors are: (1) the purpose and character of use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use on the market or value of the copyrighted work.[10]

The New York Times says:

The Times claims that OpenAI and its partner Microsoft trained GPT models on datasets of scraped content containing millions of copyrighted pieces from the publisher, such as news articles, investigations, and commentary.[11] The Times has registered the copyright for these pieces in its print edition on a daily basis for over 100 years.[12] Furthermore, The Times claims that GPT series of large language models memorized its content and produced “near-verbatim copies of significant portions” of that content when prompted.[13] To support the latter claim, The Times provided 100 examples of GPT-4 allegedly memorizing content from the publication.[14]

The Times additionally argues that OpenAI and Microsoft are using this unauthorized copyright material for commercial purposes and are significantly profiting, while threatening to hurt The Times’s bottom line and potentially hurt its ability to continue to fund its journalism.[15] The Times’s complaint noted that as of August 2023, OpenAI was on track to make more than $1 billion in revenue over the next year, while Microsoft’s Bing engine, powered by GPT-4, had “reached 100 million daily users for the first time in its 14-year history.”[16] Meanwhile, The Times asserts that if people can access its content by way of OpenAI and Microsoft without having to pay for it though The Times’s paywall, they will.[17] Ultimately, The Times claims that OpenAI’s unauthorized use of its content could divert readers, reduce the number of current and potential subscribers and, ultimately, reduce “subscription, advertising, licensing, and affiliated revenues.”[18].

OpenAI says:

In addition to claiming its use of The Times’s content is legal by way of the fair use doctrine, OpenAI addresses a portion of The Times’s claims with two notes. First, OpenAI holds that the memorized copies of The Times’s content (or as it calls it “regurgitation”) are a “rare bug” that they are working to eliminate.[19] It contends that “memorization is a rare failure” but that it does have measures in place to limit this “inadvertent memorization and prevent regurgitation.”[20] Furthermore, OpenAI claims that The Times refused to share any examples of the regurgitated content in their effort to address the issue. OpenAI also claims “the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites.”[21] It asserts that The Times “intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the ways The New York Times insinuates.” OpenAI further asserts that it “suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”[22] Furthermore, OpenAI says it provides an “opt-out” option for publishers if they don’t want OpenAI to access their tools, which it claims The Times employed in August 2023.[23]

Transformative Use

OpenAI will likely assert that its use of The Times’s content is fair use due to the transformative purpose of its tool. In Authors Guild v. Google, a landmark copyright lawsuit, the Second Circuit court noted that “transformative use is one that communicates something new and different from the original or expands its utility, thus serving copyright’s overall objective of contributing to public knowledge.”[24]

The precedent established by Authors Guild could bolster the argument that copyrighted material may be used to train large language models for GPT chatbot use.

In that class action lawsuit, the Authors Guild and individual authors sued Google for copyright infringement, claiming Google’s digitization of entire books for its Library Project and Google Books project violated their copyrights because it was not transformative use, it provided a substitute for the author’s works, and its purpose was commercial and profit driven.[25] The Second Circuit court, however, ruled in favor of Google, holding that its “digital copy to provide a search function is a transformative use, which augments public knowledge by making available information about Plaintiff’s books without providing the public with a substantial substitute.” (Emphasis included in the court’s opinion.)[26] Additionally, the court held that Google’s profit motivation did not outweigh its fair use.[27]

While promising for OpenAI and Microsoft, this case is far from cut-and-dried. The authors of the paper, Foundation Models and Fair Use, note that fair use is not guaranteed if the output is similar to the copyrighted data and it impacts the market of that data.[28] “When the downstream product based on such a model is not transformative . . . courts may decide that generated content, the model deployment, and even potentially the model parameters themselves are not covered by fair use.”[29] On the other hand, the authors acknowledge that transformativeness can be in the form of “the outputs themselves, but the purpose of the machine learning model could [also] be transformative.”[30]

No Clear Winner

Ultimately, it benefits both parties and the public if the parties could reach the licensing deal they’d been previously negotiating. If a court sided with OpenAI and Microsoft, independent news organizations could be slowly squeezed out of business due to financial losses. If that happened, the large language models would have less quality content to learn from, ultimately hurting public knowledge. On the other hand, if The Times were successful in having all GPT or other large language models and training sets that included its copyrighted work destroyed, public knowledge could ultimately suffer due to the potentially severe limitation on content needed to train those models. An out-of-court settlement may be the best path to avoiding a lose-lose scenario for both the parties and the public at large.

Footnotes[+]

Cinnamon St. John

Cinnamon St. John is a second-year J.D. candidate at Fordham University School of Law and a staff member of the Intellectual Property, Media & Entertainment Law Journal. She holds a BA in Political Science from DePaul University, an MA in International Peace & Security from King’s College London, and an MPA from the Robert F. Wagner Graduate School of Public Service at New York University.