This is the property of the Daily Journal Corporation and fully protected by copyright. It is made available only to Daily Journal subscribers for personal or collaborative purposes and may not be distributed, reproduced, modified, stored or transferred without written permission. Please click "Reprint" to order presentation-ready copies to distribute to clients or use in commercial marketing materials or for permission to post on a website. and copyright (showing year of publication) at the bottom.

Intellectual Property,
Technology

Aug. 17, 2023

Do I infringe copyrights by using AI?

How did ChatGPT know that writing this article is not generally considered copyright infringement, and why did it get it wrong (in part) about the likely reason? Because it trained on information about copyright law and appears to have been incorrectly trained about free speech law.

Jeremy T. Elman

Partner, Duane Morris LLP

Phone: (650) 847-4162

Email: jelman@duanemorris.com

Lauren Silva

Legal Extern, Duane Morris LLP

We asked ChatGPT the question posed in this headline. It created a memo with a bunch of issues to consider. Does this article then infringe ChatGPT’s copyrighted work? ChatGPT said: “Writing an article about whether you might infringe ChatGPT’s copyright is generally not considered copyright infringement itself. You’re engaging in commentary, analysis, or discussion about a topic, which is typically protected as free speech under copyright law.”

Pretty close for an artificial general intelligence system (AGI). It’s not exactly “free speech” that is protecting us; it is probably going to be considered “fair use” because we are using our own analysis and adding information protected under copyright law, specifically 15 U.S.C. 107.

How did ChatGPT know that writing this article is not generally considered copyright infringement, and why did it get it wrong (in part) about the likely reason? Because it trained on information about copyright law and appears to have been incorrectly trained about free speech law.

Clients want to use these new and exciting tools but not unwittingly become copyright infringers or get the information wrong (like ChatGPT did). The authors are following the initial breadcrumbs of government guidance and from the courts, and learning from data sources like ChatGPT, to contribute to the discourse (and hope to be included in a future version of ChatGPT).

Machine learning requires learning from copyrightable works

The field of AGI to which copyright protection and infringement liability is most relevant is machine learning, a process in which the AGI program receives feedback and refines its underlying algorithm to improve the performance of its designed capability over time. During the machine learning process, the AI program uses artificial neural networks to extrapolate from large quantities of data, called “large language models” (LLMs) (including copyrightable works) and use those patterns to learn the constraints of the output it is expected to produce without being explicitly programmed to produce it.

In its comments to Congress, OpenAI (producers of ChatGPT) has argued that “[w]ell-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus.” Thus, OpenAI states, infringement “is an unlikely accidental outcome.” But the U.S. Patent and Trademark Office has asserted that the generative AI training process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” Generative Artificial Intelligence and Copyright Law, Congressional Research Service (Feb. 24, 2023), Generative Artificial Intelligence and Copyright Law (congress.gov). The training process explicitly involves the AI program making copies of the provided data sets in order to learn how to generate new works. Under Section 106 of United States copyright law, the creation of copies of existing protected work may infringe the copyright holders’ 106(1) exclusive right to make a reproduction of their work, as well as 106(2) right to prepare derivative works “based upon” the copyrighted work.

Difficult to prove substantial similarity by AGIs

Copyright holders are pushing courts to resolve whether these AGIs infringe copyrights. In a recent hearing in Andersen et al v. Stability AI, Midjourney, DeviantArt, No. 23-cv-201 (N.D. Cal. 2023), the Court was skeptical that copyright owners could prove that the alleged “derivative works” were “substantially similar” to the approximately five billion compressed images used to train defendant’s systems. The LLM does not reproduce or display the works in a traditional manner to humans; it uses an unknown number of the plaintiff’s works to learn as part of a huge database. Copyright infringement proceedings seek to punish those who “copy,” i.e., access a work to create something “substantially similar” to the work to harm the value of that work. It is really no different than these authors using ChatGPT and other sources to create this article.

As ChatGPT artlessly noted, an AGI is likely a fair user of the copyrightable works to train machine learning systems under the fair use factors: (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. 17 U.S.C. § 107.

There is a strong argument that organizations training generative AI programs are engaging in nonexpressive, fair use, since the intermediate copies of works to train AI systems are not the original object of human consumption. Therefore, the training data has been transformed in the process and the training data itself does not threaten the market for the original art, falling within fair use. AGI companies also point to the third factor, as to the quantity of the work used, arguing that the copies are not made available to the public and are only used to train, citing Authors Guild v. Google. 804 F.3d 202 (2d Cir. 2015), where the Second Circuit found that Google’s copying of books in their entirety, in order to create a searchable digital database that displayed only portions of the books to users, constituted fair use.

Conclusion

Courts and policy-making bodies are in their initial stages of resolving how to protect copyright owners from potential misuse of their valuable work, but it does not appear that claiming that AGIs’ mere use of LLMs to train is sufficient. Companies have strong arguments that training does not create a substantially similar work and is a fair use.

#374321


Submit your own column for publication to Diana Bosetti


For reprint rights or to order a copy of your photo:

Email jeremy@reprintpros.com for prices.
Direct dial: 949-702-5390

Send a letter to the editor:

Email: letters@dailyjournal.com