Breaking down the USCO report on Generative AI Training and Re:Create’s “Non-Takeaways”

Take a Big Step Back: A Brief History of Fair Use & Why Everyone’s Talking About Fair Use and AI

Fair use is the legal right to use copyrighted material without permission from the copyright holder when the use serves the goals of copyright itself by enabling new creativity, new insight, and new knowledge. Fair use embraces uses that are serious or fun, personal or commercial, small or quite massive. A student can quote A Wrinkle in Time in a critical essay for school or record herself reacting to Taylor Swift’s surprise song at the Eras tour for social media, both thanks to fair use. A documentarian can include a fleeting glimpse of a copyrighted poster in her film and a university can digitize millions of books to create a new research tool, again both thanks to fair use. (See more examples in our fair use infographics!) The boundaries of fair use are defined by the courts, who weigh four key factors in their decisions: the purpose and character of the use (most importantly, whether the use is transformative), the nature of the work used, the amount and substantiality of the portion taken (and whether that amount is appropriate to a transformative purpose), and the effect of the use on the market.

In recent months, fair use has been the subject of many debates and discussions as AI companies and developers argue that they have a fair use right to train their models with copyrighted materials. Meanwhile, some publishers, record labels, and individual artists argue that in-copyright content can only be analyzed by AI models with express permission. The U.S. Copyright Office (USCO) joined the conversation last week when it released a pre-publication version of the final part of a three-part report series on AI and copyright.

The Copyright Office’s report reinvigorated the fair use conversation, and we agree with its main conclusion that no new copyright laws are needed to accommodate AI at this time. But several positions taken in the report depart from settled fair use law and should not be taken up by courts and others who want to properly apply fair use to AI training. These non-takeaways are detailed below.

On Transformative Use

According to the Supreme Court, transformative uses are “at the heart” of fair use because they use existing works as building blocks to give us new technologies, new art, and new knowledge, serving the ultimate purpose of copyright by enriching the public sphere. The report recognizes that some AI training is transformative, including the training of foundational models. However, it wrongly rejects several arguments that AI training is strongly transformative across a wide variety of use cases.

What the Report says…

“Some argue that the use of copyrighted works to train AI models is inherently transformative because it is not for expressive purposes. We view this argument as mistaken.” (pg. 47)

In reality…

“Non-expressive use” is a technical term with a precise definition and the Report just…ignores it. Professor Matthew Sag coined the term and defined it: “uses that involve copying, but don’t communicate the expressive aspects of the work to be read or otherwise enjoyed.” Sag has argued at length why such uses are transformative, and he was joined by several professors in comments explaining those arguments submitted to the Copyright Office. Rather than engage with these arguments, which focus on which aspects of the underlying works are communicated to the public, the Office observes that AI training requires the analysis of entire works (including their expressive elements) and that AI outputs can themselves be “expressive.” This response is a non-sequitur. The question isn’t whether the word “expressive” can be used in some way to describe aspects of AI training or AI use. It’s whether uses that are “non-expressive” in the specific way defined by Professor Sag are strongly transformative under current law. The Office simply dodges that question, rejecting the argument without responding to it.

What the Report says…

“Nor do we agree that AI training is inherently transformative because it is like human learning. To begin with, the analogy rests on a faulty premise, as fair use does not excuse all human acts done for the purpose of learning. A student could not rely on fair use to copy all the books at the library to facilitate personal education” (pg. 48)

In reality…

The report dodges the argument by analogy to human learning, rejecting an argument that no one made – that any copying that results in “learning” is transformative. What AI defenders have said is that copyright’s exclusive rights have never extended to the activities by which humans learn – reading, watching, listening – or to learning itself, that is, the process by which our human neural networks take unprotected facts, ideas, and generalizations from the works we read, watch, and hear. No human pays a license for the right to think about a book they read, or to remember facts, ideas, and even expressive elements they learned from it. Training an AI model is a similar process, and warrants similarly permissive treatment by copyright law.

What the Report says…

“Where a model is trained on specific types of works in order to produce content that shares the purpose of appealing to a particular audience, that use is, at best, modestly transformative.” (pg. 46)

In reality…

The report makes two mistakes. First, it ignores the massive difference between the purpose of the training data and the purpose of the model itself. A guitar is not a song, a typewriter is not a novel, and an AI model is not any of the things in its training data – it’s something new. Transformativeness is a question of whether the use results in the creation of “something new and important” (like a new mobile operating system, in Google v. Oracle) rather than something that is merely a copy or a derivative (like the Warhol portrait of Prince in Warhol v. Goldsmith). AI tools, including (and perhaps especially) the ones that enable further creativity, pass this test handily. Second, in comparing the outputs of models to their training data, the Office ignores a crucial fact about most outputs: they contain no protected expression from the training data. The creation of wholly new, non-infringing works does not warrant copyright scrutiny. The overlap in purpose between these works and AI training data is thus irrelevant to the transformativeness inquiry. The report simply gets this issue wrong.

Data Sourcing

AI model training requires massive amounts of data. Limiting access to training data has a direct impact on the functionality of AI tools, increasing the likelihood of bias and memorization, among other flaws. The report weighs in on the sourcing of data, putting a thumb on the scale for licensing in situations where copyright law itself does not.

What the Report says…

“In the Office’s view, the knowing use of a dataset that consists of pirated or illegally accessed works should weigh against fair use without being determinative.” (pg. 52)

In Reality…

Fair use is about how copyrighted material is used, not where it came from. Every fair use of a work is a use without authorization, so the fact that a copy is obtained without authorization isn’t especially important to the fair use calculus. Whether a particular copy of a work found online is lawfully made or published can be impossible to determine, and is irrelevant to new transformative purposes like AI training. Search engines, web archives, and internet scholars have depended on fair use to index, save, and study publicly accessible data in good faith for decades. There is no reason to subject AI training to an unprecedented and impossible new obligation to track down the pedigree of its training data.

What the Report Says…

“Copyright owners have a right to control access to their works, even if someone seeks to obtain them in order to make a fair use” (pg. 52)

“The use of pirated collections of copyrighted works to build a training library…would harm the market for access to those works.” (pg. 63)

In Reality…

Copyright does not include a right to control access to works. For example, anyone who owns a copy of a work is free to sell it or lend it without seeking authorization from the copyright holder, even though that transaction provides a new consumer with access to the work. Used bookstores, libraries, and even garage sales would all infringe copyright if it included a “right to control access to works.” At the same time, the impact of an AI model on the market for works depends on whether the model’s outputs contain copies of works in its training data in ordinary use cases. If not, the model’s impact will be zero. The question of whether the model developer itself must pay for access as part of the training process is a question of fair use. As the Copyright Act states explicitly, the rights that copyright does include (reproduction, distribution, public performance, etc.) are “subject to” the right of fair use, and the fair use of a work is not infringement “notwithstanding” those exclusive rights. So the fact that a particular use involves a right ordinarily reserved to the copyright holder can hardly be fatal, or even detrimental, to a fair use argument. Every fair use is, by definition, an unauthorized use.

Market Effects

What the Report says…

“There are instances, however, where the use of works in generative AI training can lead to a loss in sales.” (pg. 63)

In Reality…

Fair use does not require that there be zero market impact from the introduction of a fair use into the market. Not every loss in sales (or more accurately, potential loss) is relevant to fair use. Losses due to market competition from new, non-infringing works are not effects that count in the fair use analysis. When copying is an intermediate step in a process that ultimately results in the creation of new, non-infringing works, that intermediate copying is fair use. As the Ninth Circuit wrote in Sega v. Accolade, “an attempt to monopolize the market by making it impossible for others to compete runs counter to the statutory purpose of promoting creative expression and cannot constitute a strong equitable basis for resisting the invocation of the fair use doctrine.”

What the Report says…

“Where licensing markets are available to meet AI training needs, unlicensed uses will be disfavored under the fourth factor” (pg. 70)

“Where licensing options exist or are likely to be feasible, this consideration will disfavor fair use under the fourth factor.” (pg. 73)

In Reality…

By definition, fair use is the right to use a work without seeking a license. Merely offering a license is never sufficient to undermine a fair use argument, or even to weaken it. If it were, then fair use would quickly cease to exist. Transformative uses, in particular, are immune to erosion from the creation of licensing markets. As the Second Circuit has explained at length, “a copyright holder cannot prevent others from entering fair use markets merely ‘by developing or licensing a market for parody, news reporting, educational or other transformative uses of its own creative work.’” Similarly, “a publisher’s willingness to pay license fees for reproduction of images does not establish that the publisher may not, in the alternative, make fair use of those images.” So, too, in AI training—a company’s willingness to obtain a license for some of its training data does not amount to a surrender of its, or anyone else’s, fair use rights with respect to all data, or even all licensable data.

What the Report Says…

“While concerns about the effects of licensing on competition among AI companies should not be discounted, we do not believe they alter the fair use analysis. Licensing will always be easier for those with deeper pockets, and the more works to be licensed, the greater the effect.” (pg. 75)

In Reality…

If AI training were an ordinary exploitation of creative work, then it would be reasonable for the law to reward companies that devoted more resources to licensing. But since AI training is a new, transformative use, diverting resources to licensing is wasteful. Punishing companies who are not resourced to pay billions or trillions of dollars in license fees suppresses innovation with no positive return to society. It’s a drag on competition similar to the drag created by corruption and bureaucratic red tape, a barrier to entry with no public benefit. Since promoting the progress of science is the very reason copyright law was created, it is perverse for copyright to be a barrier to progress in this context. And because AI tools enable downstream creative and innovative activity, licensing requirements would create a barrier to entry not just for AI developers, but also downstream in the creative and innovative markets where AI users would operate.

While Re:Create appreciates the diligent work of the Copyright Office on the three parts of the AI and copyright reports, the misunderstandings of fair use in its latest report are concerning. The report makes several valid points about fair use, but at key points its arguments simply do not hold up. We urge the Courts, who have the ultimate decision on what is and is not fair use, to avoid these “non-takeways” in order to safeguard creativity, innovation, and the freedom of expression.

###

About Re:Create: Re:Create is a coalition comprised of a broad membership of think tanks, advocacy organizations, libraries, technology companies – large and small – that serves as the leading coalition united in the fight for a balanced copyright system that is pro-innovation, pro-creator, and pro-consumer. Not every member of the Re:Create Coalition necessarily agrees on every issue, but the views we express represent the consensus among the bulk of our membership.