Use of Copyrighted Works for AI Training Purposes is Exceedingly, Spectacularly, and Quintessentially Transformative

On June 23, 2025, the Northern District of California federal district court issued the first substantive district court decision regarding the intersection of copyright law and generative artificial intelligence. The case involved Anthropic’s Claude AI system, and a fair use defense related to the unauthorized use of copyrighted works to train large language models for use in Gen AI systems. The case is Bartz v. Anthropic PBC, Case 3:24-cv-05417-WHA (N.D.Ca.), and can be found here.

Key Takeaways from the Bartz Decision

- Use of lawfully acquired works for LLM training is “spectacularly” transformative, and qualifies as fair use of those works.
- Use of lawfully acquired hard copies of works to create a digitized library is fair use, as long as the hard copies were immediately
- Use of pirated works to create a digitized library is not fair use, and infringes the copyrights in those works. This is true even if a lawful copy of the previously pirated work is later acquired.

Not Decided

- Whether any AI-generated output from Claude infringes any of the plaintiff’s works (no such infringement alleged).
- Whether training LLMs on pirated works constitutes fair use.
- Whether copies made by the defendant from the digitized library copies qualify for fair use.

Background

Anthropic copied millions of works, with two primary goals: (1) to retain as part of a central library of all of the books in the world, and (2) to create data sets to train large language models (“LLMs”) for its “Claude” generative artificial intelligence system.

To accomplish these goals, Anthropic required a substantial volume of written text. Anthropic focused in particular on books like those written by the plaintiff authors because those books demonstrated a higher quality of creative expression than many other types of written works. Initially, it acquired copies of hundreds of thousands of copyrighted books from third party sites that had themselves acquired those works illegally (so-called “pirate” copies). Eventually, Anthropic acquired lawful copies of many of the previously pirated works, including all of the works owned by the plaintiffs. It digitized all of the works acquired, whether pirated or authorized.

In digitizing the works by plaintiffs and others, the court found that Anthropic copied the works in four ways: (1) a working copy of the central library version was created for data set; (2) each work was “cleaned” to remove certain repetitive or lower-value text (footers, page numbers, etc.), resulting in a “clean” copy; (3) each clean copy was translated into a “token” copy, where all characters were broken into short sequences and translated into corresponding number sequences (“tokens”); and (4) each fully trained LLM contained a “compressed” version of the works used to train the model.

The plaintiffs sued Anthropic for copyright infringement, on the grounds that it made copies of their copyrighted works without permission. Anthropic did not deny that copies were made, but denied infringement on fair use grounds. More specifically, Anthropic argued that its copying was in pursuit of the development of generative artificial intelligence, which was transformative. Anthropic moved for summary judgment on the fair use issue.

Fair Use Analysis

Fair use excuses an otherwise infringing use of a copyrighted work. 17 U.S.C. § 107. Four factors are relevant to the fair use inquiry: (1) the purpose and character of the defendant’s use of the copyrighted work; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the potential market for or value of the copyrighted work. The court applied each factor to each act of copying alleged by the plaintiffs to be infringing.

Purpose and Character of Use.

Use for LLM Training. This is the primary factor relied on by Anthropic. It argued that it used the copyrighted works to train its LLMs, and that such a use significantly “transformed” those works from literary works telling stories to readers to training materials allowing a machine to recognize writing styles, grammar rules, and other aspects of language, so that Claude could in turn respond to user prompts with rich, detailed, accurate and effective communication – essentially a human response. It noted that the output created by Claude had no resemblance to any of the plaintiff authors’ works.

The court agreed finding that Anthropic’s use for LLM training was “spectacularly” transformative. It rejected the plaintiff’s argument that using their works for LLM training was like using works to train any person to read and write. The court said that what the plaintiffs sought would be like charging a user each time it read, recalled or drew upon a book that the user had purchased. The court also rejected the argument that the LLM training was intended to memorize the creative elements of the copied works. The court focused on Claude’s output, noting that it merely “outputted grammar, composition, and style that the underlying LLM distilled from thousands of works”, and that “copyright does not extend to ‘methods of operation, concepts, principles . . . illustrated or embodied in a work.” 17 U.S.C. § 102(b). Finally, the court distinguished the recent case of Thomson Reuters Enter. Centre GMBH v. Ross Intell. Inc. 765 F.Supp.3d 382 (D. Del. 2025), on the grounds that that case involved a defendant’s unauthorized copying to create a directly competitive legal research tool.

The court summed up its “purpose and character” analysis regarding LLM training by noting that Anthropic used plaintiff’s copyrighted works “not to race ahead and replicate or supplement them – but to turn a hard corner and create something different.”

Use to Build Central Library. Anthropic argued that its copying related to the digital library was fair use because that library served a function in connection with LLM training. The court found this use to be fair, although it did so for reasons having nothing to do with LLM training. Instead, it held that merely replacing a physical copy of a work with a digital copy was fair use (a “mere format change”), reducing storage space and facilitating digital searching. As a result, “the digital copy should be treated just as if the purchased print copy had been placed in the central library.”

The court analogized to several earlier cases: Sony Corp. of Am. v. Universal City Studios, Inc., 464 U.S. 417 (1984) (taping copyrighted audiovisual works for later viewing was a “time shifting” fair use); Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) (digitally copying all published books to create a searchable database for reference and research was fair use); Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir. 2007) (use of visual artworks to create thumbnail images for searching was fair use). The court noted that this fair use finding had nothing to do with any eventual use of the works for LLM training.

Finally, the court rejected the argument that a defendant’s commercial use of a copyrighted work prevented a fair use finding under the first factor. First, it found that evidence of commercial use was “indicative, not dispositive”. Second, it dismissed the possibility that the authors could have charged Anthropic more money for digital than for print copies of their works, noting that neither the Constitution nor the Copyright Act contain any suggestion “that [the copyright owner’s] limited exclusive right should include a right to divide markets or a concomitant right to charge different purchasers different prices for the same book, [merely] say to increase or to maximize gain.” (citing Kirtsaeng v. John Wiley & Sons, Inc., 568 U.S. 519, 552 (2013)).

Nature of the Copyrighted Work.

Based on the plaintiffs’ description of their works as containing expressive elements and Anthropic’s acknowledgment that it selected those works for their expressive qualities, the court found that this factor points against a fair use finding for all works and all copies made by Anthropic.

Amount and Substantiality of Portion Used.

Use for LLM Training. All parties acknowledged that Anthropic copied the entirety of each work at issue in the case. However, the court directed the analysis to “the amount and substantiality of what is thereby made accessible to a public [in the purported secondary use] for which it may serve as a competing substitute [for the primary use].” (citing Authors Guild v. Google, 804 F.3d at 222). The court found that the lawsuit did not involve any allegation that copies made by Anthropic were made available to typical book users, and that “[t]he accused use here of the incremental copies is a orthogonal as can be imagined to the ordinary use of a book.”

The plaintiffs argued, and the court agreed, that Anthropic did not have to copy these specific books for its LLM training needs. However, given the massive number of works required for adequate LLM training, “using any one work for actually training LLMs was about as reasonable as [using] the next.” The court further found that, since no copies were provided by Anthropic to the public, “[w]hat was copied was therefore especially reasonable”.

Use to Build Central Library. Because the goal of the central library was to keep full copies of digitized works, “[c]opying the entire work was exactly what this purpose required.”

Effect on Market For or Value of Copyrighted Works.

Use for LLM Training. The plaintiffs argued that LLM training using their copyrighted works will result in “an explosion of works competing with their works.” However, the court dismissed any focus on generalized competition, stating that the plaintiffs may as well be arguing that “training schoolchildren to write well would result in an explosion of competing works.” This factor favored a fair use finding with respect to copying works for LLM training purposes.

Use to Build Central Library. Because Anthropic did not, and was not shown to be likely to, distribute unauthorized digital copies of plaintiffs’ works, the court found that this factor was neutral with respect to copies made to build a central library.

Use of Pirated Copies. The court rejected nearly all fair use claims related to pirated works, as long as those works could have been acquired lawfully. This is true regardless of the intended use of the works, and even if the defendant later acquires a lawful copy of the work in question. Essentially, the court found that once works had been pirated, all future uses were also infringing (“the person who copies the textbook from a pirate site has infringed already, full stop.”). Regarding the use of pirated copies to train LLMs, the court determined that relevant factual issues remained, and denied summary judgment.

Next Steps

- The court will hold a trial regarding the unauthorized copying of pirated works to create a central library. It will consider evidence of willfulness at trial.
- Summary judgment was denied with respect to the use of pirated works to train LLMs, leaving that issue for further factual development and eventual trial.
- Settlement negotiations may follow, given that each side was the winner on different key issues.

This article summarizes aspects of the law and does not constitute legal advice. For legal advice for your situation, you should contact an attorney.

Related Industries

Technology

Related Services

Related Resources

Digital Accessibility Requirements for Public ‎Entities

Court Rules AI Can’t Author a Copyrighted Work

Ideas & Insights

Successfully Subscribed!