Technology

How AI Copyright Law Works—and Why Courts Disagree

AI companies train models on billions of copyrighted works, sparking lawsuits and legislation worldwide. Here is how fair use, text-data mining rules, and landmark court rulings are shaping the legal battle over AI training data.

R
Redakcia
4 min read
Share
How AI Copyright Law Works—and Why Courts Disagree

The Billion-Dollar Question

Every major generative AI model—from ChatGPT to Claude to Midjourney—was trained on vast quantities of text, images, and code scraped from the internet. Much of that material is protected by copyright. Whether AI companies need permission to use it is now the most consequential intellectual-property question of the decade, with more than 50 lawsuits pending in U.S. courts alone and regulators on three continents writing new rules.

How Fair Use Applies to AI Training

In the United States, the legal debate centers on fair use, a doctrine that permits limited use of copyrighted material without the rights holder's consent. Courts weigh four factors when deciding whether a use qualifies:

  • Purpose and character — Is the new use "transformative," adding something different rather than substituting for the original? Courts have found that training a general-purpose AI model on a large, diverse dataset is highly transformative because the model learns patterns rather than reproducing specific works.
  • Nature of the original work — Highly creative or unpublished works receive stronger protection, making fair use harder to claim.
  • Amount used — AI training typically ingests entire works, which weighs against fair use, though courts have accepted that copying whole works can be necessary for a transformative purpose.
  • Market effect — If an AI's output competes with or replaces the original work, this factor cuts against fair use. The U.S. Copyright Office has noted that where licensing markets exist, unlicensed training is harder to justify.

No single factor is decisive. Each case turns on how the four interact, which is why judges have reached conflicting conclusions on nearly identical facts.

Landmark Rulings So Far

Three key U.S. decisions have begun to sketch the boundaries. In Bartz v. Anthropic, a federal judge ruled that training Claude on books was fair use because it was "quintessentially transformative"—but held that downloading pirated copies of those books was not. In Kadrey v. Meta, a different judge found fair use even though Meta obtained training books from pirated "shadow libraries." And in Thomson Reuters v. Ross Intelligence, the court rejected the fair use defense entirely, ruling that a competitor's use of copyrighted legal content to train its own AI crossed the line.

The highest-profile case—The New York Times v. OpenAI—is still in discovery. A judge has ordered OpenAI to hand over 20 million ChatGPT interaction logs, and a ruling on fair use is not expected before mid-2026 at the earliest.

How Europe Takes a Different Path

The European Union sidesteps fair use altogether. Under the 2019 Digital Single Market Directive, a text-and-data-mining (TDM) exception allows anyone to scrape lawfully accessible content—unless the rights holder explicitly opts out using machine-readable protocols such as robots.txt. The EU AI Act layers transparency on top: providers of general-purpose AI must publish a "sufficiently detailed summary" of their training data, including copyrighted content.

The United Kingdom, meanwhile, considered a broad TDM exemption with an opt-out mechanism but abandoned the plan in March 2026 after fierce opposition from the creative industries. The government said it would not legislate until it finds a solution that satisfies both AI developers and rights holders.

Why It Matters

The outcome will shape who profits from AI and who gets left behind. If courts broadly endorse fair use, AI companies can continue training on the open internet at little cost. If they do not, the industry will need licensing deals—potentially worth billions—with publishers, artists, and other creators. The U.S. Copyright Office has urged Congress to create "scalable mechanisms" for rights clearance, but legislation remains stalled.

For creators, the stakes are existential. Writers, visual artists, and musicians argue that uncompensated training devalues their work. AI companies counter that restricting training data would concentrate power among a few firms wealthy enough to negotiate licenses, slowing innovation for everyone.

With major rulings expected later this year and regulatory frameworks still in flux on both sides of the Atlantic, the legal architecture governing AI and copyright is being built in real time—one case, one statute, and one opt-out protocol at a time.

Stay updated!

Follow us on Facebook for the latest news and articles.

Follow us on Facebook

Related articles