Copyright ends the data rush: No mining for AI chatbots

Getting your Trinity Audio player ready...

The Regional Court of Munich I recently issued a landmark decision in the GEMA vs. OpenAI case on the use of copyright-protected content in AI training. This article takes a closer look at the case and considers parallel developments in France, the UK and the EU.

1. GEMA vs. OpenAI – Is OpenAI liable for copyright infringement?

According to the Court, the answer in this case is “Yes”.

So, what happened? The German collecting society GEMA brought an action against OpenAI after its chatbot, ChatGPT, reproduced lyrics from well-known German songs almost verbatim when prompted by users.
OpenAI argued that the language models did not store nor copy specific training data but rather reflected in their parameters what they had learned based on the entire training data set. Since the outputs were only generated as a result of user input (prompts), it was not the defendants but the respective user who was responsible for the output as its producer. In any case, any legal infringements were covered by the limitations of copyright law, in particular the limitation based on so-called text and data mining.

The Court, however, held that this amounted to copyright-relevant memorisation, i.e., a form of reproduction permitted only to authors and licensees, therefore essentially granting the asserted claims for injunctive relief, disclosure, and damages.

According to the Court, the outputs proved that the lyrics of the songs were stored and remained reproducible in the previous versions of ChatGPT (4 and 4o). This storage makes the works indirectly perceptible – sufficient to qualify as reproduction under copyright law, which is reserved exclusively for the author or a licensee.

No Text or Data Mining Exception

Memorisation therefore falls outside the text and data mining exception introduced in the German Copyright Act (§ 44b UrhG) under the Digital Single Market (DSM) Directive. This exception applies only to reproductions required to convert training data into other digital formats for analytical purposes, such as detecting patterns, trends, or correlations. Since memorisation involves storage and directly interferes with exploitation rights, both direct and analogous application of the exception are excluded.

Finally, the Court clarified that responsibility for memorisation rests with chatbot operators, who control both the system architecture and the selection of training data -not with end users.

2. France

As in Germany, the DSM Directive has been transposed into French law. Regarding the text and data mining exception in particular, Articles L. 122-5 et seq. of the French IP code provide that such exception applies in two cases:

Either solely for the purposes of scientific research by research organisations, some listed institutions, or on their behalf and at their request by other persons, including in the context of a non-profit partnership with private person, in which case no authorisation from the authors is required. However, this exception does not apply if a company, shareholder or partner of the organisation or institution conducting the research has privileged access to the results.

Or for any other purpose, provided that the author has not opted out.

To date, the French courts did not yet have to deal with a similar case to GEMA vs. OpenAI, and the French text and data mining exception therefore remains to be specified. However, a first decision could be handed down soon, as several authors’ unions have filed a copyright infringement action with the Tribunal judiciaire de Paris, accusing Meta of using their works posted on the Book3 database to train its LlaMA AI model.

3. The UK Getty Images ./. StabilityAI judgment – secondary copyright infringement

Only a week prior to the Munich Regional Court judgment, the High Court in London rejected secondary copyright infringement through AI training in the Getty Images v. Stability AI case.

The dispute centred on whether using Getty-owned images to train the AI model Stable Diffusion infringed UK copyright law. Getty argued that the model weight files generated during training qualified as an “article” under copyright law and that creating them effectively amounted to importing such an “article” into the UK, constituting secondary infringement. These arguments were based on Sections 22 and 23 of the Copyright, Designs and Patents Act 1988 (CDPA), which govern the import and possession of infringing copies.

Intangible “article” under the CDPA 1988

The Court clarified that, under the CDPA, copyright-relevant storage of works in any medium extends to intangible formats. This means that the rules on importation and possession of copyright-protected works generally apply to intangible “articles” as well, including the model weight files for Stable Diffusion.

However, the Court emphasised that infringement still requires an actual copy of a work, regardless of whether the medium is tangible or intangible. In this case, it concluded that Stable Diffusion is not an infringing copy, as it is able to learn patterns and features from training data without copying or storing any of the original works.

Key Difference to ChatGPT

Unlike the claim against ChatGPT in the GEMA/OpenAI case in Germany, Getty Images’ UK claim did not assert that the various model weights reproduced any particular copyright work, nor was there any claim that the model had ‘memorised’ any such work as a result of overfitting. This was despite agreement between the experts that the model could produce “images that are near identical (a memorized image)”. Ultimately, the UK Court did not therefore need to address whether memorisation amounted to infringement under the UK Copyright Designs and Patents Act 1988. It is likely that the issue will be explored further in subsequent cases.

4. The CJEU and the Google Gemini case

The debate over generative AI and copyright has even reached the Court of Justice of the European Union (CJEU), which for the first time has been asked by a Hungarian court to rule on whether chatbot responses that incorporate copyright-protected content are permissible.

A Hungarian publisher accuses Google Ireland Ltd, which operates the Gemini chatbot, of generating responses containing passages from its news articles without prior authorization.

When prompted by a user, the chatbot generated the summary of an article, which the Hungarian publisher alleges included substantial elements of the original content. According to the publisher, these excerpts go beyond simple authorized quotation and constitute reproduction and “communication to the public” by displaying textual passages from this protected content in the chatbot’s responses.

Here, the legal issue was whether the responses produced by generative AI amount to acts of reproduction and of communication to the public, infringing EU copyright law.

No new audience, no information retrieval system

Google argues that Gemini does not “copy” but generates text via statistical prediction.

According to Google, the chatbot’s responses are not addressed to a “new audience” within the meaning of the Court’s case law but are accessible to the same audience as the original protected content.

Thus, even if the display of the answers by the defendant were to be considered as reproduction or making available to the public, this alleged reproduction would in any event fall within the scope of the exceptions provided for in respect of temporary acts of reproduction (Article 5(1) of the Information Society Directive) and for text and data mining (Article 4 of the DSM Directive).

Furthermore, Google emphasizes that the Gemini base model, a “large language model” (LLM), is neither an information database nor an information retrieval system: it does not store copies of the data collected but tokenizes and incorporates it.

In other words, it does not have a permanent database from which it can extract any data content at the request of users: it uses the Google Search database to collect data.

5. What’s next

The forthcoming CJEU’s response will be pivotal in defining how generative AI and copyright interact in generated outputs and will most probably set the tone for on-going and future disputes across the European Union. Likewise, the Munich decision – although not yet final – could shape cases regarding AI training, reinforcing the principle that innovation must respect creators’ rights.

By contrast, the UK Getty Images judgment demonstrates that, without clear evidence of memorisation, protection of copyright withing AI training has its limits. It remains to be seen, however, whether the concept of memorisation adopted by the Munich Court can be applied as readily to artistic works as to literary works.

About the author(s)

Miray Kavruk

Partner at Gowling WLG | See recent posts

Miray Kavruk advises national and international clients in all fields of intellectual property (IP) law, specialising in trademark, copyright, unfair competition, know-how and employee invention matters.

Alessandra Birkendorf

Associate at Gowling WLG | See recent posts

Alessandra is an Associate in the Intellectual Property team of Gowling WLG's Frankfurt office.

Céline Bey

Partner at Gowling WLG | See recent posts

Céline Bey is an expert in intellectual property (IP) and information technology (IT) with almost 20 years of experience.

She has broad legal expertise in all fields of IP (trademarks, patents, design, software and copyright), IT (data protection and domain names) and in unfair competition.

Sophia Allouache

Principal Associate at Gowling WLG | See recent posts

Sophia is a Paris-based associate in Gowling WLG's Dispute Resolution & International Arbitration practice.

Alexis Augustin

Principal Associate at Gowling WLG | See recent posts

Alexis Augustin is a principal associate specialized in industrial property registered at the Paris Bar.

Ines Rosen

Senior Associate at Gowling WLG | See recent posts

Inès is a Senior Associate in the IP/IT team based in Paris and is registered at the Paris Bar.

Michael Carter

Partner at Gowling WLG | See recent posts

Michael Carter is an experienced IP litigator and IP strategist with more than 12 years' experience of working with clients in the technology, engineering and automotive sectors to develop and implement their IP strategies and enforce their IP rights.

Ollie Carpenter

Associate at Gowling WLG | See recent posts

Ollie is a London-based associate in the Intellectual Property team, with a Masters degree in Chemistry. He advises clients on intellectual property rights, with a particular emphasis on patent matters.

Amnic Atwal

Trainee at Gowling WLG | See recent posts

Amnic Atwal is a Trainee in Gowling WLG's Birmingham office.

1. GEMA vs. OpenAI – Is OpenAI liable for copyright infringement?

No Text or Data Mining Exception

2. France

3. The UK Getty Images ./. StabilityAI judgment – secondary copyright infringement

Intangible “article” under the CDPA 1988

Key Difference to ChatGPT

4. The CJEU and the Google Gemini case

No new audience, no information retrieval system

5. What’s next

About the author(s)

Footer