Generative AI under legal attack! What does this mean for its use and legal compliance?

by on 17. August 2023

Artificial intelligence (AI) has the potential to replace a huge number of jobs. According to a study by the US bank Goldman Sachs, as many as 300 million full-time jobs could be eliminated worldwide. In the United States and Europe, two-thirds of all jobs could be affected by automation, and up to a quarter of all work could be completely taken over entirely by AI (Goldman Sachs, The Potentially Large Effects of Artificial Intelligence on Economic Growth (Briggs/Kodnani), 26 March 2023).

At the same time, this development represents an enormous potential for companies to reduce costs and increase productivity, but also an urgent need to evaluate this potential in order to remain competitive. New challenges and requirements will emerge, and companies will need to develop new processes to most of these changes.

According to Goldman Sachs, productivity gains could add 7 % to global GDP, compared with an average of 3.5 % over the past 50 years. Only companies that embrace the opportunities offered by new technology will be able to participate in these developments.

At the same time, there is uncertainty as the providers of the largest and most popular generative AI systems are currently facing a wave of lawsuits alleging liability on the part of the operators for a wide variety of legal violations.

In Delaware, on 6 May 2020, legal database owner Thomas Reuters sued legal tech startup ROSS Intelligence for using its database content to provide AI training to an artificially intelligent lawyer. They accused the startup of copyright infringement and inducing breach of contract. ROSS Intelligence defended itself with a counterclaim aiming for a declaration of fair use and misuse of copyright.

On 3 November 2022, the first class action lawsuit was filed in San Francisco on behalf of programmers who remain anonymous against GitHub, Microsoft and OpenAI, which had made code available on the GitHub platform under an open source licence. They accuse the operators of GitHub‘s CoPilot system of using the code they created for an AI training course without complying with the terms of the licence – in particular, without naming the programmers – and making it available to the public.

On 13 January 2023, a group of artists also filed a class action lawsuit in San Francisco against the providers of the image generators Stable Diffusion, Midjourney and DeviantArt. The artists object to Stability AI’s use of billions of copyrighted images without the creators’ permission to train and create the system. They are basing their lawsuit on allegations of breach of contract and terms and conditions, unfair competition, and violation of the Digital Millennium Copyright Act (DMCA). The defendant AI provider’s main defence is that the images created by the AI are not similar to the artist’s original work.

Also, the stock photo agency Getty Images sued Stability AI. On 3 February 2023, they sued the AI provider in Delaware for copyright infringement, alteration of watermarks, trademark infringement and unfair competition. Stability AI had allegedly used copies of 12 million copyrighted images to train and created the system without the creators‘ permission.

On 5 June 2023, radio host Mark Walters filed a defamation lawsuit against OpenAI in Gwinnet County. Walters disputes ChatGPT’s claim that he embezzled more than five million dollars as chief financial officer of a non-profit gun rights foundation. In fact, there is no such allegation against him and he was not on the board of the organisation or involved in the process.

In lawsuits filed on 28 June 2023 and 7 July 2023, several authors – including the popular comedian Sarah Silverman – filed a class action lawsuit in San Francisco against OpenAI and Meta. The lawsuit alleges that the AI models ChatGPT and LLaMA infringed the authors’ copyrights by using illegal copies from databases such as Library Genesis (also known as LibGen), Z-Library (also known as B-ok), Sci-Hub and Bibliotik to train GPT, removing copyright notices, creating modifed versions of the works and distributing those versions without copyright notices. In addition, the authors further accuse the AI providers of violating their privacy rights and the right to be forgotten.

Google parent company Alphabet is facing a class action lawsuit filed by several individuals in the Northern District of California on 11 July 2023. It accuses the company of violating privacy law as well as the right to be forgotten, among other things, by using personal and professional information, creative and textual works as well as photos and emails ¬- in short, their entire digital footprint – to develop its AI products, most notably Google Bard. The lawsuit follows Google’s change to its privacy policy, under which it grants itself the right to use any data available online to train its artificial intelligence systems.

In addition, the US Federal Trade Commission began investigating OpenAI on 13 July 2023 after ChatGPT frequently made up reputation-damaging falsehoods about individuals.

Most recently, on 19 July 2023, the Authors Guild wrote an open letter to the CEOs of OpenAi, Alphabet, Meta, Stability Ai and Microsoft from more than 8,000 authors, including well-known names such as Dan Brown and Margaret Atwood. They demand that their works be licensed for the training of AI systems.

Also in Germany first cases in this regard have become public. In Hamburg, the German photographer Robert Kneschke requested the deletion of his images from the AI training data set provided by LAION eV. This data set was used, among other things, for the training of Stable Diffusion. LAION eV refused to delete, whereupon Kneschke filed a lawsuit. He accuses LAION of copyright infringement through the unauthorised use of his works.

Legal certainty for the use of copyrighted works in the training of generative AI does not currently exist, and is unlikely to do so in the foreseeable future, given the time it will take to litigate these cases. The safest way therefore remains the use of licensed datasets – and licensed AI services. Content from open source projects can also be used, subject to the relevant licence terms. In certain cases, the extraction of information from copyrighted works could be covered by the text and data mining exceptions (Article 4 of the DSM Directive and Section 44b of the Copyright Act). In the US, fair use may play a role in certain circumstances.

Not all AI is the same – the use of copyrighted works and their exploitation in connection with the AI training being used, or their use within a particular AI system, needs to be carefully considered on a case-by-case basis so that the right steps can be taken for legally secure AI training or integration of AI systems.