The IP In AI: Can AI Infringe IP Rights? – Copyright

In this series, we have explored whether IP rights protect AI
systems themselves and whether copyright or patents provide
protection for AI-generated works or inventions, however, equally
as controversial is the way in which AI systems use others’
works. At their core AI systems are computer systems, working on
large volumes of data. Those systems – and most obviously
those data – are often the products of others’
intellectual and economic investment. This article explores the
degree of protection likely afforded to IP rights holders against
unsanctioned use of this material by an AI system.

Copyright infringement

What amounts to infringement?

In general, copyright prevents the unauthorised use of certain
categories of subject matter (for example, literary or artistic
works). Although the requirements for infringement differ between
jurisdictions, in general if copyright subsists in a work (which we
discussed in more detail in part 3 of our series), proving
infringement of that copyright requires the copyright owner to show
that:

a relevant act was done in relation to a work (the
“junior work“), for example that it has
been reproduced, published or transmitted electronically;

the junior work bears objective similarity to the work that is
protected by copyright (the “senior
work“), or a “substantial part” of it;
and

that objective similarity arises because of copying of the
senior work.

There are also a number of exceptions or defences to
infringement that can apply, which differ from jurisdiction to
jurisdiction. For example:

In Australia, use of a copyrighted work will not amount to
infringement if it is a “fair dealing” for certain
specified purposes, such as research and study,¹
criticism and review,² news reporting,³ or
parody and satire.⁴ These exceptions are, however,
relatively narrow.⁵ As well as being for one of the
prescribed purposes, the use must be “fair”, which
requires the assessment of a number of factors such as the purpose
and character of the dealing, the possibility of obtaining the work
on commercial terms, and the effect of the dealing on the market
for the original work.

In the EU there are multiple exceptions and limitations
(“E&Ls”) available but each applies separately in
each EU Member State as there is no harmonisation of the exceptions
and limitations across the EU. However, the so-called EU
“Copyright Directive”⁶ introduced some
mandatory E&Ls, such as text and data mining (separate E&Ls
for research and other purposes), teaching and educational
purposes, and preservation of cultural heritage.

In the UK there are exceptions to copyright infringement for
“fair dealings” for certain prescribed purposes including
non-commercial research and private study;⁷ criticism,
review and news reporting;⁸ and parody, caricature and
pastiche.⁹ Again, these exceptions require the use to be
“fair”, which involves consideration of factors such as
whether the use affects the market for the original work, and
whether the amount of the work used is reasonable and appropriate.
Exceptions also exist for certain purposes such as text and data
mining for non-commercial research¹⁰ and assisting
accessibility for the disabled.¹¹ All also require
sufficient acknowledgement.

In China there are exceptions to copyright infringement for
purposes including personal study, research or appreciation;
introducing or commenting on a certain work, or illustrating a
point; news reporting; certain uses for non-commercial teaching of
research; and provision of published works to dyslexics in a
barrier-free way through which they can perceive.

It is noteworthy that none of these jurisdictions have an
equivalent to the relatively broad and flexible “fair
use” doctrine that applies in the US.

Copyright infringement by training

Infringement by training

Central to almost any AI system is a large mass of data on which
the system is trained. Although referred to as “data”,
the training materials are frequently themselves original works, in
which copyright subsists. For example, these may be artworks (as in
Stable Diffusion), or passages of code (as in CoPilot). The process
of training an AI or ML system on those inputs almost certainly
involves the creation of a copy (in a copyright sense) – and
most likely many copies – of those copyright works, even if
those copies are only ever used “internally” within the
system (eg in training the system) and never reproduced as outputs
from it.

This kind of copying is among the key allegations in proceedings
brought by Getty Images against Stability AI (Getty
Images). Getty Images, a global media provider
distributing royalty-free images, photos, music and video, has sued
Stability AI in the UK and US for allegedly using over 12 million
of its copyrighted images and associated captions and meta-data to
train its AI text-to-image tool, Stable Diffusion, without consent
or compensation. In the US, the case is in its discovery
stages,¹² whilst in the UK the High Court, on 1 December
2023, set the case down for trial on the basis of real prospects of
success.¹³ The UK litigation also involves allegations
of infringement of database rights, trade mark infringement and
passing off, as well as copyright infringement (see our recent
update on this case and other generative AI litigation worldwide here).

Authors including Jodi Picoult and George RR Martin have also
sued Open AI in the US, (Authors Guild, et al. v. OpenAI,
Inc.) alleging the infringement of fiction authors’ rights
in the AI system’s wholesale copying of their works, without
permission or compensation, to train its large language models
(LLMs). They also argue that the output of these LLMs are
derivative works which mimic or paraphrase the authors’ work
and harm the market. The Authors Guild allege that this threatens
the livelihood of authors, and most recently have joined Microsoft as a defendant.
Unsurprisingly, many other groups of authors have brought separate
suits against ChatGPT and Open AI based on similar concerns
(including Tremblay v. OpenAI, Inc.).

Practical challenges

Although, in a legal sense, this act of infringement may be
straightforward conceptually, there are practical matters that make
it difficult to establish infringement:

Proving that a particular copyright work was part of
the training data – because the training set is rarely
published (and often protected as a trade secret), this may be
difficult. In Getty Images, this problem was to some
extent circumvented because Getty located its watermark on some
output images from Stable Diffusion. Similarly, in J. Doe 1, et
al., v Github, Inc., et al No. 22-cv-06823
(GitHub), some outputs were shown to be
almost identical to specific code stored on GitHub. In other cases,
however, there may be no such link. In those cases, pre-action
investigative processes, such as preliminary discovery or
subpoenas, may be required.

Jurisdictional considerations – namely,
that the act of infringement must occur in the jurisdiction in
question. In the UK, in Getty Images for example,
Stability AI made an application for a reverse summary judgement in
the UK litigation on the basis that the acts did not occur in the
UK, though the court did not accept that it was clear enough for a
summary judgement and the matter has gone to full trial. In that
case there is also a claim of infringement based on an alleged
importation of an infringing “article” (ie the LLM),
raising the question whether a service is an “article” in
the sense anticipated by this element of the UK legislation. The
specific location of the act of infringement may also raise
difficulties if, for example, the jurisdiction in which the
training takes place has specific defences that are not available
in the copyright owner’s jurisdiction – such as the
US’s fair use defence, or Singapore’s computational data
analysis provisions.

Government responses

These issues, and the challenges they present for rights
holders, are a high priority for governments worldwide. For
example, the current draft of the EU AI Act, which is being
negotiated between the EU Council, EU Parliament and EU Commission,
contains provisions requiring transparency of training data to be
mandatory such that copyright protected materials using in training
an AI can be identified (see our blog post here). In addition, the EU AI Act requires
general purpose AI models to make publicly available a sufficiently
detailed summary of the content (including text and data protected
by copyright) used for training the model.

In Australia, in December 2023, Commonwealth Attorney-General
Mark Dreyfus announced the establishment of a copyright and AI reference group “to
better prepare for future copyright challenges emerging from
AI“, expressly referring to the need to address copyright
issues concerning “the material used to train AI
models” and “transparency of inputs and
outputs“.

The UK House of Lords Communications and Digital Committee
issued a report on LLMs and Generative AI in February 2024 (see our
blog post here), which called on the UK Government to
support copyright holders, saying the Government “cannot sit
on its hands” while LLM developers exploit the works of
rightsholders. The report expressly called for a way for
rightsholders to check training data for copyright breaches, and
the Committee Chair was quoted as saying:

One area of AI disruption that can and should be tackled
promptly is the use of copyrighted material to train LLMs. LLMs
rely on ingesting massive datasets to work properly but that does
not mean they should be able to use any material they can find
without permission or paying rightsholders for the privilege. This
is an issue the Government can get a grip of quickly and it should
do so.

In its response following the consultation on its AI Regulation
White Paper published in February 2024, the UK Government did not
produce the definitive solution that the House of Lords had called
for, but referenced the UK IPO’s failed attempts to find a
solution between stakeholders over the last 18 months. Instead the
response referred to further examination of ways to improve
transparency of use of copyright material (see our blog post here). As a result, in the UK it may well be
for the courts to determine the copyright position in the short
term, although this may not be to the liking of those investing in
AI development.

Copyright infringement by outputs

Aside from infringement during the training of an AI system, it
may also be the case that an AI system can produce outputs that
infringe copyright, in the sense that they bear sufficient
objective similarity to an original work. Since this only requires
a side-by-side comparison of a given output from the AI system and
a given original work (rather than a forensic enquiry into whether
the original work was in fact among the training data set), this
kind of claim avoids some of the difficulties referred to above.
However, here the difficulty is primarily in showing the requisite
degree of objective similarity between a given output and a given
input.

This was the primary challenge faced in GitHub and
Andersen v. Stability AI Ltd.
(Andersen),¹⁴ where many of
the claims originally brought have been dismissed because the
plaintiffs were unable to establish a specific original work that
bore sufficient objective similarity to a specific output work.
This difficulty is caused by a multitude of practical factors,
including poor or inaccurate referencing and lack of transparency
from developers, as well as the technical nature of AI systems.

This problem can be exacerbated by the “Snoopy problem” (also referred to as the
“Italian plumber problem”). If the training data uses
enough example images of a particular and well-known
subject (such as Snoopy), or a particular style of work, it may be
difficult to draw a sufficient causal link between a given output
and a specific input image. This, too, is an issue in
Getty Images, where one area that is being debated in
relation to potential defences (the Defence has yet to be filed) is
that the outputs are “inspired by” rather than directly
copying the originals, since they mix elements from multiple
sources. In that respect the replication of watermarks or parts of
them (discussed above) may assist Getty.

Academics and software developers have recently sought to
develop methods to identify whether text is generated by an AI, but
these methods appear to be currently limited to text-based output,
and have limited reliability and accuracy. Any input from
governments to mandate, as a matter of policy, a framework for
watermarking or indicating the source of an AI output will also
need to consider countervailing issues including economic policy,
competition, and the promotion of innovation.

Another challenge faced with these kinds of cases is the
identification of the infringer. If an AI system can be used to
generate an output work bearing similarity to a given input, but
only when that AI system is used by a user who is determined to
infringe, who is (or should be) liable for that
infringement?¹⁵ In many jurisdictions the answer may be
both the user and the AI system owner – the former for the
primary infringement and the latter for “authorisation”,
“vicarious” or “secondary” infringement.
However, assessment of such “secondary” liability often
requires an examination of the degree to which the AI system owner
can control or prevent the allegedly infringing conduct of the
user.

Aside from copyright infringement, the owners of works used to
train an AI system may have other causes of action in relation to a
given output. For example, even if a given input work is available
on open-source licence terms, those terms may require attribution
information, or require that any derivative works are licensed on
terms no less open than that applying to the inputs (so-called
“copyleft” licences). Indeed, the removal of attribution
(or copyright management information) is part of the complaint
brought by the plaintiffs in GitHub.

Patent infringement

With the rapid development of AI systems, companies like Google,
Samsung,and Microsoft led the market in terms of AI-related patent
applications at the EPO in the period 2016 to
2020.¹⁶

While copyright infringement has dominated current IP litigation
brought in the context of works generated by AI, there are emerging
patent disputes involving AI systems. Given it is now relatively
established in most jurisdictions worldwide following Dr
Thaler’s series of applications (see our blog post here on the UK Supreme Court decision of
December 2023 in that regard) that the AI system itself is unable
to be an “inventor” for the purposes of patent law (as
also discussed in our previous article here), the focus has shifted to infringement
of patents seeking to protect the AI system itself.

In July 2023, FriendliAI commenced proceedings in the United
States District Court For The District Of Delaware against Hugging
Face (FriendliAI Inc. v. Hugging Face,
Inc.), who offers an inference server for Large Language
Models (“LLMs”) called Text Generation Inference
(“TGI”). The founder and CEO of FriendliAI is Dr.
Byung-gon Chun, the inventor of PeriFlow/Orca, which utilises a
system for iteration-level or dynamic “batching” which
allegedly improves AI systems with more efficient and scalable
serving of generative AI transformer models. This allows the AI to
process multiple requests at once. Hugging Face clearly states on
its website that it uses PeriFlow/Orca, which FrendliAI contends
constitutes infringement of their patent entitled ‘Dynamic
Batching for Inference System for Transformer-Based Generation
Tasks’. This matter is in its early stages, and it will be one
of the first patent infringement cases relating to an AI
technology.

These patent cases, while dealing with AI subject matter, will
grapple with relatively traditional patent law concepts, including
construction of the patent claims, considerations of whether those
claims have been exploited, as well as counterclaims attacking the
patent’s validity (see our previous article on patent protection of AI systems).
An example of the latter occurred in December 2023, just before the
Supreme Court’s decision in the DABUS/Thaler case on
inventorship: the High Court of England and Wales rejected a
challenge to the patentability of an AI system, relating to an
autonomous neural network, which was held not to be excluded from
patentability (see our blog post here). The UK IPO has been granted leave to
appeal the decision to the Court of Appeal. However, in the
interim, in response to the High Court’s decision, the UK IPO
has temporarily suspended its guidance on the examination of AI inventions
while it considers the impact of this decision and has issued a practice update specifically relating to the
examination of ANNs.

Conclusions

A common thread amongst the cases discussed above is the
normative considerations associated with IP protection and
enforcement in relation to materials used and produced by AI
systems. These include the adequate compensation of copyright
owners, lost opportunity to licence their works and market
usurpation through derivative works.

Copyright holders asserting their rights, including Getty
Images, have often reiterated that they do not seek to have a
chilling effect on the development of AI technology, but instead
are focusing on ethical sourcing of data, including compensation
for copyright holders, consent (including by exploring opt-out
models), and an opportunity to licence. These issues have been
behind the debates worldwide over regulation of AI and attempts to
balance opportunity with equity.

At the same time, organisations hosting large volumes of data
are realising the potential value of those data to new and upcoming
AI systems and putting in place systems to protect them. Reddit,
for example, has announced that it plans to charge companies
for accessing its application programming interface (which is used
by external entities to download conversations from the forum),
even though its User Agreement confirms that users retain
ownership of content they post to the platform.

Outside of the strict bounds of the law, developers of AI
systems may also begin to see the fair and ethical sourcing of
their input data as forming a part of their ESG public image and
“social licence to operate”. In late 2023, for example,
Canva announced a commitment not to train its
proprietary AI models on its creators’ content without express
permission, and established a $200 million compensation program for
creators who consent to having their content used to train those
models.

The growing frequency of attempts to regulate these issues
– and disputes arising from them – demonstrate the
challenge IP law is currently contending with in striking the
balance between encouraging investment in AI technologies and
protecting investments that have already been made in the material
being used to train them. The legal reforms and market restrictions
that might lead to this balance are yet to be implemented, but the
results of the various disputes around the world may help to
illustrate the difficulties of the current position and provide
added impetus towards an international solution to an international
problem.

Footnotes

1. Copyright Act 1968 (Cth) s 40.

2. Copyright Act 1968 (Cth) s 41.

3. Copyright Act 1968 (Cth) s 42.

4. Copyright Act 1968 (Cth) s 41A.

5. See our previous articles on this topic, including
‘Not all’s “fair dealing” in war
and Greenpeace: Federal Court confirms limits of the “parody
or satire” exception to copyright infringement‘
and ‘Copyright owners “Don’t have to take
it”: Federal Court of Australia awards substantial remedy for
copyright infringement, plus double damages for
flagrancy‘.

6. EU Directive (EU) 2019/790 in the Digital Single
Market.

7. Copyright, Designs and Patents Act 1988
(CDPA) s 29.

8. CDPA s 30.

9. CDPA s 30A.

10. CDPA s 29A. The UK Government considered extending
this exception to commercial use but withdrew its proposals in 2023
(see our blog post here).

11. CDPA ss 31A-31F.

12. https://dockets.justia.com/docket/delaware/dedce/1:2023cv00135/81407.

13. Getty Images (US) et al., v Stability AI Ltd
[2023] EWHC 3090, [108].

14. Case No. 3:23-cv-00201-WHO.

15. https://crsreports.congress.gov/product/pdf/LSB/LSB10922;
http://eprints.lse.ac.uk/117745/1/McDonagh_can_artificial_intelligence_infringe_copyright_accepted.pdf
.

16. Google and Samsung top the list of applicants for
AI-related patents at the EPO – IAM
(iam-media.com).

The content of this article is intended to provide a general
guide to the subject matter. Specialist advice should be sought
about your specific circumstances.

Trump Lawyers Flip Shit Over Page Limits In Election Interference Case

Student Accepted To Top Law School With Full Ride Sentenced In January 6th Case

The Shrinking Ownership of Law Practice Management Technology (Part 4 of 4): Wrapping It All Up

North Dakota Joins National Coalition of States Working to Raise College Completion Rates

From Time Tracking To Invoicing: How TimeSolv Transforms Legal Practice [Sponsored]

BMCC and John Jay College Team Up to Create Prison-to-College Pathways Program for Incarcerated Students