A Doctrinal Critique of the DPIIT Working Paper on Generative AI and Copyright

The DPIIT Committee’s Working Paper on Generative AI (“AI”) and Copyright proposes a mandatory blanket license permitting AI developers to use all lawfully accessed copyrighted works for training, coupled with a statutory remuneration right. The proposal suffers from three fundamental defects: it constructs remedial architecture without establishing that any cognisable wrong has occurred under existing copyright doctrine; it mischaracterises the nature of copyright’s exclusive rights and their scope limitations; and its prescriptions would paradoxically harm the content owners it purports to protect.

The Proposal Precedes the Problem

The Working Paper proposes an elaborate regulatory apparatus – a centralized collecting entity, a rate-setting committee, mandatory registration systems, grievance redressal mechanisms—without first establishing that AI training constitutes infringement under the Copyright Act, 1957.

The paper candidly acknowledges this uncertainty. It notes that “there are strong reasons to believe that training AI Systems may raise concerns about copyright infringement” while simultaneously recognising “an ongoing debate regarding whether the ‘fair dealing’ exception can be applied.” It observes that litigation is pending before the Delhi High Court and that “awaiting finality on such pending litigations may not be optimal.”

This acknowledgment is revealing. The Committee has effectively conceded that the legal position is unsettled, yet proceeds to construct a comprehensive licensing regime premised on the assumption of infringement. This approach – creating remedies before establishing wrongs –represents a departure from sound policymaking. The Committee warns of “declining incentives” and “underproduction” but adduces no evidence that such effects are materialising. What the paper offers is not legal analysis but regulatory speculation.

The Doctrinal Position: No Infringement in Training

A proper doctrinal analysis suggests that AI training does not implicate the exclusive rights conferred by Section 14 of the Act at all. Section 14(a)(i) confers upon copyright owners the exclusive right to reproduce a work “in any material form including the storing of it in any medium by electronic means.” The Committee’s analysis treats any digital copying or storage per se as presumptively infringing. This interpretation, however, neglects the critical statutory qualifier: the exclusive right extends only to reproduction “in respect of a work or any substantial part thereof.”

The Supreme Court’s formulation in R.G. Anand v. Delux Films is instructive. The test for infringement asks whether the reader, spectator, or viewer, after encountering both works, receives an unmistakable impression that the subsequent work is a copy of the original. Where similarities exist alongside material dissimilarities negating any intention to copy, no infringement arises. This test is directed at human perception of expressive similarity – not at technical processes invisible to any audience.

During AI training, no human reads or is exposed to the works processed. No market for any specific work is created or substituted – markets being fundamentally constituted by humans transacting with humans over particular resources. What the model extracts are statistical patterns: mathematical relationships between tokens across millions of works. This is categorically different from reproducing expression for human consumption.

The outputs demonstrated in the ANI litigation itself illustrate this point. Even when prompted with queries specifically designed to elicit copyrighted content – prompts unlike anything ordinary users would employ – the AI responses summarise information in different words, refer to sources in third person, and provide attribution. They do not create the unmistakable impression of copying that the R.G. Anand test requires.

Transformative Use as Scope Limitation

The Committee’s analysis conflates two distinct doctrinal categories: the scope of exclusive rights under Section 14, and the exceptions to infringement under Section 52. This conflation obscures a critical point: transformative use operates as a limitation on the scope of the reproduction right itself, not merely as an exception that excuses otherwise infringing conduct.

The Delhi High Court’s reasoning in University of Cambridge v. B.D. Bhandari as well as the Calcutta High Court’s reasoning in Barbara Taylor Bradford v. Sahara Media is apposite. The Courts recognised that where copyrighted works are used for a transformative purpose resulting in output materially different from the original, the use falls outside the scope of the reproduction as well as the adaptation right. This is not an exemption in the nature of fair dealing – it is a determination that the exclusive right was never implicated in the first instance. Guidebooks or transformative derivatives to not implicate copyright at all.

AI training is paradigmatically transformative in this sense. The purpose is not to reproduce or distribute any work but to derive statistical relationships enabling generation of new content. The copyrighted works constitute infinitesimal fractions of massive training datasets. The output – a model capable of responding to prompts – bears no resemblance to any individual training work. Finding such use to constitute reproduction would expand Section 14 beyond any recognisable doctrinal boundary.

Copyright has never extended to information, knowledge, or patterns derivable from protected works. This principle is foundational to copyright’s architecture. The Supreme Court in Eastern Book Company v. D.B. Modak affirmed that copyright protects expression, not facts or information embodied in that expression. One may read an article, extract every insight, and deploy that knowledge competitively – without liability. A film student may watch thousands of films to develop aesthetic judgment; a musician may absorb countless compositions to develop melodic instinct. The value thus derived belongs to the learner, not to the creators whose works provided the educational substrate.

AI training involves precisely this extraction of unprotectable elements at computational scale. The model learns that certain word sequences correlate with certain contexts – not any particular author’s expression, but aggregate patterns across expression generally. What emerges is statistical metainformation about how language functions across millions of works, not reproduction of any discrete expressive content.

The Working Paper implicitly assumes that because creators invested labour in their works, they deserve control over all benefits flowing therefrom. This assumption, though widespread, is doctrinally mistaken. Copyright protects expression by conferring transactional exclusivity for a limited period. It does not protect labour as such, nor does it create entitlements over ideas, facts, or patterns that others may extract and utilise. Treating pattern extraction as infringement would transform copyright into a regime of information control it has never been and was never intended to be.

The Proposal Harms Content Owners and is short-sighted

The Committee presents its proposal as balancing creator and developer interests. A closer examination reveals that the mandatory licensing regime would leave content owners worse off than the status quo.

Under existing law, if AI-generated output actually reproduces protected expression – if it produces the unmistakable impression of copying – the copyright owner possesses full remedies: injunction, damages, accounts of profits. The substantial similarity analysis applies to AI outputs precisely as it applies to any allegedly infringing work.

The proposed regime would alter this calculus. A licensing fee paid into a collective pool could operate as a liability shield for genuine infringement at the output stage. Where generated content actually reproduces a creator’s work verbatim – the harm copyright exists to prevent – the creator’s remedy might be circumscribed to their share of the collective pool rather than damages reflecting actual harm.

This is precisely backwards. The proposal would immunise training activity that likely does not constitute infringement while potentially limiting remedies for output-stage copying that clearly does. Content owners would trade robust protection against genuine infringement for modest participation in a collective pool funded by activity that may be lawful in the first instance.

The Arithmetic of Insignificance

The Committee proposes that royalties be calculated as a percentage of AI companies’ global revenue – a metric bearing no relationship to harm suffered by any particular creator – then distributed through a centralized entity to Copyright Societies to registered claimants.

Consider the mathematics. Even assuming substantial sums collected at the apex, distribution across millions of training works and countless creators renders per-creator amounts trivial. The transaction costs of the proposed apparatus – rate-setting committees, collecting entities, registration systems, grievance mechanisms – will absorb substantial resources before anything reaches individual creators.

What remains provides the appearance of compensation without its substance. A negligible royalty share does not address livelihood displacement. It provides moral cover for a regime that extracts control from creators while delivering little in return.

The Category Error

The Working Paper’s genuine concern is labour displacement: that AI-generated content will flood markets, undermining human creators’ economic position. This concern warrants serious policy attention. But copyright cannot address it, and the attempt to make it do so represent a fundamental category error.

Even if AI companies paid licensing fees for every training work, the competitive dynamic remains unchanged. AI would still generate content at near-zero marginal cost, scale infinitely, and compete with human creators across every market simultaneously. A licensing fee is not competitive parity – it is a modest tax paid while displacement continues unabated. The creator receiving a meagre royalty share is not assisted if their capacity to earn a living from new works is undermined by AI systems generating comparable content instantly and cheaply.

The Committee’s proposal thus mistakes the nature of the problem it confronts. It reaches for copyright because copyright is familiar, because it appears to offer a mechanism, because the alternative – acknowledging that genuine responses require difficult structural reforms – is less tractable. But familiarity is not adequacy. The proposal would burden copyright with work it cannot perform while failing to address the displacement concerns that motivated the intervention.

Image generated on Dall-E.

(Views expressed are personal)