By Xiaoyun Hu, LL.M. Candidate, 2025
Introduction
Nowadays, AI tools can not only generate full sentences and pictures, but can also produce voice clones using audio samples as input. Some people have used these tools to fabricate a singer ‘covering’ someone else’s song, or to replace the current dubbing of a child voice actor whose voice has changed due to puberty. These uses implicate copyright infringement problems at both the input stage (training AI models with audio samples) and output stage (generating cover songs with cloned voices).
AI Training and Fair Use
Some argue that training AI with copyrighted sound recordings should qualify as fair use. However, courts have not yet adopted this opinion. For instance, in Thomson Reuters v. Ross Intelligence, the first significant judicial ruling on AI training and fair use, the court rejected Ross Intelligence’s fair use defense.
Even though many academics support the fair use argument for AI, the defense has mostly been applied to general-purpose AI models such as ChatGPT These models rely on massive, diversified training datasets to enable multifunctional outputs. By contrast, this rationale does not necessarily extend to voice cloning AI tools due to their distinct training datasets.
Although courts assess fair use under a four-factor balancing test, much of the debate about fair use to AI training focuses on two factors: (1) whether the training constitutes a “transformative use,” and (2) whether the training serves as a market replacement for the source material. Moreover, these arguments emphasize that fair use should apply to the AI training process itself, rather than just evaluating the output.
- Transformative Use Factor
Some scholars contend that using copyrighted materials to train generative AI models should be considered transformative, as the purpose is to develop new technology rather than to replace the original work. This argument reflects the U.S. Supreme Court’s recent shift (laid out in Andy Warhol Foundation v. Goldsmith) from analyzing transformative use based on whether the secondary use is a new expression or meaning to focusing on whether it a different purpose from the original. However, voice cloning is not a transformative use under this framework. Unlike general-purpose AI models, the purpose of voice cloning is not to extract statistical patterns for further applications, but to generate voices identical to the input subject. The models made from this process, aimed at imitating others’ voices, will not advance the innovation-promoting purpose pursued by copyright law and the fair use doctrine. Thus, such training is not justified under the transformative fair use factor.
- Market Harm Factor
Scholars contend that it is difficult to establish an effective licensing market for general-purpose AI. The volume of copyrighted works used for training is enormous, and the number of rights holders from whom permission would be required is equally vast. In the absence of centralized or collective licensing systems, obtaining such permissions is effectively impossible. As a result, no viable market exists for these uses, which suggests that generative AI training causes minimal market harm under the fair use analysis.
However, such a licensing market does exist for voice cloning because the necessary licenses can be granted by a limited and clearly identifiable group of right holders. First, training an AI to imitate a certain person’s voice with audio materials would require the person’s consent to avoid infringing on their right of publicity. Second, where music or videos are involved as training data, permission from the copyright holders would be required, as reproduction of sound recordings during the training process may otherwise infringe the reproduction right. There have already been attempts by AI toolmakers to strike deals with record labels in hopes of licensing music to train AI models to mimic the sounds of famous musicians. While these attempts have not yet succeeded, they demonstrate that getting licenses for AI training of voice clone models could be possible. Additionally, courts should not only focus on the training process, but also on the output of such voice models when assessing market harm. The works generated by cloned voices, such as AI-rendered songs or dubbed voice tracks, directly compete with the training materials, which weighs against fair use.
A lack of market harm may justify fair use for training general-purpose AI, but it does not extend to voice clones because they serve narrow purposes and involve only a limited number of identifiable licensing right holders.
AI Covers and Derivative Work Right
As noted, unauthorized use of materials for voice cloning may be infringing. However, even with proper authorization, songs using voice-cloning AI do not necessarily qualify for protection as derivative works under copyright law. An “AI cover” refers to the use of AI-generated voice clones to replace the voice timbre in released sound recordings to make Artist A’s original recording sound like a cover version performed by Artist B. Though AI covers may appear to be derivative works, they often lack the originality required for protection.
For a derivative work to be protected by copyright, it must have substantial originality: it cannot be merely a trivial variation of the preexisting work. Courts have held in sculpture-related cases that minor alterations do not constitute substantial originality, and this reasoning could apply similarly to sound recordings. In Batlin v. Snyder, the Court held that Snyder’s Uncle Sam bank was “extremely similar” to the original bank it replicated, “save in size and material.” The two works were almost identical in overall appearance and detailed design, such as Uncle Sam’s clothing elements and prop arrangements. The changes in size and material did not constitute original contribution, and thus Snyder’s bank was a mere copy rather than a derivative work.
AI covers are very similar to such modification. AI changes only the voice timbre in the sound recordings — from Artist A to Artist B — while preserving A’s musical interpretation or individual performance. This is akin to changing the material and size of a sculptural work while keeping the general design and most details. In essence, merely replacing a voice adds no new expression. Therefore, the secondary work of the AI cover has only trivial differences to the source material and does not qualify as a copyrightable derivative work.
Conclusion
There are two key legal challenges for AI voice cloning: fair use defenses at the training stage, and the lack of substantial originality at the output stage. These challenges reveal that specific-purpose AI tools involve different legal concerns compared to general-purpose AI models. Voice cloning tools are closely tied to identifiable copyrighted materials and personal attributes, which makes their legal risks more pronounced.
Although voice cloning technology faces serious copyright and publicity rights challenges, it is far from doomed. Future AI covers may incorporate enough original expression to gain copyright protection, and systematic licensing solutions could help overcome current legal barriers. Despite ongoing uncertainties, the technology still holds promise if innovation and rights protection are properly balanced.