Hyunsoo Kim, J.D. Class of 2028
The development of generative artificial intelligence (AI) is transforming industries at an unprecedented pace, with nearly all sectors incorporating AI models into their practice. While AI has undergone significant development in the past several years, the usage of AI in the legal industry has highlighted significant privacy concerns arising from how AI platforms utilize user-inputted data to train their models, and intellectual property ownership issues.
Many traditional cloud-based AI platforms, such as OpenAI’s ChatGPT, collect large quantities of user input as datasets for training, evaluating, and fine-tuning their large language models, leading to serious privacy concerns when users include privileged or sensitive information. Inputting privileged or sensitive information into public AI platforms risks violations of client privilege, data protection regulations, or opens users to risks of data breaches.
In fact, the American Bar Association’s (ABA) 2024 ethics guidance highlights confidentiality concerns of self-learning generative AI through improper disclosures from either those who are “prohibited from access to said information because of an ethical wall” or when AI-generated information used to help a client is inadvertently generated through other clients’ confidences.
These concerns are sometimes addressed through contractual requirements of closed-instance AI platforms, incorporating requirements such as SOC-2 Type II certification (North America) and ISO 27001 compliance (International). Closed-instance AI platforms are implementations of AI that operate within a private, tightly controlled environment, hence it being isolated from the public internet and bound by the organization’s security and compliance policies. Because the AI’s data access, updates, and integrations are restricted and auditable, it is viewed as “safer” for business use, especially when handling sensitive internal data. Similarly, Artificial Intelligence Underwriting Company (AIUC) is trying to make a SOC-2 equivalent safety standard for AI agents, called AIUC-1, and provide insurance to companies that deploy AI agents that meet said standard. AI agents are large language models (LLMs) wrapped with software to complete tasks, much like how humans interact with computers and the internet. In summary, these certifications for tech companies coupled with closed-instance AI systems are methods that companies are implementing in order to prevent data leakage and allow users to leverage the technological capabilities of AI while maintaining privacy obligations.
Foreseeing the risk of sensitive information leakage and concerns over data security, several companies, including aerospace and defense contractor Northrop Grumman, have blocked usage of ChatGPT. Alternatively, entities such as Amazon and the Department of Defense have opted to develop its own proprietary AI platforms that are organizationally compliant.
Inputting information into cloud-based AI platforms also poses risk for users concerning intellectual property. Usage of generative AI raises the question of reserving intellectual property rights in all inputs and outputs. Terms of service typically vary by provider, with Chat GPT’s TOS allowing user ownership of all rights to content, whereas Gemini’s TOS utilizes inputted content to provide, improve, and develop Google products, services, and machine learning technology. In June 2025, the Bartz v. Anthropic settlement highlighted the issue of copyright infringement by Anthropic. Book authors brought a class action lawsuit against Anthropic, a major AI firm, for using digital books found on the internet to train their large language models, including Claude AI, without their permission. Anthropic downloaded millions of open-source books to build their central library to train their AI platform, many of which were listed on pirated websites. The U.S. District Court for the Northern District of California held that Anthropic’s usage of the lawfully obtained digital copies to train specific LLMs was justified as fair use, but the retention of pirated copies to create a central library constituted infringement. Accordingly, when corporations use cloud-based AI platforms, depending on the terms of service with the AI service, their intellectual property may be stored externally into data sets within a central library and further utilized to train future iterations of LLMs.
While nearly all sectors are moving towards implementing generative AI processes in their practices to enhance operational effectiveness, there are significant risk implications to its users. Due to how large language models inherently train and process data sets, it exposes corporations to privacy and intellectual property issues. With the rapid innovation of AI, these major issues surrounding AI are currently being addressed through regulatory practice and contracting considerations. However, these measures are far from perfect, and managing these evolving issues remain a complex challenge.