This is the property of the Daily Journal Corporation and fully protected by copyright. It is made available only to Daily Journal subscribers for personal or collaborative purposes and may not be distributed, reproduced, modified, stored or transferred without written permission. Please click "Reprint" to order presentation-ready copies to distribute to clients or use in commercial marketing materials or for permission to post on a website. and copyright (showing year of publication) at the bottom.

May 17, 2023

Artificial intelligence authored software development

See more on Artificial intelligence authored software development

Aaron M. Levine

Shareholder
Polsinelli PC

See more...

Annika P. Pollard

Associate
Polsinelli PC

See more...

Artificial Intelligence (AI) has seized headlines worldwide with millions witnessing and experimenting with the potential of "generative AI." Generative AI refers to the ability of a category of AI tools to create texts, images, music, videos or other types of media in response to user prompts.

A generative AI can help you write a high school book report, a comic book and create amusing (or disturbing) images of celebrities. More serious applications include using generative AI to draft market research reports, legal research memos and patent applications. However, one of the most interesting abilities of generative AI is its ability to write software code, allowing AI to create new applications capabilities for itself and others. Theoretically, the power of these tools is nearly limitless.

Limitless capabilities also come seemingly with limitless issues, including legal issues. Because of their capability to "create," generative AI has raised many legal and even philosophical questions about what constitutes authorship, inventorship and where the line between influence and derivation really exists. Since, in a way, generative AI is not "creating" anything but applying complex probability models to supply the most appropriate outputs in view of the received prompts, the question of where the outputs ultimately derived from must be considered.

Very recently, Apple approved the first application authored from the ground-up by generative AI. See https://www.cultofmac.com/810270/5-movies-first-chatgpt-generated-app/. According to the app's developer, approximately 95% of the code base of the app was a product of the efforts of generative AI. While the specific app may not be that amazing, the proof of concept is impressive. Undoubtedly, generative AI and specifically "large language model" AI tools, such as ChatGPT, will have a bright future in the software industry. In the context of AI-generated software code, this question inevitably focuses on Open Source Software (OSS), as its public nature makes it "free" to use in the training of AI tools.

Developers that utilize generative AI to create software may need to begin strategizing the approach they will take with these tools to ensure that they do not create more licensing and copyright problems than are solved by AI software code development.

Again, it is important to recognize that generative AI does not create software code from scratch. Instead, it pieces together software code based on its probability model, the training data and the knowledge base that it has access to. According to the terms and conditions for OpenAI, the creator of the popular generative AI model ChatGPT, no representations and warranties are made that any output - software code included - are free of third-party infringement.

When queried as to what constitutes its knowledge base, ChatGPT itself explains: "The code I generate comes from a large corpus of existing code that I was trained on during my development. When I receive a request for code, ... [I] generate code that is syntactically correct, follows best practices and meets the user's requirements." When queried further for specifics, ChatGPT responds with: "I do not have access to licensing information for the large corpus of existing code that I was trained on. However, the vast majority of code in public repositories like GitHub is typically licensed under an open-source license."

As background, almost all OSS available on the internet is protected using a license. In essence, the license describes conditions for use of the OSS and allows the programmer to use the code free from any concerns of copyright infringement. There are two primary types of OSS licenses: permissive and copyleft. Typically, permissive licenses provide the software as-is with no warranties. Downstream developers use the software at their own risk, but are also free to make their own changes to the code, license their modified code for a fee, etc. On the other hand, copyleft licenses typically require that all new or modified software developed using the license software as a base also be released under an identical copyleft license. Worse, from a commercial developer standpoint, the copyleft license may require the license to be free and require the modified software code be made available to the public.

Accordingly, disregarding OSS licenses can create serious consequences for a developer's business model. For example, the Affero GPL 3 copyleft license purports to cover nearly every interaction an end user could have with the code - even through a network. Further, like a typical copyleft license, it requires that "modified" works, which can include larger works created using the Affero licensed code as a mere building block, to be licensed under Affero GPL as well. See https://www.gnu.org/licenses/agpl-3.0.en.html. This means that the resulting work must also be licensed for free and the larger work's code be made available to others. As a result, many developers simply ban the use of code licensed under Affero GPL (and some other copyleft licenses) in their development projects. Concerningly, as ChatGPT admits, "it's possible that the resulting code may contain elements or snippets of code that were derived from or inspired by code that is licensed under the A[ffero] GPL [license]."

However, this issue is more fundamental than merely the type of OSS licenses used (i.e., copyleft or permissive). Even the MIT license - often considered the most user-friendly and least restrictive OSS license - can be violated. Specifically, even the MIT license requires that the "copyright notice and this permission notice ... be included in all copies or substantial portions of the Software." https://opensource.org/license/mit/. Accordingly, if generative AI cannot cite its sources, all of the code it generates is potentially suspect from a licensing and copyright infringement point of view.

This is problematic, but one can theoretically instruct a generative AI model to use only source code licensed under MIT or Apache (Apache being another popular permissive license). When prompted with this instruction, ChatGPT responds by stating that, "I can generate code using only Open Source Software Code that is licensed under the MIT or Apache license." Yet, this seems to contradict other answers provided by ChatGPT regarding its function, as it stated that "I do not have access to licensing information" in response to an earlier query. ChatGPT certainly cannot follow instructions to use only MIT or Apache code if it does not have access to licensing information.

These interactions make two things clear: 1) prompt engineering is going to be a skill in the same way that academic and legal research is a skill and 2) ChatGPT is somewhat like a puppy - it is eager to please and give you answers, but it does not truly "understand" context or implications. Caveat prompt.

Finally, generative AI models do not test code as they are not software development environments. Accordingly, there are no assurances that the code generated by these tools is free of common exploits, backdoors, trojans or the like. Indeed, because these models are based on training data, and so any security issues in existing open-source code will now be baked into the new code created by AI, as the AI does not "understand" anything. Additionally, there is now an incentive to surreptitiously provide AIs with malicious training data in hopes of "training" the software to insert malicious code into later output. The fate of an earlier attempt at a conversational language model by a major technology corporation is ample evidence of this modal risk. See https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist. The bottom line is that any AI authored code should be rigorously examined and tested for such concerns.

Generative AI has unlocked a whole new world for software developers, but it should be seen as a tool to improve, not a replacement for traditional software development.

Aaron M. Levine is a shareholder, and Annika P. Pollard is an associate at Polsinelli PC.

#372853

For reprint rights or to order a copy of your photo:

Email Jeremy_Ellis@dailyjournal.com for prices.
Direct dial: 213-229-5424

Send a letter to the editor:

Email: letters@dailyjournal.com