Promising new tech has 'staggeringly difficult' copyright problem: expert

January 11, 2024 at 02:00 AM EST

Artificial intelligence platforms can be prompted to violate copyright and generate works nearly identical to protected content, opening up new legal questions.

Popular artificial intelligence platforms can at times return results that plagiarize copyrighted content, a problem that may not have an easy solution.

"This is a staggeringly difficult problem to solve as current AI cannot look at a blank canvas and create something original," Christopher Alexander, the chief analytics officer of Pioneer Development Group, told Fox News Digital. "Instead, generative AI draws from existing imagery and then follows a human being’s prompts. The only real possibility to combat this is to use other AI capabilities to counter plagiarism and automate [copyright] strikes for accounts that are repeat offenders."

Alexander's comments come after an IEEE Spectrum magazine report laid out the continued problems of AI plagiarism, both in popular language learning model (LLM) platforms such as ChatGPT and imagery platforms such as Midjourney V6.

AI DEVELOPMENT EXPECTED TO ‘EXPLODE' IN 2024, EXPERTS SAY

ChatGPT's apparent issues with plagiarism have prompted a lawsuit against the platform's parent company, OpenAI, from the New York Times, which accuses the company of using copyrighted works.

In one example in the lawsuit, the New York Times shows a screenshot of a ChatGPT article that copies a New York Times story nearly verbatim, opening up legal questions about how platforms can be held responsible for violating copyright restrictions.

OpenAI did not respond to a Fox News Digital request for comment by time of publication.

"The core issue here is that these LLMs scraped data without any care for ownership in the first place," Aiden Buzzetti, the president of the Bull Moose Project, told Fox News Digital. "Yes, it's a travesty that people may be violating copyright of works they've never heard of before. The burden rests on the companies that have scraped data without paying royalties, lip service or developing any kind of safeguards on what material could put them in a legal liability."

ECONOMIST WARNS NEW TECH COULD MAKE WIDE RANGE OF HIGH-SKILLED JOBS ‘OBSOLETE’

But LLMs are not the only AI platforms suffering from plagiarism, with IEEE Spectrum pointing out that image-generating platforms such as Midjourney V6 also return plagiarized results.

Using Midjourney V6, the authors of the report were able to replicate nearly identical images of popular movies such as "The Avengers." In another example, the authors were able to get the platform to return nearly identical images of the TV show "The Simpsons" by prompting the AI with "popular '90s animated cartoon with yellow skin – v 6.0 --ar 16:9 --style raw."

"In light of these results, it seems all but certain that Midjourney V6 has been trained on copyrighted materials (whether or not they have been licensed, we do not know) and that their tools could be used to create outputs that infringe," the authors note.

Midjourney V6 did not immediately respond to a Fox News Digital request for comment.

NEW TECH PROMISES TO IMPROVE TRAFFIC FLOW IN MAJOR CITIES, EXPERTS SAY

Phil Siegel, the founder of the Center for Advanced Preparedness and Threat Response Simulation, told Fox News Digital that plagiarism becomes more likely with more detailed prompts, thanks to the AI having fewer words to choose from for a response.

"While it is unlikely that a simple prompt would cause plagiarism, it is much more likely that a crisp, targeted prompt might. The way to think about it is a question like ‘Find me a fun vacation’ has lots of training data to draw on and is probably less likely to plagiarize," Siegel said. "Asking a specific question like "'Find me the funnest water sports for a vacation in Aruba' might be more likely to copy responses because fewer words have a chance to be chosen in the response."

Meanwhile, Samuel Mangold-Lenett, a staff editor at The Federalist, said there is a way to fix the issue but noted doing so would be likely to hinder development.

"Generative AI and LLMs operate within a secretive ecosystem known as 'black box.' Developers often disregard intellectual property by immersing their systems in these environments in order to saturate and fortify them with data," Mangold-Lenett told Fox News Digital. "This can be 'solved' by forcing transparency, but if this is done, it could slow AI development."

Promising new tech has 'staggeringly difficult' copyright problem: expert

Sections

Services

Contact Information

Follow Us