December 23, 2024

Brighton Journal

Complete News World

Your online content is free material for training modules • Record

Your online content is free material for training modules • Record

Mustafa Soliman, CEO of Microsoft AI, said this week that machine learning companies can extract most content posted online and use it to train neural networks because they are essentially “free software.”

Shortly after, the Center for Investigative Reporting was established. File a lawsuit against OpenAI Its largest investor, Microsoft, “for using the content of the non-profit news organization without permission or providing compensation.”

This follows in the footsteps of eight newspapers Filed a lawsuit against OpenAI and Microsoft about alleged content misappropriation in April, as The New York Times did four months earlier.

Then there are the two authors who Lawsuit filed against OpenAI and Microsoft In January, they claimed they trained AI models on authors’ works without permission. Also, in 2022, several anonymous developers filed a lawsuit against OpenAI and GitHub based on allegations that the organizations used publicly published programming code to train generative models in violation of the terms of their software license.

Asked in Interview With CNBC’s Andrew Ross Sorkin at the Aspen Ideas Festival on whether AI companies are effectively stealing the world’s intellectual property, Solomon acknowledged the controversy and tried to distinguish between content people put online and content sponsored by corporate copyright holders.

“I think in terms of content that already exists on the open web, the social contract for that content since the 1990s has been fair use,” he said. “Anyone could copy it, recreate it, reproduce it with it. It was free software, if you wanted to. That was the understanding.”

Suleiman pointed out that there is another category of content, which is material published by companies that have lawyers.

“There’s a separate category where a website, publisher or news organization has explicitly said, ‘Don’t delete or crawl me for any reason other than to index me, so others can find this content,'” he explained. “But that’s a gray area. I think this will make its way through the courts.”

That’s an understatement. While Suleiman’s comments seem certain to anger content creators, he’s not entirely wrong — it’s not clear where the legal lines lie when it comes to training AI models and the models’ output.

Most people who post content online as individuals have violated their rights in some way by accepting terms of service agreements offered by major social media platforms. Reddit’s decision to license its users’ posts to OpenAI wouldn’t happen if the social media giant believed its users had a legitimate right to the memes and data it spreads.

The fact that OpenAI and other companies that make AI models are striking content deals with major publishers shows that a strong brand, enough money, and a legal team can bring big tech operations to the table.

In other words, those who create and publish content online are making free software unless they retain, or can attract, lawyers willing to challenge Microsoft and its ilk.

in paper In a study published by SSRN last month, Frank Pasquale, a professor of law at Cornell Technology and Cornell Law School in the US, and Hao-chen Sun, an associate professor of law at the University of Hong Kong, explore the legal uncertainty surrounding the use of copyrighted data to train AI and whether courts will find such use fair. They conclude that AI must be addressed at the policy level, because current laws are inadequate to answer the questions that now need to be addressed.

“Because there is significant uncertainty about the legality of AI providers’ use of copyrighted works, lawmakers will need to articulate a bold new vision for rebalancing rights and responsibilities, just as they did in the wake of the development of the Internet (leading to the Digital Millennium Copyright Act and publication in 1998), as they claim.

The authors point out that the continued unpaid harvesting of creative work threatens not only writers, composers, journalists, actors, and other creative professionals, but also generative AI itself, which will eventually be deprived of training data. They predict that people will stop making work available online if they become accustomed to running AI models that reduce the marginal cost of content creation to zero and deprive creators of the possibility of any compensation.

This is the future that Suleiman foresees. “The economics of information are about to change radically because we can reduce the cost of producing knowledge to zero in terms of marginal cost,” he said.

All that free software you probably helped create can be yours for a small monthly subscription fee. ®