Microsoft AI CEO Says AI Use of Web Content Doesn’t Require Permission

"Public content is free for AI to use," he declares.

By Kayne Andersen - Technology Editor
Mustafa Suleyman google
Mustafa Suleyman @ Google. Photo by Patrick T. Fallon / AFP

OpenAI’s CTO recently said that AI shouldn’t have replaced some creative professions in the first place. These comments sparked outrage, especially among professionals in the field. The situation is likely to worsen with new AI video generation tools such as Sora or Runway’s Gen-3, which draw from the opaque pool that is the web for their training. This topic has been at the heart of many debates since the emergence of generative AI, particularly concerning copyright. While rights defenders fight for an agreement, Microsoft AI’s CEO doesn’t seem concerned. He asserts that artificial intelligence can train on freely accessible web content without requiring prior agreement from creators.

- Advertisement -

In an interview with American journalist Andrew Ross Sorkin (CNBC) at the Aspen “Ideas” festival, Mustafa Suleyman, CEO of Microsoft AI since March, gave his personal definition of intellectual property on the web. Responding to the journalist’s question about whether or not AI companies were stealing content from the web to train their large language models, Suleyman said: “I think when it comes to content that’s already on the open web, the social contract of that content since the ’90s has been fair use.” He added, “Everyone can copy it, recreate it, and reproduce it. That’s what we call ‘freeware,’ if you will, and that’s what we’ve understood.”

Fair use or theft? OpenAI in the crosshairs of justice

Microsoft AI thus believes that content published and freely accessible online belongs to everyone and can be used by LLMs. However, this is far from the case. Indeed, a court grants fair use, a legal defense that allows the use of copyrighted critiques, reviews, research, or articles. However, this implies that the court assesses whether the copied content harms the copyright holder. Nevertheless, what AI models do goes beyond this condition. Particularly, given the vast amount of content these models process daily, it’s difficult to determine the exact extent to which each task contributes to the algorithms in question.

Suleyman does acknowledge that there is an exception to the rule, which he calls “the gray area,” which requires evaluation by the courts. By gray area, Suleyman describes a distinct category of companies and press organizations explicitly stating that they refuse indexing of their content by search engines, notably. “We then find ourselves in a gray area on which the courts will have to rule,” says Suleyman.

In any case, all content or creation published on the web remains in principle protected by copyright, whether in France, the United States, or any other country. Moreover, it is precisely for the violation of this essential right that OpenAI and Microsoft are facing several lawsuits today, starting with that of the New York Times, filed in December 2023. Alden Global Capital followed with a series of lawsuits last May.

In parallel, OpenAI has signed agreements with publishers and press groups such as Le Monde, Axel Springer, Financial Times, and News Corp. to use their content in exchange for remuneration. Is the company implicitly acknowledging, with this criticized gesture, that sites whose content is accessible to all should also require payment?

- Advertisement -