OpenAI’s Mira Murati is “not sure” where Sora’s training data comes from

The data source of OpenAI’s upcoming video-generating artificial intelligence model, Sora, is unclear to the company’s chief technology officer, Mira Murati.

During an interview with The Wall Street Journal published on March 13, Murati offered vague responses when asked about the source of data for the company’s Sora model, which is capable of generating videos from text instructions.

“We used publicly available data and licensed data,” replied Murati about how the company valued at $80 billion was training its upcoming model.

Joanna Stern, from the Journal, then asked whether Sora was trained with data from social media platforms, such as YouTube, Instagram, or Facebook. “I’m actually not sure about that,” Murati replied, adding:

“You know, if they were publicly available — publicly available to use. But I’m not sure. I’m not confident about it.”

Before moving to another topic, Stern mentioned OpenAI’s partnership with stock image company Shutterstock, asking if its data could be used to train Sora. “I’m just not going to go into detail about the data that was used. But it was publicly available or licensed data,” Murati added. Later, she confirmed to the Journal that Shutterstock data was used for Sora.

AI models are trained using large sets of data, known as training data sets, which help the model learn to recognize patterns, make predictions, or understand language.

OpenAI's CTO Mira Murati during interview with The Wall Street Journal. Source: WSJ

Murati has been at OpenAI since 2018, leading some of the company’s most popular projects, including the image-generator model DALL-E 3, the speech-recognition tool Whisper and the latest version of the company’s chatbot GPT-4. In November 2023, she briefly took over as interim CEO after OpenAI’s board ousted Sam Altman.

OpenAI has been targeted by several legal actions involving its AI models’ training data. In July 2023, authors Sarah Silverman, Richard Kadrey, and Christopher Golden filed a lawsuit against the company, alleging that ChatGPT generates summaries of the authors’ works based on copyrighted content.

In December, The New York Times sued Microsoft and OpenAI in a similar copyright infringement complaint that alleges the companies used the newspaper’s content to train AI chatbots. A different class-action lawsuit was filed in California, alleging that OpenAI scraped private user information from the internet to train ChatGPT without user consent.

Recently Searched

Hot Coins

Trending

Daily Must-Read

Welcome Back

Join CoinTime

Sign in with email

Sign up with email

Check your inbox

OpenAI’s Mira Murati is “not sure” where Sora’s training data comes from

All Comments

Recommended for you

BTC Surpasses $63,000

Riot Platforms Deposits 500 BTC with NYDIG, Possible Sale Ahead

U.S. Treasury Yields Continue to Rise, 30-Year Yield Exceeds 5% Again

Spot Gold Falls Below $4,130

Bitcoin ETF Records Net Inflow for the First Time in a Month

Today's Bitcoin ETF Net Inflow in the U.S. Reaches 3,774 BTC, Ethereum ETF Net Inflow at 498 ETH

Goldman Sachs: Global Humanoid Robot Market to Grow from 20,000 Units in 2025 to 1.4 Million by 2035

President Trump: I Am a Big Fan of Cryptocurrency

President Trump: Trump's Account May Contain Bitcoin

Trump Family-Linked Entity Acquires 500 BTC

Daily Must-Read

From User Journey to Consumption Loop: Understanding UniKey’s Product Logic and the Three Core Use Cases of KEY

From AI Router to Agent Economy Network: How UniKey Builds the Next Generation of AI Infrastructure

SuperStrike: As AI Takes Over Financial Decision-Making, A New Era of Wealth Creation Begins

The largest IPO in history! Panoramic analysis of SpaceX's IPO: trillion dollar valuation, track change, and Musk's trillion dollar net worth

The 90 trillion yuan track is officially legalized: the United States opens up perpetual contracts on the blockchain, and encrypted finance completely rewrites the global trading landscape

PayStill Enters the Pure PAYS Era: A Value Aggregator Embarking on a New Growth Cycle

Popular Activities

RaveDAO at Terra Solis by Tomorrowland: A Female-Led Techno Night Where Web3 Culture Converges

Popular Tags

Share