Cointime

Download App
iOS & Android

OpenAI’s Mira Murati is “not sure” where Sora’s training data comes from

The data source of OpenAI’s upcoming video-generating artificial intelligence model, Sora, is unclear to the company’s chief technology officer, Mira Murati.

During an interview with The Wall Street Journal published on March 13, Murati offered vague responses when asked about the source of data for the company’s Sora model, which is capable of generating videos from text instructions.

“We used publicly available data and licensed data,” replied Murati about how the company valued at $80 billion was training its upcoming model.

Joanna Stern, from the Journal, then asked whether Sora was trained with data from social media platforms, such as YouTube, Instagram, or Facebook. “I’m actually not sure about that,” Murati replied, adding:

“You know, if they were publicly available — publicly available to use. But I’m not sure. I’m not confident about it.”

Before moving to another topic, Stern mentioned OpenAI’s partnership with stock image company Shutterstock, asking if its data could be used to train Sora. “I’m just not going to go into detail about the data that was used. But it was publicly available or licensed data,” Murati added. Later, she confirmed to the Journal that Shutterstock data was used for Sora.

AI models are trained using large sets of data, known as training data sets, which help the model learn to recognize patterns, make predictions, or understand language.

OpenAI's CTO Mira Murati during interview with The Wall Street Journal. Source: WSJ

Murati has been at OpenAI since 2018, leading some of the company’s most popular projects, including the image-generator model DALL-E 3, the speech-recognition tool Whisper and the latest version of the company’s chatbot GPT-4. In November 2023, she briefly took over as interim CEO after OpenAI’s board ousted Sam Altman.

OpenAI has been targeted by several legal actions involving its AI models’ training data. In July 2023, authors Sarah Silverman, Richard Kadrey, and Christopher Golden filed a lawsuit against the company, alleging that ChatGPT generates summaries of the authors’ works based on copyrighted content.

In December, The New York Times sued Microsoft and OpenAI in a similar copyright infringement complaint that alleges the companies used the newspaper’s content to train AI chatbots. A different class-action lawsuit was filed in California, alleging that OpenAI scraped private user information from the internet to train ChatGPT without user consent.

Comments

All Comments

Recommended for you

  • Cross border payments: Citi spotlights DLT. FSB reports on G20 progress

    During the past week Citi and the Financial Stability Board (FSB) have published separate reports on cross border payments. Both papers explore G20 targets for improving cross border payments. Unfortunately, 2024 speed figures were worse than 2023 for wholesale, retail and remittance payments, largely reflecting more accurate data. Citi’s report highlights that fintechs expect to gain 10% in market share from banks in the next two to five years. Its survey shows that DLT is one tool to address the pain points, but more in the medium term.

  • Stablecoin agents will onboard billions of Africans to the global economy

    Think back to your first time using crypto. How would you review your experience? If you were an early adopter, maybe you found the experience confusing and jargon-heavy, the irreversible nature of transactions deathly terrifying, or even all of the above. With that thought in mind, would it surprise you to learn that the reviews above are from everyday Africans using crypto?

  • Permissionless bridging is now live

    Uniswap users can now bridge across nine networks directly from the Uniswap Interface and Uniswap Wallet. In-app bridging is one of our most requested features, and we’re excited to roll it out as a step toward our long-term vision of cross-chain swaps.

  • SUI breaks above $2

    market shows SUI breaking through $2 and currently trading at $2.01, with a 24-hour increase of 7.49%. The market is volatile, so please manage your risks.

  • ERC-7777: Proposal for Human Robot Societies

    Alliance is the leading crypto accelerator and founder community. We receive over 3,000 applications every year from the best crypto startups looking to join our accelerator program. Our accelerator program runs twice a year and thus we report our data in halves.

  • AI DAOs, and Three Paths to Get There

    DAOs have arrived. When artificial intelligence (AI) gets added to the mix, the results are explosive. I was about ten years old when I stumbled across a book on artificial intelligence (AI). It riveted me. The profound implications sunk in. I’ve been passionate about ever it since. I spent nearly twenty years as a professional AI researcher. Three years ago, I stepped out of the AI world to work on something just as important: decentralization. These worlds will collide, soon.

  • Web3 data and AI company Validation Cloud completes $10 million in new round of financing

     Web3 data and AI company Validation Cloud announced a $10 million financing round from True Global Ventures. The company plans to use the funds to expand its AI products and achieve seamless access to Web3 data.

  • STRK falls below $0.4

     market showed that STRK fell below $0.4 and is now reporting at $0.39, with a 24-hour decline of 4.88%. The market is fluctuating greatly, so please be prepared for risk control.

  • BTC falls below $67,000

    market shows BTC has fallen below $67,000, currently reporting at $66,987.51, with a 24-hour increase of 0.41%. The market is experiencing significant fluctuations, please be prepared for risk control.

  • BTC breaks through $67,000

    the market shows BTC has broken through $67,000 and is currently trading at $67,011.99, with a 24-hour decline of 0.26%. The market is volatile, so please be prepared to manage risks.