Sora: OpenAI’s text-to-video model

Written by Romina Makrooni

Romina Makrooni received her bachelor's degree in English literature in 2020. She spent a few years teaching English as a second language while in university and also after graduating. By joining Holoflow, she has found her true passion in modern media technology, in the world of the metaverse and volumetric capture.
v

0

February 26, 2024

OpenAI’s jaw-dropping text-to-video model, Sora, has caused a real tumult in tech and media since the release of its research progress a few days ago. Sora turns verbal prompts into outstanding, life-like one minute long videos. It can “generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.” 

Sora has a vast comprehension of language. Therefore, it can follow up prompts coherently and understand the request accurately. Sora makes mistakes too which I will mention shortly. However, for the most part, it seems to understand how things work in the physical world and hallucinates highly realistic simulations in superb visual quality. The morphing characteristic of previous AI systems is no longer a problem with Sora, since it generates an entire video at once, rather than creating it frame by frame. Sora can also animate a still image and generate a video from it. 

At this stage, Sora is not 100% perfect. OpenAI shared flawless example videos alongside those that were a failure. Sora has problems understanding some logical concepts such as cause and effect. For example, a person might want to blow out the candles on a birthday cake with a puff, but the candles will remain lit and the light won’t even shiver as a result of the puff. Other than that, Sora may confuse spatial right and left positions in a prompt. It may struggle with “precise descriptions of events that take place over time, like following a specific camera trajectory.” It is also safe to mention that generating these AI videos requires huge computing power. This means Sora needs considerable hardware power even on the server’s side to be used simultaneously by thousands of users.

Sora AI’s Problems and Solutions

There’s no clear information on when Sora will be available to the public. OpenAI is working with red teamers regarding safety concerns such as efforts to reduce misinformation, violent and hateful content, and bias. For AI-generated videos to be distinguishable from others, OpenAI is going to include C2PA metadata. However, metadata is tricky and can easily be changed. The truth is with all the safety and protection efforts, there will still be ways to abuse this product. Hopefully, the ways that it can be beneficial would be much more.

As of now, OpenAI has granted access to Sora for selected visual artists, designers, and filmmakers in the hopes that their feedback will make the final product suitable for professionals in these fields. It is no secret that this text-to-video model has caused some triggering discussions among the community of video artists and moviemakers. However, I believe, similar to ChatGPT or Midjourney, Sora is just a productive tool to aid professionals in their related fields. 

I want to refer to science fiction writer Arthur C. Clarke, the famous author of “2001: A Space Odyssey”. Back in 1962, in his book “Profiles of the Future: An Inquiry into the Limits of the Possible”, Clarke formulated his famous Three Laws, of which the third law is the best-known and most widely cited: “Any sufficiently advanced technology is indistinguishable from magic”. 

Undoubtedly, there’s a vast and unpredictable future ahead of us with AI. How we perceive and utilize it is what matters. It could either be a friend or an enemy. The choice is ours to make.

You may also like…

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *