Our Privacy Statement & Cookie Policy

By continuing to browse our site you agree to our use of cookies, revised Privacy Policy and Terms of Use. You can change your cookie settings through your browser.

I agree

OpenAI Sora: How is it and where will it go?

Gong Zhe

 , Updated 12:57, 22-Feb-2024

OpenAI has stunned the world with Sora, its newest AI innovation that generates realistic videos from simple text descriptions.

For those unfamiliar with the wonders of AI-generated content, it's important to clarify that Sora doesn't simply stitch together multiple images. It creates dynamic video sequences with several key advantages over existing models.

Unlike other models limited to seconds, Sora generates videos up to a minute long. It goes beyond static shots with pan shots, close-ups and wide shots. What's more, objects and backgrounds maintain consistency throughout the video, avoiding jarring inconsistencies like hands with fluctuating numbers of fingers. This surpasses the capabilities of many community-driven projects.

Despite these impressive feats, Sora isn't flawless. While the generated environments look real, text elements like shop signs often lack meaning. They display nonsensical characters instead of accurate language. The first demo video on Sora's website – with a woman walking down a street – is a clear example of that.

Though adept at detail, Sora can still make mistakes. The crowd's feet in the street video appeared deformed.

However, these hiccups shouldn't overshadow Sora's potential. Models like this lay the groundwork for real-time video generation. Imagine computers creating video based on live input, revolutionizing fields like video games and entertainment.

To achieve this dream, significant computing power is required. Generating a second of video requires at least a dozen frames, but current text-to-image models take seconds to process just one frame with the best consumer PC hardware. This could translate to a tenfold increase in computing needs, creating a vast new market for hardware providers.

In conclusion, text-to-video models like Sora have crossed a critical threshold, becoming truly usable with exciting potential. Despite facing technical and moral hurdles, they're poised to propel the already booming AI market to new heights.

(Cover via CFP.)

Search Trends