Page content is not accessible due to JavaScript and cookie requirements, preventing substantive evaluation of the Sora 2 product announcement against UDHR provisions. Observable structural signals show mild negative directionality toward digital access and privacy rights through access barriers and data collection requirements.
The example prompt "intense anime battle between a boy with a sword made of blue fire and an evil demon demon" is super clearly just replicating Blue Exorcist https://en.m.wikipedia.org/wiki/Blue_Exorcist
OpenAI apparently assumes that the primary users of Sora 2/the Sora app will be Gen Z, especially with the demo examples shown in the livestream. If they are trying to pull users from TikTok with this, it won't work: there's some nuance to Gen Z interests than being quirky and random, and if they did indeed pull users from TikTok then ByteDance could easily include their own image/video generators.
Sora 2 itself as a video model doesn't seem better than Veo 3/Kling 2.5/Wan 2.2, and the primary touted feature of having a consistent character can be sufficiently emulated in those models with an input image.
The most interesting thing by far is the ability to include video clips of people and products as a part of the prompt and then create a realistic video with that metadata. On the technical side, I'm guessing they've just trained the model to conditionally generate videos based on predetermined characters -- it's likely more of a data innovation than anything architectural. However, as a user, the feature is very cool and will likely make Sora 2 very useful commercially.
However, I still don't see how OpenAI beats Google in video generation. As this was likely a data innovation, Google can replicate and improve this with their ownership of YouTube. I'd be surprised if they didn't already have something like this internally.
It's obvious there is no way OpenAI can keep videos generated by this within their ecosystem. Everything will be fake, nothing real. We are going to have to change the way we interact with video. While it's obviously possible to fake videos today, it takes work by the creator and takes skill. Now it will take no skill so the obvious consequence of this is we can't believe anything we see.
The worst part is we are already seeing bad actors saying 'I didn't say that' or 'I didn't do that, it was a deep fake'. Now you will be able to say anything in real life and use AI for plausible deniability.
Anyone with access able to confirm if you can start this with a still image and a prompt?
The recent Google Veo 3 paper "Video models are zero-shot learners and reasoners" made a fascinating argument for video generation models as multi-purpose computer vision tools in the same way that LLMs are multi-purpose NLP tools. https://video-zero-shot.github.io/
It includes a bunch of interesting prompting examples in the appendix, it would be interesting to see how those work against Sora 2.
Impressively high level of continuity. The only errors I could really call out are:
1/ 0m23s: The moon polo players begin with the red coat rider putting on a pair of gloves, but they are not wearing gloves in the left-vs-right charge-down.
2/ 1m05s: The dragon flies up the coast with the cliffs on one side, but then the close-up has the direction of flight reversed. Also, the person speaking seemingly has their back to the direction of flight. (And a stripy instead of plain shirt and a harness that wasn’t visible before.)
3/ 1m45s: The ducks aren't taking the right hand corner into the straightaway. They are heading into the wall.
I do wonder what the workflow will be for fixing any more challenging continuity errors.
State of the things with doom scrolling was already bad, add to it layoffs and replacing people with AI (just admit it, interns are struggling competing with Claude Code, Cursor and Codex)
What's coming next? Bunch of people, with lots of free time watching non-sense AI generated content?
I am genuinely curious, because I was and still excited about AI, until I saw how doom scrolling is getting worse
The main lesson I learned from the March ChatGPT image generation launch - which signed up 100 million new users in the first week - is that people love being able to generate images of their friends and family (and pets).
I expect the "cameo" feature is an attempt at capturing that viral magic a second time.
One use that occurred to me is that fans will be able to "fix" some movies that dropped the ball.
For example, I saw a lot of people criticizing "Wish" (2023, Disney) for being a good movie in the first half, and totally dropping the ball in the last half. I haven't seen it yet, but I'm wondering if fans will be able to evolve the source material in the future to get the best possible version of it.
Maybe we will even get a good closure for Lost (2004)!
(I'm ignoring copyright aspects, of course, because those are too boring :D)
I wonder if they're going to license this to brands for heavily personalized advertisement. Imagine being able to see videos of yourself wearing clothes you're buying online before you actually place the order, instead of viewing them on a model.
If they got the generation "live" enough, imagine walking past a mirror in a department store and seeing yourself in different clothes.
Sheeeeeeeeeeesh. That was so impressive. I had to go back to the start and confirm it said "Everything you're about to see is Sora 2" when I saw Sam do that intro. I thought there was a prologue that was native film before getting to the generated content.
I haven't seen comments regarding a big factor here:
It seems like OpenAI is trying to turn Sora into a social network - TikTok but AI.
The webapp is heavily geared towards consumption, with a feed as the entry point, liking and commenting for posts, and user profiles having a prominent role.
The creation aspect seems about as important as on Instagram, TikTok etc - easily available, but not the primary focus.
Generated videos are very short, with minimal controls. The only selectable option is picking between landscape and portrait mode.
There is no mention or attempt to move towards long form videos, storylines, advanced editing/controls/etc, like others in this space (eg Google Flow).
Seems like they want to turn this into AITok.
Edit: regarding accurate physics ... check out these two videos below...
To be fair, Veo fails miserably with those prompts also.
Really impressive engineering work. The videos have gotten good enough that they can grab your attention and trigger a strong uncanny valley feeling.
I think OpenAI is actually doing a great job at easing people into these new technologies. It's not such a huge leap in capabilities that it's shocking, and it helps people acclimate for what's coming. This version is still limited but you can tell that in another generation or two it's going to break through some major capabilities threshold.
To give a comparison: in the LLM model space, the big capabilities threshold event for me came with the release of Gemini 2.5 Pro. The models before that were good in various ways, but that was the first model that felt truly magical.
From a creative perspective, it would be ideal if you could first generate a fixed set of assets, locations, and objects, which are then combined and used to bring multiple scenes to life while providing stronger continuity guarantees.
I've seen a lot of "this is impressive" but I'm not really seeing it. This looks to suffer from all the same continuity problems other AI videos suffer from.
What am I looking at that's super technically impressive here? The clips look nice, but from one cut to the next there's a lot of obvious differences (usually in the background, sometimes in the foreground).
They're really playing loose with copyright: you have to actively opt out for them to not use your IP in the generated videos [1]
Tangentially related: it's wild to me that people heading such consequential projects have so little life experience. It's all exuberance and shiny things, zero consideration of the impacts and consequences. First Meta with "Vibes", now this.
I can't help but see these technologies and think of Jeff Goldblum in Jurassic Park.
My boss sends me complete AI Workslop made with these tools and he goes "Look how wild this is! This is the future" or sends me a youtube video with less than a thousand views of a guy who created UGC with Telegram and point and click tools.
I don't ever think he ever takes a beat, looks at the end product, and asks himself, "who is this for? Who even wants this?", and that's aside from the fact that I still think there are so many obvious tells with this content that make you know right away that it is AI.
I just asked GPT 5 to generate an image of as person. I then asked it to charge the color of their shirt. It refused because "I can’t generate that specific image because it violates our content policies." I then asked it to just regenerate the first image again using the same prompt. It replied "I know this has been frustrating. You’ve been really clear about what you want, and it feels like I’m blocking you for no reason. What’s happening on my side is that the image tool I was using to make the pictures you liked has been disabled, so even if I write the prompt exactly the way you want, I can’t actually send it off to generate a new image right now."
If I start a new chat it works.
I'm a Plus subscriber and didn't hit rate limits.
This video gen tool will probably be even more useless.
They can't even be consistent within their own launch video. Consistency is by far the biggest issue with generative AI. How can a professional studio work with scenes which has continuity errors on every single shot? And if it's not targeting professionals, who is it for?
one of the example prompts is literally:
Prompt: in the style of a studio ghibli anime, a boy and his dog run up a grassy scenic mountain with gorgeous clouds, overlooking a village in the distant background
I’m wondering how they really prevent uploads of other peoples faces if they take a clip of a video of another person. I’m sure Apple didn’t open up the 3d Face ID scanning to them to verify
Today's Sora can produce something that resembles reality from a distance, but if you look closely, especially if there's another perspective or the scene is atypical, the flaws are obvious.
Perhaps tomorrow's Sora will overcome the the "final 10%" and maintain undetectable consistency of objects in 2 perspectives. But that would require a spatial awareness and consistency that models still have a lot of trouble with.
>What's coming next? Bunch of people, with lots of free time watching non-sense AI generated content?
Wasn't this always the outcome of the post labor economy?
For this discussion lets just say that AI+Robots could replace most human labor and thinking. What do people do? Entertainment is going to be the number one time consumer.
It's also possible we remain stuck in the uncanny valley forever, or at least for the rest of our lives.
It's possible to produce some video or image that looks real, cherry-picked for a demo, but not possible to produce any arbitrary one you want that will end up passable.
> the ability to include video clips of people and products as a part of the prompt and then create a realistic video with that
This is something I would not like to see, I prefer product videos to be real, I am taking a risk with my money. If the product has hallucinated or unrealistic depiction it would be a kind of fraud.
Or just going to the Goofs section of a movie on IMDB, and fix the trivial issues (e.g. car had cracked window in earlier scene, and suddenly a normal window in another scene).
>Everything will be fake, nothing real. We are going to have to change the way we interact with video.
I'm optimistic here.
Look at 1900s tech like social security number/card, and paper birth certificates. Our world is changing and new systems of verification will be needed.
I see this as either terribly dystopian - or - a possibility for the mass expansion of cryptography and encrypted/signed communication. Ideally in privacy preserving ways because nothing else will make as much sense when it comes to the verification that countries will need to give each other even if they want backdoor registry BS for the common man.
It's not that obvious. iOS is pretty secure, if they keep the social network and cameo feature limited to that there might not be good ways to export videos off the platform onto others beyond pointing a camera at the tablet screen. And beyond there being lots of ways to watermark stuff to be detectable, nothing stops the device using its own camera to try and spot if it's being recorded. The bar can be raised quite high as long as you're willing to exclude any device that isn't an iPhone/iPad.
its called Virtual Try On (VTO) and there are plenty of models going there for static gfx, it is very reasonable to expect soon emerge those for video VTO.
Not sure if it counts as a continuity error, but in the example "Prompt: Martial artist doing a bo-staff kata waist-deep in a koi pond", his wooden staff changes shape several times, resembling a bow at points. That was the first example I noticed as "clearly AI."
I'm sorry but that's a gross exageration. If any of this was real film then I'd start a gofundme page for OpenAI to get better video production equipment and team because that would be laughably bad.
If anything, it looks a lot worse than a lot of AI-generated videos I've seen in the past, despite being a tech demo with carefully curated shots. Veo 3 just blows this out of the water for example.
Are users of the $20 tier really going to have to deal with that obnoxious bouncing watermark I wonder? The previous watermark could be cropped, but I often didn't feel the need to as I use it for fun, but that would make me not want to show anyone.
Good point. I think OpenAI lacks the cultural understanding that tiktok is providing their users not only with entertainment but also social things like trends, reviews, gossip, self-expression. These aspects are not included in the sora experience.
My issue is that the copyright aspect are what prevents me from using this as much as I otherwise would.
About 6 months ago I asked a few different AIs if they could translate a song for me as a learning experience, meaning not a simple translation, but more a word by word explanation of what each word meant, how it was conjugated, any more musical/lyrical only uses that aren't common outside of songs, and so on. I was consistently refused on copyright grounds, despite this seeming a fair use given the educational nature. If I pasted a line of the lyrics at a time, it would work initially, but eventually I would need to start a new chat because the AI determined I translated too much at once.
So in this one, if I wanted to ask it to create a video of the moment in Final Fantasy 6 when the bad guy wins, or a video of the main characters of Final Fantasy 7 and 8 having a sword duel, would it outright refuse for copyright reasons?
It sounds like it would block me, which makes me lose a bit of interest in the technology. I could try to get around it, but at what point might that lead to my account being flagged as a trouble maker trying to bypass 'safety' features. I'm hoping in a few years the copyright fights on AI dies down and we get more fair use allowance instead of the tighter limitations to try to prevent calls for tighter regulation.
Many people are just playing with images and the distinctive styles that Midjourney (the model) seems to have developed. It's also trained by ratings and people's interactions.
When you make images you can dial down the "aesthetic".
That, and the dragon looking straight out of How to Train Your Dragon - I wonder if they have agreements with the right holders, or if they expect massive lawsuits to create free advertising for their launch.
Editorial Channel
What the content says
ND
PreamblePreamble
Content not observable
Observable Facts
Page displays loading message: 'Just a moment...Enable JavaScript and cookies to continue'
ND
Article 1Freedom, Equality, Brotherhood
Content not observable
ND
Article 2Non-Discrimination
Content not observable
ND
Article 3Life, Liberty, Security
Low Practice
Content not observable
Observable Facts
Page loading message explicitly requires cookies to continue
Inferences
Cookie requirement suggests user behavior tracking or personal data collection
ND
Article 4No Slavery
Content not observable
ND
Article 5No Torture
Content not observable
ND
Article 6Legal Personhood
Content not observable
ND
Article 7Equality Before Law
Content not observable
ND
Article 8Right to Remedy
Content not observable
ND
Article 9No Arbitrary Detention
Content not observable
ND
Article 10Fair Hearing
Content not observable
ND
Article 11Presumption of Innocence
Content not observable
ND
Article 12Privacy
Content not observable
ND
Article 13Freedom of Movement
Content not observable
ND
Article 14Asylum
Content not observable
ND
Article 15Nationality
Content not observable
ND
Article 16Marriage & Family
Content not observable
ND
Article 17Property
Content not observable
ND
Article 18Freedom of Thought
Content not observable
ND
Article 19Freedom of Expression
Low Practice
Content not observable
Observable Facts
Page loading message requires JavaScript to access content
Inferences
Access barrier may prevent some users from receiving information about the product
ND
Article 20Assembly & Association
Content not observable
ND
Article 21Political Participation
Content not observable
ND
Article 22Social Security
Content not observable
ND
Article 23Work & Equal Pay
Content not observable
ND
Article 24Rest & Leisure
Content not observable
ND
Article 25Standard of Living
Content not observable
ND
Article 26Education
Content not observable
ND
Article 27Cultural Participation
Content not observable
ND
Article 28Social & International Order
Content not observable
ND
Article 29Duties to Community
Content not observable
ND
Article 30No Destruction of Rights
Content not observable
Structural Channel
What the site does
-0.15
Article 19Freedom of Expression
Low Practice
Structural
-0.15
Context Modifier
ND
SETL
ND
JavaScript requirement blocks access to information, limiting information freedom for users with JS disabled
-0.20
Article 3Life, Liberty, Security
Low Practice
Structural
-0.20
Context Modifier
ND
SETL
ND
Page requires cookies to load, which implies user data collection and tracking