Analysis: DeepSeek V4 is breaking the AI benchmark obsession

Home

China

World

Politics

Business

Sci-Tech

Health

Opinions

Video

Live

Radio

World Asia-Pacific Middle East Europe Americas Africa

By continuing to browse our site you agree to our use of cookies, revised Privacy Policy and Terms of Use. You can change your cookie settings through your browser.

I agree

/CGTN

Roughly two weeks ago, two large AI models were dropped almost simultaneously: OpenAI released GPT-5.5 – a flagship model with benchmark scores that reset the leaderboard, while DeepSeek, the AI lab based in Hangzhou, China, released its own new model V4.

The contrast was not what anyone expected. OpenAI's launch was a victory lap. DeepSeek came with a quiet admission buried in its technical report: V4, by its own reckoning, trails GPT-5.4 and Gemini 3.1 by roughly three to six months.

In an industry where every model launch is accompanied by a chart showing you're somehow beating everyone at something, this was close to unheard of.

So why would DeepSeek, a lab that has terrified Western AI companies with its cost efficiency, openly admit it is not winning the raw capability race?

The engineering that matters more than benchmarks

The specs that got the headlines – free download, million-token context, absurdly low pricing – are not what makes V4 strategically significant. The real story is what happens when you actually try to use a million-token context window.

/CGTN

In a standard model, the memory requirements explode as the context grows longer. V4 solves this with a brutally effective trick – compressing them. Think of it as the difference between keeping every frame of a movie versus keeping a summary of each scene. As a result, the memory burden dropped by as much as 90 percent.

Another piece of engineering worth understanding is that the DeepSeek team traded raw benchmark performance for training stability – a choice that benchmark-obsessed labs won't typically make.

/CGTN

Taken together, these engineering choices point in a single direction. DeepSeek is not trying to build the most powerful model possible. It is trying to build the most practical one. The one that runs on hardware people actually own, at costs that do not require a corporate budget meeting to approve. That turns out to be a much harder engineering problem, and a much more interesting one.

The hidden migration

V4 was supposed to ship months earlier. It did not. The reason was not some mysterious training instability or last-minute architecture change. It was chips.

DeepSeek decided to migrate V4 from Nvidia's CUDA ecosystem to Huawei's Ascend compute platform.

For years, the unspoken rule of frontier AI has been that you run on Nvidia – not because Nvidia's silicon is inherently superior in every dimension, but because the CUDA software ecosystem is so deeply embedded that switching is economically irrational. Developers trained on CUDA. Frameworks were built for CUDA. Operator libraries assumed CUDA. Changing chips meant rebuilding your entire toolchain.

The DeepSeek team had to rewrite more than 200 core operators from scratch. Early training runs on Ascend hardware repeatedly crashed. Huawei sent engineers on-site. The collaboration eventually produced a working pipeline, but it took 15 months of what multiple sources have described as a punishing, grinding migration.

One engineer calculated on social media that running V4-Flash on an Ascend 950 super-node, at equivalent concurrency, costs 40% to 60% less in hardware than a comparable Nvidia setup. "Not because I want to support domestic chips," they wrote. "Because the math works." V4 and Ascend have demonstrated that a non-CUDA path is viable.

The architecture that legislates for future hardware

There is a subtler thread here that has gotten almost no English-language coverage. V4's hybrid attention architecture is an implicit specification for the next generation of AI chips.

The memory compression technology shifts the bottleneck away from memory and toward compute. If a model like V4 becomes the dominant architecture, the optimal chip design changes. You no longer need enormous high-bandwidth memory. The software is, in effect, writing the hardware spec.

This is the inverse of how Nvidia built its dominance. CUDA created a software ecosystem so deeply entrenched that hardware customers could not leave. DeepSeek is attempting the mirror image: create a model architecture so compelling and so efficient that any chip hoping to run future AI workloads will have to follow its blueprint. It is software defining hardware, and it would put a Chinese AI company – not a Silicon Valley chip giant – in the architect's chair.

By openly conceding the raw capability crown, DeepSeek is doing something that no Western AI lab has been willing to do: reframing what "winning" means.

Open in CGTN APP for better experience

Search Trends

DeepSeek V4 analysis: What's the point of topping the AI leaderboard if nobody can afford you?