DeepSeek just ignited the next phase of the AI price war. On Monday, the Hangzhou-based lab announced a 75% discount on its freshly launched V4-Pro model, dropping input tokens to $0.435 per million from $1.74 and output to $0.87 from $3.48. The promotion runs through May 5, 2026. Cache-hit inputs across the entire API lineup fell to one-tenth prior levels, immediately benefiting heavy users.
This isn’t hype. At full price—before the cut—V4-Pro already undercut OpenAI’s GPT-5.5 ($5 input, $30 output per million tokens) and Anthropic’s Claude Opus 4.7 ($5 input, $25 output). Now? Developers can process massive workloads for pennies. A billion tokens monthly on V4-Pro with caching might cost $28. GPT-5.4? Around $2,500. Claude Opus 4.6? Closer to $15,000. VentureBeat crunched those numbers, highlighting how DeepSeek compresses frontier economics into budget tiers.
But performance delivers the punch. V4-Pro packs 1.6 trillion total parameters in a mixture-of-experts setup, activating just 49 billion per task. Its sibling, V4-Flash, scales down to 284 billion total (13 billion active). Both handle one million-token contexts natively—a boon for codebases, documents, or agent chains that choke lesser models.
Architecture Drives the Efficiency Edge
Hybrid attention makes it possible. Compressed Sparse Attention interleaves with Heavily Compressed Attention, slashing single-token inference FLOPs to 27% of DeepSeek-V3.2 at full context. KV cache? Down to 10%. Add FP4/FP8 quantization, and memory halves versus pure FP8. No wonder it runs on Huawei Ascend 950 chips or Cambricon gear, sidestepping Nvidia dependence. Wei Sun, principal analyst at Counterpoint Research, called this shift a way to build AI “without relying solely on Nvidia,” speeding domestic and global adoption. The Next Web.
Benchmarks back the claims. DeepSeek-V4-Pro-Max tops open models. LiveCodeBench: 93.5% pass@1, edging Gemini-3.1-Pro High’s 91.7%. Codeforces rating: 3206, ahead of GPT-5.4’s 3168. GPQA Diamond: 90.1%, nipping Kimi K2.6 Thinking’s 90.5. Agentic tasks shine too—Terminal Bench 2.0 at 67.9%, topping GLM-5.1’s 63.5; SWE Verified resolved at 80.6%, matching Gemini-3.1-Pro. On Artificial Analysis’s Intelligence Index, V4-Pro-Max scores 52, second among open-weights behind Kimi K2.6’s 54. Hugging Face model card; Artificial Analysis.
And long context holds up. MRCR 1M: 83.5 MMR. CorpusQA 1M: 62.0 accuracy. That’s coherence where rivals falter. DeepSeek’s official preview claims V4-Pro leads open models in world knowledge (trailing only Gemini-3.1-Pro), crushes math/STEM/coding versus opens, and rivals closed frontiers overall. DeepSeek API Docs.
Three reasoning modes sweeten the deal. Non-Think for speed. High for logic. Max unleashes full power, boosting GPQA Diamond to 90.1% from Non-Think’s 72.9%. V4-Flash keeps pace on simpler agents, making tiered deployment straightforward.
Open-weights under MIT license. Download from Hugging Face or ModelScope. API mirrors OpenAI/Anthropic formats. Akshar Keremane, O-Health co-founder, sees this combo—price, source availability, million-token window—as barrier-lowering for startups and enterprises. The Next Web.
Pressure Mounts on Western Giants
So what now? Enterprises rethink budgets. Cline CEO Saoud Rizwan noted Uber’s Claude spend—enough for four months—would stretch seven years on DeepSeek. Yahoo Tech. Indie devs ditch $20 subs; Ollama cloud runs V4-Pro for zero recurring cost. X chatter echoes this: Developers praise coding prowess, with one survey of 85 DeepSeek users showing 52% ready to swap primary models. Hugging Face blog.
China’s labs keep closing gaps. DeepSeek-V4-Pro trails GPT-5.5 or Opus 4.7 by 3-6 months per their engineers, yet costs one-sixth on API. The Register. Zhang Yi of iiMedia hails the architecture as an inflection for long-context apps. But risks linger—hallucination rates hit 94%, per Artificial Analysis. Geopolitics too: U.S. export curbs push Huawei reliance, yet broaden non-Nvidia paths.
Frontier AI democratized. Again. DeepSeek forces the question: Why pay premiums when open alternatives deliver 90% there for 10% the bill? Pricing floors collapse. Expect cascades—more discounts, rushed releases, frantic optimizations. The race accelerates.


WebProNews is an iEntry Publication