Insane. 3 points behind opus on the artificialanalysis index.
Mimo cost ~$400 at the old price, so about $40 today.
Opus cost ~$5000
That's over 100x cheaper, and just 3 points behind.
I can't wait to experiment with an llm consortium of 100 deepseek and mimo models.
Crazy times.
Shut up and take my m̶o̶n̶e̶y̶ data!
Edit: Gemini on google search told me I could write strikethrough text on hn using <s>. Mimo told me it was unsupported and then went on to list some tags that are supported, like <b>bold</b>. I tried copy pasting the word in strikethrough from a word processor but it lost the format. I ended up using mimo in an agent shell wrapper to produce it, and copy pasting from the terminal worked for some reason.
I had a subscription before the price was cut down; the model kept randomly looping the with same character (burning 30% of the budget in one shot), and the overall performance for agentic purposes is, simply put, terrible.
It finds non-existing bugs and randomly removes chunks of code to fix them, then even presents it as an "extra fix".
Maybe it's a good generalistic model; I haven't tested it in that regard.
MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks.
I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need?
Funny, I had the opposite experience with MiniMax and Mimo when using OpenCode. MiniMax got stuck with looping through broken tool calls all the time and MiMo just powered through things and for the most part just worked.
benchmarks we deserve: google search quick ai answers vs full llm model :)
search answers use Flash 3.5
they use a "low" flavor of it to scale it on billions of users
So I tried the $16/mo token plan. Burned through 31% of monthly budget in one 1-2h session of a small C project refactoring, saw some not great behavior (hey subagent, read me back these 6 files exactly - which probably burned a lot of output tokens) and will cancel, obviously.
This is waaaaay more constrained than even Claude Pro plan, let alone Deepseek V4 or Kimi K2.6 pricing.
What did MiMo say?
Says its not supported and lists a few tags that are, like <b>bold</b>
Does this work: s̶t̶r̶i̶k̶e̶t̶h̶r̶o̶u̶g̶h̶
Well done. Unicode wins again. 𓂺
That's deliberate. US AI companies have no chance of recouping even fraction of their valuations.
PS: Have not tried this but Deepseev4 Flash (not even Deepseekv4 Pro version) with set to "high" has pretty much Claud Opus 4.7 level of capabilities and is lightening fast and dirty cheap. Hours and hours of conversation barely costs few cents.
DeepSeek Flash on high (not max) is a freak of nature indeed.
Very disproportionate intelligence-to-cost ratio.
I'm leveraging this temporary anomaly and using it as my coding workhorse.
The weights are open and when prices settle down again will be runnable with less than 10k of hardware.
I can easily run it in a 8 bit quant with the 4 x 48GB Radeon Pro W7900 GPUs I snagged for 2k each before the memory squeeze.
A 158B parameter model, especially in an architecture as efficient as DS4 is not that hard to drive currently if you got in before the craze, and will be relatively easy to drive with future hardware generations.
Doesn't DeepSeek-V4-Flash have 284B parameters?
You are correct.
what makes you so confident it’s temporary
I am very happy with DSv4 for their price/performance but neither of them are comparable to Opus.
But they're overall a good thing for us consumers even if we'll never use these models, it forces the prices down all around.
Yeah, I really like and use DSv4Pro for personal projects, but I also use Opus all the time at work and they are definitively not at the same level.
I can only conclude that people who claim they are aren't doing anything close to the edge of what these models are capable of or any niche things.
I would say DSv4Pro is around the same level as Sonnet.
> US AI companies have no chance of recouping even fraction of their valuations.
A big caveat here is that many US companies (particularly in sensitive industries, like defense) will likely not want to (or not be allowed to) use Chinese models for anything of substance.
What about self host Chinese models?
Definitely more leeway there IMO. But the optics aren't ideal as someone mentioned. And unfortunately, optics is more important than it should be sometimes.
Still no.
Why?
trust, optics, laws
USA is as censored as what we believe about China. You will get cancelled by Americans if you use "communist" stuff. That is why hardly any Chinese EVs in USA. Because it is communists stuff. The odd thing is iphone is made in China. So it is more of selective enforcement when convenience. Chinese AI even self host means your will influence by communism. You want McCarthy era back again?
Massive overgeneralization and hyperbole.
You're referring to a very small subset of the American population. It's ironic because you seem to be claiming Americans are closed-minded here but I think that may actually describe your mindset as well.
Chinese EV policy in the US is about propping up our auto industry despite its best efforts to lose the EV battle. This has nothing to do with "communism", it's a purely economic thing that ties into internal US voting blocs.
I have been using DeepSeek API within Claude Code. So far it has been legitimately superior to Claude, and Codex that I used before.
Anecdotal evidence is nice but hard to take seriously given the myriad variables at play here.
I worked part time with MiMo 2.5-pro over the last month, and barely managed to use 500 Million of the 700 Million tokens I had allocated.
My plan was just upgraded to 38 BILLION tokens per month. That's at least 10X the tokens I've used in my entire agentic development so far.
I should probably downgrade my plan, but we'll see. :)
Token allocation/cost aside, how was the quality of the model? Any comparison with any other model you've used?
For example, I've heard DeepSeek v4 Pro is comparable to Sonnet 4.7, so I just bought some credits to try it out.
Did you not get 38B units? And a token = 2.5 unit (cache hit) or up to 600 unis (cache miss)
Yeah, I think they did switch the unit type.
Yep. I also got stupefied after I logged in and saw how many tokens they stuffed into my account...
Since the 3rd party providers on openrouter have all converged on much higher prices in serving these models (both mimo and dsv4), there's obviously a question on how/why are they lowering the prices so much.
It's possible they've finally integrated cheap(er) chinese chips. It's also possible they're just subsidising inference for real-world usage data. Interesting either way.
> there's obviously a question on how/why are they lowering the prices so much.
Same reason they release some of the models for free: They are trying to capture market share.
The difference is that releasing the model for free doesn't have ongoing cost for the company. Providing cheap tokens is very expensive - specially if you don't have access to the latest transistor node chips. So I think the parent comment is right, there's something else at play allowing DS and Xiaomi to offer these nearly free tokens.
LLM providers can't "capture" anything. People loved Claude Code because it was cheap and good. Not cheap anymore? People switching to Codex, DS4 etc.
Their only moat is maybe being SOTA but that only lasts so long before everyone else catches up.
This is why they are pushing more for non-tech folks to use their products with desktop apps. They are not going to switch on a whim.
I mean there is a minor moat. Most people don't enjoy switching providers or models. If you can get people to trust you'll stay near frontier, they'll stick around even when you aren't the best. Claude is a prime example of this
I switch models all the time.
/model in OpenCode
There is no "moat" for me.
Using the standard chat applications as a normal conversational/question has a little bit of moat as its able to cross reference existing conversations, but I disable that mostly anyways to prevent as much data retention as possible.
> how/why are they lowering the prices so much
Like I responded to someone else:
- Cheap electricity
- Cheap, domestically produced GPUs
- Efficiency research. (a lot of it from Deepseek's research)
Also, the Chinese government wants the AI to be as accessible as EVs so everyone will use it.
Also if this is on the path of anything the Chinese do in the physical goods world, inference will be rockbottom cheap in a few years because they'll invest in the hell out of energy, GPUs, research, etc. The same thing they did with EVs.
Only artificial barriers will keep people using some of the frontier stuff in a couple of years. No costs will justify.
National security, training data
[dead]
These and the Deepseek ones that were were cost reduced recently are perfectly capable models for the vast majority of light work and more.
It's funny thinking the US companies are hiking prices and Chinese ones do the opposite, it's obviously an strategy, but pretty funny
How are these "capacity constrained" Chinese companies running inference without Hoppers and Blackwells ?
Huawei Ascend AI Accellerators. DeepSeek V4 model architecture was optimized for Chinese hardware.
They can (not entirely sure how 'grey' market this is) either have subsidiaries outside of china (eg: singapore) that provide the inference and/or just rent it off the public gpu clouds.
Making their own NPUs for inference probably, you don't have to buy NVidia for inference. Google doesn't.
Looks like this is basically to mostly match (or slightly undercut, in the case of MiMo-v2.5-Pro) DeepSeek's pricing for DSV4-Pro and DSV4-Flash.
This seems great! Between just these two providers, this is a couple pairs of models that seem suitable for replacing Claude Sonnet and Claude Haiku, at around 1/20th the price.
It's a bummer for me that nothing can match at least Opus 4.6 or GPT-5.5 yet, since I'd characterize those as the first models to actually be good enough to be useful for writing code, at least in my experience at work.
But for simple stuff, or situations where you can have the huge model dispatch to subagents or just "advise" or "supervise" smaller agents on their work, this looks great. Wherever the frontier models end up in a year, if there are open-weight contenders like this around GPT-5.5's level by then, I think I can be happy and productive doing most prototyping with those models and hand-editing for quality or more serious work.
interesting/funny: their off-peak rates apply 00:00-08:00 Beijing time, so nine-to-five for someone on the NA west coast :p
China has a population of 1.4B, US is 349M. 0-8 Beijing time is their off-peak? How is that funny, that's literally how timezones work?
It's funny, in a good way, because their off-peak times match perfectly the werstern peak demand.
Can folks in China run US-based models? Seems like they should take advantage of this overlap in peak timing.
Yes, use VPN; they are the main clients
Why do they use a VPN?
Chinese state firewall?
Well not just that, OpenAI explicitly blocks them
Blocks Chinese users or blocks VPNs? Are they the only one?
You can check for yourself here to see that China and Hong Kong are conveniently missing. We do see blocking from Anthropic and Gemini as well in some regions
Also even though Vietnam and the Philippines are technically supported we do see blocking from some IP addresses in those regions too
I see - I was just curious. Does China permit citizens from accessing American AI models if they were permitted by the American companies?
Well I can tell you lots of them access them no matter what, our service provides a proxy for them if a request gets blocked and lots of AI providers do the same since they access the APIs through a central server without passing along the actual users IP address
Understood, I was just curious if the CCP blocks Chinese citizens even if they were permitted by the US. It looks quite a bit like the general economic policy of China - block foreign companies and artificially drive pricing down for your products globally. I have yet to see evidence to the contrary but was just wondering
Will try MiMo now. I have been mainly using just DeepSeek lately because of the fact that V4-Flash destroys basic work for basically 0 cost. Haven't exceeded even 50% of my OpenCode Go weekly limits using V4 Flash and Pro.
You can use Codex as an orchestrator and claude code via mimo/deepseek api as executor. I've read this a lot before but when you really try it, it is really something in the way you can stretch your credits.
The industry seems to be moving from "best model wins" toward "good enough model at lowest cost wins."
just wondering when the overwhelming consensus among devs is just gonna be that companies should go horizontal instead of vertical. I feel 5.5 is 100% good enough for literally anything i could point it at, and its shortcomings are largely gonna be paired to not grasping wider context it can't really load into a context window to correct (not understanding when a user is gonna miss something that might be implied knowledge at that stage or similar issues).
So, at this stage of time I'm not even totally looking at lesser models to save costs or usage, im using lesser models because they fit the task more than fine and will nail it. Instead though, 6 months from now the model landscape will be totally different, costs will not have gotten better(for US companies), because their priorities are almost entirely on chasing capability of models.
So i hope you're right and the overall market is moving the direction you mention, but I think the US will continue this absurd race to... just being #1 regardless of how much it stops making sense.
It is a combination of 3 things
1. Some companies are very good in training and serving at much lower cost
2. Some companies have access to new much cheaper hardware
3. People have realzeid that you dont need a 3.2T model when a 310B one (Opus vs MiMo 2.5) performs equally well for your particular task.
The 99% is with regards to cached inputs. It seems to now at the same price as deepseek v4-pro
These reductions as Microsoft and Uber say AI is too expensive. The play is right there.
The token plan is confusing.
From their docs "After using 10M input (cache miss) tokens of MiMo-V2.5-Pro, it is equivalent to consuming 3000M Credits, and you can still enjoy 1100M Credits of MiMo-V2.5". So it's around 12M input credit vs Earlier 60M tokens.
Hot take: The reason this is happening is because the market for Chinese AI models hosted by Chinese companies is struggling. Even the market for Chinese AI models hosted by western companies is soft: During the week of May 18, OpenRouter processed 3.4T DeepSeek v3 Flash tokens (their most popular model). Google has announced that Gemini is processing 746T per week; Claude is probably processing more. And the Chinese models were already staggeringly cheap, far cheaper than most Gemini, Claude, or GPT models, before this recent array of pricing changes.
Broadly: No one is using the Chinese AI models. Everyone, globally, everywhere, including in China, is using the models from OpenAI, Anthropic, and Google. The models from the Big Three western labs represent >80% of all tokens processed and likely >95% of all revenue.
> OpenRouter processed 3.4T DeepSeek v3 Flash
> Gemini is processing 746T per week
I read this totally differently. A startup nobody really knows is doing half a percent of Google on a commodity task?!? Google, which puts Gemini on billions of devices by default, without the user asking? Google, which is distributing Gemini to users who are unaware they are even using it?
Versus a startup that does not even have a login button on its homepage?
This is astonishing.
Unfortunately, the market doesn't generally let you buy Blackwells with "we got half a percent of Google's marketshare with a model we're literally giving away for free [1]". You need that thing we call Capital. But, they may certainly opt to have it written on their gravestone, as Google is (checks notes) continuing to put Gemini on billions of devices and doing quadrillions of tokens per month.
This is a bizarre comment for a couple of reasons.
First, obviously everyone involved understands that someone has to pay to provide a free service. Everyone involved also knows that this sometimes makes sense as a business strategy (I have not paid to ship anything from Amazon for close to two decades).
Second, OpenRouter's business model specifically does not require them to run all (any?) of the models available through the platform. Provider is one of the choices when you choose a model, and each provider can have separate pricing.
The link you posted shows only one provider, Crucible. That may/may not be affiliated with OpenRouter? Even assuming an affiliation, it's opaque who is subsidizing this usage. Is it OpenRouter or Crucible?
All of this is somewhat of a distraction. Even if someone gave search away for free (like Google), it would still be an accomplishment to get to half a percent of Google's volume. Or to sell half a percent of the volume of Android phones. Or whatever.
Kudos to the OpenRouter team!
In the statement "we got half a percent of Google's marketshare with a model we're literally giving away for free" the term "we're" here refers to the conglomeration of "DeepSeek" (for making a model small enough to be capable of being hosted for free) and the model providers who do offer it for free (why they do this is... unknowable). It does not refer to OpenRouter, who are merely middlemen.
My original DeepSeek v4 Flash token counts spanned all providers of that model, both paid and free; I merely pointed out the free provider to substantiate a point that DeepSeek's product may be so bad that they could quite literally give it away and people would still prefer to pay (a lot) to OpenAI, Anthropic, or Google. Why this is the case, I leave as a exercise to the reader; I'm just citing numbers and facts.
Agreed.
Not to mention, week on week more and more tokens are being processed via OpenRouter. [0]. The number keeps going up, with no end in sight in my opinion, if the China models continue offering cheaper inference, whilst tailing behind not too far, the line will keep going up.
OpenRouter is not the only "router" type AI company. More fixed providers like OpenCode and commandcode are offering subscription services on open/china models, likely consuming billions of tokens each. Who know how many tokens are being process directly against Deekseek and Kimi's APIs.
OpenRouter is not indicative of volume. Most high volume clients will go to the providers directly. There's not point to paying the 5% OR cut if you know what you want.
That's just it: This is not happening with the Chinese models, because western corporations are the primary drivers of AI adoption globally and western corporations are not signing up for a DeepSeek API key. If they're working with Chinese models at all, which they rarely are, it is via a western-hosted provider like Bedrock, Vertex, or OpenRouter; or self-hosting. Sure, hobbyists and individual programmers might be comfortable forming a business relationship with a nationalized Chinese entity, but you'd need a microscope to see that relative to the spend that, say, Eli Lilly is throwing at Anthropic every week.
But you're right that OpenRouter is only one data point. It is, unfortunately, one of the few we have.
If you're going to compare OpenRouter numbers for DeepSeek at least use the same metric to compare Gemini. During last week DeepSeek V4 Flash did 3.72T tokens which is way higher than combined token counts for Gemini (2.5 Flash + 3.5 Flash + 3.1 Pro)
DeepSeek's official API, which has 10x cheaper cached input cost isn't even on OpenRouter as a provider, so just like Google, most volume is not going through OpenRouter. (Gemini's official hosted api is on OpenRouter BTW)
Also you're comparing an API with Google's internal corporate and consumer app use. Bytedance announced they were using 63T tokens/day (441T / week) at the end of 2025, so they are probably even higher than Google now. We don't know how much weekly tokens the DeepSeek chatapp uses, but it would also be a very high number much higher than OpenRouter tokens.
For the real reason of the recent price drops, go ask your AI about how much it would cost to run DeepSeek V4 or MiMo 2.5 after Ascend 950 PR have started to be mass delivered in 2026 Apr at $10k / card.
The issue you're not seeing is: Western corporations, the primary drivers of AI spend globally, are not forming business relationships with nationalized Chinese AI labs in order to directly use the DeepSeek API. They're using it through western proxies like OpenRouter, if they're doing it all (newsflash: they aren't). They are forming business relationships with Anthropic, Google, and OpenAI to directly use their APIs.
Why are you using Openrouter as metric here? Most people use the APIs directly.
comparing deepseek usage on openrouter to google usage in total is not statistically correct
you could equally say, in the last complete week openrouter processed more deepseek tokens than any other provider including google
that also would not tell you much about how many tokens are used on deepseek
That makes some sense.
I mean, I am going to use the best I can afford. And at work that's Opus, but while work is happy to let me spend $50+/day, that's just not viable for personal hobby use, I need to keep that in the realm of a WOW/mmo subscription.
Yup; its not just that people want the best models, so they use Opus or GPT-5.5. Its also that we're talking about nationalized chinese companies. Western corporations are not en masse forming business relationships with Chinese firms and subjecting their proprietary code to whatever harness they cook up just to save a million dollars. Its not happening. And that's why the Chinese labs are failing; they're struggling to build a domestic market for token consumption. The Chinese domestic market sucks for almost everything China builds, they need export partners like the US to keep most of their factories on, but in the case of AI: no one overseas is buying.
"The api pricing for mimo-v2-pro and mimo-v2-omni remain unchanged" could we presume this means the discount isn't from hardware improvement or availability ?
VSCode + Cline + Mimo v2.5 pro works ! great !. Give it a try.
So exactly same as deepseek 4 api pricing
One difference is that MiMo 2.5 (non-Pro) has image, audio and video input capabilities.
DeepSeek does not understand image, audio or video.
I've heard non-Pro isn't nearly as good for coding as Pro?
Is this deepseek v4 api pricing for May (at 75% off - offer supposedly end June 1)? Or the non-discounted api pricing?
Deepseek made the discounted price permanent before this.
How realistic is this:
Chinese models incidentally slurps up some terms that lead them to finding unflattering words that you wrote about the CCP in a random journal entry, or maybe a social media csv export. You go to China one day and are denied entry due to what you said.
Realistic or no? (yes i know the us is getting bad in re. to what you write online as well)
Models hosted in China are a siren call that I don't feel bad about resisting.
China knows tourists are not Chinese; that's why when you visit China your non-Chinese SIM card silently bypasses the Great Firewall when you use roaming data.
Besides, the Chinese government doesn't really care about individual criticisms, even in public, especially in languages other than Chinese. What they really care about censoring is attempts to organize collective action. They don't care about personal opinions stated in the blog posts of tourists, let alone diary entries.
I really like the US model of free speech, at it's best. It feels natural and right to me. It would be cool if Chinese people had stronger freedom of political speech— I'd love to hear Chinese people publicly share their thoughts online without restraint or censorship; it's a huge country with a lot of smart people with diverse opinions.
But maybe you should go visit China sooner rather than later, tbf. It's friendlier and weirder and more interesting than you think, including w/r/t the censorship regime.
This statement makes no sense, because you literally said the "US is getting bad". We already gave up all of our data, if you wrote something about the CCP you should already expect they know about it.
Besides that, the us govt already has all your data and yet people are criticising it all around, in the open. They can, without repercussions, because the us is a free country.
Chinese people can’t really do the same.
This may be true about any models hosted by others than you.
At least the Xiaomi models are open weights and you can host them yourself, avoiding such concerns.
Well, at least for the Chinese models you can run them locally vs. the US models that requires you to go through their servers. But to answer your question:
> How realistic is this:
Completely unrealistic unless you are a high value target (journalist, spy, business man, etc...)
> CBP denies travelers entry because of anti-Trump comments
China won't deny entry for anti-Trump comments, guess I'll use MiMo
China also won't deny entry for anti-Israel comments, so even more reason to use MiMo.
You're projecting the US doing this with criticism of Trump and Israel on China, when there's no proof of China ever doing something like this.
Everyone already said what I wanted to say. That all US companies (OpenAI, Anthropic, Google, MS Copilot) have increased price recently while Chinese companies (Deepseek, Xiaomi) are reducing price.
The question is how they are managing to do so? They are supposed to struggle due to chip sanctions.
Secondly, why now? The US companies were supposed to subsidize too but now they are unable to keep up. Everyone going to usage based pricing, so it's unsustainable for them. They are well funded too.
If there are genuine hardware breakthrough reducing compute needs then that is good for the whole world I believe.
> They are supposed to struggle due to chip sanctions
As Jensen has been pointing out for almost a year now, these sanctions were ineffective and probably had the opposite effect of the desired goal.
The history is fairly long, but an inflection point could likely be traced to Trump v1 era DOJ enforcement on (among others) Huawei's CFO Meng Wanzhou in 2018. Huawei was hit with the (really big) stick in international transactions: OFAC violation accusations, and it was a seminal moment in the company's internal operations -- they concluded they needed a fully internal supply chain in China, and retooled for it. Meng Wanzhou cases in the US were eventually dismissed, but she was on house arrest in Canada through 2021 or so.
Fast forward to 2024 -- Huawei was culturally and technically ready to build AI accelerators -- one of the externalities of the sanctions was to provide additional benefit to Chinese companies for buying from Huawei; those economics seem to have provided a boost to on-shore development.
Competition I guess, they must be burning some resources to make this price reduction happen...
The state of the art models (mostly GPT 5.5, but also Gemini and Claude) are better so they cost more. Qwen 3.7 Max is their only direct competition and it is not any cheaper.
Are they?
I have been using DeepSeek, and I am finding it better than Claude or Codex, to be honest.
I don't see myself going back.
I love ds4, us models are better imo, but like 5% not 500% better, so the valuation doesn't really make sense
that being said, deepseek v4 needs to be on amazon bedrock to actually be feasible in the US Enterprise market and start driving other provider prices down
I just wish there was a MiMo REAP gguf that was reduced enough to fit within RAM of computers of mere mortals
as someone from the 3rd world - this is pleasant - even 3rd world countries will have affordable "A.I" access via Chinese models.
as someone who now lives & has lived in the west for the majority of their adult life - yeah the US western models r fucked n the crazy valuations of the A.I labs - which also filters down to the economy - since all money instead of being put to productive use is being wasted on this shit. hell electricity bills are up - cz datacenters need power. the current crooks in power don't believe in clean energy.
>>as someone from the 3rd world - this is pleasant - even 3rd world countries will have affordable "A.I"
I stopped tagging my country as developing and then third world and call it for what it is, a POOR country. I know with increasing certainty that my country will be poor for the rest of my life. I also expect AI to be as available as computers: there are the "have", and there are the "don't have", which is almost always a lifetime condition.
I will adopt your take. it's precise.
The price cut is 50%.
Can you explain any further?
If you open the page you will see what is reduced by how much.
Everyone adding "Permanent" to price cuts now
Can't be mistaken for someone like, ugh... Anthropic and OpenAI...
This sort of pressure will force them to though.
I hope so, but I don't know if they are in a position where they can offer these kinds of prices. They are already struggling with not losing a lot of money with their models, while chinese models can be independently hosted by inference providers at a profit already. We need to drive these prices down so AI doesn't become a thing for the few who can pay for expensive subscriptions.
nah, just ban using Chinese models and ban open source models. This will allow them to keep the high price. Got to recoup the money spent somehow, time to lobby the government.
They can't afford it. OpenAI and Anthropic bleed money and are desperate for an IPO, that they can get some extra mileage.
First Deepseek, Now Xiaomi. A price cut of 99%.
This is why Anthropic wants these chinese AI models banned as they are in the lead in the AI race to zero and they know that there is no modal moat.
So don't tell Dario.
Like I said. China doesn't care about money. We want AI in people's hands.
I mean the AI companies probably just want to make American model pricing look ridiculous in comparison (it's working imo). I think the government probably wants actually-useful AI that could be put into chips and actually revolutionize factory work or mining or whatever. Large, SOTA models are not gonna change factory work but extremely efficient and optimized models may
Every industry-wide scale technological revolution has happened because government funded a technology and then opened it up to the masses. Just look at your iPhone: GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't)
Economist Mariana Mazzucato wrote a great book about this called The Entrepreneurial State: Debunking Public vs. Private Sector Myths
> I mean the AI companies probably just want to make American model pricing look ridiculous in comparison (it's working imo)
I really don't think China cares about that. Chinese government's governance logic is making everything so cheap that everyone can get and use it. They did it with EVs and other things. Now they are doing it with the AI.
They do want to see the American bubble burst, this is the quickest way
with all the price increases in everything
else, i think we are all tired of this bubble to be honest...
OK. Google was just killed.
How is it possible to reduce the price by 99%???????
This is crazy
The reduction is in cached inputs. I've commented about this before but many labs, except Deepseek and Xaomi now, absolutely scam you for cached reads.
You are basically paying out the nose for a few seconds of VRAM residence if you are giving significant money for cache reads.
The very nature of autoregressive language modeling is that every single output token produced "reads" the cache.
So in principle the price floor for a cache hit is the flat cost of 1 output token.
Now in reality it has to be more than that because you are occupying VRAM with the cache that forces out other users. But it can still be really cheap.
No one is producing one output token though.
And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk
I've read on X that deepseek api can stay alive for hours vs 5 minutes tops for other providers. they do it with ram and ssd, not only vram.
[deleted]
- Cheap electricity
- Cheap, domestically produced GPUs
- Efficiency research by many phDs. (many AI companies used Deepseek's research though)
Industrial Chinese electricity costs is similar to that of Texas, It's 8-9cents a kWh. The only benefit is industrial China decides to put millions of solar panels down, so "peak" sunlight hours can drop electricity costs significantly since their rates are highly dynamic.
Add to that home made inference chips and dirt cheap RAM from CXMT
State backed loss leaders.
Is that worse than VC-backed loss leaders? :)
Yes. VCs can’t compete with the second largest economy on the planet.
I think this is probably correct based on the way state investment into the Chinese EV market has been working - fund a whole bunch of them and let them fight it out to be one of the few brands that will have the longevity. It's pretty brutal with the cars.
> let them fight it out
yep, from what i hear, the govt makes sure there is intense local competition in the market so it produces a few really good companies that survive... its kind ironic considering what is going on with mono/oligopolies over here...
The rest of the best of the business is paying for it
Anything to destroy US tech companies is welcome.
They aren't aiming companies but users which many have no common sense and grant these agentic AI access to everything.
All the restrictions the US imposed to CH, will be reverted back and it will be even worse, because now the data is not reaching the US gov ( we all know they have access to US big techs data ) but CH.
I really hope this goes viral and breaks Nvidia/OpenAI.
fun fact, CH is an ISO code of Switzerland, and China is CN
I even know the reasoning but CH for Switzerland always bothers me
Insane. 3 points behind opus on the artificialanalysis index.
Mimo cost ~$400 at the old price, so about $40 today. Opus cost ~$5000
That's over 100x cheaper, and just 3 points behind.
I can't wait to experiment with an llm consortium of 100 deepseek and mimo models. Crazy times.
Shut up and take my m̶o̶n̶e̶y̶ data!
Edit: Gemini on google search told me I could write strikethrough text on hn using <s>. Mimo told me it was unsupported and then went on to list some tags that are supported, like <b>bold</b>. I tried copy pasting the word in strikethrough from a word processor but it lost the format. I ended up using mimo in an agent shell wrapper to produce it, and copy pasting from the terminal worked for some reason.
I had a subscription before the price was cut down; the model kept randomly looping the with same character (burning 30% of the budget in one shot), and the overall performance for agentic purposes is, simply put, terrible. It finds non-existing bugs and randomly removes chunks of code to fix them, then even presents it as an "extra fix". Maybe it's a good generalistic model; I haven't tested it in that regard.
MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks.
I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need?
Funny, I had the opposite experience with MiniMax and Mimo when using OpenCode. MiniMax got stuck with looping through broken tool calls all the time and MiMo just powered through things and for the most part just worked.
benchmarks we deserve: google search quick ai answers vs full llm model :)
search answers use Flash 3.5
they use a "low" flavor of it to scale it on billions of users
So I tried the $16/mo token plan. Burned through 31% of monthly budget in one 1-2h session of a small C project refactoring, saw some not great behavior (hey subagent, read me back these 6 files exactly - which probably burned a lot of output tokens) and will cancel, obviously.
This is waaaaay more constrained than even Claude Pro plan, let alone Deepseek V4 or Kimi K2.6 pricing.
What did MiMo say?
Says its not supported and lists a few tags that are, like <b>bold</b>
Does this work: s̶t̶r̶i̶k̶e̶t̶h̶r̶o̶u̶g̶h̶
Well done. Unicode wins again. 𓂺
That's deliberate. US AI companies have no chance of recouping even fraction of their valuations.
PS: Have not tried this but Deepseev4 Flash (not even Deepseekv4 Pro version) with set to "high" has pretty much Claud Opus 4.7 level of capabilities and is lightening fast and dirty cheap. Hours and hours of conversation barely costs few cents.
DeepSeek Flash on high (not max) is a freak of nature indeed.
Very disproportionate intelligence-to-cost ratio.
I'm leveraging this temporary anomaly and using it as my coding workhorse.
The weights are open and when prices settle down again will be runnable with less than 10k of hardware.
I can easily run it in a 8 bit quant with the 4 x 48GB Radeon Pro W7900 GPUs I snagged for 2k each before the memory squeeze.
A 158B parameter model, especially in an architecture as efficient as DS4 is not that hard to drive currently if you got in before the craze, and will be relatively easy to drive with future hardware generations.
Doesn't DeepSeek-V4-Flash have 284B parameters?
You are correct.
what makes you so confident it’s temporary
I am very happy with DSv4 for their price/performance but neither of them are comparable to Opus.
But they're overall a good thing for us consumers even if we'll never use these models, it forces the prices down all around.
Yeah, I really like and use DSv4Pro for personal projects, but I also use Opus all the time at work and they are definitively not at the same level.
I can only conclude that people who claim they are aren't doing anything close to the edge of what these models are capable of or any niche things.
I would say DSv4Pro is around the same level as Sonnet.
> US AI companies have no chance of recouping even fraction of their valuations.
A big caveat here is that many US companies (particularly in sensitive industries, like defense) will likely not want to (or not be allowed to) use Chinese models for anything of substance.
What about self host Chinese models?
Definitely more leeway there IMO. But the optics aren't ideal as someone mentioned. And unfortunately, optics is more important than it should be sometimes.
Still no.
Why?
trust, optics, laws
USA is as censored as what we believe about China. You will get cancelled by Americans if you use "communist" stuff. That is why hardly any Chinese EVs in USA. Because it is communists stuff. The odd thing is iphone is made in China. So it is more of selective enforcement when convenience. Chinese AI even self host means your will influence by communism. You want McCarthy era back again?
Massive overgeneralization and hyperbole.
You're referring to a very small subset of the American population. It's ironic because you seem to be claiming Americans are closed-minded here but I think that may actually describe your mindset as well.
Chinese EV policy in the US is about propping up our auto industry despite its best efforts to lose the EV battle. This has nothing to do with "communism", it's a purely economic thing that ties into internal US voting blocs.
I have been using DeepSeek API within Claude Code. So far it has been legitimately superior to Claude, and Codex that I used before.
Anecdotal evidence is nice but hard to take seriously given the myriad variables at play here.
I worked part time with MiMo 2.5-pro over the last month, and barely managed to use 500 Million of the 700 Million tokens I had allocated.
My plan was just upgraded to 38 BILLION tokens per month. That's at least 10X the tokens I've used in my entire agentic development so far.
I should probably downgrade my plan, but we'll see. :)
Token allocation/cost aside, how was the quality of the model? Any comparison with any other model you've used?
For example, I've heard DeepSeek v4 Pro is comparable to Sonnet 4.7, so I just bought some credits to try it out.
Did you not get 38B units? And a token = 2.5 unit (cache hit) or up to 600 unis (cache miss)
Yeah, I think they did switch the unit type.
Yep. I also got stupefied after I logged in and saw how many tokens they stuffed into my account...
Since the 3rd party providers on openrouter have all converged on much higher prices in serving these models (both mimo and dsv4), there's obviously a question on how/why are they lowering the prices so much.
It's possible they've finally integrated cheap(er) chinese chips. It's also possible they're just subsidising inference for real-world usage data. Interesting either way.
> there's obviously a question on how/why are they lowering the prices so much.
Same reason they release some of the models for free: They are trying to capture market share.
The difference is that releasing the model for free doesn't have ongoing cost for the company. Providing cheap tokens is very expensive - specially if you don't have access to the latest transistor node chips. So I think the parent comment is right, there's something else at play allowing DS and Xiaomi to offer these nearly free tokens.
LLM providers can't "capture" anything. People loved Claude Code because it was cheap and good. Not cheap anymore? People switching to Codex, DS4 etc.
Their only moat is maybe being SOTA but that only lasts so long before everyone else catches up.
This is why they are pushing more for non-tech folks to use their products with desktop apps. They are not going to switch on a whim.
I mean there is a minor moat. Most people don't enjoy switching providers or models. If you can get people to trust you'll stay near frontier, they'll stick around even when you aren't the best. Claude is a prime example of this
I switch models all the time.
/model in OpenCode
There is no "moat" for me. Using the standard chat applications as a normal conversational/question has a little bit of moat as its able to cross reference existing conversations, but I disable that mostly anyways to prevent as much data retention as possible.
> how/why are they lowering the prices so much
Like I responded to someone else:
- Cheap electricity - Cheap, domestically produced GPUs - Efficiency research. (a lot of it from Deepseek's research)
Also, the Chinese government wants the AI to be as accessible as EVs so everyone will use it.
Also if this is on the path of anything the Chinese do in the physical goods world, inference will be rockbottom cheap in a few years because they'll invest in the hell out of energy, GPUs, research, etc. The same thing they did with EVs.
Only artificial barriers will keep people using some of the frontier stuff in a couple of years. No costs will justify.
National security, training data
[dead]
These and the Deepseek ones that were were cost reduced recently are perfectly capable models for the vast majority of light work and more.
It's funny thinking the US companies are hiking prices and Chinese ones do the opposite, it's obviously an strategy, but pretty funny
How are these "capacity constrained" Chinese companies running inference without Hoppers and Blackwells ?
Huawei Ascend AI Accellerators. DeepSeek V4 model architecture was optimized for Chinese hardware.
They can (not entirely sure how 'grey' market this is) either have subsidiaries outside of china (eg: singapore) that provide the inference and/or just rent it off the public gpu clouds.
Making their own NPUs for inference probably, you don't have to buy NVidia for inference. Google doesn't.
https://www.cnbc.com/2026/03/19/us-tech-execs-smuggled-nvidi...
Looks like this is basically to mostly match (or slightly undercut, in the case of MiMo-v2.5-Pro) DeepSeek's pricing for DSV4-Pro and DSV4-Flash.
This seems great! Between just these two providers, this is a couple pairs of models that seem suitable for replacing Claude Sonnet and Claude Haiku, at around 1/20th the price.
It's a bummer for me that nothing can match at least Opus 4.6 or GPT-5.5 yet, since I'd characterize those as the first models to actually be good enough to be useful for writing code, at least in my experience at work.
But for simple stuff, or situations where you can have the huge model dispatch to subagents or just "advise" or "supervise" smaller agents on their work, this looks great. Wherever the frontier models end up in a year, if there are open-weight contenders like this around GPT-5.5's level by then, I think I can be happy and productive doing most prototyping with those models and hand-editing for quality or more serious work.
interesting/funny: their off-peak rates apply 00:00-08:00 Beijing time, so nine-to-five for someone on the NA west coast :p
China has a population of 1.4B, US is 349M. 0-8 Beijing time is their off-peak? How is that funny, that's literally how timezones work?
It's funny, in a good way, because their off-peak times match perfectly the werstern peak demand.
Can folks in China run US-based models? Seems like they should take advantage of this overlap in peak timing.
Yes, use VPN; they are the main clients
Why do they use a VPN?
Chinese state firewall?
Well not just that, OpenAI explicitly blocks them
Blocks Chinese users or blocks VPNs? Are they the only one?
https://developers.openai.com/api/docs/supported-countries
You can check for yourself here to see that China and Hong Kong are conveniently missing. We do see blocking from Anthropic and Gemini as well in some regions
Also even though Vietnam and the Philippines are technically supported we do see blocking from some IP addresses in those regions too
I see - I was just curious. Does China permit citizens from accessing American AI models if they were permitted by the American companies?
Well I can tell you lots of them access them no matter what, our service provides a proxy for them if a request gets blocked and lots of AI providers do the same since they access the APIs through a central server without passing along the actual users IP address
Understood, I was just curious if the CCP blocks Chinese citizens even if they were permitted by the US. It looks quite a bit like the general economic policy of China - block foreign companies and artificially drive pricing down for your products globally. I have yet to see evidence to the contrary but was just wondering
Will try MiMo now. I have been mainly using just DeepSeek lately because of the fact that V4-Flash destroys basic work for basically 0 cost. Haven't exceeded even 50% of my OpenCode Go weekly limits using V4 Flash and Pro.
You can use Codex as an orchestrator and claude code via mimo/deepseek api as executor. I've read this a lot before but when you really try it, it is really something in the way you can stretch your credits.
The industry seems to be moving from "best model wins" toward "good enough model at lowest cost wins."
just wondering when the overwhelming consensus among devs is just gonna be that companies should go horizontal instead of vertical. I feel 5.5 is 100% good enough for literally anything i could point it at, and its shortcomings are largely gonna be paired to not grasping wider context it can't really load into a context window to correct (not understanding when a user is gonna miss something that might be implied knowledge at that stage or similar issues).
So, at this stage of time I'm not even totally looking at lesser models to save costs or usage, im using lesser models because they fit the task more than fine and will nail it. Instead though, 6 months from now the model landscape will be totally different, costs will not have gotten better(for US companies), because their priorities are almost entirely on chasing capability of models.
So i hope you're right and the overall market is moving the direction you mention, but I think the US will continue this absurd race to... just being #1 regardless of how much it stops making sense.
It is a combination of 3 things
1. Some companies are very good in training and serving at much lower cost
2. Some companies have access to new much cheaper hardware
3. People have realzeid that you dont need a 3.2T model when a 310B one (Opus vs MiMo 2.5) performs equally well for your particular task.
The 99% is with regards to cached inputs. It seems to now at the same price as deepseek v4-pro
These reductions as Microsoft and Uber say AI is too expensive. The play is right there.
The token plan is confusing.
From their docs "After using 10M input (cache miss) tokens of MiMo-V2.5-Pro, it is equivalent to consuming 3000M Credits, and you can still enjoy 1100M Credits of MiMo-V2.5". So it's around 12M input credit vs Earlier 60M tokens.
Hot take: The reason this is happening is because the market for Chinese AI models hosted by Chinese companies is struggling. Even the market for Chinese AI models hosted by western companies is soft: During the week of May 18, OpenRouter processed 3.4T DeepSeek v3 Flash tokens (their most popular model). Google has announced that Gemini is processing 746T per week; Claude is probably processing more. And the Chinese models were already staggeringly cheap, far cheaper than most Gemini, Claude, or GPT models, before this recent array of pricing changes.
Broadly: No one is using the Chinese AI models. Everyone, globally, everywhere, including in China, is using the models from OpenAI, Anthropic, and Google. The models from the Big Three western labs represent >80% of all tokens processed and likely >95% of all revenue.
> OpenRouter processed 3.4T DeepSeek v3 Flash
> Gemini is processing 746T per week
I read this totally differently. A startup nobody really knows is doing half a percent of Google on a commodity task?!? Google, which puts Gemini on billions of devices by default, without the user asking? Google, which is distributing Gemini to users who are unaware they are even using it?
Versus a startup that does not even have a login button on its homepage?
This is astonishing.
Unfortunately, the market doesn't generally let you buy Blackwells with "we got half a percent of Google's marketshare with a model we're literally giving away for free [1]". You need that thing we call Capital. But, they may certainly opt to have it written on their gravestone, as Google is (checks notes) continuing to put Gemini on billions of devices and doing quadrillions of tokens per month.
[1] https://openrouter.ai/deepseek/deepseek-v4-flash:free
This is a bizarre comment for a couple of reasons.
First, obviously everyone involved understands that someone has to pay to provide a free service. Everyone involved also knows that this sometimes makes sense as a business strategy (I have not paid to ship anything from Amazon for close to two decades).
Second, OpenRouter's business model specifically does not require them to run all (any?) of the models available through the platform. Provider is one of the choices when you choose a model, and each provider can have separate pricing.
The link you posted shows only one provider, Crucible. That may/may not be affiliated with OpenRouter? Even assuming an affiliation, it's opaque who is subsidizing this usage. Is it OpenRouter or Crucible?
All of this is somewhat of a distraction. Even if someone gave search away for free (like Google), it would still be an accomplishment to get to half a percent of Google's volume. Or to sell half a percent of the volume of Android phones. Or whatever.
Kudos to the OpenRouter team!
In the statement "we got half a percent of Google's marketshare with a model we're literally giving away for free" the term "we're" here refers to the conglomeration of "DeepSeek" (for making a model small enough to be capable of being hosted for free) and the model providers who do offer it for free (why they do this is... unknowable). It does not refer to OpenRouter, who are merely middlemen.
My original DeepSeek v4 Flash token counts spanned all providers of that model, both paid and free; I merely pointed out the free provider to substantiate a point that DeepSeek's product may be so bad that they could quite literally give it away and people would still prefer to pay (a lot) to OpenAI, Anthropic, or Google. Why this is the case, I leave as a exercise to the reader; I'm just citing numbers and facts.
Agreed.
Not to mention, week on week more and more tokens are being processed via OpenRouter. [0]. The number keeps going up, with no end in sight in my opinion, if the China models continue offering cheaper inference, whilst tailing behind not too far, the line will keep going up.
[0] - https://openrouter.ai/rankings
OpenRouter is not the only "router" type AI company. More fixed providers like OpenCode and commandcode are offering subscription services on open/china models, likely consuming billions of tokens each. Who know how many tokens are being process directly against Deekseek and Kimi's APIs.
OpenRouter is not indicative of volume. Most high volume clients will go to the providers directly. There's not point to paying the 5% OR cut if you know what you want.
That's just it: This is not happening with the Chinese models, because western corporations are the primary drivers of AI adoption globally and western corporations are not signing up for a DeepSeek API key. If they're working with Chinese models at all, which they rarely are, it is via a western-hosted provider like Bedrock, Vertex, or OpenRouter; or self-hosting. Sure, hobbyists and individual programmers might be comfortable forming a business relationship with a nationalized Chinese entity, but you'd need a microscope to see that relative to the spend that, say, Eli Lilly is throwing at Anthropic every week.
But you're right that OpenRouter is only one data point. It is, unfortunately, one of the few we have.
If you're going to compare OpenRouter numbers for DeepSeek at least use the same metric to compare Gemini. During last week DeepSeek V4 Flash did 3.72T tokens which is way higher than combined token counts for Gemini (2.5 Flash + 3.5 Flash + 3.1 Pro)
DeepSeek's official API, which has 10x cheaper cached input cost isn't even on OpenRouter as a provider, so just like Google, most volume is not going through OpenRouter. (Gemini's official hosted api is on OpenRouter BTW)
Also you're comparing an API with Google's internal corporate and consumer app use. Bytedance announced they were using 63T tokens/day (441T / week) at the end of 2025, so they are probably even higher than Google now. We don't know how much weekly tokens the DeepSeek chatapp uses, but it would also be a very high number much higher than OpenRouter tokens.
For the real reason of the recent price drops, go ask your AI about how much it would cost to run DeepSeek V4 or MiMo 2.5 after Ascend 950 PR have started to be mass delivered in 2026 Apr at $10k / card.
The issue you're not seeing is: Western corporations, the primary drivers of AI spend globally, are not forming business relationships with nationalized Chinese AI labs in order to directly use the DeepSeek API. They're using it through western proxies like OpenRouter, if they're doing it all (newsflash: they aren't). They are forming business relationships with Anthropic, Google, and OpenAI to directly use their APIs.
Why are you using Openrouter as metric here? Most people use the APIs directly.
comparing deepseek usage on openrouter to google usage in total is not statistically correct
you could equally say, in the last complete week openrouter processed more deepseek tokens than any other provider including google
that also would not tell you much about how many tokens are used on deepseek
That makes some sense.
I mean, I am going to use the best I can afford. And at work that's Opus, but while work is happy to let me spend $50+/day, that's just not viable for personal hobby use, I need to keep that in the realm of a WOW/mmo subscription.
Yup; its not just that people want the best models, so they use Opus or GPT-5.5. Its also that we're talking about nationalized chinese companies. Western corporations are not en masse forming business relationships with Chinese firms and subjecting their proprietary code to whatever harness they cook up just to save a million dollars. Its not happening. And that's why the Chinese labs are failing; they're struggling to build a domestic market for token consumption. The Chinese domestic market sucks for almost everything China builds, they need export partners like the US to keep most of their factories on, but in the case of AI: no one overseas is buying.
"The api pricing for mimo-v2-pro and mimo-v2-omni remain unchanged" could we presume this means the discount isn't from hardware improvement or availability ?
VSCode + Cline + Mimo v2.5 pro works ! great !. Give it a try.
So exactly same as deepseek 4 api pricing
One difference is that MiMo 2.5 (non-Pro) has image, audio and video input capabilities.
DeepSeek does not understand image, audio or video.
I've heard non-Pro isn't nearly as good for coding as Pro?
Is this deepseek v4 api pricing for May (at 75% off - offer supposedly end June 1)? Or the non-discounted api pricing?
https://news.ycombinator.com/item?id=48237663
Deepseek made the discounted price permanent before this.
How realistic is this:
Chinese models incidentally slurps up some terms that lead them to finding unflattering words that you wrote about the CCP in a random journal entry, or maybe a social media csv export. You go to China one day and are denied entry due to what you said.
Realistic or no? (yes i know the us is getting bad in re. to what you write online as well)
Models hosted in China are a siren call that I don't feel bad about resisting.
China knows tourists are not Chinese; that's why when you visit China your non-Chinese SIM card silently bypasses the Great Firewall when you use roaming data.
Besides, the Chinese government doesn't really care about individual criticisms, even in public, especially in languages other than Chinese. What they really care about censoring is attempts to organize collective action. They don't care about personal opinions stated in the blog posts of tourists, let alone diary entries.
I really like the US model of free speech, at it's best. It feels natural and right to me. It would be cool if Chinese people had stronger freedom of political speech— I'd love to hear Chinese people publicly share their thoughts online without restraint or censorship; it's a huge country with a lot of smart people with diverse opinions.
But maybe you should go visit China sooner rather than later, tbf. It's friendlier and weirder and more interesting than you think, including w/r/t the censorship regime.
This statement makes no sense, because you literally said the "US is getting bad". We already gave up all of our data, if you wrote something about the CCP you should already expect they know about it.
Besides that, the us govt already has all your data and yet people are criticising it all around, in the open. They can, without repercussions, because the us is a free country.
Chinese people can’t really do the same.
This may be true about any models hosted by others than you.
At least the Xiaomi models are open weights and you can host them yourself, avoiding such concerns.
Well, at least for the Chinese models you can run them locally vs. the US models that requires you to go through their servers. But to answer your question:
> How realistic is this:
Completely unrealistic unless you are a high value target (journalist, spy, business man, etc...)
https://immpolicytracking.org/policies/reported-french-scien...
> CBP denies travelers entry because of anti-Trump comments
China won't deny entry for anti-Trump comments, guess I'll use MiMo
China also won't deny entry for anti-Israel comments, so even more reason to use MiMo.
You're projecting the US doing this with criticism of Trump and Israel on China, when there's no proof of China ever doing something like this.
Everyone already said what I wanted to say. That all US companies (OpenAI, Anthropic, Google, MS Copilot) have increased price recently while Chinese companies (Deepseek, Xiaomi) are reducing price.
The question is how they are managing to do so? They are supposed to struggle due to chip sanctions.
Secondly, why now? The US companies were supposed to subsidize too but now they are unable to keep up. Everyone going to usage based pricing, so it's unsustainable for them. They are well funded too.
If there are genuine hardware breakthrough reducing compute needs then that is good for the whole world I believe.
> They are supposed to struggle due to chip sanctions
As Jensen has been pointing out for almost a year now, these sanctions were ineffective and probably had the opposite effect of the desired goal.
The history is fairly long, but an inflection point could likely be traced to Trump v1 era DOJ enforcement on (among others) Huawei's CFO Meng Wanzhou in 2018. Huawei was hit with the (really big) stick in international transactions: OFAC violation accusations, and it was a seminal moment in the company's internal operations -- they concluded they needed a fully internal supply chain in China, and retooled for it. Meng Wanzhou cases in the US were eventually dismissed, but she was on house arrest in Canada through 2021 or so.
Fast forward to 2024 -- Huawei was culturally and technically ready to build AI accelerators -- one of the externalities of the sanctions was to provide additional benefit to Chinese companies for buying from Huawei; those economics seem to have provided a boost to on-shore development.
Competition I guess, they must be burning some resources to make this price reduction happen...
The state of the art models (mostly GPT 5.5, but also Gemini and Claude) are better so they cost more. Qwen 3.7 Max is their only direct competition and it is not any cheaper.
Are they?
I have been using DeepSeek, and I am finding it better than Claude or Codex, to be honest.
I don't see myself going back.
I love ds4, us models are better imo, but like 5% not 500% better, so the valuation doesn't really make sense
that being said, deepseek v4 needs to be on amazon bedrock to actually be feasible in the US Enterprise market and start driving other provider prices down
I just wish there was a MiMo REAP gguf that was reduced enough to fit within RAM of computers of mere mortals
as someone from the 3rd world - this is pleasant - even 3rd world countries will have affordable "A.I" access via Chinese models.
as someone who now lives & has lived in the west for the majority of their adult life - yeah the US western models r fucked n the crazy valuations of the A.I labs - which also filters down to the economy - since all money instead of being put to productive use is being wasted on this shit. hell electricity bills are up - cz datacenters need power. the current crooks in power don't believe in clean energy.
>>as someone from the 3rd world - this is pleasant - even 3rd world countries will have affordable "A.I"
I stopped tagging my country as developing and then third world and call it for what it is, a POOR country. I know with increasing certainty that my country will be poor for the rest of my life. I also expect AI to be as available as computers: there are the "have", and there are the "don't have", which is almost always a lifetime condition.
I will adopt your take. it's precise.
The price cut is 50%.
Can you explain any further?
If you open the page you will see what is reduced by how much.
Everyone adding "Permanent" to price cuts now
Can't be mistaken for someone like, ugh... Anthropic and OpenAI...
This sort of pressure will force them to though.
I hope so, but I don't know if they are in a position where they can offer these kinds of prices. They are already struggling with not losing a lot of money with their models, while chinese models can be independently hosted by inference providers at a profit already. We need to drive these prices down so AI doesn't become a thing for the few who can pay for expensive subscriptions.
nah, just ban using Chinese models and ban open source models. This will allow them to keep the high price. Got to recoup the money spent somehow, time to lobby the government.
They can't afford it. OpenAI and Anthropic bleed money and are desperate for an IPO, that they can get some extra mileage.
First Deepseek, Now Xiaomi. A price cut of 99%.
This is why Anthropic wants these chinese AI models banned as they are in the lead in the AI race to zero and they know that there is no modal moat.
So don't tell Dario.
Like I said. China doesn't care about money. We want AI in people's hands.
I mean the AI companies probably just want to make American model pricing look ridiculous in comparison (it's working imo). I think the government probably wants actually-useful AI that could be put into chips and actually revolutionize factory work or mining or whatever. Large, SOTA models are not gonna change factory work but extremely efficient and optimized models may
Every industry-wide scale technological revolution has happened because government funded a technology and then opened it up to the masses. Just look at your iPhone: GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't)
Economist Mariana Mazzucato wrote a great book about this called The Entrepreneurial State: Debunking Public vs. Private Sector Myths
> I mean the AI companies probably just want to make American model pricing look ridiculous in comparison (it's working imo)
I really don't think China cares about that. Chinese government's governance logic is making everything so cheap that everyone can get and use it. They did it with EVs and other things. Now they are doing it with the AI.
They do want to see the American bubble burst, this is the quickest way
with all the price increases in everything else, i think we are all tired of this bubble to be honest...
OK. Google was just killed. How is it possible to reduce the price by 99%??????? This is crazy
The reduction is in cached inputs. I've commented about this before but many labs, except Deepseek and Xaomi now, absolutely scam you for cached reads.
You are basically paying out the nose for a few seconds of VRAM residence if you are giving significant money for cache reads.
The very nature of autoregressive language modeling is that every single output token produced "reads" the cache.
So in principle the price floor for a cache hit is the flat cost of 1 output token.
Now in reality it has to be more than that because you are occupying VRAM with the cache that forces out other users. But it can still be really cheap.
No one is producing one output token though.
And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk
I've read on X that deepseek api can stay alive for hours vs 5 minutes tops for other providers. they do it with ram and ssd, not only vram.
- Cheap electricity - Cheap, domestically produced GPUs - Efficiency research by many phDs. (many AI companies used Deepseek's research though)
Industrial Chinese electricity costs is similar to that of Texas, It's 8-9cents a kWh. The only benefit is industrial China decides to put millions of solar panels down, so "peak" sunlight hours can drop electricity costs significantly since their rates are highly dynamic.
Add to that home made inference chips and dirt cheap RAM from CXMT
State backed loss leaders.
Is that worse than VC-backed loss leaders? :)
Yes. VCs can’t compete with the second largest economy on the planet.
I think this is probably correct based on the way state investment into the Chinese EV market has been working - fund a whole bunch of them and let them fight it out to be one of the few brands that will have the longevity. It's pretty brutal with the cars.
The rest of the best of the business is paying for it
Anything to destroy US tech companies is welcome.
They aren't aiming companies but users which many have no common sense and grant these agentic AI access to everything.
All the restrictions the US imposed to CH, will be reverted back and it will be even worse, because now the data is not reaching the US gov ( we all know they have access to US big techs data ) but CH.
I really hope this goes viral and breaks Nvidia/OpenAI.
fun fact, CH is an ISO code of Switzerland, and China is CN
I even know the reasoning but CH for Switzerland always bothers me