414

DeepSeek v4

I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.

Flash: https://gist.github.com/simonw/4a7a9e75b666a58a0cf81495acddf...

Pro: https://gist.github.com/simonw/9e8dfed68933ab752c9cf27a03250...

Both generated using OpenRouter.

For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/

And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/

And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/

an hour agosimonw

DeepSeek pelicans are the angriest pelicans I’ve seen so far.

9 minutes agomurkt

they're just late for work.

7 minutes agokristopolous

No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.

an hour agoJSR_FDED

yeah. look at these 4 feathers (?) on his bum too.

an hour agow4yai

a lot of dumplings

40 minutes agooliver236

What was your prompt for the image? Apologies if this should be obvious.

4 minutes agobrutal_chaos_

>Generate an SVG of a pelican riding a bicycle

at the top of the linked pages.

2 minutes agoshawn_w

The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series

an hour agonickvec

Being a bicycle geometry nerd I always look at the bicycle first.

Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.

The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations. The seat post has different angle than the seat tube, so good luck lowering that.

[1] https://en.wikipedia.org/wiki/Pedersen_bicycle

[2] https://en.wikipedia.org/wiki/Lowrider_bicycle

23 minutes agomikae1

This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.

19 minutes agosimonw

Some other reactions:

I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common. The Pro model fails badly at the spoke pattern.

[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...

6 minutes agomikae1

The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.

3 minutes agojojobas

Where is the GPT 5.5 Pelican?

10 minutes agotheanonymousone

Why they so angry?

10 minutes agolobochrome

I really like the pro version. The pelican is so cute.

an hour agoycui1986

[flagged]

33 minutes agowhateveracct

It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.

30 minutes agofastball

No benchmark will be perfect, especially if it's public but it's a fun experiment to visually see how these models get better and better.

30 minutes agodewey

Why is it so wrong?

30 minutes agopost-it

Thanks for the "scientific air" remark, that gave me a genuine LOL.

22 minutes agosimonw

I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.

7 minutes agocatelm

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.

2 hours agonthypes

I don't think we need to compare models to Opus anymore. Opus users don't care about other models, as they're convinced Opus will be better forever. And non-Opus users don't want the expense, lock-in or limits.

As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.

an hour ago0xbadcafebee

Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...

23 minutes agoversteegen

Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.

28 minutes agoind-igo

What do you run these on? I've gotten comfortable with Claude but if folks are getting Opus performance for cheaper I'll switch.

29 minutes agopost-it

Try Charm Crush first, it's a native binary. If it's unbearable, try opencode, just with the knowledge your system will probably be pwned soon since it's JS + NPM + vibe coding + some of the most insufferable devs in the industry behind that product.

If you're feeling frisky, Zed has a decent agent harness and a very good editor.

24 minutes agoslopinthebag

This resonates with me a lot.

I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company

32 minutes agokmarc

[dead]

28 minutes agoszundi

How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6

2 hours agoonchainintel

The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.

This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.

2 hours agogreenknight

Do you think a lot of people have “systems” to run a 1.6T model?

2 hours agop1esk

To me, the important thing isn't that I can run it, it's that I can pay someone else to run it. I'm finding Opus 4.7 seems to be weirdly broken compared to 4.6, it just doesn't understand my code, breaks it whenever I ask it to do anything.

Now, at the moment, i can still use 4.6 but eventually Anthropic are going to remove it, and when it's gone it will be gone forever. I'm planning on trying Deepseek v4, because even if it's not quite as good, I know that it will be available forever, I'll always be able to find someone to run it.

30 minutes agoCJefferson

No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.

an hour agoapplfanboysbgon

What type of system is needed to self host this? How much would it cost?

an hour agoforrestthewoods

Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.

34 minutes agodisiplus

Depends on fast you want it to be. I’m guessing a couple of $10k mac studio boxes could run it, but probably not fast enough to enjoy using it.

39 minutes agop1esk

One GB200 NVL72 from Nvidia would do it. $2-3 million, or so. If you're a corporation, say Walmart or PayPal, that's not out of the question.

If you want to go budget corporate, 7 x H200 is just barely going to run it, but all in, $300k ought to do it.

21 minutes agofragmede

How many users can you serve with that?

16 minutes agogloflo

Not really - on prem llm hosting is extremely labor and capital intensive

an hour agocholdstare

But can be, and is, done. I work for a bootstrapped startup that hosts a DeepSeek v3 retrain on our own GPUs. We are highly profitable. We're certainly not the only ones in the space, as I'm personally aware of several other startups hosting their own GLM or DeepSeek models.

an hour agoapplfanboysbgon
[deleted]
an hour ago

Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.

an hour agoonchainintel

What's the hardware cost to running it?

an hour agokelseyfrog

I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:

- To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).

- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.

- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).

Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...

an hour agobbor

That article is a total hallucination.

37 minutes agozargon

Probably like 100 USD/hour

an hour agoredox99

"if you have to ask..."

an hour agoslashdave

... if you have 800 GB of VRAM free.

2 hours agojohnmaguire

I remember reading about some new frameworks have been coming out to allow Macs to stream weights of huge models live from fast SSDs and produce quality output, albeit slowly. Apart from that...good luck finding that much available VRAM haha

an hour agoinventor7777

It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.

It's about 2 months behind GPT 5.5 and Opus 4.7.

As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.

It should be obvious now why Anthropic really doesn't want you to run local models on your machine.

an hour agorvz

Vibes > Benchmarks. And it's all so task-specific. Gemini 3 has scored very well in benchmarks for very long but is poor at agentic usecases. A lot of people prefering Opus 4.6 to 4.7 for coding despite benchmarks, much more than I've seen before (4.5->4.6, 4->4.5).

Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.

an hour agodeaux

With the ability of the Qwen3.6 27B, I think in 2 years consumers will be running models of this capability on current hardware.

an hour agosnovv_crash

What's going to change in 2 years that would allow users to run 500B-800B parameter models on consumer hardware?

an hour agocolordrops

I think its just an estimate

an hour agoDiscourseFan

But the question remains

11 minutes agoindigodaddy

> (better than Opus 4.6)

There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.

Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.

This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.

The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.

an hour agoNitpickLawyer

Anthropic fans would claim God itself is behind Opus by 3-6 months and then willingly be abused by Boris and one of his gaslighting tweets.

LMAO

22 minutes agoslopinthebag

> Anthropic fans ...

I have no idea why you'd think that, but this is straight from their announcement here (https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg):

> According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6's non-thinking mode, but there is still a certain gap compared to Opus 4.6's thinking mode.

This is the model creators saying it, not me.

12 minutes agoNitpickLawyer

Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?

If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.

2 hours agodoctoboggan

Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.

an hour agomadagang

I appreciate this, makes me trust it more than benchmarks.

an hour agomchusma

That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.

an hour agodeaux
[deleted]
2 hours ago

For the curious, I did some napkin math on their posted benchmarks and it racks up 20.1 percentage point difference across the 20 metrics where both were scored, for an average improvement of about 2% (non-pp). I really can't decide if that's mind blowing or boring?

Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).

an hour agobbor

FWIW it's also like 10x cheaper.

an hour agoQuasimarion

[dead]

2 minutes agoasamarts

The dragon awakes yet again!

2 hours agosergiotapia

There appears a flight of dragons without heads. Good fortune.

That's literally what the I Ching calls "good fortune."

Competition, when no single dragon monopolizes the sky, brings fortune for all.

an hour agokindkang2024

Pop?

2 hours agorapind

This is shockingly cheap for a near frontier model. This is insane.

For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.

I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.

Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.

3 minutes agorohanm93

865 GB: I am going to need a bigger GPU.

a minute agogardnr

Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.

an hour agoyanis_t

Getting 'Api Error' here :( Every other model is working fine.

an hour agoastrod

Try interacting with it through the website, it will give an error and some explanation on the issue. I had to relax my guardrail settings.

13 minutes agopoglet

https://openrouter.ai/deepseek/deepseek-v4-pro

https://openrouter.ai/deepseek/deepseek-v4-flash

an hour agoesafak

Its on OR - but currently not available on their anthropic endpoint. OR if you read this, pls enable it there! I am using kimi-2.6 with Claude Code, works well, but Deepseek V4 gives an error:

`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"

an hour ago77ko

There's something heartwarming about the developer docs being released before the flashy press release.

2 hours agofblp

Insert obligatory "this is the way" Mando scene. Indeed!

2 hours agoonchainintel

Where's the training data and training scripts since you are calling this open source?

an hour agonecovek

doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?

no one is ever going to release their training data because it's full of copyrighted stuff. everyone, even the hecking-wholesome safety-first Anthropic uses copyrighted data without permission to train their models. there you go.

31 minutes agob65e8bee43c2ed0

it's not a gotcha but people using words in ways others don't like.

8 minutes agofragmede

Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.

an hour agosidcool

Open weight!

an hour agoI_am_tiberius

Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.

Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels

Others worth mentioning:

https://github.com/deepseek-ai/DeepGEMM a competitive foundational library

https://github.com/deepseek-ai/Engram

https://github.com/deepseek-ai/DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-OCR-2

They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all

And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.

The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.

29 minutes agoalecco

For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.

an hour agomchusma

We will be hosting it soon at getlilac.com!

5 minutes agoluew

Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(

31 minutes agostorus

The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.

2 hours agozargon

Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.

input: $0.14/$0.28 (whereas gemini $0.5/$3)

Does anyone know why output prices have such a big gap?

an hour agosbinnee

A few hours after GPT5.5 is wild. Can’t wait to try it.

8 minutes agosibellavia

Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.

13 minutes agotariky

I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.

2 hours agogbnwl

Don't keep up. Much like with news, you'll know when you need to know, because someone else will tell you first.

21 minutes agosatvikpendem

The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.

2 hours agowordpad

I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.

37 minutes agogbnwl

It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.

At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.

Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.

an hour agoehnto

I find ChatGPT annoying mostly

an hour agoDiscourseFan

Open settings > personalization. Set it to efficient base style. Turn off enthusiasm and warmth. You’re welcome

an hour agoawakeasleep

It honestly has all kinda felt like more of the same ever since maybe GPT4?

New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.

Feels like the field has stagnated to a point where only the enthusiasts care.

6 minutes agovrganj

holy shit im right there with you

24 minutes agotrueno

At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.

2 hours agojessepcc

What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?

28 minutes agoCJefferson

You can use deepseek with Claude code

25 minutes agowhoopdeepoo

claude-code-cli/opencode/codex

21 minutes ago0x142857

MMLU-Pro:

Gemini-3.1-Pro at 91.0

Opus-4.6 at 89.1

GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5

Pretty impressive

2 hours agoAliabid94

Funny how Gemini is theoretically the best -- but in practice all the bugs in the interface mean I don't want to use it anymore. The worst is it forgets context (and lies about it), but it's very unreliable at reading pdfs (and lies about it). There's also no branch, so once the context is lost/polluted, you have to start projects over and build up the context from scratch again.

an hour agoant6n

Looking forward to DeepSeek Coding Plan

26 minutes agoclark1013

Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.

2 hours agojdeng

For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.

2 hours agoluyu_wu
[deleted]
2 hours ago

My submission here https://news.ycombinator.com/item?id=47885014 done at the same time was to the weights.

dang, probably the two should be merged and that be the link

2 hours agocmrdporcupine

there's no pinging. Someone's gotta email dang

an hour agoculi

How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?

an hour agoaliljet

For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).

A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.

The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.

Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.

28 minutes agorevolvingthrow

Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.

9 minutes agozargon

A loaded macbook pro can get you to the frontier from 24 months ago at ~10-40tok/s, which is plenty fast enough for regular chatting.

27 minutes agodatadrivenangel

The same way you fit a bucket wheel excavator in your garage

an hour agoawakeasleep

Very carefully

17 minutes agofloam

The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.

The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).

34 minutes ago542458

More like 500k

an hour agojdoe1337halo

SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here

2 hours agoKaoruAoiShiho

Is there a Quantized version of this?

an hour agonamegulf

Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.

2 hours agoswrrt

Does deepseek has any coding plan?

an hour agomariopt

no

32 minutes agojeffzys8

How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.

2 hours agols612

Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?

I have never tried one yet but I am considering trying that for a medium sized model.

an hour agoinventor7777

I've been calling that the "streaming experts" trick, the key idea is to take advantage of Mixture of Expert models where only a subset of the weights are used for each round of calculations, then load those weights from SSD into RAM for each round.

As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.

v4 Flash is 284B, 13B active so might even fit in <32GB.

an hour agosimonw

> ~100GB at 16 bit or ~50GB at 8bit quantized.

V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.

17 minutes agozargon

Ahh, that actually makes more sense now. (As you can tell, I just skimmed through the READMEs and starred "for later".)

My Mac can fit almost 70B (Q3_K_M) in memory at once, so I really need to try this out soon at maybe Q5-ish.

an hour agoinventor7777

The paper is here: [0]

Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.

One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.

There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.

I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.

Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://news.ycombinator.com/item?id=47793880

[2] https://arxiv.org/abs/2512.24880

[3] https://news.ycombinator.com/item?id=46452172

2 hours agorvz

> this is why Anthropic wants to ban open weight models

Do you have a source?

2 hours agojeswin

More like he wants to ban accelerator chip sales to China, which may be about “national security” or self preservation against a different model for AI development which also happens to be an existential threat to Anthropic. Maybe those alternatives are actually one and the same to him.

33 minutes agolouiereederson
[deleted]
an hour ago
[deleted]
2 hours ago

congrats

an hour agohongbo_zhang

Ah now !

42 minutes agodhruv3006

[dead]

2 hours agocreamyhorror

[dead]

an hour agohubertzhang

[dead]

an hour agomaryjeiel

[flagged]

an hour agominhajulmahib

Why did you bother to submit an AI comment?

an hour agopolski-g

I suspect you may have replied to a bot. Dead internet theory

an hour agosidcool

OMG

OMG ITS HAPPENING

30 minutes agoslopinthebag

I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.

2 hours agoshafiemoji

History doesn't always repeat itself.

But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.

Then a few weeks later it'll be forgotten by most.

2 hours agoraincole

It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.

an hour agosbysb

OpenCode? Pi?

If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.

The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.

an hour agoraincole

They have instructions right on their page on how to use claude code with it.

an hour agocmrdporcupine

[flagged]