145

Gemini 3.5 Flash: frontier intelligence with action

Per million input/output tokens:

Gemini 2.5 flash: $0.30/$2.50

Gemini 3.0 flash preview: $0.50/$3.00

Gemini 3.5 flash preview: $1.50/$9.00

Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model.

3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10

3 minutes agoGodelNumbering

  > Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG
3.5 Flash: Thinking Medium - 7516 tokens

https://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...

3.5 Flash: Thinking High - 7280 tokens

https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...

3.1 Pro - 28,258 tokens

https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...

Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.

43 minutes agoSXX

All three links animate for me.

32 minutes agocaptn3m0

I think they mean the boat is moving. In the flash ones the paddles are animated but the boat is stationary for me.

27 minutes agoNitpickLawyer

The boat moves in all three for me

19 minutes agocodazoda

The boat itself rocks, but do you see the background changing to indicate the boat is progressing through the environment? I only see that in the 3.1 Pro example. I believe that's what the OP meant.

13 minutes agoFishkins

I think this illustrates the problem with OP's prompt. If the goal is specifically to implement a scrolling background, this should have been in the prompt.

6 minutes agoManuel_D

Your links are broken FYI.

41 minutes agoabi

They work for me.

41 minutes agoJohn7878781

They do work here too.

32 minutes agoTacticalCoder

[delayed]

4 minutes agonpn

Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall

23 minutes agoOsrsNeedsf2P

The pricing is an issue.

3 minutes agodroidjj

Yeah, bummer. I was very excited for this release, but this killed it.

15 minutes agosauwan

Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.

They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.

an hour agohimata4113

Given the cost increase associated with this model, and previous model releases, I think the size is trending upwards, not down.

an hour agostri8ed

The speed says otherwise. I think they're increasing costs since they want to start seeing ROI.

an hour agohimata4113

Those are (mostly) new, faster TPU

21 minutes agoJanSt

latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.

17 minutes agohimata4113

Don’t let that fool yourself. Google will have SOTA models as big as or even bigger than their competitors.

They are just refining their current models while they finish training the next generation.

They will all come out at about the same time. Anthropic, OpenAi, Google, xAI

an hour agomaipen

Anthropic has been sitting on Mythos for a while now. I guess they don't feel pressured to fuck it ship it until anyone else gets a 10T to work.

43 minutes agoACCount37

According to people who have access to Mythos, it is slightly worse than GPT-5.5-xhigh. At least for security tasks.

2 minutes agothrowa356262

I suspect that Mythos doesn't have a business model that works

18 minutes agooutside1234

It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?

35 minutes agoSevii

Google’s pro models are almost certainly bigger than Openai’s lol

44 minutes agohowdareme

GPT-5.5 on the benchmarks still seem to perform better than this

Plus the vibe of the gemini models are so weird particularly when it comes to tool calling

At this point I kinda need them to shock me to make the switch

9 minutes agowarthog

$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite

an hour agoasar

I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.

Cost per task is a more productive measure, but obviously a more difficult one to benchmark.

21 minutes agoWarmWash

I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.

an hour agohimata4113

It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.

44 minutes agowolttam

gemini models solve a problem in 80% less tokens so that's something to think about.

29 minutes agohimata4113

In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.

42 minutes ago__jl__

10% of input pricing is standard especially compared to competition.

an hour agominimaxir

yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.

an hour agohimata4113

[deleted]

an hour agoJohn7878781

Output cost is 3x from Gemini 3 flash.

an hour agostri8ed

Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.

18 minutes agos3p

The Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.

15 minutes agonoelsusman

Is there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.

41 minutes agoaliljet

People complain about them incessantly, but I can almost never get people to actually post receipts. Every provider allows sharing chats, and anyone can share a prompt that reliably produces hallucinations.

More often than not, people are using images in responses that go awry. Which is fair, the models are sold as multi-modal, but image analyses is still at gpt-4.0 text-analyses levels.

23 minutes agoWarmWash

I see hallucinations ALL the time. It's only obvious when you're prompting about a subject you know well.

And when I say all the time, I mean it, and this is for Opus 4.7 Adaptive.

I often have to say, please do searches and cite sources, as if it doesn't it will confidently give me wrong or outdated information.

If you're often asking questions about a topic that's not in your specialist knowledge you won't notice them.

15 minutes agosaberience

As long as the model uses web search, they almost never hallucinate anymore. The fast models (haiku, gpt-instant, flash) still sometimes have the problem where they don't search before answering so they can hallucinate

10 minutes agoFergusArgyll

I haven't been bothered by hallucinations in premier models since early last year. Still see it in smaller local models though.

36 minutes agoSevii

I'm really running into this deep at the edges of content creation. Take, for example, a need to general some kind of legal work. The cost of painstakingly checking and rechecking each case cited is reducing the value of these frontier models immensely.

Coding, however, is solved like magic. Easier to add tests, to be fair.

32 minutes agoaliljet

if last year's models were the ones people got familiar with in late 2022, hallucinations would be an underrepresented rumor, there would be no articles about it because its so rare. overconfident lawyers wouldn't have messed up dockets in court with fake case law, in other domains that move faster, sources would be only partially outdated with agentic search and mcp servers filling in the gaps

AI psychosis would be the problem people talk about more, not just outright agreement but subtle ways of making you feel confident in your ideas. "yes, buy that domain name buy these other ones for defensibility"

(the domain name is dumb and completely unmarketable)

28 minutes agoyieldcrv

The models still hallucinate bad when called via APIs, especially if web search is not enabled. Gemini hallucinates quite frequently even with the app and search enabled. More recent (e.g. ChatGPT 5.x and Deepseek v3) prompts/harnesses search very aggressively, which does greatly mitigate hallucinations.

11 minutes agojampekka

benchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?

an hour agomixtureoftakes
[deleted]
an hour ago

Just updated my HN Wrapped project with it and it does well on my totally unscientific LLM humor benchmark: https://hn-wrapped.kadoa.com

7 minutes agohubraumhugo

No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?

Also concerned about Gemini models being benchmaxxed generally

8 minutes agosimianwords
[deleted]
an hour ago

3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.

I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.

One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.

[0] https://artificialanalysis.ai/models/gemini-3-5-flash [1] https://artificialanalysis.ai/models/gemini-3-1-pro-preview

35 minutes agoeis

Seems like the only good thing about 3.5 Flash is its speed. Not cost-competitive or benchmark-leading by any means.

21 minutes agoekojs

>3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite

That's everything I needed to know.

25 minutes agols_stats

Its Gemini 3.5 Flash

an hour agoalexdns

Yeah, Google chose a misleading title for the blog post.

41 minutes agonerdalytics

AI being a product is not the future. It's more like an operating system that deserves to be open and free (aka Linux). Unless that happens we are in for a very dystopian future. I wish I had the intelligence, resources and/or connections to try and make that happen.

32 minutes agonightski

Oh boy.

GDM is making (or has been backed into a corner into making) the bet that high throughput, low latency, low capability models are the path forward.

That probably works for vibe coded apps by non-practitioners.

I suspect that practitioners/professionals will wait longer for better results.

28 minutes agoHardCodedBias

Where do you see that it’s low capability?

And Google is trying to make something affordable enough for a mass market, ad-supported audience.

They aren’t hyper focused on enterprise like Anthropic is. And that’s okay. There’s room for different players in different markets.

17 minutes agobrokencode

Triple the price of the last Flash model ($3 -> $9 per 1M output). Quickly approaching Sonnet prices.

Feels like the AI pricing noose is tightening sooner rather than later.

39 minutes agobakugo

Add Flash to the title, please.

42 minutes agocesarvarela

edited it.

24 minutes agomeetpateltech

[dead]

an hour agomugivarra69

Pricing is now live on ai.google.dev/pricing:

Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.

For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00

So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.

41 minutes agobenbencodes

You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model

32 minutes agolyjackal

Please delete/edit your AI-written and factually wrong post.

15 minutes agoTiberium

I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.

35 minutes agoconorh

Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!

28 minutes agomchusma

Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.

If this is the big model release out of google, its a disappointent.

18 minutes agomchusma

You are seeing batch inference, standard inference is $1.5/$9. I was excited until I saw that price.

29 minutes agols_stats

Standard pricing is showing for me as $1.50 / $9.

(I suspect you're viewing the "flex" pricing).