I think they mean the boat is moving. In the flash ones the paddles are animated but the boat is stationary for me.
The boat moves in all three for me
The boat itself rocks, but do you see the background changing to indicate the boat is progressing through the environment? I only see that in the 3.1 Pro example. I believe that's what the OP meant.
I think this illustrates the problem with OP's prompt. If the goal is specifically to implement a scrolling background, this should have been in the prompt.
Can you try with a more complex story such as "three little pigs"? I tried but it created a storybook instead of the SVG animation. I am looking to partially imitate Godogen [1][2] which is really great, even for animations.
It’s not possible to uptrain on preview releases and it did not get that much love for a while.
[delayed]
Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall
The pricing is an issue.
Yeah, bummer. I was very excited for this release, but this killed it.
Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.
They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.
Given the cost increase associated with this model, and previous model releases, I think the size is trending upwards, not down.
The speed says otherwise. I think they're increasing costs since they want to start seeing ROI.
Those are (mostly) new, faster TPU
latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.
Don’t let that fool yourself.
Google will have SOTA models as big as or even bigger than their competitors.
They are just refining their current models while they finish training the next generation.
They will all come out at about the same time. Anthropic, OpenAi, Google, xAI
Anthropic has been sitting on Mythos for a while now. I guess they don't feel pressured to fuck it ship it until anyone else gets a 10T to work.
According to people who have access to Mythos, it is slightly worse than GPT-5.5-xhigh. At least for security tasks.
I suspect that Mythos doesn't have a business model that works
It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?
Google’s pro models are almost certainly bigger than Openai’s lol
GPT-5.5 on the benchmarks still seem to perform better than this
Plus the vibe of the gemini models are so weird particularly when it comes to tool calling
At this point I kinda need them to shock me to make the switch
$1.5/m input tokens
$9/m output tokens
6x the price of 3.1 flash lite
I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.
Cost per task is a more productive measure, but obviously a more difficult one to benchmark.
I wonder why they didn't discuss price in the post?
I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.
It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
gemini models solve a problem in 80% less tokens so that's something to think about.
In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
10% of input pricing is standard especially compared to competition.
yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.
[deleted]
Output cost is 3x from Gemini 3 flash.
Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.
The Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.
Is there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.
People complain about them incessantly, but I can almost never get people to actually post receipts. Every provider allows sharing chats, and anyone can share a prompt that reliably produces hallucinations.
More often than not, people are using images in responses that go awry. Which is fair, the models are sold as multi-modal, but image analyses is still at gpt-4.0 text-analyses levels.
I see hallucinations ALL the time. It's only obvious when you're prompting about a subject you know well.
And when I say all the time, I mean it, and this is for Opus 4.7 Adaptive.
I often have to say, please do searches and cite sources, as if it doesn't it will confidently give me wrong or outdated information.
If you're often asking questions about a topic that's not in your specialist knowledge you won't notice them.
As long as the model uses web search, they almost never hallucinate anymore. The fast models (haiku, gpt-instant, flash) still sometimes have the problem where they don't search before answering so they can hallucinate
I haven't been bothered by hallucinations in premier models since early last year. Still see it in smaller local models though.
I'm really running into this deep at the edges of content creation. Take, for example, a need to general some kind of legal work. The cost of painstakingly checking and rechecking each case cited is reducing the value of these frontier models immensely.
Coding, however, is solved like magic. Easier to add tests, to be fair.
if last year's models were the ones people got familiar with in late 2022, hallucinations would be an underrepresented rumor, there would be no articles about it because its so rare. overconfident lawyers wouldn't have messed up dockets in court with fake case law, in other domains that move faster, sources would be only partially outdated with agentic search and mcp servers filling in the gaps
AI psychosis would be the problem people talk about more, not just outright agreement but subtle ways of making you feel confident in your ideas. "yes, buy that domain name buy these other ones for defensibility"
(the domain name is dumb and completely unmarketable)
The models still hallucinate bad when called via APIs, especially if web search is not enabled. Gemini hallucinates quite frequently even with the app and search enabled. More recent (e.g. ChatGPT 5.x and Deepseek v3) prompts/harnesses search very aggressively, which does greatly mitigate hallucinations.
benchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?
[deleted]
Just updated my HN Wrapped project with it and it does well on my totally unscientific LLM humor benchmark: https://hn-wrapped.kadoa.com
No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?
Also concerned about Gemini models being benchmaxxed generally
[deleted]
3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.
I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.
One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.
Yeah, Google chose a misleading title for the blog post.
AI being a product is not the future. It's more like an operating system that deserves to be open and free (aka Linux). Unless that happens we are in for a very dystopian future. I wish I had the intelligence, resources and/or connections to try and make that happen.
Oh boy.
GDM is making (or has been backed into a corner into making) the bet that high throughput, low latency, low capability models are the path forward.
That probably works for vibe coded apps by non-practitioners.
I suspect that practitioners/professionals will wait longer for better results.
Where do you see that it’s low capability?
And Google is trying to make something affordable enough for a mass market, ad-supported audience.
They aren’t hyper focused on enterprise like Anthropic is. And that’s okay. There’s room for different players in different markets.
Triple the price of the last Flash model ($3 -> $9 per 1M output). Quickly approaching Sonnet prices.
Feels like the AI pricing noose is tightening sooner rather than later.
Add Flash to the title, please.
edited it.
[dead]
Pricing is now live on ai.google.dev/pricing:
Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.
For comparison within the Gemini lineup:
- Gemini 2.5 Flash: $0.30 / $2.50
- Gemini 3.1 Flash-Lite: $0.25 / $1.50
- Gemini 3.1 Pro Preview: $2.00 / $12.00
So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.
You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model
Please delete/edit your AI-written and factually wrong post.
I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.
Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!
Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.
If this is the big model release out of google, its a disappointent.
You are seeing batch inference, standard inference is $1.5/$9.
I was excited until I saw that price.
Per million input/output tokens:
Gemini 2.5 flash: $0.30/$2.50
Gemini 3.0 flash preview: $0.50/$3.00
Gemini 3.5 flash preview: $1.50/$9.00
Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model.
3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10
https://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...
3.5 Flash: Thinking High - 7280 tokens
https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...
3.1 Pro - 28,258 tokens
https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...
Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.
Gemini 3.1 Flash Lite Thinking High - 2,526 tokens:
https://gistpreview.github.io/?3496285c5dac5ba10ebbc0b201a1a...
Gemini 2.5 Pro - 5,325 tokens:
https://gistpreview.github.io/?cc5e0fefeaaffecd228c16c95e736...
Gemini 2.5 Flash - 7,556 tokens:
https://gistpreview.github.io/?263d6058fe526a62b8f270f0620ec...
All three links animate for me.
I think they mean the boat is moving. In the flash ones the paddles are animated but the boat is stationary for me.
The boat moves in all three for me
The boat itself rocks, but do you see the background changing to indicate the boat is progressing through the environment? I only see that in the 3.1 Pro example. I believe that's what the OP meant.
I think this illustrates the problem with OP's prompt. If the goal is specifically to implement a scrolling background, this should have been in the prompt.
hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF @ Q6_K
8112 tokens @ 52.97 TPS, 0.85s TTFT
https://gistpreview.github.io/?7bdefff99aca89d1bc12405323bd4...
Full session: https://gist.github.com/abtinf/7bdefff99aca89d1bc12405323bd4...
Generated with LM Studio on a Macbook Pro M2 Max
https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6...
Can you try with a more complex story such as "three little pigs"? I tried but it created a storybook instead of the SVG animation. I am looking to partially imitate Godogen [1][2] which is really great, even for animations.
[1] https://github.com/htdt/godogen
[2] https://drive.google.com/file/d/1ozZmWcSwieZQG0muYjbj7Xjhhlz...
Your links are broken FYI.
They work for me.
They do work here too.
Stil no new processor version for document ai https://docs.cloud.google.com/document-ai/docs/release-notes that is so weird. (Customer extractor)
It’s not possible to uptrain on preview releases and it did not get that much love for a while.
[delayed]
Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall
The pricing is an issue.
Yeah, bummer. I was very excited for this release, but this killed it.
Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.
They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.
Given the cost increase associated with this model, and previous model releases, I think the size is trending upwards, not down.
The speed says otherwise. I think they're increasing costs since they want to start seeing ROI.
Those are (mostly) new, faster TPU
latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.
Don’t let that fool yourself. Google will have SOTA models as big as or even bigger than their competitors.
They are just refining their current models while they finish training the next generation.
They will all come out at about the same time. Anthropic, OpenAi, Google, xAI
Anthropic has been sitting on Mythos for a while now. I guess they don't feel pressured to fuck it ship it until anyone else gets a 10T to work.
According to people who have access to Mythos, it is slightly worse than GPT-5.5-xhigh. At least for security tasks.
I suspect that Mythos doesn't have a business model that works
It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?
Google’s pro models are almost certainly bigger than Openai’s lol
GPT-5.5 on the benchmarks still seem to perform better than this
Plus the vibe of the gemini models are so weird particularly when it comes to tool calling
At this point I kinda need them to shock me to make the switch
$1.5/m input tokens $9/m output tokens
6x the price of 3.1 flash lite
I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.
Cost per task is a more productive measure, but obviously a more difficult one to benchmark.
I wonder why they didn't discuss price in the post?
Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/
I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.
It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
gemini models solve a problem in 80% less tokens so that's something to think about.
In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
10% of input pricing is standard especially compared to competition.
yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.
[deleted]
Output cost is 3x from Gemini 3 flash.
Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.
The Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.
Is there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.
People complain about them incessantly, but I can almost never get people to actually post receipts. Every provider allows sharing chats, and anyone can share a prompt that reliably produces hallucinations.
More often than not, people are using images in responses that go awry. Which is fair, the models are sold as multi-modal, but image analyses is still at gpt-4.0 text-analyses levels.
I see hallucinations ALL the time. It's only obvious when you're prompting about a subject you know well.
And when I say all the time, I mean it, and this is for Opus 4.7 Adaptive.
I often have to say, please do searches and cite sources, as if it doesn't it will confidently give me wrong or outdated information.
If you're often asking questions about a topic that's not in your specialist knowledge you won't notice them.
well there is https://artificialanalysis.ai/evaluations/omniscience
As long as the model uses web search, they almost never hallucinate anymore. The fast models (haiku, gpt-instant, flash) still sometimes have the problem where they don't search before answering so they can hallucinate
I haven't been bothered by hallucinations in premier models since early last year. Still see it in smaller local models though.
I'm really running into this deep at the edges of content creation. Take, for example, a need to general some kind of legal work. The cost of painstakingly checking and rechecking each case cited is reducing the value of these frontier models immensely.
Coding, however, is solved like magic. Easier to add tests, to be fair.
maybe something like this? https://petergpt.github.io/bullshit-benchmark/viewer/index.v...
if last year's models were the ones people got familiar with in late 2022, hallucinations would be an underrepresented rumor, there would be no articles about it because its so rare. overconfident lawyers wouldn't have messed up dockets in court with fake case law, in other domains that move faster, sources would be only partially outdated with agentic search and mcp servers filling in the gaps
AI psychosis would be the problem people talk about more, not just outright agreement but subtle ways of making you feel confident in your ideas. "yes, buy that domain name buy these other ones for defensibility"
(the domain name is dumb and completely unmarketable)
The models still hallucinate bad when called via APIs, especially if web search is not enabled. Gemini hallucinates quite frequently even with the app and search enabled. More recent (e.g. ChatGPT 5.x and Deepseek v3) prompts/harnesses search very aggressively, which does greatly mitigate hallucinations.
Here's the benchmark scoreboard they published:
https://storage.googleapis.com/gweb-uniblog-publish-prod/ori...
benchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?
Just updated my HN Wrapped project with it and it does well on my totally unscientific LLM humor benchmark: https://hn-wrapped.kadoa.com
No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?
Also concerned about Gemini models being benchmaxxed generally
3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.
I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.
One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.
[0] https://artificialanalysis.ai/models/gemini-3-5-flash [1] https://artificialanalysis.ai/models/gemini-3-1-pro-preview
Seems like the only good thing about 3.5 Flash is its speed. Not cost-competitive or benchmark-leading by any means.
>3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite
That's everything I needed to know.
That's what I came here to check. Last model release they only put it into preview[0] at first.
Does that mean this model is production ready?
[0] https://news.ycombinator.com/item?id=47076484
Its Gemini 3.5 Flash
Yeah, Google chose a misleading title for the blog post.
AI being a product is not the future. It's more like an operating system that deserves to be open and free (aka Linux). Unless that happens we are in for a very dystopian future. I wish I had the intelligence, resources and/or connections to try and make that happen.
Oh boy.
GDM is making (or has been backed into a corner into making) the bet that high throughput, low latency, low capability models are the path forward.
That probably works for vibe coded apps by non-practitioners.
I suspect that practitioners/professionals will wait longer for better results.
Where do you see that it’s low capability?
And Google is trying to make something affordable enough for a mass market, ad-supported audience.
They aren’t hyper focused on enterprise like Anthropic is. And that’s okay. There’s room for different players in different markets.
Triple the price of the last Flash model ($3 -> $9 per 1M output). Quickly approaching Sonnet prices.
Feels like the AI pricing noose is tightening sooner rather than later.
Add Flash to the title, please.
edited it.
[dead]
Pricing is now live on ai.google.dev/pricing:
Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" — which is why it's higher than a typical flash-class model.
For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00
So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.
You’re quoting the batch pricing. On demand is 1.5 per input and 9 per M output. This is effectively comparable cost to Gemini 2.5 Pro in a flash tier model
Please delete/edit your AI-written and factually wrong post.
I think you have your pricing wrong there, Gemini 3.5 flash is $1.50 input and $9 output.
Okay, it's kind of somewhere between haiku and sonnet level pricing, at somewhere between sonnet and opus level performance. Its a great option. I was hoping to see opus class intelligence at haiku level pricing out of google, and this is close to that!
Never mind, after looking at more benchmarks, seems closer to sonnet level intelligence at slightly lower cost. Speed is great for latency sensitive applications, but if this was 1/2 the cost it would have been priced to win.
If this is the big model release out of google, its a disappointent.
You are seeing batch inference, standard inference is $1.5/$9. I was excited until I saw that price.
Standard pricing is showing for me as $1.50 / $9.
(I suspect you're viewing the "flex" pricing).