4

Show HN: a small API layer for real-time AI streaming, retries, and debugging

While building AI features that rely on real-time streaming responses, I kept running into failures that were hard to reason about once things went async.

Requests would partially stream, providers would throttle or fail mid-stream, and retry logic ended up scattered across background jobs, webhooks, and request handlers.

I built ModelRiver as a thin API layer that sits between an app and AI providers and centralizes streaming, retries, failover, and request-level debugging in one place.

It’s early and opinionated, and there are tradeoffs. Happy to answer technical questions or hear how others are handling streaming reliability in production AI apps.

4 hours agoakarshc

At what point does adding this layer become more complex than just handling streaming failures directly in the app?

3 hours agoamalv

If streaming behavior is still product-specific and changing fast, this adds friction. It only pays off once failure handling stabilizes and starts repeating across the system.

3 hours agoakarshc
[deleted]
4 hours ago

Why not just handle this in the application with queues and background jobs?

4 hours agoarxgo

Queues work well before or after a request, but they’re awkward once a response is already streaming. This layer exists mainly to handle failures during a stream without spreading that logic across handlers, workers, and client code.

4 hours agoakarshc

[dead]