Wow, Home Assistant should try something along these lines. Home Assistant’s current handling of time series data is comically poor.
Another decent option might be Clickhouse. Sadly, as far as I know, DuckDB has no real understanding of sorted or ordered data, so it might be challenging to avoid absurd amounts of read amplification.
I have several years worth of timestamped sensor data in a SQlite DB that is 12+GB and growing. One idea was to experiment with DuckDB. Your comment about DuckDB has me worried as time ranged queries are common. Any link for me to dig deeper?
How does this differ from a log file?
I've been thinking about this question for a while. It's confusing at first blush because it's an append-only database and it has a WAL — and it feels like a WAL is already an append-only database, so what's even happening?
Looking back at the project now, I think the value comes from querying it, and especially from automatic aggregations.
This is a reasonable use case that can't immediately be resolved by just logging to a file. It creates an aggregation profile, so a sensor could log temperature every minute and the database will automatically average temperature by the hour. That's a straightforward and meaningful use case.
There's also some query support, but that may be closer to something you can sort of do if you just have a log file.
I think the aggregations are the most direct value proposition. OP/author: worth making this pitch "above the fold" in the README, imho.
Also, I've done a lot of analytics work, and a fun feature to add that I've built in the past is an approximate median. I might open a PR and remind myself how to build that. Cheers!
Seems the IOT / embedded device constraint is what is driving the query feature. You don't have to go scan all of the file and depending on where it is running having the rollup functionality could be a big help
You can query it? And may be faster?
the history of every append only database:
* we will make it append only, the type of data makes sense for it and it will simplify the design
* whoops, devs fucked something up and added a bunch of nonsense that have to be removed, let's figure out how to make at least occasional deletes work
these match my experiences living with these in production.
Wow, Home Assistant should try something along these lines. Home Assistant’s current handling of time series data is comically poor.
Another decent option might be Clickhouse. Sadly, as far as I know, DuckDB has no real understanding of sorted or ordered data, so it might be challenging to avoid absurd amounts of read amplification.
I have several years worth of timestamped sensor data in a SQlite DB that is 12+GB and growing. One idea was to experiment with DuckDB. Your comment about DuckDB has me worried as time ranged queries are common. Any link for me to dig deeper?
How does this differ from a log file?
I've been thinking about this question for a while. It's confusing at first blush because it's an append-only database and it has a WAL — and it feels like a WAL is already an append-only database, so what's even happening?
Looking back at the project now, I think the value comes from querying it, and especially from automatic aggregations.
This is a reasonable use case that can't immediately be resolved by just logging to a file. It creates an aggregation profile, so a sensor could log temperature every minute and the database will automatically average temperature by the hour. That's a straightforward and meaningful use case.There's also some query support, but that may be closer to something you can sort of do if you just have a log file.
I think the aggregations are the most direct value proposition. OP/author: worth making this pitch "above the fold" in the README, imho.
Also, I've done a lot of analytics work, and a fun feature to add that I've built in the past is an approximate median. I might open a PR and remind myself how to build that. Cheers!
Seems the IOT / embedded device constraint is what is driving the query feature. You don't have to go scan all of the file and depending on where it is running having the rollup functionality could be a big help
You can query it? And may be faster?
the history of every append only database:
* we will make it append only, the type of data makes sense for it and it will simplify the design
* whoops, devs fucked something up and added a bunch of nonsense that have to be removed, let's figure out how to make at least occasional deletes work
these match my experiences living with these in production.
[flagged]