Not clear if the author realises that all commercial SQL database engines support querying of the database's metadata using SQL. Or maybe I have misunderstood - I only skimmed the article.
Yeah, this seemed like a very long way to say, "Our RDBMS has system catalogs," as if it's 1987.
But then, they're also doing JOINs with the USING clause, which seems like one of those things that everybody tries... until they hit one of the several reasons not to use them, and then they go back to the ON clause which is explicit and concrete and works great in all cases.
Personally, I'd like to hear more about the claims made about Snowflake IDs.
[deleted]
Isn't his like it is in many relational databases, you can query them about the tables in them?
[deleted]
i think the key difference is making that metadata first-class and queryable across the whole system (lineage, stats, access patterns), not just information_schema / catalog tables. most rdbms expose schema metadata, but not things like which queries produced which rows, freshness, or cost/latency signals unless you bolt it on with tracing. curious if floe is treating metadata as data (versioned, joinable) or as observability sidecars?
That's quite expensive. Most systems that need this sort of data will instead implement some form of audit log or audit table. Which is still quite expensive.
At the record level, I've seldom seen more than an add timestamp, and add user id, a last change timestamp, and a last change user id. Even then, it covers any change to the whole row, not every field. It's still relatively expensive.
> which queries produced which rows
I doubt many real-world applications could tolerate the amount of data/performance degradation this implies. If you need this (and I can't think why you would), then I think writing your own logging code is the answer, rather than lumbering everyone else with it.
Not clear if the author realises that all commercial SQL database engines support querying of the database's metadata using SQL. Or maybe I have misunderstood - I only skimmed the article.
Yeah, this seemed like a very long way to say, "Our RDBMS has system catalogs," as if it's 1987.
But then, they're also doing JOINs with the USING clause, which seems like one of those things that everybody tries... until they hit one of the several reasons not to use them, and then they go back to the ON clause which is explicit and concrete and works great in all cases.
Personally, I'd like to hear more about the claims made about Snowflake IDs.
Isn't his like it is in many relational databases, you can query them about the tables in them?
i think the key difference is making that metadata first-class and queryable across the whole system (lineage, stats, access patterns), not just information_schema / catalog tables. most rdbms expose schema metadata, but not things like which queries produced which rows, freshness, or cost/latency signals unless you bolt it on with tracing. curious if floe is treating metadata as data (versioned, joinable) or as observability sidecars?
That's quite expensive. Most systems that need this sort of data will instead implement some form of audit log or audit table. Which is still quite expensive.
At the record level, I've seldom seen more than an add timestamp, and add user id, a last change timestamp, and a last change user id. Even then, it covers any change to the whole row, not every field. It's still relatively expensive.
> which queries produced which rows
I doubt many real-world applications could tolerate the amount of data/performance degradation this implies. If you need this (and I can't think why you would), then I think writing your own logging code is the answer, rather than lumbering everyone else with it.