Yev from Backblaze here -> Y'all this is one of our favorite things and I've spread it internally. So cool <3
OP here - thank you! I'd actually like to more in the future with your data sets. It's so cool that you publish this.
Love that! I think I re-shared your tweet about it a few days ago! I've been kinda staring at it on a loop on my monitor, fun to watch the balls bouncing around :D
Why doesn't it show a pile of failed drives?
YES
And I'd want to see failed drives somehow organized by TimeInService and maybe origin...
We of course expect their drive usage to grow, but what would be surprising (& provide more info) is how the drives fail or age-out. None of us without huge data centers can get that kind of info
Exactly, I’m. I was hoping to see all time drive failure data as well.
I'd like to do another vis that includes failed drives. Please, keep helping me brainstorm.
Yeah, I think it would be a cooler visualization if the drives were in a line instead of a circle and new drives are added on to the right. The failed drives pile up on the bottom.
Still fun to watch as it is, though.
What exactly is this visualizing? Does each dot represent a chunk of data?
1 small node is a 100 drives. So the small circles represent 100 drives each, I think. Not sure what they … do, though.
Maybe the amount of drives they purchased?
I'm assuming it's the acquisition and removal of drives over time.
> 1 small node -> 100 drives
Nice one, we can see the (logical) shift to bigger drives. One small comment if I may, after 2020-2021 it gets really crowded with the dots and the number of drives leading to a loss of overall picture ;-)
Thats good feedback. It definitely gets crowded, and sluggish due to the number of dom nodes. I didn't spend much time optimizing performance.
there's a few places where all the dots seem to drop (guessing there's some discontinuities in the data?)
There's also a few places where there's duplicate labels (e.g. Hitachi 3TB)
Would be great to group by manufacturer somehow (e.g. color) and make the size more prominent.
Very cool visualization regardless.
I took that to be massive decomming of drives
[deleted]
Yeah, when drives fall off the bottom that represents decommissioning.
No, it's an obvious data error. The same drives reappear in future months.
Pause and look at 2013-08-01 through 2013-11-01, or 2015-09-01 through 2016-01-01 to confirm this.
The data takes a while to load – a dozen megabytes of data or so. After a while the visualization loaded.
(It loaded about 0.3MB/s for me)
Probably wouldn't be that difficult to organize down the y axis based on drive capacity, and the amount of pointless jostling around of small nodes makes it noticeably bog down as the years go by.
Interesting visualization though.
Fun to look at! Since I also had a mini project, that utilized this data. Sadly, haven't maintained it in awhile. It's a Show HN on my profile if you're curious.
I hope you had a better time with ingesting the data than I did :)
I actually built this primarily with chatGPT o1.
One of the things LLMS are really good at is writing scripts for processing and pairing down data. I wanna do a blog post talking about how did some of this, maybe coming up!
As a data guy, I'm not sure how useful this chart is.
But it sure is fun to look at. I enjoyed it ;-)
Not sure why it would intermittently redraw the whole scene though.. could be a Chrome thing.
It's empty in Firefox Android. Play button, slider and nothing else.
[OP] Interesting! I just tested it on Android/FF and it works. Could be a version-specific thing. Could also be just taking awhile to load (it has to download a 32MB json file).
I had to allow d3.js in NoScript for Fennec. Had the same intermittant redraw of the entire screen mentioned above.
Doesn’t appear to work on mobile safari
Or firefox and chromium ubuntu desktop
Works fine for me on Firefox Linux. Interestingly took a lot longer to load in Chromium and Brave but they all work.
or Chrome, Brave on Linux ?
Doesn't appear to work in Firefox or Chrome on Android either.
Works fine for me in Firefox Android.
It does, it just takes some time to download all the data
It works on my iphone with safari
needs the ball sizes to represent the storage capacity!
That’s not true. Don’t shame me like that
Great suggestion.
I personally thought that ssd density and cost sizes would have crossed hdd prices by now. And we would start seeing them in these stats from blackblaze but hdd manufacturers seem to have stayed ahead of them so far.
Density wins but not cost. Maybe in another 5-10 years we'll get to parity.
A pile of microsd cards has been more dense than a hard drive for 20 years, basically as soon as the format existed and stabilized. But at that point you were paying 100x as much.
I hope we reach parity. Right now prices have gone up since 2023, and flash is about 3x as expensive as hard drives.
It would be nice if it was just a conventional graph...
Data from https://www.backblaze.com/cloud-storage/resources/hard-drive... if you want to make one.
[OP]
Backblaze writes about these data sets, and includes some more conventional graphs. For example:
https://www.backblaze.com/blog/backblaze-drive-stats-for-202...
I wanted to do something more fun!
[flagged]
Yev from Backblaze here -> Y'all this is one of our favorite things and I've spread it internally. So cool <3
OP here - thank you! I'd actually like to more in the future with your data sets. It's so cool that you publish this.
Love that! I think I re-shared your tweet about it a few days ago! I've been kinda staring at it on a loop on my monitor, fun to watch the balls bouncing around :D
Why doesn't it show a pile of failed drives?
YES
And I'd want to see failed drives somehow organized by TimeInService and maybe origin...
We of course expect their drive usage to grow, but what would be surprising (& provide more info) is how the drives fail or age-out. None of us without huge data centers can get that kind of info
Exactly, I’m. I was hoping to see all time drive failure data as well.
I'd like to do another vis that includes failed drives. Please, keep helping me brainstorm.
Yeah, I think it would be a cooler visualization if the drives were in a line instead of a circle and new drives are added on to the right. The failed drives pile up on the bottom.
Still fun to watch as it is, though.
What exactly is this visualizing? Does each dot represent a chunk of data?
1 small node is a 100 drives. So the small circles represent 100 drives each, I think. Not sure what they … do, though.
Maybe the amount of drives they purchased?
I'm assuming it's the acquisition and removal of drives over time.
> 1 small node -> 100 drives
Nice one, we can see the (logical) shift to bigger drives. One small comment if I may, after 2020-2021 it gets really crowded with the dots and the number of drives leading to a loss of overall picture ;-)
Thats good feedback. It definitely gets crowded, and sluggish due to the number of dom nodes. I didn't spend much time optimizing performance.
there's a few places where all the dots seem to drop (guessing there's some discontinuities in the data?)
There's also a few places where there's duplicate labels (e.g. Hitachi 3TB)
Would be great to group by manufacturer somehow (e.g. color) and make the size more prominent.
Very cool visualization regardless.
I took that to be massive decomming of drives
Yeah, when drives fall off the bottom that represents decommissioning.
No, it's an obvious data error. The same drives reappear in future months.
Pause and look at 2013-08-01 through 2013-11-01, or 2015-09-01 through 2016-01-01 to confirm this.
The data takes a while to load – a dozen megabytes of data or so. After a while the visualization loaded.
(It loaded about 0.3MB/s for me)
Probably wouldn't be that difficult to organize down the y axis based on drive capacity, and the amount of pointless jostling around of small nodes makes it noticeably bog down as the years go by.
Interesting visualization though.
Fun to look at! Since I also had a mini project, that utilized this data. Sadly, haven't maintained it in awhile. It's a Show HN on my profile if you're curious.
I hope you had a better time with ingesting the data than I did :)
I actually built this primarily with chatGPT o1.
One of the things LLMS are really good at is writing scripts for processing and pairing down data. I wanna do a blog post talking about how did some of this, maybe coming up!
As a data guy, I'm not sure how useful this chart is.
But it sure is fun to look at. I enjoyed it ;-)
Not sure why it would intermittently redraw the whole scene though.. could be a Chrome thing.
It's empty in Firefox Android. Play button, slider and nothing else.
[OP] Interesting! I just tested it on Android/FF and it works. Could be a version-specific thing. Could also be just taking awhile to load (it has to download a 32MB json file).
I had to allow d3.js in NoScript for Fennec. Had the same intermittant redraw of the entire screen mentioned above.
Doesn’t appear to work on mobile safari
Or firefox and chromium ubuntu desktop
Works fine for me on Firefox Linux. Interestingly took a lot longer to load in Chromium and Brave but they all work.
or Chrome, Brave on Linux ?
Doesn't appear to work in Firefox or Chrome on Android either.
Works fine for me in Firefox Android.
It does, it just takes some time to download all the data
It works on my iphone with safari
needs the ball sizes to represent the storage capacity!
That’s not true. Don’t shame me like that
Great suggestion.
I personally thought that ssd density and cost sizes would have crossed hdd prices by now. And we would start seeing them in these stats from blackblaze but hdd manufacturers seem to have stayed ahead of them so far.
Density wins but not cost. Maybe in another 5-10 years we'll get to parity.
A pile of microsd cards has been more dense than a hard drive for 20 years, basically as soon as the format existed and stabilized. But at that point you were paying 100x as much.
I hope we reach parity. Right now prices have gone up since 2023, and flash is about 3x as expensive as hard drives.