The "010 Editor" (commercial) can apply templates to color code actual semantic meaning. Very useful. A basic C struct can be enough, but it's powerful enough to handle quite complex formats. Besides colors this also gives you semantic navigation and folding etc.
For anyone who regularly has to look at/analyze binary files, i highly recommend ImHex [1].
Its a hex editor built with imgui and has a lot of built in tools. Imo the best feature is the data structure editor. You can write a data type definition similar to C and it overlays it on the hexdump and parses it in a structured way while you type.
ImHex (https://imhex.werwolv.net/) is also a really nice Hex editor with tons of plugins (patterns, file support, etc.) and even an embedded language for adding more patterns easily
Everything should try do some basic syntax highlighting IMO. Not too much, or it just becomes a sea of formatting that doesn't help at all. It is surprising how much difference just a little splash of colour can make if it isn't overdone. If possible, always include configuration options for the user though, so those with colour-blindness issues can tweak things to their needs, those who are just fussy can make the output fit with their finely adjusted system-wide colour schemes¹, and even better, where you can, allow bold/italic/other as well as colours so that those who barely see colour at all can play too.
Of course none of this helps those using screen-readers and other tech, so make sure that all your fancy colouring & such is additive so if it is all “lost” no meaning is absolutely lost with it.
--------
[1] Some people can be very vocal about this, more so than if highlighting isn't possible at all. If you give any output formatting they'll expect you to match, or be able to be made to match, their preferred style.
Why did author decide that best way to demonstrate his idea would be by cutting contrast in half?
color-coding might be a great solution, but you don't really know beforehand which byte values are important. Manually selecting C0 to make it stand out it just ctrl+f with extra steps. (But I wouldn't mind something like "color 00 separate from ascii separate from the rest)
> Manually selecting C0 to make it stand out
That's not what they did, actually. C0 is the only byte in there that's above 3F or so, and it's far from it. Hence the very different colour, and the lack of contrast between the colours of the other bytes.
This article made me think how I could use similar techinques to colour code the data in database tables. Has anyone here tried that and has some recommendations where to start, etc.?
Emacs's hexl-mode does this, incidentally, though annoyingly by default it makes all faces the same color. I never understood why it defines the faces but then doesn't customize them.
The cool thing about it imo (outside of colors) is a `--windows` flag. Which separates the hex view into partitions: so `-w 2:-3:5` shows the first two bytes on a line, then skips three bytes, then shows the next 5 bytes on a line, then the rest of the file. Easy to use combined with a terminal's up arrow.
That said, even colored these dumps still feels unappealing to me — so yes this is admittedly subjective gut jumping in the conversation. I get that occult form can also be an attractive force.
The post put on the table an interesting point about how to improve the presentation layer to fit what’s human cognition is good at spotting (in general, or at least for the expected audience with some training). And it does start proposing something with these color schemes. But isn’t it kind of missing the forest for the tree? Actually why do we even have rendering with [012345678ABCDEF], when a specific set of (colored/imaged?) glyphs would be able to make more obvious what’s on the table? Or even beyond the hexadecimal grouping, wouldn’t be more relevant to render something "intuitively" far more easy to grap without several layer of internalized interpretation through acculturation?
I can't think of anything better than a hex dump for representing raw binary data. I don't mean that there are no others, equally good representations, but hex dumps win because of familiarity.
Of course, if you know about the format, there are better ways, but it goes beyond the scope of a hex editor, though the most advanced ones support things like template files and can display structured data, disassembly, etc...
> Actually why do we even have rendering with [012345678ABCDEF], when a specific set of (colored/imaged?) glyphs would be able to make more obvious what’s on the table?
Most of us have internalized the relationship between digits in [0-9] for a very long time. Adding 6 more glyphs after that is quite easy (and they're also somewhat well known in the world), and after a while you stop even thinking about the glyphs consciously anyway. A hex 'C' intuitively means to me '4 from the end'. A hex 'F' intuitively means to me 'all 4 bits are set to 1'. I don't see any advantage to switching to a different glyph set for this base, other than disruption for disruption's sake.
> Or even beyond the hexadecimal grouping, wouldn’t be more relevant to render something "intuitively" far more easy to grap without several layer of internalized interpretation through acculturation?
Modern computers deal with 8-bit bytes, and their word sizes are a multiple of bytes - unless you're dealing with bit-packed data, which is comparatively rare (closest is bit twiddling of MMIO registers, which is when you sometimes switch to binary; although for a 4-bit hex nibble you can still learn arbitrary combinations of bits on/off into its value).
This means you can group 8 bits into 1 digits of 8 bits as one glyph (alphabet too large to be useful), 2 digits of 4 (hex), 4 digits of 2 (alphabet too small to give a benefit over binary) and 8 digits of 1 (binary). Hex just works really well as a practical middle ground.
Back when computers used 12 bit words (PDP-8 and friends) octal (4 digits of 3 bits represented in the 0-7 alphabet) was more popular.
If you analyze binary files often, I highly recommend binvis - http://binvis.io/. It creates a colored minimap for files it loads and has two available arrangements. Pixel color is based on range of bytes, eg ASCII/null bytes/FF bytes. Besides, it’s a pretty basic hex viewer that runs in your browser. The minimap is extremely powerful for identifying interesting areas and patterns in unknown data.
> it’s a pretty basic hex viewer that runs in your browser
excuse me? "basic" and "runs in your browser" together sound very contradictory to me. while doing things i actually feel (yes, emotionally) much better when there is no browser open on my machine, but only text editors, vcs gui and file managers, and terminals of course. and sometimes i reject an idea to start a browser just thinking how much ram it will take (ha, what a progress we have done - one github issue tab, with text only and no images, takes 180mb of ram).
It's basic bause it does like two things. It's not advanced or complex. HN is also a basic forum, even though it runs in a browser.
I really like hexyl [1], which does this by default.
The author uses hexyl as an example of trying, but not doing it right.
I think semantic coloring (based on structure) is more useful. Also (can't help as someone working with z/OS), if you really want to make hex output readable, I recommend using big-endian machine.
I've started doing this with hashes in a CLI I'm working on. For slow prints, it's somewhat helpful https://asciinema.org/a/aD38Pk88CZgSZqtq but for debug dumps with many many hashes it really helps readability and tracking hashes across lines.
radare2 also has excellent hex viewing/editing support, if one manages to grok the usage of it.
> it’s much easier to pick out the unique byte when it’s a different color! human brains are really good at spotting visual patterns—given the right format
Don't really see the advantage. Unique bytes have no unique meaning across data types.
The only good syntax highlight to me is 00 and perhaps FF. But that's my opinion of course.
Anything else that has no direct relation to what you're looking at is meaningless.
To me the random colors at each byte is messing up with my brain making it hard to fast identify C0 or any other value that I could more easily identify in all black.
But color would be nice more based on the bytes logic.
Eventually the 00 in a shaded grey instead of black, and in best case scenario by logic unit based on your protocol.
And worst case scenario by groups of words or so.
[dead]
Anyone else see the irony?
> Your hex editor should colour-code bytes so it is easier for users to distinguish patterns
> Article is fully in lowercase, which makes it harder for readers to make out sentences and the flow of the article
> mfw
The "010 Editor" (commercial) can apply templates to color code actual semantic meaning. Very useful. A basic C struct can be enough, but it's powerful enough to handle quite complex formats. Besides colors this also gives you semantic navigation and folding etc.
For anyone who regularly has to look at/analyze binary files, i highly recommend ImHex [1].
Its a hex editor built with imgui and has a lot of built in tools. Imo the best feature is the data structure editor. You can write a data type definition similar to C and it overlays it on the hexdump and parses it in a structured way while you type.
It also has a node based editor.
1: https://github.com/WerWolv/ImHex
ImHex (https://imhex.werwolv.net/) is also a really nice Hex editor with tons of plugins (patterns, file support, etc.) and even an embedded language for adding more patterns easily
Everything should try do some basic syntax highlighting IMO. Not too much, or it just becomes a sea of formatting that doesn't help at all. It is surprising how much difference just a little splash of colour can make if it isn't overdone. If possible, always include configuration options for the user though, so those with colour-blindness issues can tweak things to their needs, those who are just fussy can make the output fit with their finely adjusted system-wide colour schemes¹, and even better, where you can, allow bold/italic/other as well as colours so that those who barely see colour at all can play too.
Of course none of this helps those using screen-readers and other tech, so make sure that all your fancy colouring & such is additive so if it is all “lost” no meaning is absolutely lost with it.
--------
[1] Some people can be very vocal about this, more so than if highlighting isn't possible at all. If you give any output formatting they'll expect you to match, or be able to be made to match, their preferred style.
Why did author decide that best way to demonstrate his idea would be by cutting contrast in half?
color-coding might be a great solution, but you don't really know beforehand which byte values are important. Manually selecting C0 to make it stand out it just ctrl+f with extra steps. (But I wouldn't mind something like "color 00 separate from ascii separate from the rest)
> Manually selecting C0 to make it stand out
That's not what they did, actually. C0 is the only byte in there that's above 3F or so, and it's far from it. Hence the very different colour, and the lack of contrast between the colours of the other bytes.
This article made me think how I could use similar techinques to colour code the data in database tables. Has anyone here tried that and has some recommendations where to start, etc.?
DataGrip (and Pycharm by extension) lets you apply heatmap colors to dataframes and database tables: https://www.jetbrains.com/help/datagrip/tables-view-data.htm...
Another option would be to load data in pandas and display it in a Jupyter notebook with style.background_gradient()
Polars delegate styling to Great Tables, but it's also doable there: https://posit-dev.github.io/great-tables/get-started/coloriz...
Emacs's hexl-mode does this, incidentally, though annoyingly by default it makes all faces the same color. I never understood why it defines the faces but then doesn't customize them.
When I read this article a few days ago it inspired me to create my own hex viewer : https://ar-ms.me/thoughts/3sl-a-sweet-hex-utility/
The cool thing about it imo (outside of colors) is a `--windows` flag. Which separates the hex view into partitions: so `-w 2:-3:5` shows the first two bytes on a line, then skips three bytes, then shows the next 5 bytes on a line, then the rest of the file. Easy to use combined with a terminal's up arrow.
That said, even colored these dumps still feels unappealing to me — so yes this is admittedly subjective gut jumping in the conversation. I get that occult form can also be an attractive force.
The post put on the table an interesting point about how to improve the presentation layer to fit what’s human cognition is good at spotting (in general, or at least for the expected audience with some training). And it does start proposing something with these color schemes. But isn’t it kind of missing the forest for the tree? Actually why do we even have rendering with [012345678ABCDEF], when a specific set of (colored/imaged?) glyphs would be able to make more obvious what’s on the table? Or even beyond the hexadecimal grouping, wouldn’t be more relevant to render something "intuitively" far more easy to grap without several layer of internalized interpretation through acculturation?
I can't think of anything better than a hex dump for representing raw binary data. I don't mean that there are no others, equally good representations, but hex dumps win because of familiarity.
Of course, if you know about the format, there are better ways, but it goes beyond the scope of a hex editor, though the most advanced ones support things like template files and can display structured data, disassembly, etc...
> Actually why do we even have rendering with [012345678ABCDEF], when a specific set of (colored/imaged?) glyphs would be able to make more obvious what’s on the table?
Most of us have internalized the relationship between digits in [0-9] for a very long time. Adding 6 more glyphs after that is quite easy (and they're also somewhat well known in the world), and after a while you stop even thinking about the glyphs consciously anyway. A hex 'C' intuitively means to me '4 from the end'. A hex 'F' intuitively means to me 'all 4 bits are set to 1'. I don't see any advantage to switching to a different glyph set for this base, other than disruption for disruption's sake.
> Or even beyond the hexadecimal grouping, wouldn’t be more relevant to render something "intuitively" far more easy to grap without several layer of internalized interpretation through acculturation?
Modern computers deal with 8-bit bytes, and their word sizes are a multiple of bytes - unless you're dealing with bit-packed data, which is comparatively rare (closest is bit twiddling of MMIO registers, which is when you sometimes switch to binary; although for a 4-bit hex nibble you can still learn arbitrary combinations of bits on/off into its value).
This means you can group 8 bits into 1 digits of 8 bits as one glyph (alphabet too large to be useful), 2 digits of 4 (hex), 4 digits of 2 (alphabet too small to give a benefit over binary) and 8 digits of 1 (binary). Hex just works really well as a practical middle ground.
Back when computers used 12 bit words (PDP-8 and friends) octal (4 digits of 3 bits represented in the 0-7 alphabet) was more popular.
If you analyze binary files often, I highly recommend binvis - http://binvis.io/. It creates a colored minimap for files it loads and has two available arrangements. Pixel color is based on range of bytes, eg ASCII/null bytes/FF bytes. Besides, it’s a pretty basic hex viewer that runs in your browser. The minimap is extremely powerful for identifying interesting areas and patterns in unknown data.
> it’s a pretty basic hex viewer that runs in your browser
excuse me? "basic" and "runs in your browser" together sound very contradictory to me. while doing things i actually feel (yes, emotionally) much better when there is no browser open on my machine, but only text editors, vcs gui and file managers, and terminals of course. and sometimes i reject an idea to start a browser just thinking how much ram it will take (ha, what a progress we have done - one github issue tab, with text only and no images, takes 180mb of ram).
It's basic bause it does like two things. It's not advanced or complex. HN is also a basic forum, even though it runs in a browser.
I really like hexyl [1], which does this by default.
https://github.com/sharkdp/hexyl
The author uses hexyl as an example of trying, but not doing it right.
I think semantic coloring (based on structure) is more useful. Also (can't help as someone working with z/OS), if you really want to make hex output readable, I recommend using big-endian machine.
I've started doing this with hashes in a CLI I'm working on. For slow prints, it's somewhat helpful https://asciinema.org/a/aD38Pk88CZgSZqtq but for debug dumps with many many hashes it really helps readability and tracking hashes across lines.
radare2 also has excellent hex viewing/editing support, if one manages to grok the usage of it.
> it’s much easier to pick out the unique byte when it’s a different color! human brains are really good at spotting visual patterns—given the right format
Don't really see the advantage. Unique bytes have no unique meaning across data types.
The only good syntax highlight to me is 00 and perhaps FF. But that's my opinion of course.
Anything else that has no direct relation to what you're looking at is meaningless.
To me the random colors at each byte is messing up with my brain making it hard to fast identify C0 or any other value that I could more easily identify in all black.
But color would be nice more based on the bytes logic.
Eventually the 00 in a shaded grey instead of black, and in best case scenario by logic unit based on your protocol. And worst case scenario by groups of words or so.
[dead]
Anyone else see the irony?