Unicode started with the mission to encode all characters needed for written communication in the world.
This was already broad, but was not unusual for its time. Unlike Wikipedia, Unicode never went through a battle between inclusionists and deletionists. Moreover, with Han Unification it strayed from its core mission to "encode all characters needed for written communication in the world" (emphasis mine).
Instead it ended up as a fancy clip art library that every software somehow has to support, but with no way to implement the standard in its entirety.
Unicode still does lots of work on language support. The notion that emoji support impedes that is simply not true.
And people were already doing emojis with phpBB, MSN Messenger, etc. The alternative to Unicode emojis would not be "no emojis", but "every platform with their own proprietary incompatible implementation".
Han Unification has been discussed a million times already. Originally Unicode only had 2 bytes and 65k characters. Maybe it was a mistake, maybe not – I don't speak these languages and those who do often disagree on this as well. However, changing it now would probably introduce be more pain than it solves.
"Unicode still does lots of work on language support."
Yes, and that's a good thing.
"The notion that emoji support impedes that is simply not true."
If it does not, why are there so many unresolved issues and shortcomings lingering around for years.
It's not that the issues around Han Unification will go away by ignoring them. There are related issues in western languages, like the umlaut/trema distinction. Pushing these topics, which are core to Unicode's original mission, into OpenType is not a solution.
Why do we continue adding pictures of random every day objects, like disco balls, when not even characters common in ordinary books can be represented?
Don't you think reallocating resources from emoji work to more serious issues would make sense?
These are not "unresolved issues"; these are opinionated views. That's okay, but please, don't fool yourself in to thinking this is somehow objective fact because it's just not. Encoding all human scripts in one encoding was always going to involve some choices, and no matter which choices would have been made some people were going to disagree with it.
I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing.
Unicode's original mission was to encode all characters needed for written communication in the world.
Han Unification fails this mission and that is not a matter of opinion.
"I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing."
In
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different characters with different function. In traditional book printing they used to look differently and quality fonts do still have both. Unfortunately Unicode does not encode both of them.
To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself. Expecting Unicode to contain a codepoint for every single rendering variation is not realistic and the line must be drawn somewhere, with other rendering information provided in another way (e.g. lang=de, font-style, whatnot).
You can disagree how Unicode does this (or how other encodings do it, for that matter) but this is just an utterly disingenuous thing to say. I no longer believe you are engaging in good faith. You have either not understood Unicode or you're intentionally misrepresenting it. Good bye.
"To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself."
I did not. In every book printed before 1950 and every quality book printed now the different characters would actually look differently. This is not about rendering variations but about different characters (linguistically and functionally, e.g. wrt collation) that coincidentally look similar and Unicode confuses.
Here is a source from DIN (Deutsches Institut für Normung)
with more background:
If you think its just crazy Germans arguing a moot point Yannis Haralambous has a paragraph specifically about the umlaut/trema issue in his O'Reilly book "Fonts & Encodings".
Haven't read the book yet, but isn't that more like a matter of the font/rendering engine? I have a murky notion that for Cyrillic, for example, there are some nuances in rendering certain glyphs in cursive between languages [1], but these nuances are usually resolved by cooperation of the font and client interpreting the language hints, so not in the "physical" text.
(Not telling I see this as a good thing or anything: it is way beyond my expertise; I definitely can see the motivation for introducing as many variants in the Unicode register as there are in the real world)
Isn't the umlaut vs trema/diaeresis in a similar situation?
[1] made me test it and cobble a demo. (Sadly, not speaking any of these languages, so cannot verify it is correct; just wanted to see the difference in practice):
Arguably, depending on wide (physical text ↔ specific font ↔ rendering agent) ecosystem feels quite fragile, but cannot tell if there is any better alternative for this particular case.
>Expecting Unicode to contain a codepoint for every single rendering variation
It's not just rendering variations. While they are etymlogically related they are made with different strokes and are incorrect to substitute for one another.
Technically Unicode has a variant selector that can be used for selecting between variations of the characters, but this does not have sufficient adoption. So that means pretty much everything has to annotate what language it is written in so it can be rendered correctly, else the system has to check the system settings to guess what language the user likely wants to see things rendered as.
Going to be nitpicky, while also saying +1
MSN, Skype, etc used emoticons and not emoji, and there wasn’t a standard that I’m aware of.
I'm reasonably sure they had graphical "emojis" rather than just ASCII "emotions" like :-). They weren't called "emojis" at the time of course, but basically, it's the same thing.
True. And interestingly, the term "emoticon" was used both for the basic "ASCII" and fancy graphical representation. And while some "ASCII" sequences were present on multiple platforms, the meaning quite unsurprisingly often diverged, just like you implied. See some:
[MSN emoticons] vs [Yahoo emoticons]
:-* "Secret telling" vs "kiss"
8-| "Nerd" vs "rolling eyes"
:| "Disappointed" vs "straight face"
N.B. some of them are animated. Animated fonts coming when?
unicode did not magically get rid of proprietary implementations... e.g. apple's "pistol".
That's not a proprietary implementation, it's a font. Unicode doesn't define how a character renders more than giving a suggestion. It's fully up to the font how it should display.
A single disagreement over something does not make things "proprietary". And it's been what, ten years? Honestly, get over it.
Apple displays the pistol emoji as a squirt gun. That’s wrong. It has always been wrong. It will always be wrong, because a squirt gun is not a pistol. Time doesn’t erase an error. ‘Get over it’ is the wrong response: ‘Apple, stop being wrong’ is the correct one.
The name in Unicode for U+1F52B is "water pistol". For better or worse, everyone uses a water pistol now and has for years.
Not in my book, which is The Unicode Standard, Version 16.0.
Humans have other purposes than satisfying stated goals.
That’s often described as a flaw, e.g. to err is human, but it’s what we do. Some degree of chaos can help for efficient problem-solving.
Based on past history, we may never get perfect encoding for historical Earthlings, e.g. what about the following list looks well-planned and coordinated for the future?: ASCII, ISO 8859-1 (Latin-1), ISO 8859-2 (Latin-2), ISO 8859-3, ISO 8859-4, ISO 8859-5 (Cyrillic), ISO 8859-6 (Arabic), ISO 8859-7 (Greek), ISO 8859-8 (Hebrew), ISO 8859-8-I, ISO 8859-10, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, KOI8-R, KOI8-U, KOI8-RU, Shift_JIS, EUC-JP, EUC-KR, GB2312, GBK, Big5, HZ-GB-2312, TIS-620, MacRoman, MacCyrillic, UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE), CESU-8, UTF-7, IBM866, IBM437, IBM850, IBM852, IBM855, IBM857, IBM862, IBM864, IBM866, KZ1048, IBM874 (TIS-620), VNI, Windows-874, Mac Thai, Mac Central European.
Emoji can be seen as a bait for implementations to support under-represented or ancient scripts as a side effect. In fact, emoji worked so well that we now have a universal full non-BMP support everywhere. For example, MySQL used to have the cursed BMP-only utf8 charset aka utf8mb3! It lasted until everyone started to complain about emojis.
>Moreover, with Han Unification it strayed from its core mission to "encode all characters needed for written communication in the world" (emphasis mine).
Why do you say that? Because Unicode now has become balkanised between various CJK regions?
Because it conflates distinct characters and therefore fails its original mission to encode all characters needed for written communication.
Han Unification is just the most obvious case but the issues do not stop there. I'll give you a
western example. In the sentence
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different things with different function. In traditional book printing they used to look differently.
With Unicode all this cultural nuance is lost. The characters necessary to communicate precisely simply have never been encoded, because Unicode forgot about its core mission.
Fixing things like that is where I want to see efforts go.
[deleted]
The iconic decoration reflects light in all directions and transforms every room - no matter how big its size - into a glamorous space in which people can dance or dream.
I never thought about dreaming in a room with a disco ball in it. I think the informality of emoji proposals is really special!
It's been in the Unicode standard as #129705 since 14.0 but I don't think I've ever sent or received one.
I'd be curious to know how the actual usage stats aligned with their expectations.
[deleted]
Would be the most culturally neutral "party emoji". Party popper is not used that much outside of Anglosphere, confetti ball even less so and its emoji looks like medusa.
It is part of Unicode, and I've never seen it being used. I'll venture to say it didn't catch up as a party emoji.
>confetti ball even less so
Because it's a Japan only thing.
Ah yes the super smash brothers ball
Why would you have such a thing? When you communicate, you know the receivers' culture, isn't it? Otherwise wouldn't it be a rather infrequent symbol with less practical use than e.g. "incomplete infinity"?
Do you? All I know about the readers of this post is that they either know how to read English, use some translation service or won't understand much of what I was writing.
But even when you do know exactly who you're addressing, they might be a very diverse group.
This is not about any message, this is about an icon signifying "party". In your argumentation: they might not even know what a party is! That's how useless such an icon would be.
I would love a mirror ball emoji, why can't we have nice things?
It should reflect the colors of pixels on the screen in all different directions in real time, and also cast bright spots of colored light all over the screen, while spinning. The stress test would be to fill the entire screen with many disco balls, over live video. Also a set of colored spotlight and smoke machine emojis would go well with it nicely too.
Naturally, it should also make all full body emoji start dancing, but only when proximate and with a line of sight not obstructed by a U+99385 SOLID WALL. Specific dance moves are implementation dependent, but SHOULD adapt to the user’s locale.
Can we have hardware ray tracing for emojis?
And why is there still no standard way to embed audio tracks into fonts? Each Unicode character deserves a sound! And Rick Astley needs to be heard more often...
> Rick Astley
Agreed. Do the kids even still know about him?
Any platform that introduces hardware ray tracing for the first time will do so for the sake of emojis.
Cute, but all of the example emojis look pretty poor, murky if you will.
I also appreciate melting face, dotted outline face, and face with salute. Low battery & ID card.
Also the pregnant man / Bill Gates was added there :)
How do I switch to the timeline where emojis, crypto, NFTs and generative AI weren't invented?
Lumping together emojis with these other nightmare technologies is wild.
Name 1 bad thing that came from the invention of emojis that is comparable to the others
[deleted]
Nope - you've got me on that one. I can't.
I’m convinced AI is a conspiracy by Big Emoji to get using more emoji. That’s why there are so many in its responses.
It is the year 2118. ASI is taking over the light cone. The economy grows 100% each solar year. Still no emoji support on Hacker News.
For a glorious few hours after one of the last emoji updates hacker news did support it because they didn’t filter those ones out yet. They do not filter out some of the quasi emojis like the hieroglyphs.
𓂸 Out for harambe.
Hacker News supports emojis just fine, as you can see here:
It's just that most people can't deal with them responsibly, so it has not been made easy.
¯ \ _ ( ツ ) _ / ¯
It's not a matter of not supporting it; if you can store UTF-8 (as HN does) then you can store emojis. It's that HN actively strips emojis. One of the more childish and petty aspects of the site IMO.
Yeah. If you look at somewhat similar forums like Less Wrong, people don't even use emojis that often. So not a big loss on HN anyway. Markdown and LaTeX support however...
Clip art is dead, long live clip art!
Unicode started with the mission to encode all characters needed for written communication in the world. This was already broad, but was not unusual for its time. Unlike Wikipedia, Unicode never went through a battle between inclusionists and deletionists. Moreover, with Han Unification it strayed from its core mission to "encode all characters needed for written communication in the world" (emphasis mine).
Instead it ended up as a fancy clip art library that every software somehow has to support, but with no way to implement the standard in its entirety.
Unicode still does lots of work on language support. The notion that emoji support impedes that is simply not true.
And people were already doing emojis with phpBB, MSN Messenger, etc. The alternative to Unicode emojis would not be "no emojis", but "every platform with their own proprietary incompatible implementation".
Han Unification has been discussed a million times already. Originally Unicode only had 2 bytes and 65k characters. Maybe it was a mistake, maybe not – I don't speak these languages and those who do often disagree on this as well. However, changing it now would probably introduce be more pain than it solves.
"Unicode still does lots of work on language support."
Yes, and that's a good thing.
"The notion that emoji support impedes that is simply not true."
If it does not, why are there so many unresolved issues and shortcomings lingering around for years.
It's not that the issues around Han Unification will go away by ignoring them. There are related issues in western languages, like the umlaut/trema distinction. Pushing these topics, which are core to Unicode's original mission, into OpenType is not a solution.
Why do we continue adding pictures of random every day objects, like disco balls, when not even characters common in ordinary books can be represented?
Don't you think reallocating resources from emoji work to more serious issues would make sense?
These are not "unresolved issues"; these are opinionated views. That's okay, but please, don't fool yourself in to thinking this is somehow objective fact because it's just not. Encoding all human scripts in one encoding was always going to involve some choices, and no matter which choices would have been made some people were going to disagree with it.
I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing.
Unicode's original mission was to encode all characters needed for written communication in the world.
Han Unification fails this mission and that is not a matter of opinion.
"I have no idea which "characters common in ordinary books" are missing; the explicit goal of Unicode is to encode exactly that sort of thing."
In
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different characters with different function. In traditional book printing they used to look differently and quality fonts do still have both. Unfortunately Unicode does not encode both of them.
To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself. Expecting Unicode to contain a codepoint for every single rendering variation is not realistic and the line must be drawn somewhere, with other rendering information provided in another way (e.g. lang=de, font-style, whatnot).
You can disagree how Unicode does this (or how other encodings do it, for that matter) but this is just an utterly disingenuous thing to say. I no longer believe you are engaging in good faith. You have either not understood Unicode or you're intentionally misrepresenting it. Good bye.
"To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself."
I did not. In every book printed before 1950 and every quality book printed now the different characters would actually look differently. This is not about rendering variations but about different characters (linguistically and functionally, e.g. wrt collation) that coincidentally look similar and Unicode confuses.
Here is a source from DIN (Deutsches Institut für Normung) with more background:
https://www.unicode.org/L2/L2003/03215-n2593-umlaut-trema.pd...
If you think its just crazy Germans arguing a moot point Yannis Haralambous has a paragraph specifically about the umlaut/trema issue in his O'Reilly book "Fonts & Encodings".
Haven't read the book yet, but isn't that more like a matter of the font/rendering engine? I have a murky notion that for Cyrillic, for example, there are some nuances in rendering certain glyphs in cursive between languages [1], but these nuances are usually resolved by cooperation of the font and client interpreting the language hints, so not in the "physical" text.
(Not telling I see this as a good thing or anything: it is way beyond my expertise; I definitely can see the motivation for introducing as many variants in the Unicode register as there are in the real world)
Isn't the umlaut vs trema/diaeresis in a similar situation?
[1] made me test it and cobble a demo. (Sadly, not speaking any of these languages, so cannot verify it is correct; just wanted to see the difference in practice):
Arguably, depending on wide (physical text ↔ specific font ↔ rendering agent) ecosystem feels quite fragile, but cannot tell if there is any better alternative for this particular case.https://myfonj.github.io/sandbox.html#%3C!doctype%20html%3E%...
>Expecting Unicode to contain a codepoint for every single rendering variation
It's not just rendering variations. While they are etymlogically related they are made with different strokes and are incorrect to substitute for one another.
Technically Unicode has a variant selector that can be used for selecting between variations of the characters, but this does not have sufficient adoption. So that means pretty much everything has to annotate what language it is written in so it can be rendered correctly, else the system has to check the system settings to guess what language the user likely wants to see things rendered as.
Going to be nitpicky, while also saying +1
MSN, Skype, etc used emoticons and not emoji, and there wasn’t a standard that I’m aware of.
I'm reasonably sure they had graphical "emojis" rather than just ASCII "emotions" like :-). They weren't called "emojis" at the time of course, but basically, it's the same thing.
True. And interestingly, the term "emoticon" was used both for the basic "ASCII" and fancy graphical representation. And while some "ASCII" sequences were present on multiple platforms, the meaning quite unsurprisingly often diverged, just like you implied. See some:
[MSN emoticons] https://web.archive.org/web/20031206095746/http://messenger....[Yahoo emoticons] https://web.archive.org/web/20080408053458/http://messenger....
N.B. some of them are animated. Animated fonts coming when?
unicode did not magically get rid of proprietary implementations... e.g. apple's "pistol".
That's not a proprietary implementation, it's a font. Unicode doesn't define how a character renders more than giving a suggestion. It's fully up to the font how it should display.
A single disagreement over something does not make things "proprietary". And it's been what, ten years? Honestly, get over it.
Apple displays the pistol emoji as a squirt gun. That’s wrong. It has always been wrong. It will always be wrong, because a squirt gun is not a pistol. Time doesn’t erase an error. ‘Get over it’ is the wrong response: ‘Apple, stop being wrong’ is the correct one.
The name in Unicode for U+1F52B is "water pistol". For better or worse, everyone uses a water pistol now and has for years.
Not in my book, which is The Unicode Standard, Version 16.0.
https://www.unicode.org/charts/PDF/U1F300.pdfHumans have other purposes than satisfying stated goals.
That’s often described as a flaw, e.g. to err is human, but it’s what we do. Some degree of chaos can help for efficient problem-solving.
Based on past history, we may never get perfect encoding for historical Earthlings, e.g. what about the following list looks well-planned and coordinated for the future?: ASCII, ISO 8859-1 (Latin-1), ISO 8859-2 (Latin-2), ISO 8859-3, ISO 8859-4, ISO 8859-5 (Cyrillic), ISO 8859-6 (Arabic), ISO 8859-7 (Greek), ISO 8859-8 (Hebrew), ISO 8859-8-I, ISO 8859-10, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, KOI8-R, KOI8-U, KOI8-RU, Shift_JIS, EUC-JP, EUC-KR, GB2312, GBK, Big5, HZ-GB-2312, TIS-620, MacRoman, MacCyrillic, UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE), CESU-8, UTF-7, IBM866, IBM437, IBM850, IBM852, IBM855, IBM857, IBM862, IBM864, IBM866, KZ1048, IBM874 (TIS-620), VNI, Windows-874, Mac Thai, Mac Central European.
Emoji can be seen as a bait for implementations to support under-represented or ancient scripts as a side effect. In fact, emoji worked so well that we now have a universal full non-BMP support everywhere. For example, MySQL used to have the cursed BMP-only utf8 charset aka utf8mb3! It lasted until everyone started to complain about emojis.
>Moreover, with Han Unification it strayed from its core mission to "encode all characters needed for written communication in the world" (emphasis mine).
Why do you say that? Because Unicode now has become balkanised between various CJK regions?
Because it conflates distinct characters and therefore fails its original mission to encode all characters needed for written communication.
Han Unification is just the most obvious case but the issues do not stop there. I'll give you a western example. In the sentence
«Günther a souligné l’ambigüité de son discours.»
there is an umlaut and a dieresis.
They are different things with different function. In traditional book printing they used to look differently.
With Unicode all this cultural nuance is lost. The characters necessary to communicate precisely simply have never been encoded, because Unicode forgot about its core mission.
Fixing things like that is where I want to see efforts go.
It's been in the Unicode standard as #129705 since 14.0 but I don't think I've ever sent or received one.
I'd be curious to know how the actual usage stats aligned with their expectations.
Would be the most culturally neutral "party emoji". Party popper is not used that much outside of Anglosphere, confetti ball even less so and its emoji looks like medusa.
It is part of Unicode, and I've never seen it being used. I'll venture to say it didn't catch up as a party emoji.
>confetti ball even less so
Because it's a Japan only thing.
Ah yes the super smash brothers ball
Why would you have such a thing? When you communicate, you know the receivers' culture, isn't it? Otherwise wouldn't it be a rather infrequent symbol with less practical use than e.g. "incomplete infinity"?
Do you? All I know about the readers of this post is that they either know how to read English, use some translation service or won't understand much of what I was writing.
But even when you do know exactly who you're addressing, they might be a very diverse group.
This is not about any message, this is about an icon signifying "party". In your argumentation: they might not even know what a party is! That's how useless such an icon would be.
I would love a mirror ball emoji, why can't we have nice things?
It should reflect the colors of pixels on the screen in all different directions in real time, and also cast bright spots of colored light all over the screen, while spinning. The stress test would be to fill the entire screen with many disco balls, over live video. Also a set of colored spotlight and smoke machine emojis would go well with it nicely too.
Naturally, it should also make all full body emoji start dancing, but only when proximate and with a line of sight not obstructed by a U+99385 SOLID WALL. Specific dance moves are implementation dependent, but SHOULD adapt to the user’s locale.
Can we have hardware ray tracing for emojis?
And why is there still no standard way to embed audio tracks into fonts? Each Unicode character deserves a sound! And Rick Astley needs to be heard more often...
> Rick Astley
Agreed. Do the kids even still know about him?
Any platform that introduces hardware ray tracing for the first time will do so for the sake of emojis.
Cute, but all of the example emojis look pretty poor, murky if you will.
U+1FAA9
Added in Unicode 14: https://unicode.org/emoji/charts-14.0/emoji-released.html
I also appreciate melting face, dotted outline face, and face with salute. Low battery & ID card.
Also the pregnant man / Bill Gates was added there :)
How do I switch to the timeline where emojis, crypto, NFTs and generative AI weren't invented?
Lumping together emojis with these other nightmare technologies is wild.
Name 1 bad thing that came from the invention of emojis that is comparable to the others
Nope - you've got me on that one. I can't.
I’m convinced AI is a conspiracy by Big Emoji to get using more emoji. That’s why there are so many in its responses.
It is the year 2118. ASI is taking over the light cone. The economy grows 100% each solar year. Still no emoji support on Hacker News.
For a glorious few hours after one of the last emoji updates hacker news did support it because they didn’t filter those ones out yet. They do not filter out some of the quasi emojis like the hieroglyphs.
𓂸 Out for harambe.
Hacker News supports emojis just fine, as you can see here:
https://news.ycombinator.com/item?id=23659248
It's just that most people can't deal with them responsibly, so it has not been made easy.
¯ \ _ ( ツ ) _ / ¯
It's not a matter of not supporting it; if you can store UTF-8 (as HN does) then you can store emojis. It's that HN actively strips emojis. One of the more childish and petty aspects of the site IMO.
Yeah. If you look at somewhat similar forums like Less Wrong, people don't even use emojis that often. So not a big loss on HN anyway. Markdown and LaTeX support however...
Emoji filtration != lack of support.
:'(
[dead]