> > "Important: Supplementary code points must be supported for full 
Unicode 
> > support, regardless of the encoding form.
> 
> That's the theory. But UTF-16 is strictly less convenient than UTF-32,
> which means that a lot of code working in terms of UTF-16 doesn't bother
> to support supplementary code points.
>From Wikipedia:
"Unfortunately using UTF-16 makes characters outside the Basic 
Multilingual Plane a special case which increases the risk of oversights 
related to their handling. That said, programs that mishandle surrogate 
pairs probably also have problems with combining sequences, so using 
UTF-32 is unlikely to solve the more general problem of poor handling of 
multi-code-unit characters."
> The only advantage of UTF-16 over UTF-32 is memory usage, and data
> exchange with those who already use UTF-16. *Nothing* in UTF-16 is more
> convenient or simpler than UTF-32, it's an additional complexity layer.
"The only advantage of fixnums over bignums is [performance and] memory 
usage, and data exchange with those who already use fixnums. *Nothing* in 
fixnums is more convenient or simpler than bignums, it's an additional 
complexity layer."
> > But I'll tell you what. Find a document, written by someone with 
> > substantial Unicode experience, that recommends UTF-32 as the best 
overall 
> > in-memory encoding.
I don't agree with everything you said, but more to the point none of it 
related to the question I asked: can you find a single document written by 
a Unicode expert that recommends UTF-32? Every such document I can find 
recommends UTF-16 as the best overall encoding, with UTF-8 a second choice 
(based on expected usage). UTF-32 is always the third choice and it always 
has the caveat "if space doesn't matter."
Received on Mon Mar 26 2007 - 11:29:06 UTC
This archive was generated by hypermail 2.3.0
: Wed Oct 23 2024 - 09:15:01 UTC