Why
Unicode Came ?
Before
Unicode was invented, there were hundreds of different encoding
systems existed. No single encoding could contain enough characters.
These encoding systems also conflict with one another. That
is, two encodings can use the same number for two different
characters, or use different numbers for the same character. To solve
the problems with different encoding mechanisms unicode came into
picture.
How
Unicode solves the problem ?
Unicode
provides a unique number for every character. Unicode standard is
adopted by several industry leaders like IBM, HP, Microsoft, Oracle,
SAP, Sun, Sybase, Unisys etc.,
The
Unicode Standard provides the capacity to encode all of the
characters used for the written languages of the world. To keep
character coding simple and efficient, the Unicode Standard assigns
each character a unique numeric value and name.
Encoding
Forms
The
Unicode Standard defines three encoding forms that allow the same
data to be transmitted in a byte, word or double word oriented format
(i.e. in 8, 16 or 32-bits per code unit). All three encoding forms
encode the same common character repertoire and can be efficiently
transformed into one another without loss of data.
UTF-8
is a way of transforming all Unicode characters into a variable
length encoding of bytes. It has the advantages that the Unicode
characters corresponding to the familiar ASCII set have the same byte
values as ASCII, and that Unicode characters transformed into UTF-8
can be used with much existing software without extensive software
rewrites.
UTF-16
is popular in many environments that need to balance efficient access
to characters with economical use of storage. It is reasonably
compact and all the heavily used characters fit into a single 16-bit
code unit, while all other characters are accessible via pairs of
16-bit code units.
UTF-32
is useful where memory space is no concern, but fixed width, single
code unit access to characters is desired. Each Unicode character is
encoded in a single 32-bit code unit when using UTF-32.
For
More information about unicode
No comments:
Post a Comment