Saturday 14 June 2014

Unicode support for Java

Java char data types based on UTF-16 format. The Unicode standard has changed latter to more than 16 bits. Currently the range of legal code points is now U+0000 to U+10FFFF.

Basic Multilingual Plane
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane.

supplementary characters
Characters whose code points are greater than U+FFFF are called supplementary characters.

Java platform represents uni code characters in UTF-16 format. Supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).

Note:
  1. The methods that only accept a char value cannot support supplementary characters.
  2. The methods that accept an int value support all Unicode characters, including supplementary characters.

                                                             Home

No comments:

Post a Comment