Topics Index

Exploring java.lang > Primitive Type Wrappers >
Siva Nookala - 05 Apr 2016
In the earlier versions of JDK, the Character size was only two bytes or 16-bit, which means it could support only characters with codes ranging from 0 to FFFF. But as more languages are supported, the number of characters increased to 10FFFF, so it needed to support 32-bit Unicode characters.

Lets understand the following terms first.
  • Code Point : A character which is in the range of 0 to 10FFFF.
  • Supplemental Characters : Characters whose value is greater than FFFF.
  • Basic Multilingual Plan (BMP) : Characters whose value is between 0 and FFFF.
The change from 16-bit characters to 32-bit characters caused a problem to Java since supplemental characters can not be stored in regular character whose size is 16-bit. To avoid this Java uses two chars to represent a single supplemental character. The first char is called the high surrogate where as the second character is called the low surrogate.

To primarily help with this problem a new method called codePointAt() is provided. This method returns int instead of char. Since int is 32-bits, it can store these two chars with out any problem. Java also provided overloaded forms that operate on int. Some sample methods are:
static boolean isDigit(int codePoint)
static boolean isLetter(int codePoint)
static int toLowerCase(int codePoint)

Listed below are more methods which help in handling 32-bit Unicode Code Points.
static int charCount(int codePoint)Returns 1 if codePoint can be represented using 16-bit (or 1 char) and returns 2 if it needs 32-bit (or 2 chars).
static int codePointAt(CharSequence chars, int loc)
static int codePointAt(char chars[], int loc)
static int codePointBefore(CharSequence chars, int loc)
static int codePointBefore(char chars[], int loc)
Returns the code point at or before the location specified by loc.
static boolean isBmpCodePoint(int cp)Returns true if the cp is part of basic multilingual plane (BMP). See above for more info on BMP.
static boolean isHighSurrogate(char ch)Returns true if ch contains a valid high surrogate character.
static boolean isLowSurrogate(char ch)Returns true if ch contains a valid low surrogate character.
static boolean isSupplementaryCodePoint(int cp)Returns true if cp contains a supplemental character.
static boolean isSurrogatePair(char highCh, char lowCh)Returns true if highCh and lowCh form a valid surrogate pair.
static boolean isValidCodePoint(int cp)Returns true if cp contains a valid code point.
static char[] toChars(int cp)Converts the code point in cp into its char equivalent, which might require two chars. An array holding the result is returned.
static int toChars(int cp, char target[], int loc)Converts the code point in cp into its char equivalent, storing the result in target, beginning at loc. Returns 1 if cp can be represented by a single char. It returns 2 otherwise.
static int toCodePoint(char highCh, char lowCh)Converts highCh and lowCh into their equivalent code point.

Score more than 2 points

© meritcampus 2016 - 2017

All Rights Reserved.