Programming for beginners: Character Category

XML matches by character category and blocks. Category classifies the characters by their usage indepent to their localization. Where as blocks classifies the characters by their localization independent of their usage.

A regular expression must escape a character category. An inclusive character category that represents any uppercase letter looks like the following.

\p{Lu}

An exclusive category that represents any character except an uppercase letter looks like the following:

\P{Lu}

Note: inclusive requires a lowercase 'p', whereas exclusive requires an uppercase 'P'.

Example

\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.

Character Category	Description
C	Matches other characters like non-letters, non-symbols, non-numbers, non-separators
Cc	Matches Control characters
Cf	Matches format characters
Cn	Matches unassigned cod epoints
Co	Matches private use characters
L	Matches letters
Ll	Matches lower case letters
Lm	Matches modifier letters
Lo	Matches other letters
Lt	Matches title case letters
Lu	Matches upper case letters
M	Matches any mark
Mc	Matches spacing combining mark
Me	Matches enclosing mark
Mn	Matches non-spacing mark
N	Matches numbers
Nd	Matches decimal digits
Nl	Matches number letters
No	Matches other numbers
P	Matches punctuation
Pc	Matches connector punctuation
Pd	Matches dashes
Pe	Matches closing punctuation
Pf	Matches final quotes
Pi	Matches initial quotes
Po	Matches other forms of punctuation
Ps	Matches opening punctuation
S	Matches symbols
Sc	Matches currency symbols
Sk	Matches modifier symbols
Sm	Matches mathematical symbols
So	Matches other symbols
Z	Matches separators
Zl	Matches line breaks
Zp	Matches paragraph breaks
Zs	Matches spaces

Prevoius Next Home

Programming for beginners

Friday, 7 November 2014

Character Category

No comments:

Post a Comment