Friday, 7 November 2014

Character Category


XML matches by character category and blocks. Category classifies the characters by their usage indepent to their localization. Where as blocks classifies the characters by their localization independent of their usage.
A regular expression must escape a character category. An inclusive character category that represents any uppercase letter looks like the following.
    \p{Lu}
An exclusive category that represents any character except an uppercase letter looks like the following:
    \P{Lu}
Note: inclusive requires a lowercase 'p', whereas exclusive requires an uppercase 'P'.
Example
\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.


Character Category
Description
C
Matches other characters like non-letters, non-symbols, non-numbers, non-separators
Cc
Matches Control characters
Cf
Matches format characters
Cn
Matches unassigned cod epoints
Co
Matches private use characters
L
Matches letters
Ll
Matches lower case letters
Lm
Matches modifier letters
Lo
Matches other letters
Lt
Matches title case letters
Lu
Matches upper case letters
M
Matches any mark
Mc
Matches spacing combining mark
Me
Matches enclosing mark
Mn
Matches non-spacing mark
N
Matches numbers
Nd
Matches decimal digits
Nl
Matches number letters
No
Matches other numbers
P
Matches punctuation
Pc
Matches connector punctuation
Pd
Matches dashes
Pe
Matches closing punctuation
Pf
Matches final quotes
Pi
Matches initial quotes
Po
Matches other forms of punctuation
Ps
Matches opening punctuation
S
Matches symbols
Sc
Matches currency symbols
Sk
Matches modifier symbols
Sm
Matches mathematical symbols
So
Matches other symbols
Z
Matches separators
Zl
Matches line breaks
Zp
Matches paragraph breaks
Zs
Matches spaces




Prevoius                                                 Next                                                 Home

No comments:

Post a Comment