XML matches
by character category and blocks. Category classifies the characters by their
usage indepent to their localization. Where as blocks classifies the characters
by their localization independent of their usage.
A regular expression
must escape a character category. An inclusive character category that
represents any uppercase letter looks like the following.
\p{Lu}
An exclusive category
that represents any character except an uppercase letter looks like the
following:
\P{Lu}
Note: inclusive requires a lowercase 'p', whereas exclusive
requires an uppercase 'P'.
Example
\p{L}
matches a single code point in the
category "letter".\p{N}
matches any kind of numeric
character in any script.
Character Category
|
Description
|
C
|
Matches
other characters like non-letters, non-symbols, non-numbers, non-separators
|
Cc
|
Matches
Control characters
|
Cf
|
Matches format
characters
|
Cn
|
Matches unassigned
cod epoints
|
Co
|
Matches private use
characters
|
L
|
Matches letters
|
Ll
|
Matches lower case
letters
|
Lm
|
Matches modifier
letters
|
Lo
|
Matches other
letters
|
Lt
|
Matches title case
letters
|
Lu
|
Matches upper case
letters
|
M
|
Matches any mark
|
Mc
|
Matches spacing
combining mark
|
Me
|
Matches enclosing
mark
|
Mn
|
Matches non-spacing
mark
|
N
|
Matches numbers
|
Nd
|
Matches decimal
digits
|
Nl
|
Matches number
letters
|
No
|
Matches other
numbers
|
P
|
Matches punctuation
|
Pc
|
Matches connector
punctuation
|
Pd
|
Matches dashes
|
Pe
|
Matches closing
punctuation
|
Pf
|
Matches final quotes
|
Pi
|
Matches initial quotes
|
Po
|
Matches other forms
of punctuation
|
Ps
|
Matches opening
punctuation
|
S
|
Matches symbols
|
Sc
|
Matches currency
symbols
|
Sk
|
Matches modifier
symbols
|
Sm
|
Matches mathematical
symbols
|
So
|
Matches other
symbols
|
Z
|
Matches separators
|
Zl
|
Matches line breaks
|
Zp
|
Matches paragraph
breaks
|
Zs
|
Matches spaces
|
No comments:
Post a Comment