Regular
expressions are powerful tool for text processing. A regular expression is a sequence
of characters that forms a search pattern, mainly used for pattern matching in
strings. Regular Expressions are used to search, edit, or manipulate text and
data. Once you comfortable in regular expressions, you will find the advantage
of regular expressions.
Regular
expressions composed of meta characters like *, ., ?, ^, $ etc., Meta
characters have special meaning.
Meta characters
Meta character
|
Meaning
|
?
|
Match one
or no character
|
*
|
Match zero
or more times
|
+
|
Match one
or more times
|
.
|
Match
single character
|
? : Match one or no character
For example, colou?r matches both "color" and "colour".
For example, colou?r matches both "color" and "colour".
> gregexpr("colou?r", "color is wrongly typed as colour") [[1]] [1] 1 27 attr(,"match.length") [1] 5 6 attr(,"useBytes") [1] TRUE
* : Match zero (or) more characters
For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.
For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.
> gregexpr("ab*c", "abc bc abc abbbc ac babbc") [[1]] [1] 1 8 12 18 22 attr(,"match.length") [1] 3 3 5 2 4 attr(,"useBytes") [1] TRUE
+ : Match one (or) more charters
For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac"
For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac"
> gregexpr("ab+c", "abc bc abc abbbc ac babbc") [[1]] [1] 1 8 12 22 attr(,"match.length") [1] 3 3 5 4 attr(,"useBytes") [1] TRUE
. : Match single character
For example a.c
matches abc, acc, azc, axc, a1c, a7c etc., ‘.’ Can be any character.
> gregexpr("a.c", "abc bc abc abbbc ac babbc axc azc") [[1]] [1] 1 8 27 31 attr(,"match.length") [1] 3 3 3 3 attr(,"useBytes") [1] TRUE
Quantifiers
Quantifiers specify the number of occurrences to match against.
Quantifiers specify the number of occurrences to match against.
Quantifier
|
Meaning
|
{n}
|
The
preceding item is matched exactly n times.
|
{n,}
|
The
preceding item is matched n or more times.
|
{n,m}
|
The
preceding item is matched at least n times, but not more than m times.
|
> gregexpr("a{2}", "aaa, abcd, aabc, bcaad, aaaa") [[1]] [1] 1 12 20 25 27 attr(,"match.length") [1] 2 2 2 2 2 attr(,"useBytes") [1] TRUE > > > gregexpr("a{2,4}", "aaa, abcd, aabc, bcaad, aaaa") [[1]] [1] 1 12 20 25 attr(,"match.length") [1] 3 2 2 4 attr(,"useBytes") [1] TRUE > > > gregexpr("a{2,}", "aaa, abcd, aabc, bcaad, aaaa") [[1]] [1] 1 12 20 25 attr(,"match.length") [1] 3 2 2 4 attr(,"useBytes") [1] TRUE
No comments:
Post a Comment