Regular expressions are powerful tool
for text processing. A regular expression is a sequence of characters that
forms a search pattern, mainly used for pattern matching in strings. Regular Expressions
are used to search, edit, or manipulate text and data. Once you comfortable
with regular expressions, you can find the advantage of regular expressions.
A Regular expression is composed of two
types of characters.
a. Metacharacters (*, ., ?, ^, $ etc.,) :
Metacharacters have special meaning.
b. Literals : Normal characters without
special meaning.
For
Ex: a*b
a,b are literals and '*' is the
metacharacter. 'a*b' matches strings like b, ab, aab, aaab, aaaaab etc.,
a* means 'a' can be repeated zero or
more times.
Julia provides ismatch function, to work
with regular expressions.
ismatch(r::Regex,
s::AbstractString) -> Bool
Test whether a string contains a match
of the given regular expression.
Following
table summarizes the regular expressions
Regular
Expression
|
Description
|
.
|
Matches any character
|
\d
|
Matches a digit: [0-9]
|
\D
|
Matches a non-digit: [^0-9]
|
\h
|
Matches a horizontal whitespace
character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
|
\H
|
Matches a non-horizontal whitespace character:
[^\h]
|
\s
|
Matches a whitespace character: [
\t\n\x0B\f\r]
|
\S
|
Matches a non-whitespace character:
[^\s]
|
\v
|
Matches a vertical whitespace
character: [\n\x0B\f\r\x85\u2028\u2029]
|
\V
|
Matches a non-vertical whitespace
character: [^\v]
|
\w
|
Matches a word character: [a-zA-Z_0-9]
|
\W
|
Matches a non-word character: [^\w]
|
[abc]
|
Matches a, b, or c
|
[^abc]
|
Matches any character except a, b, or
c (negation)
|
[a-zA-Z]
|
Matches a through z or A through Z,
inclusive (range)
|
[a-d[m-p]]
|
Matches a through d, or m through p:
[a-dm-p] (union)
|
[a-z&&[def]]
|
Matches d, e, or f (intersection)
|
[a-z&&[^bc]]
|
Matches a through z, except for b and
c
|
[a-z&&[^m-p]]
|
Matches a through z, and not m through p
|
Quantifiers
Quantifier
|
Example
|
Description
|
?
|
X?
|
X, once or not at all
|
*
|
X*
|
X, zero or more times
|
+
|
X+
|
X, one or more times
|
{n}
|
X{n}
|
X, exactly n times
|
{n,}
|
X{n,}
|
X, at least n times
|
{n,m}
|
X{n,m}
|
X, at least n but not more than m
times
|
Match
any Character with .
Metacharacter '.' match any character.
Metacharacter '.' match any character.
julia> ismatch(r".", "") false julia> ismatch(r".", "h") true julia> ismatch(r".", "he") true julia> ismatch(r".", "12") true julia> ismatch(r".", "#\$") true
Match
any digit with \d
\d match a digit 0-9.
\d match a digit 0-9.
For Example \d+ match any number of
digits
julia> ismatch(r"\d+", "123") true julia> ismatch(r"\d+", "123aa") true julia> ismatch(r"\d+", "aa") false julia> ismatch(r"\d+", "aa123") true julia> ismatch(r"\d+", "") false
Match
any word character \w
\w match a word character : [a-zA-Z_0-9].
\w match a word character : [a-zA-Z_0-9].
julia> ismatch(r"\w", "") false julia> ismatch(r"\w", "a") true julia> ismatch(r"\w", "A") true julia> ismatch(r"\w", "1") true julia> ismatch(r"\w", "_") true julia> ismatch(r"\w", "#") false julia> ismatch(r"\w", "^") false julia> ismatch(r"\w", "!") false
Range
Of Characters
'-' is used to check for range of characters.
'-' is used to check for range of characters.
For
Example
[a-f,A-F,0-9]{4}
[a-f,A-F,0-9]{4}
Above regular expression is used to
check for 32bit hexa decimal strings.
julia> ismatch(r"[a-f,A-F,0-9]{4}", "ABCD") true julia> ismatch(r"[a-f,A-F,0-9]{4}", "ABCDEF") true julia> ismatch(r"[a-f,A-F,0-9]{4}", "1234") true julia> ismatch(r"[a-f,A-F,0-9]{4}", "_123") false julia> ismatch(r"[a-f,A-F,0-9]{4}", "_12_") false julia> ismatch(r"[a-f,A-F,0-9]{4}", "_12234") true
Negated
Character classes
[^..] matches any character that is not listed in given class.
[^..] matches any character that is not listed in given class.
Ex:
[^abc] matches any character except a,
b, or c
c[^aom] matches 2 characters 'c' which
is not followed by 'a','o', 'm'.
julia> ismatch(r"c[^aom]", "cat") false julia> ismatch(r"c[^aom]", "cot") false julia> ismatch(r"c[^aom]", "ceat") true julia> ismatch(r"c[^aom]", "century") true
Quantifier examples
julia> ismatch(r"a?", "ab") true julia> ismatch(r"a?", "ba") true julia> ismatch(r"a?", "bet") true julia> ismatch(r"a*", "bet") true julia> ismatch(r"a*", "aaaa") true julia> ismatch(r"a+", "bet") false julia> ismatch(r"a+", "bat") true julia> ismatch(r"a{5}", "bat") false julia> ismatch(r"a{5}", "baaaaat") true julia> ismatch(r"a{5}", "baaaat") false julia> ismatch(r"a{5,}", "baaaaaaaat") true julia> ismatch(r"a{5,}", "baaaat") false julia> ismatch(r"a{5,6}", "baaaat") false julia> ismatch(r"a{5,6}", "baaaaat") true
No comments:
Post a Comment