Friday 19 February 2016

Julia: Regula Expressions

Regular expressions are powerful tool for text processing. A regular expression is a sequence of characters that forms a search pattern, mainly used for pattern matching in strings. Regular Expressions are used to search, edit, or manipulate text and data. Once you comfortable with regular expressions, you can find the advantage of regular expressions.

A Regular expression is composed of two types of characters.
a.   Metacharacters (*, ., ?, ^, $ etc.,) : Metacharacters have special meaning.
b.   Literals : Normal characters without special meaning.

For Ex: a*b
a,b are literals and '*' is the metacharacter. 'a*b' matches strings like b, ab, aab, aaab, aaaaab etc.,

a* means 'a' can be repeated zero or more times.

Julia provides ismatch function, to work with regular expressions.

ismatch(r::Regex, s::AbstractString) -> Bool
Test whether a string contains a match of the given regular expression.

Following table summarizes the regular expressions
Regular Expression
Description
.
Matches any character
\d
Matches a digit: [0-9]
\D
Matches a non-digit: [^0-9]
\h
Matches a horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H
Matches a non-horizontal whitespace character: [^\h]
\s
Matches a whitespace character: [ \t\n\x0B\f\r]
\S
Matches a non-whitespace character: [^\s]
\v
Matches a vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V
Matches a non-vertical whitespace character: [^\v]
\w

Matches a word character: [a-zA-Z_0-9]

\W
Matches a non-word character: [^\w]
[abc]
Matches a, b, or c
[^abc]
Matches any character except a, b, or c (negation)
[a-zA-Z]

Matches a through z or A through Z, inclusive (range)
[a-d[m-p]]

Matches a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]
Matches d, e, or f (intersection)
[a-z&&[^bc]]
Matches a through z, except for b and c
[a-z&&[^m-p]]
Matches  a through z, and not m through p

Quantifiers
Quantifier
Example
Description
?
X?
X, once or not at all
*
X*
X, zero or more times
+
X+
X, one or more times
{n}
X{n}
X, exactly n times
{n,}
X{n,}
X, at least n times
{n,m}
X{n,m}
X, at least n but not more than m times

Match any Character with .
Metacharacter '.' match any character.
julia> ismatch(r".", "")
false

julia> ismatch(r".", "h")
true

julia> ismatch(r".", "he")
true

julia> ismatch(r".", "12")
true

julia> ismatch(r".", "#\$")
true

Match any digit with \d
\d match a digit 0-9.


For Example \d+ match any number of digits
julia> ismatch(r"\d+", "123")
true

julia> ismatch(r"\d+", "123aa")
true

julia> ismatch(r"\d+", "aa")
false

julia> ismatch(r"\d+", "aa123")
true

julia> ismatch(r"\d+", "")
false


Match any word character \w
\w match a word character : [a-zA-Z_0-9].

julia> ismatch(r"\w", "")
false

julia> ismatch(r"\w", "a")
true

julia> ismatch(r"\w", "A")
true

julia> ismatch(r"\w", "1")
true

julia> ismatch(r"\w", "_")
true

julia> ismatch(r"\w", "#")
false

julia> ismatch(r"\w", "^")
false

julia> ismatch(r"\w", "!")
false


Range Of Characters
'-' is used to check for range of characters.

For Example
[a-f,A-F,0-9]{4}
Above regular expression is used to check for 32bit hexa decimal strings.
julia> ismatch(r"[a-f,A-F,0-9]{4}", "ABCD")
true

julia> ismatch(r"[a-f,A-F,0-9]{4}", "ABCDEF")
true

julia> ismatch(r"[a-f,A-F,0-9]{4}", "1234")
true

julia> ismatch(r"[a-f,A-F,0-9]{4}", "_123")
false

julia> ismatch(r"[a-f,A-F,0-9]{4}", "_12_")
false

julia> ismatch(r"[a-f,A-F,0-9]{4}", "_12234")
true


Negated Character classes
[^..] matches any character that is not listed in given class.

Ex:
[^abc] matches any character except a, b, or c
c[^aom] matches 2 characters 'c' which is not followed by 'a','o', 'm'.
julia> ismatch(r"c[^aom]", "cat")
false

julia> ismatch(r"c[^aom]", "cot")
false

julia> ismatch(r"c[^aom]", "ceat")
true

julia> ismatch(r"c[^aom]", "century")
true


Quantifier examples

julia> ismatch(r"a?", "ab")
true

julia> ismatch(r"a?", "ba")
true

julia> ismatch(r"a?", "bet")
true

julia> ismatch(r"a*", "bet")
true

julia> ismatch(r"a*", "aaaa")
true

julia> ismatch(r"a+", "bet")
false

julia> ismatch(r"a+", "bat")
true

julia> ismatch(r"a{5}", "bat")
false

julia> ismatch(r"a{5}", "baaaaat")
true

julia> ismatch(r"a{5}", "baaaat")
false

julia> ismatch(r"a{5,}", "baaaaaaaat")
true

julia> ismatch(r"a{5,}", "baaaat")
false

julia> ismatch(r"a{5,6}", "baaaat")
false

julia> ismatch(r"a{5,6}", "baaaaat")
true







Previous                                                 Next                                                 Home

No comments:

Post a Comment