Saturday, 1 November 2014

Greedy, Reluctant, and Possessive Quantifiers

Greedy
Reluctant
Possessive
Description
X?
X??
X?+
X, once or not at all
X*
X*?
X*+
X, zero or more times
X+
X+?
X++
X, one or more times
X{n}
X{n}?
X{n}+
X, exactly n times
X{n,}
X{n,}?
X{n,}+
X, at least n times
X{n,m}
X{n,m}?
X{n,m}+
X, at least n but not more than m times

import java.util.regex.*;
import java.io.*;

public class RegExHarness {
    public static void main(String args[]) throws IOException{
        
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        
        Pattern pattern;
        Matcher matcher;
        
        String regex;
        String text;
              
        System.out.println("Enter the string");
        text = br.readLine();
        while(true){
            try{
                System.out.println("Enter Regular Expression");
                regex = br.readLine();
                pattern = Pattern.compile(regex);

                matcher = pattern.matcher(text);

                while (matcher.find()) {
                    System.out.print("I found the text " + matcher.group());
                    System.out.print(" starting at index " + matcher.start());
                    System.out.println(" Ending at index " + matcher.end());
                }
            }
            catch(IOException e){
                System.out.println(e);
            }
        }     
    }
}

Greedy Quantifier (Longest Match)
Greedy quantifier gives you the longest match in the given string. For Example for the input '12345', the valid values for '\d+' are 1, 12, 123, 1234, 12345 etc., But the Greedy quantifier gives you the longest match '12345'. Run the above program with input string '12345' like below.

Enter the string
12345
Enter Regular Expression
\d+
I found the text 12345 starting at index 0 Ending at index 5

For the input string 'xfooxxxxxxfoo' and regular expression '.*foo'.

Enter the string
xfooxxxxxxfoo
Enter Regular Expression
.*foo
I found the text xfooxxxxxxfoo starting at index 0 Ending at index 13

A greedy quantifier first matches as much as possible. So the .* matches entire string. Then the matcher tries to match the 'f' following, but there are no characters left. So it "backtracks", making the greedy quantifier match one less thing. That still doesn't match the f in the regex, so it "backtracks" one more step, making the greedy quantifier match one less thing again (leaving the "oo" at the end of the string unmatched). That still doesn't match the 'f' in the regex, so it backtracks one more step (leaving the "foo" at the end of the string unmatched). Now, the matcher finally matches the 'f' in the regex, and the 'o' and the next 'o' are matched too. So it return entire string as output for the given regular expression.

Reluctant quantifier
Reluctant quantifier starts by first consuming 'nothing'. It match as little as possible and once it finds match, it starts again the same process, until the string exhausted.

Enter the string
xxxfooxxfoo
Enter Regular Expression
.*?foo
I found the text xxxfoo starting at index 0 Ending at index 6
I found the text xxfoo starting at index 6 Ending at index 11

Enter the string
12345
Enter Regular Expression
\d+?
I found the text 1 starting at index 0 Ending at index 1
I found the text 2 starting at index 1 Ending at index 2
I found the text 3 starting at index 2 Ending at index 3
I found the text 4 starting at index 3 Ending at index 4
I found the text 5 starting at index 4 Ending at index 5

Possessive Quantifier
A possessive quantifier is just like the greedy quantifier, but it doesn't backtrack.

Enter the string
xxxfooxxfoo
Enter Regular Expression
.*+foo
Enter Regular Expression

In the above case  it starts out with .* matching the entire string, leaving nothing unmatched. Then there is nothing left for it to match with the 'f' in the regex. Since the possessive quantifier doesn't backtrack, the match fails there.

Enter the string
\d+?
Enter Regular Expression
\d+?

Enter Regular Expression


Prevoius                                                 Next                                                 Home

No comments:

Post a Comment