Regular expressions

Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. They allow us to define search patterns for strings, enabling complex string operations such as validation, searching, and replacing. Java provides the java.util.regex package, which includes the Pattern and Matcher classes to work with regular expressions. The Pattern class is used to compile a regex into a pattern, while the Matcher class is used to perform operations on the text based on that pattern.

Write the import java.util.regex.*; statement at the beginning of every example.

Pattern.compile() and Matcher.find() - finding a match in a string

The Pattern.compile() method compiles a regex pattern, and the Matcher.find() method locates the first occurrence of the pattern in a string. If no match is found, it returns false.


String pattern = "Java";
String input = "I am learning Java programming.";
Pattern compiledPattern = Pattern.compile(pattern); // compiling the regex into a pattern
Matcher matcher = compiledPattern.matcher(input); // creating a matcher for the input text

// Checking if the pattern matches
if (matcher.find())  
    System.out.println("Match found!");  
else
    System.out.println("No match");  
                                    

Matcher.matches() - matching the entire string

The Matcher.matches() method checks if the entire string matches the pattern.


String pattern = "I am learning Java programming.";
String input = "I am learning Java programming.";
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(input);

if (matcher.matches())
    System.out.println("Exact match found!");  
else
    System.out.println("No exact match");  
                                    

Pattern.split() - splitting strings

The Pattern.split() method splits a string based on a given regex pattern and returns an array of strings.


String input = "apple,banana,orange";
String pattern = ",";
Pattern compiledPattern = Pattern.compile(pattern);
String[] result = compiledPattern.split(input);

for (String s: result)
    System.out.println(s); 
                                    

Matcher.group() - extracting specific parts of a match

The Matcher.group() method retrieves specific matched groups in the pattern. Groups are defined using parentheses in the regex.


String pattern = "(\\d{3})-(\\d{2})-(\\d{4})"; // grouping digits
String input = "My SSN is 123-45-6789.";
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(input);

if (matcher.find()) {
    System.out.println("Entire match: " + matcher.group()); // 123-45-6789
    System.out.println("Group 1: " + matcher.group(1)); // 123
    System.out.println("Group 2: " + matcher.group(2)); // 45
    System.out.println("Group 3: " + matcher.group(3)); // 6789
}           

Special sequences

Java regex supports special sequences that simplify pattern creation:

\\d Matching any digit (equivalent to [0-9]).
\\D Matching any non-digit character.
\\w Matching any word character (alphanumeric + underscore).
\\W Matching any non-word character.
\\s Matching any whitespace character.
\\S Matching any non-whitespace character.

String pattern = "\\d+";
String input = "There are 12 cats and 34 dogs.";
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(input);

while (matcher.find()) 
    System.out.println(matcher.group()); // 12 and 34
                                    

Regex options (flags)

The Pattern class provides several flags to modify regex behavior:

Pattern.CASE_INSENSITIVE Making the regex case-insensitive.
Pattern.DOTALL Allowing the . character to match newline characters.
Pattern.COMMENTS Allowing the use of whitespace and comments in regex patterns for better readability.

// Using CASE_INSENSITIVE
Pattern pattern = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("I love JAVA programming.");
if (matcher.find())
    System.out.println("Case-insensitive match found!");  

// Using DOTALL
pattern = Pattern.compile("Hello.*World", Pattern.DOTALL);
matcher = pattern.matcher("Hello\nWorld");
if (matcher.find())
    System.out.println("Match found with DOTALL!");  

// Using COMMENTS
pattern = Pattern.compile(
    "\\d{3}-\\d{2}-\\d{4}", 
    Pattern.COMMENTS
);
matcher = pattern.matcher("123-45-6789");
if (matcher.find()) 
    System.out.println("Match found with comments enabled!");  
                                    

All useful methods

Pattern.compile(pattern flags) Compiling the given regex pattern with optional flags.
Pattern.matcher(input) Returning a Matcher instance for the specified input text.
Pattern.pattern() Returning the string pattern used to create the instance.
Pattern.flags() Returning the flags bit mask used to compile the pattern.
Pattern.split(input, limit) Splitting the input around matches found by the pattern.
Matcher.find(start) Finding the next match in the input, starting from the specified index.
Matcher.matches() Checking if the entire input matches the pattern.
Matcher.group(group) Returning the matched subsequence for the specified group.
Matcher.start(group) Returning the start index of the match for the specified group.
Matcher.end(group) Returning the end index of the match for the specified group.
Matcher.replaceAll(replacement) Replacing all matches with the given replacement string.
Matcher.reset(input) Resetting the matcher with new input text.