Why should I use Pattern Search Algorithms?
PatternSearchAlgorithms provides high efficient regular expression search and matching through using a deterministic finite automaton (DFA). It does not provide features as group capturing, matching modes or backreferences. It is just efficient for complex patterns and large texts. The match time of such an automaton is linear dependent on the number of chars to match (O(n), where n = number of chars to match). Common regex packages (as provided by the JDK) use nondeterministic automatons (NFA) to capture the regular expression, allowing several special features but implying low performance on complex patterns and large texts. The match time of an NFA based solution is dependent on the nodes in the automaton and the chars to match (O(m2n), where m = number of automaton nodes, n = number of chars to match).
Usage
Available PatternOptions:
Pattern.compile allows us to pass a varargs argument of PatternOptions. The current available options are: -
OptimizationTarget.MATCHfor fast matching andOptimizationTarget.SEARCHfor fast searching -
RegexOption.DOTALLto make the dot match all characters (by default it does not match line endings) -
CharsetOptionto specify the charset to be matched -
SearchModeto specify the search variant (default is longest non overlapping)
Matching Patterns:
Matching regular expressions is similar to matching with the JDK. Just use Pattern.compile with OptimizationTarget.MATCH.
Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.MATCH);
Matcher matcher = pattern.matcher("more efficient");
System.out.prinltn("matches: " + matcher.matches()); // matches:true
...
Matcher matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
System.out.prinltn("matches: " + matcher.matches()); // matches:falseSearching Patterns:
Searching regular expressions is similar to matching with the JDK. Just use Pattern.compile with OptimizationTarget.SEARCH.
Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.SEARCH);
Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
while (matcher.find()) {
System.out.println("found text = " + matcher.match.text()); // found text = more efficient
System.out.println("at = " + matcher.match.start()); // at = 27
System.out.println("to = " + matcher.match.end()); // to = 14
}
...
Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
for (Match match : matcher.findAll()) {
System.out.println("found text = " + match.text()); // found text = more efficient
System.out.println("at = " + match.start()); // at = 27
System.out.println("to = " + match.end()); // to = 14
}Benchmarks
To keep track of the performance of each algorithm we created the benchmark project RegexBench.
RegexBench compares all available algorithms (of Pattern Search Algorithms and other libraries) is planned to benchmark regular expression search in realistic scenarios.