Why should I use Pattern Search Algorithms?
PatternSearchAlgorithms provides high efficient regular expression search and matching through using a deterministic finite automaton (DFA). It does not provide features as group capturing, matching modes or backreferences. It is just efficient for complex patterns and large texts. The match time of such an automaton is linear dependent on the number of chars to match (O(n), where n = number of chars to match). Common regex packages (as provided by the JDK) use nondeterministic automatons (NFA) to capture the regular expression, allowing several special features but implying low performance on complex patterns and large texts. The match time of an NFA based solution is dependent on the nodes in the automaton and the chars to match (O(m2n), where m = number of automaton nodes, n = number of chars to match).
Usage
Available PatternOptions:
Pattern.compile
allows us to pass a varargs argument of PatternOption
s. The current available options are: -
OptimizationTarget.MATCH
for fast matching andOptimizationTarget.SEARCH
for fast searching -
RegexOption.DOTALL
to make the dot match all characters (by default it does not match line endings) -
CharsetOption
to specify the charset to be matched -
SearchMode
to specify the search variant (default is longest non overlapping)
Matching Patterns:
Matching regular expressions is similar to matching with the JDK. Just use Pattern.compile
with OptimizationTarget.MATCH
.
Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.MATCH);
Matcher matcher = pattern.matcher("more efficient");
System.out.prinltn("matches: " + matcher.matches()); // matches:true
...
Matcher matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
System.out.prinltn("matches: " + matcher.matches()); // matches:false
Searching Patterns:
Searching regular expressions is similar to matching with the JDK. Just use Pattern.compile
with OptimizationTarget.SEARCH
.
Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.SEARCH);
Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
while (matcher.find()) {
System.out.println("found text = " + matcher.match.text()); // found text = more efficient
System.out.println("at = " + matcher.match.start()); // at = 27
System.out.println("to = " + matcher.match.end()); // to = 14
}
...
Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
for (Match match : matcher.findAll()) {
System.out.println("found text = " + match.text()); // found text = more efficient
System.out.println("at = " + match.start()); // at = 27
System.out.println("to = " + match.end()); // to = 14
}
Benchmarks
To keep track of the performance of each algorithm we created the benchmark project RegexBench.
RegexBench compares all available algorithms (of Pattern Search Algorithms and other libraries) is planned to benchmark regular expression search in realistic scenarios.