Pattern Search Algorithms

PatternSearchAlgorithms provides highly efficient algorithms for pattern (regular expression) search. PatternSearchAlgorithms supports search in Strings, char-arrays, byte-arrays, InputStreams, Readers and Buffers.

Get Pattern Search Algorithms Github

Why should I use Pattern Search Algorithms?

PatternSearchAlgorithms provides high efficient regular expression search and matching through using a deterministic finite automaton (DFA). It does not provide features as group capturing, matching modes or backreferences. It is just efficient for complex patterns and large texts. The match time of such an automaton is linear dependent on the number of chars to match (O(n), where n = number of chars to match). Common regex packages (as provided by the JDK) use nondeterministic automatons (NFA) to capture the regular expression, allowing several special features but implying low performance on complex patterns and large texts. The match time of an NFA based solution is dependent on the nodes in the automaton and the chars to match (O(m2n), where m = number of automaton nodes, n = number of chars to match).

Usage

Available PatternOptions:

Pattern.compile allows us to pass a varargs argument of PatternOptions. The current available options are:
  • OptimizationTarget.MATCH for fast matching and OptimizationTarget.SEARCH for fast searching
  • RegexOption.DOTALL to make the dot match all characters (by default it does not match line endings)
  • CharsetOption to specify the charset to be matched
  • SearchMode to specify the search variant (default is longest non overlapping)

Matching Patterns:

Matching regular expressions is similar to matching with the JDK. Just use Pattern.compile with OptimizationTarget.MATCH.

 Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.MATCH);
 
 Matcher matcher = pattern.matcher("more efficient");
 System.out.prinltn("matches: " + matcher.matches()); // matches:true
 
 ...
 
 Matcher matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
 System.out.prinltn("matches: " + matcher.matches()); // matches:false

Searching Patterns:

Searching regular expressions is similar to matching with the JDK. Just use Pattern.compile with OptimizationTarget.SEARCH.

 Pattern pattern = Pattern.compile("(more|less) efficient", OptimizationTarget.SEARCH);
 
 Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
 while (matcher.find()) {
     System.out.println("found text = " + matcher.match.text()); // found text = more efficient
     System.out.println("at = " + matcher.match.start()); // at = 27
     System.out.println("to = " + matcher.match.end()); // to = 14
 }
 
 ...
 
 Finder matcher = pattern.matcher("patternsearchalgorithms is more efficient than java.util.regex");
 for (Match match : matcher.findAll()) {
     System.out.println("found text = " + match.text()); // found text = more efficient
     System.out.println("at = " + match.start()); // at = 27
     System.out.println("to = " + match.end()); // to = 14
 }

Benchmarks

To keep track of the performance of each algorithm we created the benchmark project RegexBench.

RegexBench compares all available algorithms (of Pattern Search Algorithms and other libraries) is planned to benchmark regular expression search in realistic scenarios.

Authors

Stefan Mandel