Learn Regex The Hard Way Scanning And Parsing Text Without Going Insane

If you run into strings like “\s+.?(?i)a+b?” and your eyes glaze over like a pair of old fashioned donuts, then this book is for you. When you’re done you will be able to read that string, understand what’s going on with regex, learn when to use them, learn to write them, and learn how to write simple parsers to avoid abusing regex to process your strings.


XRegExp: JavaScript regex library

XRegExp is an open source (MIT license) JavaScript library that provides an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods.

XRegExp is fully compliant with the regular expression flavor specified in ES3 and ES5, and has been tested with Internet Explorer 5.5–8, Firefox 2–3.6, Safari 3–4, Chrome 1–4, and Opera 9.5–10.5. It uses feature detection for its cross-browser support—no browser sniffing.


RE2: a principled approach to regular expression matching

The feature-rich regular expression implementations of today are based on a backtracking search with a potential for exponential run time and unbounded stack usage. At Google, we use regular expressions as part of the interface to many external and internal systems, including Code Search, Sawzall, and Bigtable. Those systems process large amounts of data; exponential run time would be a serious problem. On a more practical note, these are multithreaded C++ programs with fixed-size stacks: the unbounded stack usage in typical regular expression implementations leads to stack overflows and server crashes. To solve both problems, we’ve built a new regular expression engine, called RE2, which is based on automata theory and guarantees that searches complete in linear time with respect to the size of the input and in a fixed amount of stack space.

The Regex Coach – interactive regular expressions
The Regex Coach is a graphical application for Linux and Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features:

  • It shows whether a regular expression matches a particular target string.
  • It can also show which parts of the target string correspond to captured register groups or to arbitrary parts of the regular expression.
  • It can “walk” through the target string one match at a time.
  • It can simulate Perl’s split and s/// (substitution) operators.
  • It tries to describe the regular expression in plain English.
  • It can show a graphical representation of the regular expression’s parse tree.
  • It can single-step through the matching process as performed by the regex engine.
  • Everything happens in “real time”, i.e. as soon as you make a change somewhere in the application all other parts are instantly updated.

.