Home Documentation Download Order

Regent Regent

Regular expressions tutorial

Search and replace basics

Regular expressions are used for text search when you need to find not the exact phrase, but something like any e-mail or any IP address in a text. When text is found it can be modified by formatting regular expression, for example replacing @ in e-email with  at .

Simplistically, e-mail like account1@example.com can be described as one or more letters and digits followed by @ followed by letters followed by dot and followed by more letters. Regular expressions has keywords to represent any letter, any digit, explicit symbol and repetition. The keywords are different in different regular expressions languages. In this tutorial I will use Perl compatible regular expressions language as most capable. In this language keyword for any letter is [[:alpha:]], letters and digits are [[:alnum:]], @ is @ and dot needs to be escaped by back slash. Repetition for one or more is +. Combining it we get [[:alnum:]]+@[[:alpha:]]+\.[[:alpha:]]+ simplistic regular expression to search for e-mails in text.

To replace @ with  at  in e-mail we need to take part of e-mail before @, append  at  and append part of e-mail after @. In search regular expression we must tag parts of e-mail before and after @ with parenthesis - ([[:alnum:]]+)@([[:alpha:]]+\.[[:alpha:]]+). Tagged expressions are automatically numbered from left to right starting with 1 and we can use them in formatting regular expression preceding numbers with $ - $1 at $2.

Greedy and Lazy matching

Greedy vs lazy matching can be illustrated in following example. Having string MessageBox("Hello", "World"); greedy search regular expression ".*" matches "Hello", "World", while lazy search regular expression ".*?" matches "Hello" and "World" on subsequent search.

Back reference

Back reference in search regular expression can find duplicates. Having long long int string, (.+)\1 will match long long .

Non-capture group

Non-capture group separates groups destined for replace operation and organizational groups. If we want to replace int m_i; char m_c; with int i; char c; we can use regular expression ((?:int|char) )m_([[:alpha:]]+;) and format string $1$2. Without non-capture symbols ?: in regular expression, format string would be less obvious $1$3.

Lookahead and lookbehind

Having a++; a--; a + 1; a - 1; we can search for a in unary operations using negative lookahead a(?! ). It will match only first two a symbols. Alternatively we can use positive lookahead a(?=[+-]) to achieve same result.

Having the same a++; a--; a + 1; a - 1; positive lookbehind (?<=a)[+-] and negative lookbehind (?<![ +-])[+-] will match only first two + and - symbols.