Home Documentation Download Order

Regent Regent

Microsoft Word wildcards as regular expressions

Microsoft Word's Find and Replace dialog Find and Replace dialog in Microsoft Word (XP, 2003 and 2007) supports wildcard mode that is very similar to regular expressions. For example, find ([a-z]{1,}) and replace "\1" expression double quotes all lowercase English strings and substrings: ABCabc becomes ABC"abc".

Microsoft Word doesn't have special symbol for greedy one or more, but {1,} is a good workaround. (Except it doesn't work with any character - ?{1,} matches no more than one.)

Separator in repeat count expression {min,max} in Microsoft Word depends on user's regional options. For example, it is {min,max} for English and {min;max} for Russian. Regent follows it and uses list separator from current user's regional options as repeat count separator in search.

Parts of regular expressions that are not supported by Microsoft Word wildcards are: Lack of nested tagged expressions and non-capture groups are the reasons why Regent doesn't suggest symbols and elements repetition for Microsoft Word.

To search for letters in all languages Regent combines English letters with 00c0-00d6, 00d8-00f6, 00f8-02af and 0370-1FFF Unicode ranges into [A-Za-zÀ-ÖØ-öø-ʯͰ-῿]. It's OK that last three characters in the expression are displayed as squares on most systems, Microsoft Word search uses correct character values in spite of that.

Instead of back slash character ^92 must be used in Replace with expression. For example, to find 1/2 and replace it with 1\2 use regex ([0-9])/([0-9]) and format \1^92\2. Additionally, if digit follows ^92 in format string it must be represented by its ASCII code. For example, to find 1/2 and replace it with 01\02 use regex ([0-9])/([0-9]) and format 0\1^92^48\2. Regent handles these cases automatically.

In at least two cases Microsoft Word processes escaped characters near right parenthesis in regular expression incorrectly: (\\) matches any symbol instead of back slash and (\() is not recognized as expression to search for left parenthesis at all. As a workaround Regent encloses escaped back slash and left parenthesis in square brackets at regular expression end: ([\\]) and ([\(]).

Microsoft Word supports up to 9 tagged expressions. (1)(2)(3)(4)(5)(6)(7)(8)(9) is valid, (1)(2)(3)(4)(5)(6)(7)(8)(9)(10) is not.