Regular Expressions Syntax

Literals

All characters are taken literally except the following:
".", "|", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\".
These characters have special meaning and must be preceded by a "\" to be taken literally.

Wildcards

The dot "." matches any characters including new line symbols [CR] and [LF].

Repeats

An expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times excluding zero.
An expression followed by "?" can be repeated no more than one time.
The bounds "{" "}" may be used to specify number of repetitions: "{N}" means that the expression must be repeated N times, "{N,M}" means that the expression must be repeated N to M times.

Subexpressions and parenthesis

Parenthesis "(" ")" are used to mark subexpressions which which are counted starting from 1 from left to right. Subexpression zero is the whole match of the expression.

Alternatives

Alternative expressions are separated by "|" or put on separate lines in the expression.

Line anchors

The empty string at the beginning of line is matched by "^" character.
The empty string at the end of line is matched by "$" character.

Text anchors

"\`" matches the start of the whole text.
"\A" matches the start of the whole text.
"\'" matches the end of a whole text.
"\z" matches the end of a whole text.
"\Z" matches the end of a whole text, or any new line characters at the end.

Character sets

The character set enclosed in brackets "[" "]" matches any symbol it contains, for example "[abc]" matches either "a", "b" or "c".
Sets that start with "^" matches any character that is not member of the set, for example "[^abc]" matches any character except "a", "b" and "c".
Character ranges can be specified as "[a-d]", which matches any symbol betweed "a" and "d".
Character classes are denoted by "[:class:]" within a set declaration.
Commonly used character sets are:
[:alnum:]Alpha numeric character.
[:alpha:]Alphabetical character a-z and A-Z.
[:blank:]Blank character, either a space or a tab.
[:cntrl:]Control character.
[:digit:]Digit 0-9.
[:graph:]Graphical character.
[:lower:]Lower case character a-z.
[:print:]Printable character.
[:punct:]Punctuation character.
[:space:]Whitespace character.
[:upper:]Upper case character A-Z.
[:xdigit:]Hexadecimal digit character, 0-9, a-f and A-F.
[:word:]Word character - all alphanumeric characters plus the underscore.
[:Unicode:]Character whose code is greater than 255, this applies to the Unicode characters only.

Character codes

The characters may be matched by octal code "\0NNN" or hexademical code "\xHH", enclosed in brackets "{" "}" if necessary: "\0{NNN}" "\x{HH}".

Word operators

"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
The beginning of the text is a potential start of the word and the end of the text is a potential end of the word.

Back references

Subexpressions may be identified and the matched text used further in the expression by labels "\1" to "\9".

Miscellaneous escape sequences

\wEquivalent to [[:word:]].
\WEquivalent to [^[:word:]].
\sEquivalent to [[:space:]].
\SEquivalent to [^[:space:]].
\dEquivalent to [[:digit:]].
\DEquivalent to [^[:digit:]].
\lEquivalent to [[:lower:]].
\LEquivalent to [^[:lower:]].
\uEquivalent to [[:upper:]].
\UEquivalent to [^[:upper:]].
\CAny single character, equivalent to ".".
\XMatch any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\QThe begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\EThe end quote operator, terminates a sequence started with \Q.
\aBell character 0x07.
\fForm feed character 0x0C.
\nNewline character 0x0A.
\rCarriage return character 0x0D.
\tTab character 0x09.
\vVertical tab character 0x0B.
\eASCII Escape character 0x1B.
\0ddAn octal character code, where dd is one or more octal digits.
\xXXA hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX}A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a Unicode character.
\cZAn ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.