- 3 Minutes to read
- Print
- DarkLight
- PDF
Regex
- 3 Minutes to read
- Print
- DarkLight
- PDF
A regular expression (regex) is a specially encoded sequence of characters that defines a search pattern. Using that pattern, you can find a matching character combination in a string or validate data input. If you are familiar with a wildcard notation, you can think of regexes as an advanced version of wildcards.
Regular expressions have their own syntax consisting of special characters, operators, and constructs.
Characters
Purpose: Match specific characters
Pattern | Description | Example | Matches |
---|---|---|---|
. | Wildcard character: matches any single character except a line break | .at | cat, mat, pat, @at |
\d | Digit character: any single digit from 0 to 9 | \d | In W1N, matches 1 |
\D | Any character that is NOT a digit | \D | In W1N, matches W and N |
\s | Whitespace character: space, tab, new line and carriage return | .\s. | In 1 space, matches 1 s |
\S | Any non-whitespace character | \S+ | In big space, matches big and space |
\w | Word character: any ASCII letter, digit or underscore | \w+ | In under_score***, matches under_score |
\W | Any character that is NOT an alphanumeric character or underscore | \W+ | In under_score***, matches *** |
\t | Tab | ||
\n | New line | \n\d+ | In the two-line string below, matches 3 |
2 calling birds 3 French hens | |||
\ | Escapes special meaning of a character, so you can search for it | \. | Escapes a period so you can find the literal "." character in a string |
+\. | Mr., Mrs., Prof. |
Character classes
Purpose: Match elements of character sets
Pattern | Description | Example | Matches |
---|---|---|---|
[characters] | Matches any single character in the brackets | b[ae]g | bag and beg |
[^characters] | Matches any single character NOT in the brackets | b[^ae]g | Matches big, bug, b1g |
Does not match bag and beg | |||
[from–to] | Matches any character in the range between the brackets | [0-9] | Any single digit from 0 to 9 |
[a-z] | Any single lowercase letter | ||
[A-Z] | Any single uppercase letter |
Quantifiers
Purpose: Specify the number of characters to match (always applies to the character before it)
Pattern | Description | Example | Matches |
---|---|---|---|
* | Zero or more occurrences | 1a* | 1, 1a, 1aa, 1aaa, etc. |
+ | One or more occurrences | po+ | In pot, matches po |
In poor, matches poo | |||
? | Zero or one occurrence | roa?d | road, rod |
*? | Zero or more occurrences, but as fewer as possible | 1a*? | In 1a, 1aa and 1aaa, matches 1a |
+? | One or more occurrences, but as fewer as possible | po+? | In pot and poor, matches po |
?? | Zero or one occurrence, but as fewer as possible | roa?? | In road and rod, matches ro |
{n} | Matches the preceding pattern n times | \d{3} | Exactly 3 digits |
{n,} | Matches the preceding pattern n or more times | \d{3,} | 3 or more digits |
{n,m} | Matches the preceding pattern between n and m times | \d{3,5} | From 3 to 5 digits |
Grouping
Purpose: Used to capture a substring from the source string, so you can perform some operation with it.
Syntax | Description | Example | Matches |
---|---|---|---|
(pattern) | Capturing group: captures a matching substring and assigns it an ordinal number | (\d+) | In 5 cats and 10 dogs, captures 5 (group 1) and 10 (group 2) |
(?:pattern) | Non-capturing group: matches a group but does not capture it | (\d+)(?: dogs) | In 5 cats and 10 dogs, captures 10 |
\1 | Contents of group 1 | (\d+)+(\d+)=\2+\1 | 5+10=10+5 |
\2 | Contents of group 2 |
Anchors
Purpose: Specifies a position in the input string in which to look for a match.
Anchor | Description | Example | Matches |
---|---|---|---|
^ | Start of string | ^\d+ | Any number of digits at the start of the string. |
Note: [^inside brackets] means "not" | In 5 cats and 10 dogs, matches 5 | ||
$ | End of string | \d+$ | Any number of digits at the end of the string. |
In 10 plus 5 gives 15, matches 15 | |||
\b | Word boundary | \bjoy\b | Matches joy as a separate word, but not in enjoyable. |
\B | Word NOT a word boundary | \Bjoy\B | Matches joy in enjoyable, but not as a separate word. |
Alternation (OR) construct
Purpose: Enables the OR logic to order to match multiple elements.
Construct | Description | Example | Matches |
---|---|---|---|
| | Matches any single element separated by the vertical bar | (s|sh)ells | In she sells sea-shells, matches sells and shells |
Lookahead and Lookbehind
Purpose: Helpful for matching something that is or isn't followed or preceded by something else.
These expressions are sometimes called "zero-width assertions" or "zero-width match" because they match a position rather than actual characters.
Pattern | Description | Example | Matches |
---|---|---|---|
(?=) | Positive lookahead | X(?=Y) | Matches expression X when it is followed by Y (i.e. if there is Y ahead of X) |
(?!) | Negative lookahead | X(?!Y) | Matches expression X if it is NOT followed by Y |
(?<=) | Positive lookbehind | (?<=Y)X | Matches expression X when it is preceded by Y (i.e. if there is Y behind of X) |
(?<!) | Negative lookbehind | (?<!Y)X | Matches expression X when it is NOT preceded by Y |