Regex

A regular expression (regex) is a specially encoded sequence of characters that defines a search pattern. Using that pattern, you can find a matching character combination in a string or validate data input. If you are familiar with a wildcard notation, you can think of regexes as an advanced version of wildcards.

Regular expressions have their own syntax consisting of special characters, operators, and constructs.

Characters

Purpose: Match specific characters

Pattern Description Example Matches
. Wildcard character: matches any single character except a line break .at cat, mat, pat, @at
\d Digit character: any single digit from 0 to 9 \d In W1N, matches 1
\D Any character that is NOT a digit \D In W1N, matches W and N
\s Whitespace character: space, tab, new line and carriage return .\s. In 1 space, matches 1 s
\S Any non-whitespace character \S+ In big space, matches big and space
\w Word character: any ASCII letter, digit or underscore \w+ In under_score***, matches under_score
\W Any character that is NOT an alphanumeric character or underscore \W+ In under_score***, matches ***
\t Tab
\n New line \n\d+ In the two-line string below, matches 3
2 calling birds 3 French hens
\ Escapes special meaning of a character, so you can search for it \. Escapes a period so you can find the literal "." character in a string
+\. Mr., Mrs., Prof.

Character classes

Purpose: Match elements of character sets

Pattern Description Example Matches
[characters] Matches any single character in the brackets b[ae]g bag and beg
[^characters] Matches any single character NOT in the brackets b[^ae]g Matches big, bug, b1g
Does not match bag and beg
[from–to] Matches any character in the range between the brackets [0-9] Any single digit from 0 to 9
[a-z] Any single lowercase letter
[A-Z] Any single uppercase letter

Quantifiers

Purpose: Specify the number of characters to match (always applies to the character before it)

Pattern Description Example Matches
* Zero or more occurrences 1a* 1, 1a, 1aa, 1aaa, etc.
+ One or more occurrences po+ In pot, matches po
In poor, matches poo
? Zero or one occurrence roa?d road, rod
*? Zero or more occurrences, but as fewer as possible 1a*? In 1a, 1aa and 1aaa, matches 1a
+? One or more occurrences, but as fewer as possible po+? In pot and poor, matches po
?? Zero or one occurrence, but as fewer as possible roa?? In road and rod, matches ro
{n} Matches the preceding pattern n times \d{3} Exactly 3 digits
{n,} Matches the preceding pattern n or more times \d{3,} 3 or more digits
{n,m} Matches the preceding pattern between n and m times \d{3,5} From 3 to 5 digits

Grouping

Purpose: Used to capture a substring from the source string, so you can perform some operation with it.

Syntax Description Example Matches
(pattern) Capturing group: captures a matching substring and assigns it an ordinal number (\d+) In 5 cats and 10 dogs, captures 5 (group 1) and 10 (group 2)
(?:pattern) Non-capturing group: matches a group but does not capture it (\d+)(?: dogs) In 5 cats and 10 dogs, captures 10
\1 Contents of group 1 (\d+)+(\d+)=\2+\1 5+10=10+5
\2 Contents of group 2

Anchors

Purpose: Specifies a position in the input string in which to look for a match.

Anchor Description Example Matches
^ Start of string ^\d+ Any number of digits at the start of the string.
Note: [^inside brackets] means "not" In 5 cats and 10 dogs, matches 5
$ End of string \d+$ Any number of digits at the end of the string.
In 10 plus 5 gives 15, matches 15
\b Word boundary \bjoy\b Matches joy as a separate word, but not in enjoyable.
\B Word NOT a word boundary \Bjoy\B Matches joy in enjoyable, but not as a separate word.

Alternation (OR) construct

Purpose: Enables the OR logic to order to match multiple elements.

Construct Description Example Matches
| Matches any single element separated by the vertical bar (s|sh)ells In she sells sea-shells, matches sells and shells

Lookahead and Lookbehind

Purpose: Helpful for matching something that is or isn't followed or preceded by something else.
These expressions are sometimes called "zero-width assertions" or "zero-width match" because they match a position rather than actual characters.

Pattern Description Example Matches
(?=) Positive lookahead X(?=Y) Matches expression X when it is followed by Y (i.e. if there is Y ahead of X)
(?!) Negative lookahead X(?!Y) Matches expression X if it is NOT followed by Y
(?<=) Positive lookbehind (?<=Y)X Matches expression X when it is preceded by Y (i.e. if there is Y behind of X)
(?<!) Negative lookbehind (?<!Y)X Matches expression X when it is NOT preceded by Y