Regex
  • 3 Minutes to read
  • Dark
    Light
  • PDF

Regex

  • Dark
    Light
  • PDF

Article summary

A regular expression (regex) is a specially encoded sequence of characters that defines a search pattern. Using that pattern, you can find a matching character combination in a string or validate data input. If you are familiar with a wildcard notation, you can think of regexes as an advanced version of wildcards.

Regular expressions have their own syntax consisting of special characters, operators, and constructs.

Characters

Purpose: Match specific characters

PatternDescriptionExampleMatches
.Wildcard character: matches any single character except a line break.atcat, mat, pat, @at
\dDigit character: any single digit from 0 to 9\dIn W1N, matches 1
\DAny character that is NOT a digit\DIn W1N, matches W and N
\sWhitespace character: space, tab, new line and carriage return.\s.In 1 space, matches 1 s
\SAny non-whitespace character\S+In big space, matches big and space
\wWord character: any ASCII letter, digit or underscore\w+In under_score***, matches under_score
\WAny character that is NOT an alphanumeric character or underscore\W+In under_score***, matches ***
\tTab
\nNew line\n\d+In the two-line string below, matches 3
2 calling birds 3 French hens
\Escapes special meaning of a character, so you can search for it\.Escapes a period so you can find the literal "." character in a string
+\.Mr., Mrs., Prof.

Character classes

Purpose: Match elements of character sets

PatternDescriptionExampleMatches
[characters]Matches any single character in the bracketsb[ae]gbag and beg
[^characters]Matches any single character NOT in the bracketsb[^ae]gMatches big, bug, b1g
Does not match bag and beg
[from–to]Matches any character in the range between the brackets[0-9]Any single digit from 0 to 9
[a-z]Any single lowercase letter
[A-Z]Any single uppercase letter

Quantifiers

Purpose: Specify the number of characters to match (always applies to the character before it)

PatternDescriptionExampleMatches
*Zero or more occurrences1a*1, 1a, 1aa, 1aaa, etc.
+One or more occurrencespo+In pot, matches po
In poor, matches poo
?Zero or one occurrenceroa?droad, rod
*?Zero or more occurrences, but as fewer as possible1a*?In 1a, 1aa and 1aaa, matches 1a
+?One or more occurrences, but as fewer as possiblepo+?In pot and poor, matches po
??Zero or one occurrence, but as fewer as possibleroa??In road and rod, matches ro
{n}Matches the preceding pattern n times\d{3}Exactly 3 digits
{n,}Matches the preceding pattern n or more times\d{3,}3 or more digits
{n,m}Matches the preceding pattern between n and m times\d{3,5}From 3 to 5 digits

Grouping

Purpose: Used to capture a substring from the source string, so you can perform some operation with it.

SyntaxDescriptionExampleMatches
(pattern)Capturing group: captures a matching substring and assigns it an ordinal number(\d+)In 5 cats and 10 dogs, captures 5 (group 1) and 10 (group 2)
(?:pattern)Non-capturing group: matches a group but does not capture it(\d+)(?: dogs)In 5 cats and 10 dogs, captures 10
\1Contents of group 1(\d+)+(\d+)=\2+\15+10=10+5
\2Contents of group 2

Anchors

Purpose: Specifies a position in the input string in which to look for a match.

AnchorDescriptionExampleMatches
^Start of string^\d+Any number of digits at the start of the string.
Note: [^inside brackets] means "not"In 5 cats and 10 dogs, matches 5
$End of string\d+$Any number of digits at the end of the string.
In 10 plus 5 gives 15, matches 15
\bWord boundary\bjoy\bMatches joy as a separate word, but not in enjoyable.
\BWord NOT a word boundary\Bjoy\BMatches joy in enjoyable, but not as a separate word.

Alternation (OR) construct

Purpose: Enables the OR logic to order to match multiple elements.

ConstructDescriptionExampleMatches
|Matches any single element separated by the vertical bar(s|sh)ellsIn she sells sea-shells, matches sells and shells

Lookahead and Lookbehind

Purpose: Helpful for matching something that is or isn't followed or preceded by something else.
These expressions are sometimes called "zero-width assertions" or "zero-width match" because they match a position rather than actual characters.

PatternDescriptionExampleMatches
(?=)Positive lookaheadX(?=Y)Matches expression X when it is followed by Y (i.e. if there is Y ahead of X)
(?!)Negative lookaheadX(?!Y)Matches expression X if it is NOT followed by Y
(?<=)Positive lookbehind(?<=Y)XMatches expression X when it is preceded by Y (i.e. if there is Y behind of X)
(?<!)Negative lookbehind(?<!Y)XMatches expression X when it is NOT preceded by Y

Was this article helpful?