Python Regular Expressions | Special Sequences

Special sequences are written as a backslash followed by another character.

The \1-99 special sequence

One useful special sequence is a backslash and a number between 1 and 99. This matches the expression of the group of that number.

Example:

import re

pattern = r"(.+) \1"

match = re.match(pattern, "Dan D Dan")
if match:
	print ("Match 1")

match = re.match(pattern, "?! ?! ?")
if match:
	print ("Match 2") 

match = re.match(pattern, "Dan Dan")
if match:
	print ("Match 3")
	
match = re.match(pattern, "Dan dan")
if match:
	print ("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 2
Match 3
ddn_ro@linux:~/Desktop$

Note:
The r"(.+) \1" is not the same as "(.+) (.+)", because \1 refers to the first group’s subexpression, which is the matched expression itself, and not the regex pattern.

The special sequences \d, \s, and \w

These match digits (\d), whitespace(\s), and word characters(\w) respectively.
In ASCII mode they are equivalent to [0-9] (for \d), [ \t\n\r\f\v], and [a-zA-Z0-9_].

In Unicode mode they match certain other characters, as well. For instance, \w matches letters with accents.

Note:
Versions of these special sequences with upper case letters – \D, \S, and \W – mean the opposite to the lower-case versions. For instance, \D matches anything that isn’t a digit.

Example:

import re

pattern = r"(\D+\d)"

match = re.match(pattern, "Hi 999")
if match:
	print("Match 1")

match = re.match(pattern, "Hi! 999!")
if match:
	print("Match 2")

match = re.match(pattern, "Hi8 999")
if match:
	print("Match 3")
	
match = re.match(pattern, "8Hi 999")
if match:
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
Match 2
Match 3
ddn_ro@linux:~/Desktop$

The \D+\d matches one or more non-digits followed by a digit.

The special sequences \A, \Z, and \b

The sequences \A and \Z match the beginning and end of a string, respectively.
The sequence \b matches the empty string between \w and \W characters, or \w characters and the beginning or end of the string. Informally, it represents the boundary between words.
The sequence \B matches the empty string anywhere else.

Example:

import re

pattern = r"\b(cat)\b"

match = re.search(pattern, "The cat sat!")
if match:
	print ("Match 1")

match = re.search(pattern, "We s>cat<tered?")
if match:
	print ("Match 2")

match = re.search(pattern, "We scattered.")
if match:
	print ("Match 3")
	
match = re.search(pattern, "We s8cat!tered.")
if match:
	print ("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
Match 2
Match 4
ddn_ro@linux:~/Desktop$

The r"\b(cat)\b" matches the word “cat” surrounded by word boundaries.

Leave a Reply