Metacharacters for repetitions in Python

The following metacharacters specify numbers of repetitions: *, +, ?, and {}.

The metacharacter *

The metacharacter * means “zero or more repetitions of the previous thing”. It tries to match as many repetitions as possible. The “previous thing” can be a single character, a class, or a group of characters in parentheses.

Example:

import re

pattern = r"Dan(40)*"

if re.search(pattern, "dan is 40 years old."):
	print("Match 1")

if re.search(pattern, "Dan is 40 years old."):
	print("Match 2")

if re.search(pattern, "Dan Dumitrache"):
	print("Match 3")

if re.search(pattern, "Dan@mail40.com"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 2
Match 3
Match 4
ddn_ro@linux:~/Desktop$

The code above matches strings that start with “Dan” and follow with zero or more “40”s.

The metacharacter +

The metacharacter + is very similar to *, except it means “one or more repetitions”, as opposed to “zero or more repetitions”.

Example:

import re

pattern = r"Dan(40)+"

if re.search(pattern, "dan is 40 years old."):
	print("Match 1")

if re.search(pattern, "Dan is 40 years old."):
	print("Match 2")

if re.search(pattern, "Dan Dumitrache"):
	print("Match 3")

if re.search(pattern, "Dan40@mail.com"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 4
ddn_ro@linux:~/Desktop$

The code above matches strings that start with “Dan” and follow with at least one “40”.

The metacharacter ?

The metacharacter ? means “zero or one repetitions”.

Example:

import re

pattern = r"Dan(-)?Dumitrache.?"

if re.search(pattern, "DanDumitrache."):
	print("Match 1")

if re.search(pattern, "Dan-Dumitrache."):
	print("Match 2")

if re.search(pattern, "Dan--Dumitrache"):
	print("Match 3")

if re.search(pattern, "Dan@mail--40.com"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
Match 2
ddn_ro@linux:~/Desktop$

The code above matches only “DanDumitrache” or “Dan-Dumitrache”. It won’t match “Dan Dumitrache” (with the space between Dan and Dumitrache) for example.

The metacharacters {}

The {} can be used to represent the number of repetitions between two numbers.
The regex {x,y} means “between x and y repetitions of something”.
Hence {0,1} is the same thing as ?.

If the first number is missing, it is taken to be zero. If the second number is missing, it is taken to be infinity.

Example:

import re

pattern = r"5{1,3}$"

if re.match(pattern, "5"):
	print("Match 1")

if re.match(pattern, "55"):
	print("Match 2")

if re.match(pattern, "555"):
	print("Match 3")

if re.match(pattern, "5555"):
	print("Match 3")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
Match 2
Match 3
ddn_ro@linux:~/Desktop$

The 5{1,3}$ matches string that have 1 to 3 fives.

Leave a Reply