Character Classes in Python

Character classes match only one of a specific set of characters. A character class is created by putting the characters it matches inside square brackets.

Example:

import re

pattern = r"[abcd]"

if re.search(pattern, "dan"):
	print("Match 1")

if re.search(pattern, "roof"):
	print("Match 2")

if re.search(pattern, "ZORO"):
	print("Match 3")

if re.search(pattern, "DAN"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
ddn_ro@linux:~/Desktop$

The pattern [abcd] in the search function matches all strings that contain any one of the characters defined.

Character classes can also match ranges of characters.

The class [a-z] matches any lowercase alphabetic character.
The class [M-V] matches any uppercase character from M to V.
The class [0-9] matches any digit.

Multiple ranges can be included in one class. For example, [A-Za-z] matches a letter of any case.

Example:

import re

pattern = r"[A-Z][A-Z][A-Z][0-9]"

if re.search(pattern, "dan"):
	print("Match 1")

if re.search(pattern, "roof"):
	print("Match 2")

if re.search(pattern, "ZORO"):
	print("Match 3")

if re.search(pattern, "DAN7"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 4
ddn_ro@linux:~/Desktop$

The pattern in the example above matches strings that contain three alphabetic uppercase letters followed by a digit.

If you place a ^ at the start of a character class it inverts it. This causes it to match any character other than the ones included.

Note:
Ther metacharacters $ and ., have no meaning within character classes. The metacharacter ^ has no meaning unless it is the first character in a class.

Example:

import re

pattern = r"[^A-Z]"

if re.search(pattern, "dan"):
	print("Match 1")

if re.search(pattern, "roof"):
	print("Match 2")

if re.search(pattern, "ZORO"):
	print("Match 3")

if re.search(pattern, "DAN"):
	print("Match 4")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
Match 1
Match 2
ddn_ro@linux:~/Desktop$

The pattern [^A-Z] excludes uppercase strings.

Note:
The ^ should be inside the brackets to invert the character class.

Leave a Reply