Regular Expressions in Python | Introduction

Regular expressions are a powerful tool for various kinds of string manipulation.

The two main uses of regular expressions are:

  • Verifying that strings match a pattern
  • Performing substitutions in a string

They are a domain specific language (DSL) that is present as a library in most modern programming languages, not just Python. Domain specific languages are highly specialized mini programming languages. Regular expressions are a popular example, and SQL (for database manipulation) is another.

Note:
Private domain-specific languages are often used for specific industrial purposes.

Regular expressions in Python can be accessed using the re module, which is part of the standard library.

After you’ve defined a regular expression, the re.match() function can be used to determine whether it matches at the beginning of a string. If it does, match() returns an object representing the match, if not, it returns None.

Example:

import re

pattern = r"Dan"

if re.match(pattern, "Dan is a web developer!"):
	print("Match")
else:
	print("No match")

The terminal output will be:

ddn_ro@linux:~/Desktop$ python3 file.py
Match
ddn_ro@linux:~/Desktop$

The above example checks if the pattern “Dan” matches the string and prints “Match” if it does.
Here the pattern is a simple word, but there are various characters, which would have special meaning when they are used in a regular expression.

Other functions to match patterns are re.search() and re.findall().
The function re.search() finds a match of a pattern anywhere in the string.
The function re.findall() returns a list of all substrings that match a pattern.

Example:

import re

pattern = r"developer"

if re.match(pattern, "Dan is a web developer!"):
	print("Match")
else:
	print("No match")

if re.search(pattern, "Dan is a web developer!"):
	print("Match")
else:
	print("No match")

print(re.findall(pattern, "Dan is a web developer!"))

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
No match
Match
['developer']
ddn_ro@linux:~/Desktop$

In the example above, the match() function did not match the pattern, as it looks at the beginning of the string.
The search() function found a match in the string.
The function re.finditer() does the same thing as re.findall(), except it returns an iterator, rather than a list.

The regex search() returns an object with several methods that give details about it.
The method group() returns the string matched.
The start() and end() return the start and ending positions of the first match.
The span() which returns the start and end positions of the first match as a tuple.

Example:

import re

pattern = r"dev"
match = re.search(pattern, "webdeveloper")

if match:
	print(match.group())
	print(match.start())
	print(match.end())
	print(match.span())

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
dev
3
6
(3, 6)
ddn_ro@linux:~/Desktop$

One of the most important re methods that use regular expressions is sub().

#Syntax:
re.sub(pattern, repl, string, max=0)

This method replaces all occurrences of the pattern in string with repl, substituting all occurrences, unless max provided. This method returns the modified string.

Example:

import re

str = "Dan is a web developer."
pattern = r"web developer"
newstr = re.sub(pattern, "teacher", str)
print(newstr)

The terminal output will be:

ddn_ro@linux:~/Desktop$ python3 file.py
Dan is a teacher.
ddn_ro@linux:~/Desktop$

Leave a Reply