Python Regular Expressions | Email Extraction

The following program extracts email addresses from a string.

We have the text “Contact me at info@saigon.ro for more information”
The goal is to extract the substring “info@saigon.ro”.

A basic email address consists of a word and may include dots or dashes. This is followed by the @ sign and the domain name (the name, a dot, and the domain name suffix).

This is the basis for building our regular expression. pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)".
The [\w\.-]+ matches one or more word character, dot or dash.

The regex above says that the string should contain a word (with dots and dashes allowed), followed by the @ sign, then another similar word, then a dot and another word.

Our regex contains three groups:

  • first part of the email address
  • domain name without the suffix
  • the domain suffix

So, we are going to have:

import re

pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)"
str = "Contact me at info@saigon.ro for more information"

match = re.search(pattern, str)
if match:
	print(match.group())

The terminal output will be:

ddn_ro@linux:~/Desktop$ python file.py
info@saigon.ro
ddn_ro@linux:~/Desktop$

In case the string contains multiple email addresses, we could use the re.findall method instead of re.search, to extract all email addresses.

Note:
The regex in this example is for demonstration purposes only.
A much more complex regex is required to fully validate an email address.

Leave a Reply