The following program extracts email addresses from a string.
We have the text “Contact me at firstname.lastname@example.org for more information”
The goal is to extract the substring “email@example.com”.
A basic email address consists of a word and may include dots or dashes. This is followed by the @ sign and the domain name (the name, a dot, and the domain name suffix).
This is the basis for building our regular expression.
pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)".
[\w\.-]+ matches one or more word character, dot or dash.
The regex above says that the string should contain a word (with dots and dashes allowed), followed by the @ sign, then another similar word, then a dot and another word.
Our regex contains three groups:
- first part of the email address
- domain name without the suffix
- the domain suffix
So, we are going to have:
import re pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)" str = "Contact me at firstname.lastname@example.org for more information" match = re.search(pattern, str) if match: print(match.group())
The terminal output will be:
ddn_ro@linux:~/Desktop$ python file.py email@example.com ddn_ro@linux:~/Desktop$
In case the string contains multiple email addresses, we could use the
re.findall method instead of
re.search, to extract all email addresses.
The regex in this example is for demonstration purposes only.
A much more complex regex is required to fully validate an email address.