
Amr Tarek
Software Engineer


Mastering Regular Expressions (Regex) from Scratch - A Comprehensive Guide
Regular Expressions, or Regex, are a powerful tool for text processing and pattern matching. They are widely used for search, validation, text manipulation, and data extraction tasks across different programming languages, making them an essential skill for developers and data professionals alike.
We will go over Regex from the basics to more advanced concepts, using clear examples to help you apply what you learn.
At its core, Regex is a sequence of characters that forms a search pattern. This pattern can be used to match sequences of characters within a larger text. For instance, if you want to find all instances of dates in a document, Regex can help you locate any date formats in one go.
Regex Syntax
Literal Characters:
Regex can match specific characters exactly. For example:
/cat/
matches the word "cat" in any text.
Meta Characters:
Meta characters have special meanings in Regex. Here are some of the most common ones:
.
(dot): Matches any single character except a newline.^
: Matches the start of a line.$
: Matches the end of a line.\
(backslash): Escapes a metacharacter, treating it as a literal character. For instance,\.
matches a literal dot instead of any character.
Character Classes
Character classes allow you to match specific sets of characters. Some common ones include:
[abc]
: Matches any one ofa
,b
, orc
.[^abc]
: Matches any character that is nota
,b
, orc
.[a-z]
: Matches any lowercase letter froma
toz
.[0-9]
: Matches any digit.
Predefined Character Classes:
\d
: Matches any digit (equivalent to[0-9]
).\D
: Matches any non-digit.\w
: Matches any word character (alphanumeric + underscore).\W
: Matches any non-word character.\s
: Matches any whitespace character (spaces, tabs, etc.).\S
: Matches any non-whitespace character.
Quantifiers
Quantifiers specify how many times a character or group should be matched.
*
: Matches 0 or more occurrences.+
: Matches 1 or more occurrences.?
: Matches 0 or 1 occurrence.{n}
: Matches exactlyn
occurrences.{n,}
: Matchesn
or more occurrences.{n,m}
: Matches betweenn
andm
occurrences.
Example:
The pattern a{2,4}
will match aa
, aaa
, or aaaa
.
Grouping and Capturing
Groups are created with parentheses and allow parts of your pattern to be treated as a unit.
(abc)
: Matchesabc
as a group.(?:abc)
: Matchesabc
but does not capture it (non-capturing group).
Back references:
Captured groups can be reused later in the expression using back references.
- Example:
(\w+)\s\1
matches a repeated word, like "hello hello."
Anchors and Boundaries
Anchors and boundaries restrict the position of the match.
^
: Start of a line.$
: End of a line.\b
: Word boundary (e.g.,\bword\b
matches "word" as a whole word, not within "sword").\B
: Non-word boundary.
Common Regex Patterns
Here are some real-world patterns that show how Regex is used to match typical formats:
- Email:
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
- Phone Number:
/\(\d{3}\) \d{3}-\d{4}/
(matches format(123) 456-7890
) - URL:
/https?:\/\/(www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
Practical Examples
- Extracting All Emails from Text
If you have a document with scattered emails, use:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
- Validating a Phone Number
For a standard U.S. phone format, use:
\(\d{3}\) \d{3}-\d{4}
- Finding Dates in the Format YYYY-MM-DD
A pattern to match dates could be:
\d{4}-\d{2}-\d{2}
Putting It All Together: Building a Regex Tool
Creating a small Regex testing tool in your favorite programming language (like Python) can help you practice. Here’s a quick snippet in Python:
import re
# Text to search
text = "Contact us at support@example.com or sales@example.com."
# Regex pattern to find emails
pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
# Find all matches
emails = re.findall(pattern, text)
print("Emails found:", emails)
This code will output:
Emails found: ['support@example.com', 'sales@example.com']
Regex is an incredibly versatile tool that can save hours in text processing tasks. With practice, you’ll be able to craft patterns that quickly filter, validate, and extract data from text. Dive into these basics, experiment with patterns, and explore advanced features like lookahead/lookbehind assertions and conditionals as you grow more confident.
Happy Regexing!