Git workflow
Git, a distributed version control system, has become t...
Regular Expressions, or Regex, are a powerful tool for text processing and pattern matching. They are widely used for search, validation, text manipulation, and data extraction tasks across different programming languages, making them an essential skill for developers and data professionals alike.
We will go over Regex from the basics to more advanced concepts, using clear examples to help you apply what you learn.
At its core, Regex is a sequence of characters that forms a search pattern. This pattern can be used to match sequences of characters within a larger text. For instance, if you want to find all instances of dates in a document, Regex can help you locate any date formats in one go.
Regex can match specific characters exactly. For example:
/cat/
matches the word "cat" in any text.Meta characters have special meanings in Regex.
Here are some of the most common ones:
.
(dot): Matches any single character except a newline.^
: Matches the start of a line.$
: Matches the end of a line.\
(backslash): Escapes a metacharacter, treating it as a literal character. For instance, \.
matches a literal dot instead of any character.Character classes allow you to match specific sets of characters. Some common ones include:
[abc]
: Matches any one of a
, b
, or c
.[^abc]
: Matches any character that is not a
, b
, or c
.[a-z]
: Matches any lowercase letter from a
to z
.[0-9]
: Matches any digit.\d
: Matches any digit (equivalent to [0-9]
).\D
: Matches any non-digit.\w
: Matches any word character (alphanumeric + underscore).\W
: Matches any non-word character.\s
: Matches any whitespace character (spaces, tabs, etc.).\S
: Matches any non-whitespace character.Quantifiers specify how many times a character or group should be matched.
*
: Matches 0 or more occurrences.+
: Matches 1 or more occurrences.?
: Matches 0 or 1 occurrence.{n}
: Matches exactly n
occurrences.{n,}
: Matches n
or more occurrences.{n,m}
: Matches between n
and m
occurrences.Example:
The pattern a{2,4}
will match aa
, aaa
, or aaaa
.
Groups are created with parentheses and allow parts of your pattern to be treated as a unit.
(abc)
: Matches abc
as a group.(?:abc)
: Matches abc
but does not capture it (non-capturing group).Back references:
Captured groups can be reused later in the expression using back references.
(\w+)\s\1
matches a repeated word, like "hello hello."Anchors and boundaries restrict the position of the match.
^
: Start of a line.$
: End of a line.\b
: Word boundary (e.g., \bword\b
matches "word" as a whole word, not within "sword").\B
: Non-word boundary.Here are some real-world patterns that show how Regex is used to match typical formats:
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
/\(\d{3}\) \d{3}-\d{4}/
(matches format (123) 456-7890
)/https?:\/\/(www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
\(\d{3}\) \d{3}-\d{4}
\d{4}-\d{2}-\d{2}
Creating a small Regex testing tool in your favorite programming language (like Python) can help you practice. Here’s a quick snippet in Python:
import re
# Text to search
text = "Contact us at support@example.com or sales@example.com."
# Regex pattern to find emails
pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
# Find all matches
emails = re.findall(pattern, text)
print("Emails found:", emails)
This code will output:
Emails found: ['support@example.com', 'sales@example.com']
Regex is an incredibly versatile tool that can save hours in text processing tasks. With practice, you’ll be able to craft patterns that quickly filter, validate, and extract data from text. Dive into these basics, experiment with patterns, and explore advanced features like lookahead/lookbehind assertions and conditionals as you grow more confident.
Happy Regexing!
Commit message is a description of what the contributor...
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |