Skip to content

Regex Syntax & Patterns

Interactive Regex Tester

Test regular expressions in real time with live match highlighting, group capture display, and pattern explanation.

Regex Tester

Test regular expressions with real-time matching, highlighting, and explanations

Presets
Flags:
//g
Result (9 matches)
The quick Brown Fox jumps over the Lazy Dog. Alice and Bob met at the Park on a Sunny day.
Match Details
#MatchIndex
1The0
2Brown10
3Fox16
4Lazy35
5Dog40
6Alice45
7Bob55
8Park70
9Sunny80
Pattern Explanation
Hide
\b[A-Z][a-z]+\b
\bword boundary
[A-Z]character class: [A-Z]
[a-z]character class: [a-z]
+one or more (greedy)
\bword boundary

This page is a comprehensive reference for regular expression syntax. We cover every major construct with clear examples and practical patterns you will use daily.


Character Classes

Character classes match a single character from a defined set.

Predefined Classes

PatternMatchesEquivalent
.Any character except newline[^\n] (with s flag: any char)
\dAny digit[0-9]
\DAny non-digit[^0-9]
\wWord character[a-zA-Z0-9_]
\WNon-word character[^a-zA-Z0-9_]
\sWhitespace[ \t\n\r\f\v]
\SNon-whitespace[^ \t\n\r\f\v]

Custom Character Classes

[abc] Match a, b, or c
[a-z] Match any lowercase letter
[A-Z] Match any uppercase letter
[0-9] Match any digit
[a-zA-Z] Match any letter
[a-zA-Z0-9] Match any alphanumeric character
[^abc] Match anything EXCEPT a, b, or c (negation)
[^0-9] Match any non-digit
Special characters inside classes:
[.] Literal dot (most special chars are literal inside [])
[\-] Literal hyphen (or put it first/last: [-abc] or [abc-])
[\]] Literal closing bracket (must be escaped)
[\\] Literal backslash
import re
# Match hex color codes
hex_pattern = r'#[0-9a-fA-F]{6}\b'
text = "Colors: #FF5733, #00cc99, #invalid, #abc"
matches = re.findall(hex_pattern, text)
# ['#FF5733', '#00cc99']
# Match vowels only
vowels = re.findall(r'[aeiouAEIOU]', "Hello World")
# ['e', 'o', 'o']
# Match non-ASCII characters
non_ascii = re.findall(r'[^\x00-\x7F]+', "Hello café résumé")
# ['é', 'é']

Quantifiers

Quantifiers specify how many times a pattern should repeat.

QuantifierMeaningExampleMatches
*0 or morea*"", "a", "aaa"
+1 or morea+"a", "aaa" (not "")
?0 or 1colou?r"color", "colour"
{n}Exactly n\d{4}"2025"
{n,}n or more\d{2,}"42", "123", "9999"
{n,m}Between n and m\d{2,4}"42", "123", "9999"

Greedy vs Lazy Quantifiers

By default, quantifiers are greedy — they match as much as possible. Adding ? makes them lazy — they match as little as possible.

Text: <div>hello</div><div>world</div>
Greedy: <.*> matches "<div>hello</div><div>world</div>" (entire string)
Lazy: <.*?> matches "<div>" (first tag only)
GreedyLazyMeaning
**?0 or more (as few as possible)
++?1 or more (as few as possible)
???0 or 1 (prefer 0)
{n,m}{n,m}?Between n and m (as few as possible)
import re
html = '<b>bold</b> and <i>italic</i>'
# Greedy: matches from first < to LAST >
greedy = re.findall(r'<.+>', html)
# ['<b>bold</b> and <i>italic</i>']
# Lazy: matches from each < to NEXT >
lazy = re.findall(r'<.+?>', html)
# ['<b>', '</b>', '<i>', '</i>']
# Even better: use negated class (no backtracking needed)
negated = re.findall(r'<[^>]+>', html)
# ['<b>', '</b>', '<i>', '</i>']

Anchors

Anchors match positions in the string, not characters.

AnchorMatches
^Start of string (or start of line with m flag)
$End of string (or end of line with m flag)
\bWord boundary (between \w and \W)
\BNon-word boundary
Text: "cat concatenate education"
\bcat\b → "cat" (whole word only)
\bcat → "cat", "concatenate" (word starts with "cat")
cat\b → "cat" (word ends with "cat")
cat → "cat", "cat", "cat" (substring anywhere — 3 matches)
import re
text = "cat concatenate education"
# Word boundary - exact word
re.findall(r'\bcat\b', text)
# ['cat']
# Starts with "cat"
re.findall(r'\bcat\w*', text)
# ['cat', 'concatenate']
# Multiline anchors
multiline = "first line\nsecond line\nthird line"
re.findall(r'^\w+', multiline, re.MULTILINE)
# ['first', 'second', 'third']
re.findall(r'\w+$', multiline, re.MULTILINE)
# ['line', 'line', 'line']

Groups and Captures

Parentheses create groups that can capture matched text for later use.

Capturing Groups

Pattern: (\d{4})-(\d{2})-(\d{2})
Text: 2025-01-15
Group 0 (full match): "2025-01-15"
Group 1: "2025"
Group 2: "01"
Group 3: "15"

Non-Capturing Groups

Use (?:...) when you need grouping for alternation or quantification but do not need to capture:

Capturing: (https?|ftp):// → group 1 = "https" or "http" or "ftp"
Non-capturing: (?:https?|ftp):// → no capture, just grouping

Named Groups

Named groups make patterns more readable and maintainable:

import re
log_line = '2025-01-15 14:30:45 ERROR Database connection failed'
# Named groups
pattern = r'(?P<date>\d{4}-\d{2}-\d{2})\s+(?P<time>\d{2}:\d{2}:\d{2})\s+(?P<level>\w+)\s+(?P<message>.+)'
match = re.match(pattern, log_line)
if match:
print(match.group('date')) # 2025-01-15
print(match.group('level')) # ERROR
print(match.group('message')) # Database connection failed
print(match.groupdict())
# {'date': '2025-01-15', 'time': '14:30:45',
# 'level': 'ERROR', 'message': 'Database connection failed'}

Alternation

The pipe | acts as an OR operator:

cat|dog → "cat" or "dog"
(cat|dog) food → "cat food" or "dog food"
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)day → any weekday name
# Alternation is greedy-left: tries alternatives left to right
# The FIRST match wins (important in NFA engines)

Backreferences

Backreferences match the same text that a previous capturing group matched:

Pattern: (\w+)\s+\1
Text: "the the quick brown fox fox"
Matches: "the the" and "fox fox" (duplicate words)
\1 refers to whatever group 1 captured
import re
# Find duplicate words
text = "the the quick brown fox fox jumped"
dupes = re.findall(r'\b(\w+)\s+\1\b', text)
# ['the', 'fox']
# Remove duplicate words
cleaned = re.sub(r'\b(\w+)(\s+\1\b)+', r'\1', text)
# 'the quick brown fox jumped'
# Match repeated HTML tags
html = '<div>content</div>'
match = re.match(r'<(\w+)>.*?</\1>', html)
# Matches because opening and closing tags match

Lookahead and Lookbehind

Lookaround assertions check what comes before or after the current position without consuming characters.

Lookahead

SyntaxNameMatches
(?=...)Positive lookaheadPosition followed by pattern
(?!...)Negative lookaheadPosition NOT followed by pattern

Lookbehind

SyntaxNameMatches
(?<=...)Positive lookbehindPosition preceded by pattern
(?<!...)Negative lookbehindPosition NOT preceded by pattern
Text: "$100 €200 £300"
\d+(?=\s*€) → "200" (digits followed by €)
(?<=\$)\d+ → "100" (digits preceded by $)
\d+(?!.*€) → "300" (digits NOT followed by €)
(?<!\$)\d+ → "200" (digits NOT preceded by $)
Note: "300" also matches

Practical Lookaround Examples

import re
# Password validation using lookahead
# Must have: 8+ chars, uppercase, lowercase, digit, special char
password_pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$'
passwords = ['Str0ng!Pass', 'weak', '12345678', 'NoSpecial1']
for pw in passwords:
valid = bool(re.match(password_pattern, pw))
print(f'{pw}: {"valid" if valid else "invalid"}')
# Str0ng!Pass: valid
# weak: invalid
# 12345678: invalid
# NoSpecial1: invalid
# Add commas to numbers: 1234567 → 1,234,567
def add_commas(n):
return re.sub(r'(?<=\d)(?=(\d{3})+(?!\d))', ',', str(n))
print(add_commas(1234567)) # 1,234,567
print(add_commas(1000000000)) # 1,000,000,000
# Extract domain from URL (lookbehind)
urls = ['https://example.com/path', 'http://api.test.org/v1']
for url in urls:
domain = re.search(r'(?<=://)[^/]+', url)
print(domain.group())
# example.com
# api.test.org

Flags (Modifiers)

FlagNameEffect
gGlobalMatch all occurrences (not just first)
iCase-insensitiveA matches a
mMultiline^ and $ match line boundaries
sDotAll (Single-line). matches newline characters
uUnicodeEnable full Unicode matching
xVerbose/ExtendedAllow comments and whitespace (Python, Ruby)
dhasIndicesProvide match index information (JS)
yStickyMatch only from lastIndex position (JS)
import re
# Verbose mode — allows comments and whitespace
phone_pattern = re.compile(r"""
^ # Start of string
(?:\+1[-.\s]?)? # Optional country code
\(? # Optional opening paren
(\d{3}) # Area code (captured)
\)? # Optional closing paren
[-.\s]? # Optional separator
(\d{3}) # Exchange (captured)
[-.\s]? # Optional separator
(\d{4}) # Subscriber number (captured)
$ # End of string
""", re.VERBOSE)
numbers = ["+1 (555) 123-4567", "555.123.4567", "5551234567"]
for num in numbers:
match = phone_pattern.match(num)
if match:
print(f"Area: {match.group(1)}, Exchange: {match.group(2)}, "
f"Number: {match.group(3)}")
# Combine flags
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)

Common Patterns

Here are battle-tested regex patterns for common validation and extraction tasks:

Email (Simplified)

[\w.+-]+@[\w-]+\.[\w.]+

URL

# Python
url_pattern = r'https?://(?:[\w-]+\.)+[\w]{2,}(?:/\S*)?'

IP Address (IPv4)

# Strict IPv4 validation
ipv4 = r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$'

Date Formats

import re
# ISO 8601: YYYY-MM-DD
iso_date = r'\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])'
# US format: MM/DD/YYYY
us_date = r'(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}'
# European: DD.MM.YYYY
eu_date = r'(?:0[1-9]|[12]\d|3[01])\.(?:0[1-9]|1[0-2])\.\d{4}'

Phone Numbers

# US phone numbers (various formats)
phone = r'(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

HTML Tag

# Match opening/closing tags (not for parsing HTML!)
tag = r'</?(\w+)(?:\s+[\w-]+(?:=(?:"[^"]*"|\'[^\']*\'|[\w-]+))?)*\s*/?>'

Log Entry

# Apache log format
apache_log = r'(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+|-)'

Pattern Testing and Debugging

When building complex patterns, use these strategies:

  1. Start simple — Build your regex incrementally, testing each part
  2. Use a regex tester — Tools like regex101.com explain each component
  3. Test edge cases — Empty strings, special characters, very long input
  4. Check for catastrophic backtracking — Nested quantifiers can cause exponential time
  5. Add comments — Use verbose mode (x flag) for complex patterns
# Incremental development example:
# Goal: Parse "2025-01-15T14:30:00Z"
# Step 1: Match digits
r'\d+'
# Step 2: Match date
r'\d{4}-\d{2}-\d{2}'
# Step 3: Add time
r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
# Step 4: Add timezone
r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[Z+-]\d{0,4}'
# Step 5: Add captures
r'(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})([Z+-][\d:]*)'