RegEX Cheat Sheet
Every pattern, anchor, quantifier, group, and flag. Searchable, copy-ready, with live match testing.
Every pattern, anchor, quantifier, group, and flag. Searchable, copy-ready, with live match testing.
Most developers have copy-pasted a regex pattern at least once without fully understanding what it does. That works, until it doesn't.
Regular expressions are one of those tools that seem cryptic at first but make a lot of sense once the syntax clicks. Pattern matching, input validation, string manipulation, find and replace across files - regex handles all of it.
This regex cheat sheet covers the core syntax you actually need: character classes, quantifiers, anchors, capture groups, lookaheads, and common patterns across Python, JavaScript, and other languages.
Bookmark it. You'll come back to it.
A regular expression (RegEx) is a sequence of characters that defines a search pattern used to match, find, or manipulate text.
It works across almost every programming language: Python, JavaScript, PHP, Java, Ruby, and more.
Not a formatting tool. Not a beautifier. A precision instrument for string manipulation, input validation, text parsing, and pattern matching.
RegEx syntax is built from a small set of rules. Learn these, and every pattern starts to make sense.
These characters have reserved meaning in any regular expression engine:
|
Character |
Meaning |
|---|---|
|
|
Matches any single character except newline |
|
|
Start of string (or line in multiline mode) |
|
|
End of string (or line in multiline mode) |
|
|
Zero or more repetitions |
|
|
One or more repetitions |
|
|
Zero or one repetition (also makes quantifiers lazy) |
|
|
Escape character |
|
` |
` |
|
|
Capturing group |
|
|
Character class |
|
|
Quantifier range |
Any character not listed above matches itself literally.
cat matches the exact string "cat" - case-sensitive by default.
Use \ before a special character to match it literally.
|
Sequence |
Matches |
|---|---|
|
|
A literal dot |
|
|
A literal asterisk |
|
|
A literal opening parenthesis |
|
|
A literal backslash |
|
|
Newline |
|
|
Tab |
|
|
Carriage return |
Character classes let you match any one character from a defined set.
|
Class |
Description |
Matches |
|---|---|---|
|
|
Digit |
|
|
|
Non-digit |
Anything except |
|
|
Word character |
|
|
|
Non-word character |
Anything |
|
|
Whitespace |
Space, tab, newline |
|
|
Non-whitespace |
Anything |
|
|
Any character |
Except newline (by default) |
Square brackets define a custom set. [aeiou] matches any single vowel.
Ranges work too: [a-z] matches any lowercase letter, [0-9] matches any digit, [a-zA-Z0-9] matches any alphanumeric character.
[aeiou] → matches a, e, i, o, or u
[a-z] → matches any lowercase letter
[A-Za-z0-9] → matches any alphanumeric character
[.,!?] → matches any of these punctuation marks
Add ^ inside brackets to match anything NOT in the set.
[^aeiou] → matches any character that is NOT a vowel
[^0-9] → matches any non-digit character
[^\s] → matches any non-whitespace character
Quantifiers control how many times a pattern repeats.
Greedy quantifiers match as much as possible. They're the default behavior.
|
Quantifier |
Meaning |
Example |
|---|---|---|
|
|
0 or more |
|
|
|
1 or more |
|
|
|
0 or 1 |
|
|
|
Exactly n times |
|
|
|
n or more times |
|
|
|
Between n and m times |
|
Add ? after any greedy quantifier to make it lazy - matches as little as possible.
|
Quantifier |
Meaning |
|---|---|
|
|
0 or more (lazy) |
|
|
1 or more (lazy) |
|
|
0 or 1 (lazy) |
|
|
Between n and m (lazy) |
Input: <b>bold</b>
Greedy: <.+> → matches <b>bold</b> (entire string)
Lazy: <.+?> → matches <b> (stops at first >)
Supported in Java, PCRE, and a few other engines. Not available in JavaScript.
They match greedily and never give back characters, which can prevent backtracking.
|
Quantifier |
Meaning |
|---|---|
|
|
0 or more (possessive) |
|
|
1 or more (possessive) |
|
|
0 or 1 (possessive) |
Anchors don't match characters. They match positions in a string.
|
Anchor |
Matches |
|---|---|
|
|
Start of string (or line with |
|
|
End of string (or line with |
|
|
Start of string (ignores |
|
|
End of string (ignores |
^Hello → matches "Hello" only at the start
world$ → matches "world" only at the end
^\d{3}-\d{4}$ → matches exactly "123-4567" and nothing else
Word boundaries match the position between a word character (\w) and a non-word character (\W).
|
Boundary |
Matches |
|---|---|
|
|
Word boundary |
|
|
Non-word boundary |
\bcat\b → matches "cat" in "the cat sat" but NOT in "category"
\Bcat\B → matches "cat" inside "concatenate" but NOT as a standalone word
By default, ^ and $ match the start and end of the entire string.
Enable multiline mode (m flag) and they match the start and end of each individual line instead.
Without m flag: ^cat$ → only matches if the entire string is "cat"
With m flag: ^cat$ → matches "cat" on any line within a multiline string
Groups let you isolate, reuse, and reference parts of a matched pattern.
Wrap any pattern in () to capture it. The engine stores the match so you can reference it later.
(\d{4})-(\d{2})-(\d{2})
→ Matches "2024-01-15", captures year, month, day in groups 1, 2, 3
(?:...) groups without storing the match. Use when you need grouping for structure but don't need the captured value.
(?:https?|ftp):// → groups "https", "http", or "ftp" without capturing
Assign a name instead of a number. Cleaner to reference, especially in long patterns.
(?P<year>\d{4})-(?P<month>\d{2}) # Python
(?<year>\d{4})-(?<month>\d{2}) # JavaScript, .NET, Java
Access via match.group('year') in Python or match.groups.year in JavaScript.
Reference a previously captured group inside the same pattern using \1, \2, etc.
(\w+)\s\1 → matches repeated words like "the the" or "go go"
Lookarounds check what surrounds a match without including that context in the result.
Zero-width assertions. They check position, not content.
(?=...) matches if what follows fits the pattern.
\d+(?= dollars) → matches "100" in "100 dollars" but not in "100 euros"
(?!...) matches if what follows does NOT fit the pattern.
\d+(?! dollars) → matches "100" in "100 euros" but skips "100 dollars"
(?<=...) matches if what precedes fits the pattern.
(?<=\$)\d+ → matches "99" in "$99" but not in "99 USD"
(?<!...) matches if what precedes does NOT fit the pattern.
(?<!\$)\d+ → matches "99" in "99 USD" but skips "$99"
Note: JavaScript (pre-ES2018) does not support lookbehind. Most other major engines do.
Flags change how the entire pattern behaves. Placed after the closing delimiter or passed as a second argument.
|
Flag |
Name |
Effect |
Example |
|---|---|---|---|
|
|
Case insensitive |
|
|
|
|
Global |
Find all matches, not just the first |
|
|
|
Multiline |
|
|
|
|
Dotall |
|
|
|
|
Extended |
Allows whitespace and comments in patterns |
Supported in Python, PHP, Ruby |
|
|
Unicode |
Enables full Unicode support |
|
/hello/gi → matches "hello", "Hello", "HELLO" anywhere in the string
The core syntax is consistent. Delimiters, flags, and a few edge cases vary by language.
Uses RegExp object or literal /pattern/flags syntax. Lookbehind requires ES2018+.
const pattern = /^\d{4}-\d{2}-\d{2}$/;
pattern.test("2024-01-15"); // true
const matches = "one 1, two 2".match(/\d+/g); // ["1", "2"]
Key methods: .test(), .match(), .replace(), .split(), .exec()
Uses the built-in re module. Supports named groups, lookbehind, and verbose mode (re.X).
import re
pattern = re.compile(r"(\d{4})-(\d{2})-(\d{2})")
match = pattern.search("Date: 2024-01-15")
print(match.group(1)) # "2024"
Key functions: re.search(), re.match(), re.findall(), re.sub(), re.compile()
Uses PCRE via preg_* functions. Patterns wrapped in delimiters (usually /).
preg_match('/(\d{4})-(\d{2})/', '2024-01', $matches);
echo $matches[1]; // "2024"
preg_replace('/\s+/', '-', 'hello world'); // "hello-world"
Key functions: preg_match(), preg_match_all(), preg_replace(), preg_split()
Uses java.util.regex package. Patterns compiled from strings - backslashes need escaping (\\d not \d).
import java.util.regex.*;
Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("2024-01-15");
if (m.find()) {
System.out.println(m.group(1)); // "2024"
}
Key classes: Pattern, Matcher
Uses the Regexp class or literal /pattern/ syntax. Named groups work cleanly with symbol access.
str = "2024-01-15"
match = str.match(/(?<year>\d{4})-(?<month>\d{2})/)
puts match[:year] # "2024"
Key methods: .match(), .scan(), .gsub(), .split()
Patterns you'll actually use. Copy, test, adjust for your input.
|
Use Case |
Pattern |
Notes |
|---|---|---|
|
|
|
Basic validation; RFC 5322 is far more complex |
|
URL |
|
Simplified - use a library for production |
|
Phone (US) |
|
Handles common US formats |
|
IP Address (IPv4) |
|
Does not validate the 0–255 range |
|
Date (YYYY-MM-DD) |
|
Validates month and day ranges |
|
HTML tag |
|
Matches any tag - not for parsing full HTML |
|
Whitespace (trim) |
|
Matches leading and trailing whitespace |
|
Password |
|
8+ chars, uppercase, digit, special char |
|
Hex color |
|
Matches 3 or 6 digit hex codes |
|
Username |
|
3–16 alphanumeric characters or underscores |
| acts as a logical OR between two expressions.
cat|dog → matches "cat" or "dog"
jpg|jpeg|png → matches any of these file extensions
Alternation has the lowest precedence in a regular expression.
cat|dog food matches "cat" OR "dog food", not "cat food" OR "dog food". Wrap in a group to control scope: (cat|dog) food.
Combine alternation with groups to build clean branching patterns.
^(Mr|Mrs|Ms|Dr)\. [A-Z][a-z]+$
→ matches "Dr. Smith", "Ms. Jones"
These tools give real-time match highlighting, group inspection, and engine selection.
Regex101 - best overall; supports PCRE, JavaScript, Python, Go; explains each token inline
Regexr - clean interface, good for beginners learning pattern matching
Debuggex - renders a visual NFA/DFA diagram of your pattern
Test against both matching and non-matching strings
Break complex patterns into smaller pieces, then build up
Use verbose mode (x flag) to add whitespace and inline comments
Check for catastrophic backtracking before running on large inputs
Regex101's step-through debugger shows exactly how the engine processes each character
Backtracking happens when the engine tries a path, fails, then backs up and retries.
Minor backtracking is normal. Catastrophic backtracking is a bug - it can freeze an application on specific inputs.
Nested quantifiers on overlapping patterns. (a+)+b on a string like "aaaaaa" with no b at the end triggers an exponential number of retries.
Other common causes:
(.+)* or (.*)+ - redundant quantifier nesting
Alternation with shared prefixes: (cat|catch) - use (cat(?:ch)?) instead
Overly broad .+ patterns mid-string with nothing to anchor the match
Use possessive quantifiers (*+, ++) in engines that support them - Java, PCRE
Use atomic groups (?>...) in PCRE/Java to prevent backtracking into a completed group
Prefer specific character classes over . wherever possible
Test adversarial inputs on Regex101's debugger before shipping to a production environment
During the code review process, flag any pattern with nested quantifiers for manual testing
RegEx is used for pattern matching, input validation, text parsing, and string manipulation.
Common applications: validating email addresses, extracting URLs, finding duplicates, parsing log files, and running find-and-replace across large codebases.
.* mean in RegEx?. matches any single character except a newline.
* is a greedy quantifier meaning zero or more repetitions. Together, .* matches any sequence of characters on a single line - as much as possible.
Greedy quantifiers match as much text as possible. Lazy quantifiers (add ?) match as little as possible.
<.+> grabs everything between the first < and the last >. <.+?> stops at the first > it finds.
^ mean in a RegEx pattern?Outside a character class, ^ is a start anchor - it matches the position at the beginning of a string.
Inside brackets like [^abc], it negates the set, matching any character that is NOT a, b, or c.
Flags modify how the regex engine processes a pattern.
Common ones: i (case-insensitive), g (global - find all matches), m (multiline - ^ and $ match per line), s (dotall - . matches newlines too).
Parentheses () create a capturing group, storing the matched text for later use via backreferences or in replacement strings.
Non-capturing groups (?:...) group without storing. Named groups (?<name>...) let you reference matches by name instead of index.
\b matches the position between a word character (\w) and a non-word character (\W).
\bcat\b matches "cat" as a standalone word but skips "category" or "concatenate". Useful for precise string search without partial matches.
Yes. Core syntax is consistent, but engine-specific features vary.
JavaScript lacks lookbehind in older versions and has no possessive quantifiers. Python's re module supports named groups. PCRE (used in PHP) is the most feature-complete regex flavor available.
It happens when a pattern with nested quantifiers - like (a+)+ - tries an exponential number of combinations on a non-matching string.
The regex engine keeps retrying, causing serious performance slowdowns. Fix it by rewriting ambiguous patterns or using possessive quantifiers where supported.
Use an online regex tester like Regex101, Regexr, or Debuggex.
These tools show real-time match highlighting, break down each part of your pattern, and let you switch between regex flavors like PCRE, JavaScript, and Python.