cheat sheet

RegEX Cheat Sheet

Every pattern, anchor, quantifier, group, and flag. Searchable, copy-ready, with live match testing.

/ to focus
engine:
Live Tester
JS engine
Pattern
Flags
Test string
Matches

Anchors

Position Anchors
JSPYPCRE
beginner
Start of string
^
^Hello matches Hello world
End of string
$
world$ matches Hello world
Word boundary
\b
\bcat\b matches cat but not concatenate
Non-word boundary
\B
\Bcat\B matches concatenate
Absolute start of string (multiline-safe)
\A
\AHello matches start of string only
Absolute end of string (multiline-safe)
\Z
\Zworld matches end of string only

Character Classes

Built-in Classes
JSPYPCRE
beginner
Any digit (0 through 9)
\d
\d+ matches 42 in "42 hello"
Any non-digit
\D
\D+ matches non-numeric runs
Word character (a-z, A-Z, 0-9, _)
\w
\w+ matches hello_42
Non-word character
\W
\W matches hello!world
Whitespace (space, tab, newline)
\s
\s+ matches spaces between words
Non-whitespace
\S
\S+ matches each word token
Any character except newline
.
c.t matches cat, cut, c4t
Custom Character Classes
JSPYPCRE
beginner
Match any character in set
[aeiou]
[aeiou] matches vowels in "hello"
Match any character NOT in set
[^aeiou]
[^aeiou] matches consonants and spaces
Character range
[a-z]
[a-z]+ matches lowercase words
Alphanumeric range
[a-zA-Z0-9]
Matches any letter or digit
Literal dot inside class
[.]
Inside [], dot is literal. No escaping needed
POSIX Classes (PCRE / Python)
PYPCRE
intermediate
Any letter (locale-aware)
[[:alpha:]]
Equivalent to [a-zA-Z] in basic locales
Any digit
[[:digit:]]
Equivalent to [0-9]
Alphanumeric
[[:alnum:]]
Equivalent to [a-zA-Z0-9]
Whitespace
[[:space:]]
Includes space, tab, newline, form feed
Uppercase letters
[[:upper:]]
Equivalent to [A-Z] in C locale
Lowercase letters
[[:lower:]]
Equivalent to [a-z] in C locale
Punctuation characters
[[:punct:]]
Matches . , ! ? ; : etc.

Quantifiers

Greedy Quantifiers
JSPYPCRE
beginner
Zero or more (greedy)
*
a* matches "", "a", "aaa"
One or more (greedy)
+
a+ matches "a", "aaa" but not ""
Zero or one (optional)
?
colou?r matches "color" and "colour"
Exactly n times
{n}
\d{4} matches "2024"
At least n times
{n,}
\d{2,} matches "42", "123"
Between n and m times
{n,m}
\d{2,4} matches "42", "123", "2024"
Lazy Quantifiers
JSPYPCRE
intermediate
Zero or more (lazy, shortest match)
*?
<.+?> matches <b> not the whole tag
One or more (lazy)
+?
a+? matches only first "a" in "aaa"
Zero or one (lazy)
??
Prefers the shorter match when possible
Between n and m (lazy)
{n,m}?
Stops as soon as minimum is met

Groups

Capturing & Non-Capturing
JSPYPCRE
beginner
Capturing group
(abc)
(foo) captures "foo" in group 1
Non-capturing group
(?:abc)
(?:foo)+ groups without capturing
Named capturing group
(?<name>abc)
(?<year>\d{4}) -- access via match.groups.year
Alternation (OR)
cat|dog
cat|dog matches "cat" or "dog"
Backreference to group 1
\1
(\w+) \1 matches "hello hello"
Named backreference
\k<name>
(?<w>\w+) \k<w> matches repeated words

Lookaround

Lookahead & Lookbehind
JSPYPCRE
advanced
Positive lookahead: followed by
(?=abc)
\d+(?= dollars) matches "100" in "100 dollars"
Negative lookahead: NOT followed by
(?!abc)
\d+(?! dollars) matches "200" in "200 euros"
Positive lookbehind: preceded by
(?<=abc)
(?<=\$)\d+ matches "100" in "$100"
Negative lookbehind: NOT preceded by
(?<!abc)
(?<!\$)\d+ matches digits not preceded by $
Lookaround Tips
note
Lookarounds are zero-width; they don't consume characters
(?=\d)\w+
Checks next char without moving the cursor
Chain multiple lookaheads for validation
(?=.*\d)(?=.*[A-Z]).{8,}
Validates: 8+ chars, has digit, has uppercase

Flags

Regex Flags / Modifiers
JSPYPCRE
beginner
Case-insensitive matching
i
/hello/i matches "Hello", "HELLO"
Global: find all matches
g
/\d/g finds every digit in the string
Multiline: ^ and $ match line boundaries
m
/^\w/m matches first word of each line
Dot-all: . matches newlines too
s
/a.b/s matches "a\nb"
Unicode mode
u
/\u{1F600}/u matches emoji codepoints
Sticky: match only at lastIndex (JS)
y
Anchors the match to regex.lastIndex position
Verbose: allow whitespace & comments (Python/PCRE)
x
re.compile(r"\d+ # digits", re.VERBOSE)
Combine multiple flags
gi
/hello/gi (global, case-insensitive)

Escapes

Special Characters & Escapes
JSPYPCRE
beginner
Must-escape metacharacters
\ ^ $ . | ? * + ( ) [ ] { }
Escape with \ to match literally: \. \* \+
Tab character
\t
\t matches a tab character
Newline
\n
\n matches a line feed
Carriage return
\r
\r\n matches Windows line endings
Unicode code point
\uXXXX
\u0041 matches "A"
Hex character
\xXX
\x41 matches "A"

Common Patterns

Email & URLs
JSPYPCRE
pattern
Email address (basic)
[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}
URL (http / https)
https?://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*)?
Matches https://example.com/path?q=1
URL slug (kebab-case)
[a-z0-9]+(?:-[a-z0-9]+)*
Matches "my-blog-post", "product-v2"
Domain name
(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}
Dates & Times
JSPYPCRE
pattern
ISO date (YYYY-MM-DD)
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
Matches 2024-01-15
Time (HH:MM or HH:MM:SS)
(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d)?
Matches 14:30, 09:05:22
Numbers & IDs
JSPYPCRE
pattern
Integer (positive or negative)
-?\d+
Matches -42, 0, 1000
Decimal number
-?\d+(?:\.\d+)?
Matches -3.14, 42, 0.5
Hex color code
#(?:[0-9a-fA-F]{3}){1,2}
Matches #fff, #1a2b3c
UUID v4
[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}
Matches 550e8400-e29b-41d4-a716-446655440000
IPv4 address
(?:\d{1,3}\.){3}\d{1,3}
Matches 192.168.1.1
Semantic version (semver)
\d+\.\d+\.\d+(?:-[a-zA-Z0-9.]+)?
Matches 1.0.0, 2.3.1-beta.1
File extension
\.\w+$
Matches .jpg, .min.js, .tar.gz
Password strength: 8+ chars, digit, uppercase
(?=.*\d)(?=.*[A-Z]).{8,}
Validates password complexity
Phone number (international)
\+?[\d\s\-().]{7,20}
Matches +1 (555) 123-4567
Credit card number (basic format)
\b(?:\d{4}[\s-]?){3}\d{4}\b
Matches 4111 1111 1111 1111, 4111-1111-1111-1111
Substitution Syntax
JSPYPCRE
intermediate
Replace using group reference (JS)
"hello world".replace(/(\w+) (\w+)/, "$2 $1")
Returns "world hello"
Replace using named group (JS)
"2024-01-15".replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, "$<d>/$<m>/$<y>")
Returns "15/01/2024"
Replace using group reference (Python)
re.sub(r"(\w+) (\w+)", r"\2 \1", "hello world")
Returns "world hello"
Replace using named group (Python)
re.sub(r"(?P<first>\w+) (?P<last>\w+)", r"\g<last> \g<first>", s)
Returns "Smith John" from "John Smith"
Replace using group reference (PCRE/PHP)
preg_replace('/(\w+) (\w+)/', '$2 $1', $str)
Returns "world hello"

Most developers have copy-pasted a regex pattern at least once without fully understanding what it does. That works, until it doesn't.

Regular expressions are one of those tools that seem cryptic at first but make a lot of sense once the syntax clicks. Pattern matching, input validation, string manipulation, find and replace across files - regex handles all of it.

This regex cheat sheet covers the core syntax you actually need: character classes, quantifiers, anchors, capture groups, lookaheads, and common patterns across Python, JavaScript, and other languages.

Bookmark it. You'll come back to it.

What is RegEx

A regular expression (RegEx) is a sequence of characters that defines a search pattern used to match, find, or manipulate text.

It works across almost every programming language: Python, JavaScript, PHP, Java, Ruby, and more.

Not a formatting tool. Not a beautifier. A precision instrument for string manipulation, input validation, text parsing, and pattern matching.


RegEx Syntax

RegEx syntax is built from a small set of rules. Learn these, and every pattern starts to make sense.

Special Characters

These characters have reserved meaning in any regular expression engine:

Character

Meaning

.

Matches any single character except newline

^

Start of string (or line in multiline mode)

$

End of string (or line in multiline mode)

*

Zero or more repetitions

+

One or more repetitions

?

Zero or one repetition (also makes quantifiers lazy)

\

Escape character

`

`

()

Capturing group

[]

Character class

{}

Quantifier range

Literal Characters

Any character not listed above matches itself literally.

cat matches the exact string "cat" - case-sensitive by default.

Escape Sequences

Use \ before a special character to match it literally.

Sequence

Matches

\.

A literal dot

\*

A literal asterisk

\(

A literal opening parenthesis

\\

A literal backslash

\n

Newline

\t

Tab

\r

Carriage return


RegEx Character Classes

Character classes let you match any one character from a defined set.

Predefined Character Classes

Class

Description

Matches

\d

Digit

0-9

\D

Non-digit

Anything except 0-9

\w

Word character

a-z, A-Z, 0-9, _

\W

Non-word character

Anything \w won't match

\s

Whitespace

Space, tab, newline

\S

Non-whitespace

Anything \s won't match

.

Any character

Except newline (by default)

Custom Character Classes

Square brackets define a custom set. [aeiou] matches any single vowel.

Ranges work too: [a-z] matches any lowercase letter, [0-9] matches any digit, [a-zA-Z0-9] matches any alphanumeric character.

[aeiou]       → matches a, e, i, o, or u
[a-z]         → matches any lowercase letter
[A-Za-z0-9]   → matches any alphanumeric character
[.,!?]        → matches any of these punctuation marks

Negated Character Classes

Add ^ inside brackets to match anything NOT in the set.

[^aeiou]    → matches any character that is NOT a vowel
[^0-9]      → matches any non-digit character
[^\s]       → matches any non-whitespace character

RegEx Quantifiers

Quantifiers control how many times a pattern repeats.

Greedy Quantifiers

Greedy quantifiers match as much as possible. They're the default behavior.

Quantifier

Meaning

Example

*

0 or more

a* matches "", "a", "aaa"

+

1 or more

a+ matches "a", "aaa" but not ""

?

0 or 1

colou?r matches "color" and "colour"

{n}

Exactly n times

\d{4} matches exactly 4 digits

{n,}

n or more times

\d{2,} matches 2 or more digits

{n,m}

Between n and m times

\d{2,4} matches 2, 3, or 4 digits

Lazy Quantifiers

Add ? after any greedy quantifier to make it lazy - matches as little as possible.

Quantifier

Meaning

*?

0 or more (lazy)

+?

1 or more (lazy)

??

0 or 1 (lazy)

{n,m}?

Between n and m (lazy)

Input: <b>bold</b>
Greedy:  <.+>   → matches <b>bold</b> (entire string)
Lazy:    <.+?>  → matches <b> (stops at first >)

Possessive Quantifiers

Supported in Java, PCRE, and a few other engines. Not available in JavaScript.

They match greedily and never give back characters, which can prevent backtracking.

Quantifier

Meaning

*+

0 or more (possessive)

++

1 or more (possessive)

?+

0 or 1 (possessive)


RegEx Anchors and Boundaries

Anchors don't match characters. They match positions in a string.

Start and End Anchors

Anchor

Matches

^

Start of string (or line with m flag)

$

End of string (or line with m flag)

\A

Start of string (ignores m flag)

\Z

End of string (ignores m flag)

^Hello        → matches "Hello" only at the start
world$        → matches "world" only at the end
^\d{3}-\d{4}$ → matches exactly "123-4567" and nothing else

Word Boundaries

Word boundaries match the position between a word character (\w) and a non-word character (\W).

Boundary

Matches

\b

Word boundary

\B

Non-word boundary

\bcat\b    → matches "cat" in "the cat sat" but NOT in "category"
\Bcat\B    → matches "cat" inside "concatenate" but NOT as a standalone word

Line vs String Anchors

By default, ^ and $ match the start and end of the entire string.

Enable multiline mode (m flag) and they match the start and end of each individual line instead.

Without m flag:   ^cat$  → only matches if the entire string is "cat"
With m flag:      ^cat$  → matches "cat" on any line within a multiline string

RegEx Groups and Capturing

Groups let you isolate, reuse, and reference parts of a matched pattern.

Capturing Groups

Wrap any pattern in () to capture it. The engine stores the match so you can reference it later.

(\d{4})-(\d{2})-(\d{2})
→ Matches "2024-01-15", captures year, month, day in groups 1, 2, 3

Non-Capturing Groups

(?:...) groups without storing the match. Use when you need grouping for structure but don't need the captured value.

(?:https?|ftp)://   → groups "https", "http", or "ftp" without capturing

Named Capturing Groups

Assign a name instead of a number. Cleaner to reference, especially in long patterns.

(?P<year>\d{4})-(?P<month>\d{2})    # Python
(?<year>\d{4})-(?<month>\d{2})      # JavaScript, .NET, Java

Access via match.group('year') in Python or match.groups.year in JavaScript.

Backreferences

Reference a previously captured group inside the same pattern using \1, \2, etc.

(\w+)\s\1       → matches repeated words like "the the" or "go go"

RegEx Lookahead and Lookbehind

Lookarounds check what surrounds a match without including that context in the result.

Zero-width assertions. They check position, not content.

Positive Lookahead

(?=...) matches if what follows fits the pattern.

\d+(?= dollars)   → matches "100" in "100 dollars" but not in "100 euros"

Negative Lookahead

(?!...) matches if what follows does NOT fit the pattern.

\d+(?! dollars)   → matches "100" in "100 euros" but skips "100 dollars"

Positive Lookbehind

(?<=...) matches if what precedes fits the pattern.

(?<=\$)\d+    → matches "99" in "$99" but not in "99 USD"

Negative Lookbehind

(?<!...) matches if what precedes does NOT fit the pattern.

(?<!\$)\d+    → matches "99" in "99 USD" but skips "$99"

Note: JavaScript (pre-ES2018) does not support lookbehind. Most other major engines do.


RegEx Flags and Modifiers

Flags change how the entire pattern behaves. Placed after the closing delimiter or passed as a second argument.

Flag

Name

Effect

Example

i

Case insensitive

a matches A and a

/cat/i matches "Cat", "CAT"

g

Global

Find all matches, not just the first

/\d+/g returns every number

m

Multiline

^ and $ match per line

/^\w+/gm matches first word per line

s

Dotall

. matches newlines too

/a.b/s matches "a\nb"

x

Extended

Allows whitespace and comments in patterns

Supported in Python, PHP, Ruby

u

Unicode

Enables full Unicode support

/\p{L}+/u matches Unicode letters

/hello/gi     → matches "hello", "Hello", "HELLO" anywhere in the string

RegEx Syntax by Language

The core syntax is consistent. Delimiters, flags, and a few edge cases vary by language.

RegEx in JavaScript

Uses RegExp object or literal /pattern/flags syntax. Lookbehind requires ES2018+.

const pattern = /^\d{4}-\d{2}-\d{2}$/;
pattern.test("2024-01-15");   // true

const matches = "one 1, two 2".match(/\d+/g);  // ["1", "2"]

Key methods: .test(), .match(), .replace(), .split(), .exec()

RegEx in Python

Uses the built-in re module. Supports named groups, lookbehind, and verbose mode (re.X).

import re

pattern = re.compile(r"(\d{4})-(\d{2})-(\d{2})")
match = pattern.search("Date: 2024-01-15")
print(match.group(1))  # "2024"

Key functions: re.search(), re.match(), re.findall(), re.sub(), re.compile()

RegEx in PHP

Uses PCRE via preg_* functions. Patterns wrapped in delimiters (usually /).

preg_match('/(\d{4})-(\d{2})/', '2024-01', $matches);
echo $matches[1]; // "2024"

preg_replace('/\s+/', '-', 'hello world'); // "hello-world"

Key functions: preg_match(), preg_match_all(), preg_replace(), preg_split()

RegEx in Java

Uses java.util.regex package. Patterns compiled from strings - backslashes need escaping (\\d not \d).

import java.util.regex.*;

Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("2024-01-15");
if (m.find()) {
    System.out.println(m.group(1)); // "2024"
}

Key classes: Pattern, Matcher

RegEx in Ruby

Uses the Regexp class or literal /pattern/ syntax. Named groups work cleanly with symbol access.

str = "2024-01-15"
match = str.match(/(?<year>\d{4})-(?<month>\d{2})/)
puts match[:year]   # "2024"

Key methods: .match(), .scan(), .gsub(), .split()


Common RegEx Patterns

Patterns you'll actually use. Copy, test, adjust for your input.

Use Case

Pattern

Notes

Email

^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$

Basic validation; RFC 5322 is far more complex

URL

https?://[\w./-]+

Simplified - use a library for production

Phone (US)

\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Handles common US formats

IP Address (IPv4)

\b(?:\d{1,3}\.){3}\d{1,3}\b

Does not validate the 0–255 range

Date (YYYY-MM-DD)

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

Validates month and day ranges

HTML tag

<[^>]+>

Matches any tag - not for parsing full HTML

Whitespace (trim)

^\s+|\s+$

Matches leading and trailing whitespace

Password

^(?=.*[A-Z])(?=.*\d)(?=.*[\W_]).{8,}$

8+ chars, uppercase, digit, special char

Hex color

#[0-9a-fA-F]{3,6}

Matches 3 or 6 digit hex codes

Username

^[a-zA-Z0-9_]{3,16}$

3–16 alphanumeric characters or underscores


RegEx Operators and Alternation

Pipe Operator

| acts as a logical OR between two expressions.

cat|dog       → matches "cat" or "dog"
jpg|jpeg|png  → matches any of these file extensions

Precedence Rules

Alternation has the lowest precedence in a regular expression.

cat|dog food matches "cat" OR "dog food", not "cat food" OR "dog food". Wrap in a group to control scope: (cat|dog) food.

Grouping with Alternation

Combine alternation with groups to build clean branching patterns.

^(Mr|Mrs|Ms|Dr)\. [A-Z][a-z]+$
→ matches "Dr. Smith", "Ms. Jones"

How to Test RegEx

Online RegEx Testers

These tools give real-time match highlighting, group inspection, and engine selection.

RegEx Debugging Tips


RegEx Performance and Backtracking

Backtracking happens when the engine tries a path, fails, then backs up and retries.

Minor backtracking is normal. Catastrophic backtracking is a bug - it can freeze an application on specific inputs.

What Causes It

Nested quantifiers on overlapping patterns. (a+)+b on a string like "aaaaaa" with no b at the end triggers an exponential number of retries.

Other common causes:

How to Avoid It

FAQ on Regex

What is RegEx used for?

RegEx is used for pattern matching, input validation, text parsing, and string manipulation.

Common applications: validating email addresses, extracting URLs, finding duplicates, parsing log files, and running find-and-replace across large codebases.

What does .* mean in RegEx?

. matches any single character except a newline.

* is a greedy quantifier meaning zero or more repetitions. Together, .* matches any sequence of characters on a single line - as much as possible.

What is the difference between greedy and lazy quantifiers?

Greedy quantifiers match as much text as possible. Lazy quantifiers (add ?) match as little as possible.

<.+> grabs everything between the first < and the last >. <.+?> stops at the first > it finds.

What does ^ mean in a RegEx pattern?

Outside a character class, ^ is a start anchor - it matches the position at the beginning of a string.

Inside brackets like [^abc], it negates the set, matching any character that is NOT a, b, or c.

What are RegEx flags?

Flags modify how the regex engine processes a pattern.

Common ones: i (case-insensitive), g (global - find all matches), m (multiline - ^ and $ match per line), s (dotall - . matches newlines too).

What is a capturing group in RegEx?

Parentheses () create a capturing group, storing the matched text for later use via backreferences or in replacement strings.

Non-capturing groups (?:...) group without storing. Named groups (?<name>...) let you reference matches by name instead of index.

How do word boundaries work in RegEx?

\b matches the position between a word character (\w) and a non-word character (\W).

\bcat\b matches "cat" as a standalone word but skips "category" or "concatenate". Useful for precise string search without partial matches.

Does RegEx syntax differ between programming languages?

Yes. Core syntax is consistent, but engine-specific features vary.

JavaScript lacks lookbehind in older versions and has no possessive quantifiers. Python's re module supports named groups. PCRE (used in PHP) is the most feature-complete regex flavor available.

What is catastrophic backtracking in RegEx?

It happens when a pattern with nested quantifiers - like (a+)+ - tries an exponential number of combinations on a non-matching string.

The regex engine keeps retrying, causing serious performance slowdowns. Fix it by rewriting ambiguous patterns or using possessive quantifiers where supported.

How do I test a RegEx pattern?

Use an online regex tester like Regex101, Regexr, or Debuggex.

These tools show real-time match highlighting, break down each part of your pattern, and let you switch between regex flavors like PCRE, JavaScript, and Python.