JavaScript Regex Syntax Basics: Patterns, Flags, and the Methods That Use Them
Regular expressions are the most-feared, most-Googled, and most-useful piece of JavaScript syntax. They look intimidating (/^[a-z]+\b/gim), but the rules are surprisingly small. Once you know the basic character classes, anchors, quantifiers, and flags, you can read or write any pattern in the wild.
This lesson covers the regex syntax that 90% of real code uses, the JavaScript-specific quirks, and the four methods (test, match, replace, split) that consume regexes.
Creating a regex #
Two equivalent syntaxes:
const a = /hello/; // regex literal
const b = new RegExp('hello'); // RegExp constructor
const c = new RegExp('hello', 'gi'); // with flags
Use the literal form when the pattern is fixed at write time. Use the constructor when you need to build the pattern dynamically (interpolating user input):
const search = 'hello';
const pattern = new RegExp(search, 'i');
The constructor takes the pattern as a string — so backslashes need to be doubled: new RegExp('\\d+'). Easy to forget.
The four methods that use regexes #
const re = /\d+/;
re.test('abc 123'); // boolean — true if a match exists
'abc 123'.match(re); // ['123'] — first match (without /g) or all matches (with /g)
'abc 123'.replace(re, 'X'); // 'abc X' — replace first match
'abc 123 456'.split(re); // ['abc ', ' ', ''] — split by matches
Add g (global) flag to match all occurrences. Without g, methods only act on the first match.
matchAll for capture groups across matches #
const text = 'a1 b2 c3';
for (const m of text.matchAll(/(\w)(\d)/g)) {
console.log(m[1], m[2]); // 'a' '1', 'b' '2', 'c' '3'
}
Returns an iterator of match objects, each with capture groups. The cleanest way to handle multiple matches with groups.
Basic character matching #
/abc/ // matches the literal string 'abc'
/./ // any single character except newline
/\./ // a literal dot (backslash escapes)
Character classes #
A set of characters in square brackets — matches any one of them:
/[abc]/ // a, b, or c
/[a-z]/ // any lowercase letter (range)
/[a-zA-Z]/ // any letter
/[0-9]/ // any digit
/[^0-9]/ // any NON-digit (the ^ inside [ ] is negation)
/[a-z0-9]/ // alphanumeric lowercase
Shorthand classes #
/\d/ // any digit (same as [0-9])
/\D/ // any non-digit (same as [^0-9])
/\w/ // word char (letters, digits, underscore)
/\W/ // non-word
/\s/ // whitespace (space, tab, newline, etc.)
/\S/ // non-whitespace
The shorthand classes are the most-used building blocks. Memorize them.
Quantifiers — how many #
/a*/ // zero or more 'a'
/a+/ // one or more 'a'
/a?/ // zero or one 'a'
/a{3}/ // exactly 3 'a's
/a{2,4}/ // 2 to 4 'a's
/a{2,}/ // 2 or more
Greedy vs lazy #
Quantifiers are greedy by default — they match as much as possible:
'<b>hi</b>'.match(/<.+>/); // ['<b>hi</b>'] — greedy: grabs everything
Add ? after the quantifier to make it lazy — match as little as possible:
'<b>hi</b>'.match(/<.+?>/); // ['<b>'] — lazy: stops at first '>'
The greedy/lazy distinction is the single most useful insight when patterns aren't matching what you expect.
Anchors and boundaries #
/^abc/ // matches if 'abc' is at the START of the string
/abc$/ // matches if 'abc' is at the END
/\babc/ // word boundary — 'abc' starting at a word boundary
/abc\b/ // 'abc' ending at a word boundary
Anchors don't consume characters — they're "position" assertions.
/^\d+$/.test('12345'); // true — entire string is digits
/^\d+$/.test('a12'); // false — there's an 'a'
Groups and alternation #
Alternation: | #
/cat|dog/ // 'cat' OR 'dog'
/^(cat|dog)$/ // exactly 'cat' or exactly 'dog'
Grouping: (...) #
Parentheses group a sub-pattern AND capture the match:
const m = 'hello world'.match(/(hello) (world)/);
console.log(m[0]); // 'hello world' — full match
console.log(m[1]); // 'hello' — first capture group
console.log(m[2]); // 'world'
Groups are numbered left-to-right starting at 1 (group 0 is the whole match).
Non-capturing groups: (?:...) #
If you need grouping but not capturing (useful in alternation):
const m = 'rainbow'.match(/(?:rain|sun)bow/);
console.log(m[0]); // 'rainbow' — no group 1
Faster and clearer when you don't need the capture.
Named groups: (?<name>...) #
Gives groups readable names:
const m = '2026-05-21'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(m.groups.year); // '2026'
console.log(m.groups.month); // '05'
Much more maintainable than positional indices. We cover named groups in depth in Lesson 9.2.
Flags #
Letters after the closing /:
| Flag | Meaning |
|---|---|
g |
Global — find ALL matches, not just the first |
i |
Case-insensitive |
m |
Multi-line — ^ and $ match at line breaks too |
s |
DotAll — . matches newline too |
u |
Unicode — proper handling of code points (essential for emoji) |
y |
Sticky — match only at lastIndex position |
d |
Has indices — match returns position info per group (ES2022) |
Most common combinations:
/.../g // find all
/.../gi // find all, case-insensitive
/.../u // Unicode-aware (always pass for any user text)
/.../gu // Unicode all-matches
For any pattern that touches international text, always include the u flag — it makes \w, character classes, and the . metachar Unicode-aware.
Escaping special characters #
These characters have special meaning and must be escaped to match literally:
. ^ $ * + ? ( ) [ ] { } | \ /
Escape with \:
/3\.14/ // matches the literal '3.14'
/\$100/ // matches '$100'
/\\n/ // matches a literal backslash-n (not a newline)
For user input that becomes a pattern, escape automatically:
function escapeRegex(s) {
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
const pattern = new RegExp(escapeRegex(userInput), 'gi');
Never interpolate raw user input into a regex constructor — it can break or be exploited (catastrophic backtracking).
Practical examples #
Email validation (simple) #
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
emailRegex.test('ada@example.com'); // true
This is intentionally simple — real-world email validation is famously ambiguous. For production, send a verification email instead of trying to perfectly validate the local-part syntax.
Extract digits #
const digits = 'price: $1,234.56'.match(/\d+/g);
// ['1', '234', '56']
Replace tabs with spaces #
'a\tb\tc'.replace(/\t/g, ' ');
Split on multiple delimiters #
'a,b;c|d'.split(/[,;|]/); // ['a', 'b', 'c', 'd']
Trim multiple spaces #
'hello world'.replace(/\s+/g, ' '); // 'hello world'
Find words #
const words = 'Hello, world!'.match(/\b\w+\b/g);
// ['Hello', 'world']
The lastIndex gotcha #
Regexes with the g flag have state — they remember where they left off:
const re = /\d/g;
re.test('a1'); // true, lastIndex now 2
re.test('a1'); // false (!) — searching from position 2 of 'a1', no digit
re.test('a1'); // true — wrapped back to 0
The single most surprising regex behavior. To avoid:
- Don't reuse
/gregexes fortest(). Create a fresh one or resetlastIndex = 0. - Use
String.prototype.matchAll()instead, which doesn't have this issue.
Performance pitfalls #
Catastrophic backtracking #
Some patterns can be very slow on certain inputs:
/^(a+)+b/.test('a'.repeat(30) + 'c'); // hangs the browser
The (a+)+ structure tries to match in exponentially many ways before failing. Avoid nested quantifiers — use atomic groups, possessive quantifiers (not in JavaScript regex), or rewrite without them.
In practice: if a regex runs slowly, simplify. If you need to validate untrusted input, set a timeout or use a regex engine like re2 (no backtracking) on the server.
Compile once, reuse #
const RE = /\d+/g; // hoisted out of the hot path
for (const item of items) {
if (RE.test(item)) { /* ... */ }
}
Each new RegExp(...) recompiles the pattern. For hot loops, hoist the regex out.
A summary #
- Two syntaxes: literal
/pattern/flags(write-time) vsnew RegExp(string, flags)(dynamic). - Four consumer methods:
test,match,replace,split(plusmatchAllfor groups + all matches). - Shorthand classes:
\d \w \sand their inversions are the building blocks. - Quantifiers:
* + ? {n,m}. Add?for lazy. - Anchors:
^ $ \bare position-only, don't consume. - Groups:
(...)captures,(?:...)doesn't,(?<name>...)names. - Flags:
g i m s u y d. Always passufor user text. - The
g-flaglastIndexstatefulness is the #1 gotcha. PrefermatchAll. - Escape user input before putting it in a regex.
What's next #
Lesson 9.2 covers modern regex features — lookaheads/lookbehinds, Unicode property escapes (\p{...}), named groups in depth, indices flag, and the patterns these unlock.
Try it yourself #
The greedy-vs-lazy distinction is the single most useful one to feel:
const html = '<b>hello</b><i>world</i>';
console.log(html.match(/<.+>/));
console.log(html.match(/<.+?>/));
console.log(html.match(/<.+?>/g));js_sandboxOutput:1.
['<b>hello</b><i>world</i>'] — greedy: matches from first < to last >, consuming everything in between.2.
['<b>'] — lazy: stops at the first >.3.
['<b>', '</b>', '<i>', '</i>'] — lazy + global gets every tag individually.This single trick — adding
? to make a quantifier lazy — fixes more “my regex is matching too much” bugs than any other syntax change.Knowing when to be greedy and when to be lazy is half of writing useful regexes.
Up next in JavaScript
More from this topic
Enjoyed this article?
Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.


