JavaScript Modern Regex Features: Lookarounds, Named Groups, Unicode Properties

Link copied
JavaScript Modern Regex Features: Lookarounds, Named Groups, Unicode Properties

JavaScript Modern Regex Features: Lookarounds, Named Groups, Unicode Properties

JS Tutorial Module 9: Regex Lesson 9.2

Regexes added a lot in the ES2018–ES2022 era — features that make patterns shorter, more readable, and capable of things that previously required external libraries. Lookarounds check context without consuming characters. Named capture groups make match results readable. Unicode property escapes let you match every letter in every script with one expression. The d flag gives you match indices.

This lesson covers everything regex picked up in the last few editions of JavaScript, with practical patterns for each.

Lookaheads and lookbehinds #

Lookarounds are zero-width assertions — they check what's before or after the current position without consuming characters.

Positive lookahead: (?=...) #

"Match X only if followed by Y":

'foobar foobaz'.match(/foo(?=bar)/);
// ['foo'] — matched only the 'foo' that has 'bar' after it

The bar is checked but not included in the match.

Negative lookahead: (?!...) #

"Match X only if NOT followed by Y":

'foobar foobaz'.match(/foo(?!bar)/);
// ['foo'] — the foo in foobaz, since the first foo is followed by bar

Positive lookbehind: (?<=...) #

"Match X only if preceded by Y":

'$5 €10 ¥100'.match(/(?<=€)\d+/);
// ['10'] — number preceded by €

Negative lookbehind: (?<!...) #

"Match X only if NOT preceded by Y":

'$5 €10 ¥100'.match(/(?<!€)\d+/g);
// ['5', '00'] — note: '100' matches but only digits not preceded by € → '00' (the '1' is preceded by ¥, but the '0's aren't preceded by €)

Lookbehinds were the latest of the four (ES2018). They unlock patterns that used to require post-processing.

Practical lookaround patterns #

Extract value after a label #

const log = 'user=ada role=admin lastSeen=2026-05-21';
const role = log.match(/(?<=role=)\w+/)?.[0];  // 'admin'

Match dollar amounts #

'I have $50 and $100'.match(/(?<=\$)\d+/g);  // ['50', '100']

Without lookbehind, you'd capture $50 and have to strip the $ afterwards.

Password validation: at least one of each #

const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;

Four positive lookaheads, each asserting one requirement. The actual match is just .{8,} — 8+ characters. Each lookahead requires the string to contain something specific.

Reading the pattern:

  • ^(?=.*[a-z]) — at least one lowercase
  • (?=.*[A-Z]) — at least one uppercase
  • (?=.*\d) — at least one digit
  • (?=.*[!@#$%^&*]) — at least one symbol
  • .{8,}$ — 8 or more characters total

This is the most common real-world lookaround use case.

Replace a word but not when it's part of another #

// 'cat' but not 'catalog' or 'category'
'cat catalog category'.replace(/\bcat(?!\w)/g, 'dog');
// 'dog catalog category'

The (?!\w) says: "not followed by another word character." Word-boundary \b alone wouldn't help here — cat ends at a word boundary in catalog too.

Named capture groups #

Covered briefly in Lesson 9.1. Worth a deeper look.

const m = '2026-05-21'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);

m.groups.year;   // '2026'
m.groups.month;  // '05'
m.groups.day;    // '21'

Named groups also work in replace:

'2026-05-21'.replace(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
  '$<day>/$<month>/$<year>'
);
// '21/05/2026'

In a callback, the named groups arrive as an object after the positional args:

'2026-05-21'.replace(
  /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
  (match, ...args) => {
    const groups = args.at(-1);
    return `${groups.day}/${groups.month}/${groups.year}`;
  }
);

Use named groups for any pattern with more than two captures. Self-documenting.

Backreferences within a pattern #

Reference a named group later in the same pattern:

// Match doubled words
/\b(?<word>\w+) \k<word>\b/.test('the the cat'); // true

For positional groups, use \1, \2, etc. For named groups, \k<name>.

Unicode property escapes: \p{...} and \P{...} #

The biggest improvement in modern regex for international text.

// All Unicode letters, in any script
/\p{Letter}/u.test('Ω');  // true
/\p{L}/u.test('字');       // true
/\p{L}/u.test('A');        // true
/\p{L}/u.test('1');        // false

The u flag is required for \p{...}. Without it, \p matches a literal p.

Useful property classes:

Class Matches
\p{L} or \p{Letter} Any letter, any script
\p{N} or \p{Number} Any digit or numeric character
\p{Lu} Uppercase letter
\p{Ll} Lowercase letter
\p{P} Punctuation
\p{S} Symbol (currency, math)
\p{Z} Separator (space etc.)
\p{M} Mark (combining accent)
\p{Diacritic} Diacritical marks
\p{Emoji} Emoji (any)
\p{Emoji_Presentation} Emoji that defaults to graphic
\p{Script=Greek} Letters from Greek script
\p{Script=Latin} Letters from Latin script

\P{...} (uppercase P) is the negation.

Practical patterns #

Strip diacritics:

function stripAccents(s) {
  return s.normalize('NFKD').replace(/\p{Diacritic}/gu, '');
}
stripAccents('Café résumé');  // 'Cafe resume'

Match emoji:

'Hello 🎉 world 👋'.match(/\p{Emoji}/gu);
// ['🎉', '👋']

Capitalize every Unicode letter at the start of a word:

'hello world. café'.replace(/\b\p{Ll}/gu, c => c.toUpperCase());
// 'Hello World. Café'

Before property escapes, you'd need a hand-maintained character class. Now: one pattern, every language.

The s flag — dotAll #

By default, . doesn't match newlines. With the s flag, it does:

'line1\nline2'.match(/line1.line2/);    // null — . doesn't match \n
'line1\nline2'.match(/line1.line2/s);   // matches

Useful for parsing multi-line strings where you don't want to write [\s\S] (the old hack to match anything including newlines).

The d flag — hasIndices #

Added in ES2022. When set, match results include positional info for every capture group:

const m = 'hello world'.match(/(?<greet>hello) (?<name>world)/d);
console.log(m.indices);
// [[0, 11], [0, 5], [6, 11]]
//  full     greet    name
console.log(m.indices.groups);
// { greet: [0, 5], name: [6, 11] }

Useful for syntax highlighting, error reporting ("problem at characters 6-11"), and any tool that needs to know exactly where a match landed.

The y flag — sticky #

Matches only at the regex's current lastIndex position. Used for stateful parsers:

const re = /\w+/y;
re.lastIndex = 6;
re.exec('hello world');  // ['world'] — only because position 6 starts a match
re.lastIndex = 4;
re.exec('hello world');  // null — position 4 is 'o w', a partial word match doesn't count

Niche but indispensable for tokenizers. Most app code doesn't need it.

Putting it all together: a URL parser #

const URL_RE = /^(?<scheme>https?):\/\/(?<host>[^\/:?#]+)(?::(?<port>\d+))?(?<path>\/[^?#]*)?(?:\?(?<query>[^#]*))?(?:#(?<fragment>.*))?$/;

const m = 'https://api.example.com:8080/users/42?role=admin#section'.match(URL_RE);
console.log(m.groups);
// {
//   scheme: 'https',
//   host: 'api.example.com',
//   port: '8080',
//   path: '/users/42',
//   query: 'role=admin',
//   fragment: 'section'
// }

Named groups make the result usable. Non-capturing groups ((?:...)) keep the indices clean. Lookarounds aren't needed for this one — the anchors and structure carry it.

Production code would use new URL(...), but the regex demonstrates the power of named groups.

A summary #

  • Lookarounds check context without consuming. (?=...) (?!...) (?<=...) (?<!...).
  • Named groups (?<name>...) and backreferences \k<name>. Always preferable to positional indices.
  • Unicode property escapes \p{Letter}, \p{Emoji}, \p{Script=Greek} — require u flag.
  • Flags: s for dot-matches-newline, d for match indices, y for sticky.
  • Backreferences inside a pattern\1 (positional) or \k<name> (named).

With modern regex features, JavaScript can match almost anything PCRE-compatible (Perl/Python regex) can. The notable absence: atomic groups and possessive quantifiers. For catastrophic-backtracking-resistant matching at scale, run untrusted regex elsewhere (Go's RE2 via WebAssembly, or server-side with timeouts).

What's next #

Lesson 9.3 covers real-world regex patterns — URL parsing, email validation, find-and-replace recipes, validation patterns, and the libraries (and anti-patterns) you should know about.

Try it yourself #

The lookbehind pattern is the easiest new feature to feel. Predict the output:

YouPredict the output:
const text = 'Price: $19.99, was $29.99';
console.log(text.match(/(?<=\$)\d+\.\d+/g));

const html = '<b>hello</b>';
console.log(html.match(/(?<=<b>).*(?=<\/b>)/));
Claude · used js_sandboxOutput:
['19.99', '29.99'] — lookbehind matched $ without including it. Same for the prices that follow.
['hello'] — lookbehind for the opening <b>, lookahead for the closing </b>, neither in the match.

Lookarounds let you describe context without including it. That’s the whole magic — you get cleaner matches without post-processing.

Once you internalize lookarounds, you write half as much string-processing code.

Up next in JavaScript

More from this topic

View all JavaScript articles →

Enjoyed this article?

Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.

Leave a Comment

Your email address will not be published. Required fields are marked *