JavaScript Regex Real-World Patterns: Validation, Parsing, and Find/Replace Recipes

Regex in theory is one thing; regex in production code is another. The patterns you'll actually write or read in a real codebase form a small list: validations, URL/email checks, find-and-replace transformations, log parsing, and a handful of string-manipulation idioms. Knowing the right pattern (and knowing which problems regex shouldn't solve) is worth more than memorizing the syntax.

This lesson is a recipe collection plus a guided tour of the anti-patterns and when to reach for non-regex tools.

Validation patterns #

Email (pragmatic, not RFC-perfect) #

const EMAIL = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
EMAIL.test('ada@example.com');     // true
EMAIL.test('ada@example');         // false
EMAIL.test('ada@@example.com');    // false

A truly RFC-5322-compliant email regex is hundreds of characters. Don't bother. Use this pragmatic version for client-side hint validation and send a verification email for actual correctness.

URL (basic) #

const URL_RE = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
URL_RE.test('https://code-js.in/posts');  // true

But for any non-trivial URL work, use the built-in URL class instead:

try {
  const u = new URL('https://api.example.com:8080/users?id=42');
  console.log(u.host);     // 'api.example.com:8080'
  console.log(u.pathname); // '/users'
  console.log(u.searchParams.get('id')); // '42'
} catch {
  // not a valid URL
}

The URL constructor handles every edge case the spec covers. Use it for parsing; only fall back to regex for quick "does this look URL-ish?" checks.

Phone number (locale-specific) #

// US: +1 followed by 10 digits, allowing spaces, dashes, parens
const US_PHONE = /^(\+1[\s-]?)?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}$/;
US_PHONE.test('+1 (212) 555-1234');  // true
US_PHONE.test('212-555-1234');        // true

For international phone validation, use the libphonenumber-js library. The rules vary wildly by country.

UUID v4 #

const UUID_V4 = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
UUID_V4.test('a1b2c3d4-e5f6-4789-abcd-1234567890ab');  // true

The 4 in the third segment and [89ab] at the start of the fourth are version-4 specifiers.

Hex color #

const HEX_COLOR = /^#([0-9a-f]{3}|[0-9a-f]{6}|[0-9a-f]{8})$/i;
HEX_COLOR.test('#fff');
HEX_COLOR.test('#ffffff');
HEX_COLOR.test('#ffffff80');  // 8-digit = with alpha

Strong password (one of each) #

const STRONG_PASSWORD = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;

We walked through this lookaround pattern in Lesson 9.2.

Parsing patterns #

Date in ISO 8601 #

const ISO_DATE = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})(?:T(?<hour>\d{2}):(?<minute>\d{2})(?::(?<second>\d{2})(?:\.(?<ms>\d+))?)?(?<tz>Z|[+-]\d{2}:\d{2})?)?$/;

const { groups } = '2026-05-21T14:30:00.500Z'.match(ISO_DATE);
console.log(groups);
// { year: '2026', month: '05', day: '21', hour: '14', minute: '30', second: '00', ms: '500', tz: 'Z' }

Or just use new Date(string) and toISOString(). Lesson 8.3 covers the Date API.

CSV row (simple — no quoted commas) #

const row = 'Ada,36,London';
const [name, age, city] = row.split(',');

For CSV with quoted fields and embedded commas, don't use regex. Use a real CSV parser (papaparse, csv-parse). Regex for CSV is a famous trap — the moment your data has "Smith, John" in a single cell, naive regex breaks.

Query string #

// Or use URLSearchParams — strongly preferred
const params = new URLSearchParams('?id=42&name=Ada%20Lovelace');
params.get('id');     // '42'
params.get('name');   // 'Ada Lovelace' — handles URL decoding

Never roll your own query-string regex. Always URLSearchParams.

Markdown link #

const MD_LINK = /\[([^\]]+)\]\(([^)]+)\)/g;
for (const m of 'See [docs](https://example.com) and [tutorials](/tutorials/)'.matchAll(MD_LINK)) {
  console.log(m[1], '→', m[2]);
}
// docs → https://example.com
// tutorials → /tutorials/

The [^\]]+ and [^)]+ are negated classes — "any character except ] (or ))". Pragmatic Markdown parsing — for production, use a real Markdown library (marked, markdown-it).

Find-and-replace recipes #

Trim multiple spaces #

' hello   world '.replace(/\s+/g, ' ').trim();
// 'hello world'

Capitalize each word #

'hello world'.replace(/\b\p{L}/gu, c => c.toUpperCase());
// 'Hello World'

Remove non-numeric #

'$1,234.56'.replace(/[^\d.-]/g, '');
// '1234.56'

Convert camelCase to kebab-case #

'fooBarBaz'.replace(/([a-z])([A-Z])/g, '$1-$2').toLowerCase();
// 'foo-bar-baz'

Convert snake_case to camelCase #

'foo_bar_baz'.replace(/_([a-z])/g, (_, c) => c.toUpperCase());
// 'fooBarBaz'

Truncate to N words #

function truncateWords(s, n) {
  return s.split(/\s+/).slice(0, n).join(' ');
}

Mask all but last 4 digits #

'1234567890123456'.replace(/\d(?=\d{4})/g, '*');
// '************3456'

The lookahead (?=\d{4}) says: "only replace digits that have at least 4 more digits after them." Leaves the last four alone.

Strip HTML tags #

'<p>Hello <b>world</b></p>'.replace(/<[^>]+>/g, '');
// 'Hello world'

For real HTML sanitization, use DOMPurify. This regex is for trusted content where you just want plain text. Untrusted HTML through this regex is a classic XSS path.

Log parsing #

const LOG_LINE = /^(?<time>\S+)\s+(?<level>\w+)\s+(?<msg>.+)$/;

const lines = [
  '2026-05-21T14:30:00Z INFO  User logged in',
  '2026-05-21T14:30:05Z ERROR Connection refused',
];

for (const line of lines) {
  const m = line.match(LOG_LINE);
  if (m) console.log(m.groups);
}
// { time: '2026-05-21T14:30:00Z', level: 'INFO',  msg: 'User logged in' }
// { time: '2026-05-21T14:30:05Z', level: 'ERROR', msg: 'Connection refused' }

With named groups, log-parsing regexes become self-documenting. Combined with \s+ for whitespace, anchors ^ and $, this scales to most structured-log formats.

Splitting on multiple delimiters #

'apple,banana;cherry|date'.split(/[,;|]/);
// ['apple', 'banana', 'cherry', 'date']

Extracting all matches #

The pattern of the decade — matchAll:

const text = '@ada said hi to @grace and @hopper';
const mentions = [...text.matchAll(/@(\w+)/g)].map(m => m[1]);
// ['ada', 'grace', 'hopper']

Returns an iterator of match objects, each with capture groups. Replaces the older while ((m = re.exec(str))) {...} dance.

Anti-patterns: when NOT to regex #

Regex is the wrong tool for:

HTML / XML parsing #

// Famous wrong answer to 'how do I parse HTML with regex'

Use DOMParser (browser) or parse5/cheerio (Node). HTML is not a regular language — the grammar nests arbitrarily, has whitespace rules regex can't express, and CDATA/script blocks need special handling.

CSV parsing #

Use papaparse or csv-parse. Quoted commas and embedded newlines break naive regex.

JSON parsing #

Use JSON.parse. Don't try to extract values from JSON with regex.

Source code analysis #

Regex finds function foo. It cannot tell that foo is in a comment, a string literal, or a different scope. Use a real parser (@babel/parser, @typescript-eslint, acorn).

Free-form natural-language matching #

For sentiment, entity extraction, or anything that needs to understand language, regex is the wrong layer. Use NLP libraries or an LLM call.

Performance tips #

Hoist regexes out of loops #

const RE = /\d+/g;
for (const item of items) {
  if (RE.test(item.name)) { /* ... */ }
}

Each /foo/ literal is compiled fresh on every evaluation. Inside a hot loop, that adds up.

Watch for `lastIndex` with `/g` #

From Lesson 9.1: regex objects with g keep state. For repeated test() calls on the same regex, reset lastIndex = 0 or use matchAll/match instead.

Avoid catastrophic backtracking #

Nested quantifiers like (a+)+, (a*)*, (a|a)+ can explode on malicious or adversarial inputs. If you're validating user-controlled strings on a server:

Use re2 (a non-backtracking engine, available via WebAssembly or as a Node binding)
Cap input length
Use a timeout (worker threads / setTimeout-based bailout)

When to use a library #

Task	Use
Phone numbers	`libphonenumber-js`
Markdown	`marked`, `markdown-it`
HTML	`DOMParser`, `parse5`, `cheerio`
CSV	`papaparse`, `csv-parse`
URL parsing	`URL` (built in)
Query strings	`URLSearchParams` (built in)
Dates	`Date`, `Intl`, `date-fns`, or `Temporal`
Email validation	Send a verification email
Sanitizing HTML	`DOMPurify`

If the task is on this list, default to the library. Regex is your tool of choice when there isn't a purpose-built one.

A summary #

Common validations — email, URL, password, UUID, hex color — small, well-known regexes worth keeping handy.
Common transformations — case conversion, masking, trim multiple spaces — short patterns that come up daily.
Use matchAll for any "find all matches with groups" task.
Hoist regexes out of hot loops.
Avoid nested quantifiers to prevent catastrophic backtracking.
Skip regex for HTML, CSV, JSON, source code, NLP. Use real parsers.
Built-in alternatives (URL, URLSearchParams, Date) beat hand-rolled regexes for everything they cover.

What's next #

That completes Module 9 — Regular Expressions. Module 10 dives into advanced JavaScript features: Proxy, Reflect, async iteration, streams, and Web Workers.

Try it yourself #

The matchAll + named group pattern is the most useful real-world regex idiom. Predict the output:

YouPredict the output:

const text = 'Order #123 by Ada, #456 by Grace';
const orders = [...text.matchAll(/#(?<id>\d+) by (?<name>\w+)/g)]
  .map(m => m.groups);
console.log(orders);

Claude · used js_sandboxOutput: [{ id: '123', name: 'Ada' }, { id: '456', name: 'Grace' }].

One pattern, two matches, two clean little objects. matchAll + groups + map is the modern “parse this structured text” toolkit. Once you internalize this, half the string-processing in your codebase becomes one-liners.

A clean regex with named groups and matchAll is the difference between code you can come back to in 6 months and code you have to rewrite.