JavaScript Regex Real-World Patterns: Validation, Parsing, and Find/Replace Recipes

Link copied
JavaScript Regex Real-World Patterns: Validation, Parsing, and Find/Replace Recipes

JavaScript Regex Real-World Patterns: Validation, Parsing, and Find/Replace Recipes

JS Tutorial Module 9: Regex Lesson 9.3

Regex in theory is one thing; regex in production code is another. The patterns you'll actually write or read in a real codebase form a small list: validations, URL/email checks, find-and-replace transformations, log parsing, and a handful of string-manipulation idioms. Knowing the right pattern (and knowing which problems regex shouldn't solve) is worth more than memorizing the syntax.

This lesson is a recipe collection plus a guided tour of the anti-patterns and when to reach for non-regex tools.

Validation patterns #

Email (pragmatic, not RFC-perfect) #

const EMAIL = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
EMAIL.test('ada@example.com');     // true
EMAIL.test('ada@example');         // false
EMAIL.test('ada@@example.com');    // false

A truly RFC-5322-compliant email regex is hundreds of characters. Don't bother. Use this pragmatic version for client-side hint validation and send a verification email for actual correctness.

URL (basic) #

const URL_RE = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
URL_RE.test('https://code-js.in/posts');  // true

But for any non-trivial URL work, use the built-in URL class instead:

try {
  const u = new URL('https://api.example.com:8080/users?id=42');
  console.log(u.host);     // 'api.example.com:8080'
  console.log(u.pathname); // '/users'
  console.log(u.searchParams.get('id')); // '42'
} catch {
  // not a valid URL
}

The URL constructor handles every edge case the spec covers. Use it for parsing; only fall back to regex for quick "does this look URL-ish?" checks.

Phone number (locale-specific) #

// US: +1 followed by 10 digits, allowing spaces, dashes, parens
const US_PHONE = /^(\+1[\s-]?)?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}$/;
US_PHONE.test('+1 (212) 555-1234');  // true
US_PHONE.test('212-555-1234');        // true

For international phone validation, use the libphonenumber-js library. The rules vary wildly by country.

UUID v4 #

const UUID_V4 = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
UUID_V4.test('a1b2c3d4-e5f6-4789-abcd-1234567890ab');  // true

The 4 in the third segment and [89ab] at the start of the fourth are version-4 specifiers.

Hex color #

const HEX_COLOR = /^#([0-9a-f]{3}|[0-9a-f]{6}|[0-9a-f]{8})$/i;
HEX_COLOR.test('#fff');
HEX_COLOR.test('#ffffff');
HEX_COLOR.test('#ffffff80');  // 8-digit = with alpha

Strong password (one of each) #

const STRONG_PASSWORD = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;

We walked through this lookaround pattern in Lesson 9.2.

Parsing patterns #

Date in ISO 8601 #

const ISO_DATE = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})(?:T(?<hour>\d{2}):(?<minute>\d{2})(?::(?<second>\d{2})(?:\.(?<ms>\d+))?)?(?<tz>Z|[+-]\d{2}:\d{2})?)?$/;

const { groups } = '2026-05-21T14:30:00.500Z'.match(ISO_DATE);
console.log(groups);
// { year: '2026', month: '05', day: '21', hour: '14', minute: '30', second: '00', ms: '500', tz: 'Z' }

Or just use new Date(string) and toISOString(). Lesson 8.3 covers the Date API.

CSV row (simple — no quoted commas) #

const row = 'Ada,36,London';
const [name, age, city] = row.split(',');

For CSV with quoted fields and embedded commas, don't use regex. Use a real CSV parser (papaparse, csv-parse). Regex for CSV is a famous trap — the moment your data has "Smith, John" in a single cell, naive regex breaks.

Query string #

// Or use URLSearchParams — strongly preferred
const params = new URLSearchParams('?id=42&name=Ada%20Lovelace');
params.get('id');     // '42'
params.get('name');   // 'Ada Lovelace' — handles URL decoding

Never roll your own query-string regex. Always URLSearchParams.

const MD_LINK = /\[([^\]]+)\]\(([^)]+)\)/g;
for (const m of 'See [docs](https://example.com) and [tutorials](/tutorials/)'.matchAll(MD_LINK)) {
  console.log(m[1], '→', m[2]);
}
// docs → https://example.com
// tutorials → /tutorials/

The [^\]]+ and [^)]+ are negated classes — "any character except ] (or ))". Pragmatic Markdown parsing — for production, use a real Markdown library (marked, markdown-it).

Find-and-replace recipes #

Trim multiple spaces #

' hello   world '.replace(/\s+/g, ' ').trim();
// 'hello world'

Capitalize each word #

'hello world'.replace(/\b\p{L}/gu, c => c.toUpperCase());
// 'Hello World'

Remove non-numeric #

'$1,234.56'.replace(/[^\d.-]/g, '');
// '1234.56'

Convert camelCase to kebab-case #

'fooBarBaz'.replace(/([a-z])([A-Z])/g, '$1-$2').toLowerCase();
// 'foo-bar-baz'

Convert snake_case to camelCase #

'foo_bar_baz'.replace(/_([a-z])/g, (_, c) => c.toUpperCase());
// 'fooBarBaz'

Truncate to N words #

function truncateWords(s, n) {
  return s.split(/\s+/).slice(0, n).join(' ');
}

Mask all but last 4 digits #

'1234567890123456'.replace(/\d(?=\d{4})/g, '*');
// '************3456'

The lookahead (?=\d{4}) says: "only replace digits that have at least 4 more digits after them." Leaves the last four alone.

Strip HTML tags #

'<p>Hello <b>world</b></p>'.replace(/<[^>]+>/g, '');
// 'Hello world'

For real HTML sanitization, use DOMPurify. This regex is for trusted content where you just want plain text. Untrusted HTML through this regex is a classic XSS path.

Log parsing #

const LOG_LINE = /^(?<time>\S+)\s+(?<level>\w+)\s+(?<msg>.+)$/;

const lines = [
  '2026-05-21T14:30:00Z INFO  User logged in',
  '2026-05-21T14:30:05Z ERROR Connection refused',
];

for (const line of lines) {
  const m = line.match(LOG_LINE);
  if (m) console.log(m.groups);
}
// { time: '2026-05-21T14:30:00Z', level: 'INFO',  msg: 'User logged in' }
// { time: '2026-05-21T14:30:05Z', level: 'ERROR', msg: 'Connection refused' }

With named groups, log-parsing regexes become self-documenting. Combined with \s+ for whitespace, anchors ^ and $, this scales to most structured-log formats.

Splitting on multiple delimiters #

'apple,banana;cherry|date'.split(/[,;|]/);
// ['apple', 'banana', 'cherry', 'date']

Extracting all matches #

The pattern of the decade — matchAll:

const text = '@ada said hi to @grace and @hopper';
const mentions = [...text.matchAll(/@(\w+)/g)].map(m => m[1]);
// ['ada', 'grace', 'hopper']

Returns an iterator of match objects, each with capture groups. Replaces the older while ((m = re.exec(str))) {...} dance.

Anti-patterns: when NOT to regex #

Regex is the wrong tool for:

HTML / XML parsing #

// Famous wrong answer to 'how do I parse HTML with regex'

Use DOMParser (browser) or parse5/cheerio (Node). HTML is not a regular language — the grammar nests arbitrarily, has whitespace rules regex can't express, and CDATA/script blocks need special handling.

CSV parsing #

Use papaparse or csv-parse. Quoted commas and embedded newlines break naive regex.

JSON parsing #

Use JSON.parse. Don't try to extract values from JSON with regex.

Source code analysis #

Regex finds function foo. It cannot tell that foo is in a comment, a string literal, or a different scope. Use a real parser (@babel/parser, @typescript-eslint, acorn).

Free-form natural-language matching #

For sentiment, entity extraction, or anything that needs to understand language, regex is the wrong layer. Use NLP libraries or an LLM call.

Performance tips #

Hoist regexes out of loops #

const RE = /\d+/g;
for (const item of items) {
  if (RE.test(item.name)) { /* ... */ }
}

Each /foo/ literal is compiled fresh on every evaluation. Inside a hot loop, that adds up.

Watch for lastIndex with /g #

From Lesson 9.1: regex objects with g keep state. For repeated test() calls on the same regex, reset lastIndex = 0 or use matchAll/match instead.

Avoid catastrophic backtracking #

Nested quantifiers like (a+)+, (a*)*, (a|a)+ can explode on malicious or adversarial inputs. If you're validating user-controlled strings on a server:

  • Use re2 (a non-backtracking engine, available via WebAssembly or as a Node binding)
  • Cap input length
  • Use a timeout (worker threads / setTimeout-based bailout)

When to use a library #

Task Use
Phone numbers libphonenumber-js
Markdown marked, markdown-it
HTML DOMParser, parse5, cheerio
CSV papaparse, csv-parse
URL parsing URL (built in)
Query strings URLSearchParams (built in)
Dates Date, Intl, date-fns, or Temporal
Email validation Send a verification email
Sanitizing HTML DOMPurify

If the task is on this list, default to the library. Regex is your tool of choice when there isn't a purpose-built one.

A summary #

  • Common validations — email, URL, password, UUID, hex color — small, well-known regexes worth keeping handy.
  • Common transformations — case conversion, masking, trim multiple spaces — short patterns that come up daily.
  • Use matchAll for any "find all matches with groups" task.
  • Hoist regexes out of hot loops.
  • Avoid nested quantifiers to prevent catastrophic backtracking.
  • Skip regex for HTML, CSV, JSON, source code, NLP. Use real parsers.
  • Built-in alternatives (URL, URLSearchParams, Date) beat hand-rolled regexes for everything they cover.

What's next #

That completes Module 9 — Regular Expressions. Module 10 dives into advanced JavaScript features: Proxy, Reflect, async iteration, streams, and Web Workers.

Try it yourself #

The matchAll + named group pattern is the most useful real-world regex idiom. Predict the output:

YouPredict the output:
const text = 'Order #123 by Ada, #456 by Grace';
const orders = [...text.matchAll(/#(?<id>\d+) by (?<name>\w+)/g)]
.map(m => m.groups);
console.log(orders);
Claude · used js_sandboxOutput: [{ id: '123', name: 'Ada' }, { id: '456', name: 'Grace' }].

One pattern, two matches, two clean little objects. matchAll + groups + map is the modern “parse this structured text” toolkit. Once you internalize this, half the string-processing in your codebase becomes one-liners.

A clean regex with named groups and matchAll is the difference between code you can come back to in 6 months and code you have to rewrite.

Up next in JavaScript

More from this topic

View all JavaScript articles →

Enjoyed this article?

Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.

Leave a Comment

Your email stays private. Required fields are marked *

Leave a Comment

Your email stays private. Required fields are marked *