JavaScript Regex Real-World Patterns: Validation, Parsing, and Find/Replace Recipes
Regex in theory is one thing; regex in production code is another. The patterns you'll actually write or read in a real codebase form a small list: validations, URL/email checks, find-and-replace transformations, log parsing, and a handful of string-manipulation idioms. Knowing the right pattern (and knowing which problems regex shouldn't solve) is worth more than memorizing the syntax.
This lesson is a recipe collection plus a guided tour of the anti-patterns and when to reach for non-regex tools.
Validation patterns #
Email (pragmatic, not RFC-perfect) #
const EMAIL = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
EMAIL.test('ada@example.com'); // true
EMAIL.test('ada@example'); // false
EMAIL.test('ada@@example.com'); // false
A truly RFC-5322-compliant email regex is hundreds of characters. Don't bother. Use this pragmatic version for client-side hint validation and send a verification email for actual correctness.
URL (basic) #
const URL_RE = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
URL_RE.test('https://code-js.in/posts'); // true
But for any non-trivial URL work, use the built-in URL class instead:
try {
const u = new URL('https://api.example.com:8080/users?id=42');
console.log(u.host); // 'api.example.com:8080'
console.log(u.pathname); // '/users'
console.log(u.searchParams.get('id')); // '42'
} catch {
// not a valid URL
}
The URL constructor handles every edge case the spec covers. Use it for parsing; only fall back to regex for quick "does this look URL-ish?" checks.
Phone number (locale-specific) #
// US: +1 followed by 10 digits, allowing spaces, dashes, parens
const US_PHONE = /^(\+1[\s-]?)?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}$/;
US_PHONE.test('+1 (212) 555-1234'); // true
US_PHONE.test('212-555-1234'); // true
For international phone validation, use the libphonenumber-js library. The rules vary wildly by country.
UUID v4 #
const UUID_V4 = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
UUID_V4.test('a1b2c3d4-e5f6-4789-abcd-1234567890ab'); // true
The 4 in the third segment and [89ab] at the start of the fourth are version-4 specifiers.
Hex color #
const HEX_COLOR = /^#([0-9a-f]{3}|[0-9a-f]{6}|[0-9a-f]{8})$/i;
HEX_COLOR.test('#fff');
HEX_COLOR.test('#ffffff');
HEX_COLOR.test('#ffffff80'); // 8-digit = with alpha
Strong password (one of each) #
const STRONG_PASSWORD = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
We walked through this lookaround pattern in Lesson 9.2.
Parsing patterns #
Date in ISO 8601 #
const ISO_DATE = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})(?:T(?<hour>\d{2}):(?<minute>\d{2})(?::(?<second>\d{2})(?:\.(?<ms>\d+))?)?(?<tz>Z|[+-]\d{2}:\d{2})?)?$/;
const { groups } = '2026-05-21T14:30:00.500Z'.match(ISO_DATE);
console.log(groups);
// { year: '2026', month: '05', day: '21', hour: '14', minute: '30', second: '00', ms: '500', tz: 'Z' }
Or just use new Date(string) and toISOString(). Lesson 8.3 covers the Date API.
CSV row (simple — no quoted commas) #
const row = 'Ada,36,London';
const [name, age, city] = row.split(',');
For CSV with quoted fields and embedded commas, don't use regex. Use a real CSV parser (papaparse, csv-parse). Regex for CSV is a famous trap — the moment your data has "Smith, John" in a single cell, naive regex breaks.
Query string #
// Or use URLSearchParams — strongly preferred
const params = new URLSearchParams('?id=42&name=Ada%20Lovelace');
params.get('id'); // '42'
params.get('name'); // 'Ada Lovelace' — handles URL decoding
Never roll your own query-string regex. Always URLSearchParams.
Markdown link #
const MD_LINK = /\[([^\]]+)\]\(([^)]+)\)/g;
for (const m of 'See [docs](https://example.com) and [tutorials](/tutorials/)'.matchAll(MD_LINK)) {
console.log(m[1], '→', m[2]);
}
// docs → https://example.com
// tutorials → /tutorials/
The [^\]]+ and [^)]+ are negated classes — "any character except ] (or ))". Pragmatic Markdown parsing — for production, use a real Markdown library (marked, markdown-it).
Find-and-replace recipes #
Trim multiple spaces #
' hello world '.replace(/\s+/g, ' ').trim();
// 'hello world'
Capitalize each word #
'hello world'.replace(/\b\p{L}/gu, c => c.toUpperCase());
// 'Hello World'
Remove non-numeric #
'$1,234.56'.replace(/[^\d.-]/g, '');
// '1234.56'
Convert camelCase to kebab-case #
'fooBarBaz'.replace(/([a-z])([A-Z])/g, '$1-$2').toLowerCase();
// 'foo-bar-baz'
Convert snake_case to camelCase #
'foo_bar_baz'.replace(/_([a-z])/g, (_, c) => c.toUpperCase());
// 'fooBarBaz'
Truncate to N words #
function truncateWords(s, n) {
return s.split(/\s+/).slice(0, n).join(' ');
}
Mask all but last 4 digits #
'1234567890123456'.replace(/\d(?=\d{4})/g, '*');
// '************3456'
The lookahead (?=\d{4}) says: "only replace digits that have at least 4 more digits after them." Leaves the last four alone.
Strip HTML tags #
'<p>Hello <b>world</b></p>'.replace(/<[^>]+>/g, '');
// 'Hello world'
For real HTML sanitization, use DOMPurify. This regex is for trusted content where you just want plain text. Untrusted HTML through this regex is a classic XSS path.
Log parsing #
const LOG_LINE = /^(?<time>\S+)\s+(?<level>\w+)\s+(?<msg>.+)$/;
const lines = [
'2026-05-21T14:30:00Z INFO User logged in',
'2026-05-21T14:30:05Z ERROR Connection refused',
];
for (const line of lines) {
const m = line.match(LOG_LINE);
if (m) console.log(m.groups);
}
// { time: '2026-05-21T14:30:00Z', level: 'INFO', msg: 'User logged in' }
// { time: '2026-05-21T14:30:05Z', level: 'ERROR', msg: 'Connection refused' }
With named groups, log-parsing regexes become self-documenting. Combined with \s+ for whitespace, anchors ^ and $, this scales to most structured-log formats.
Splitting on multiple delimiters #
'apple,banana;cherry|date'.split(/[,;|]/);
// ['apple', 'banana', 'cherry', 'date']
Extracting all matches #
The pattern of the decade — matchAll:
const text = '@ada said hi to @grace and @hopper';
const mentions = [...text.matchAll(/@(\w+)/g)].map(m => m[1]);
// ['ada', 'grace', 'hopper']
Returns an iterator of match objects, each with capture groups. Replaces the older while ((m = re.exec(str))) {...} dance.
Anti-patterns: when NOT to regex #
Regex is the wrong tool for:
HTML / XML parsing #
// Famous wrong answer to 'how do I parse HTML with regex'
Use DOMParser (browser) or parse5/cheerio (Node). HTML is not a regular language — the grammar nests arbitrarily, has whitespace rules regex can't express, and CDATA/script blocks need special handling.
CSV parsing #
Use papaparse or csv-parse. Quoted commas and embedded newlines break naive regex.
JSON parsing #
Use JSON.parse. Don't try to extract values from JSON with regex.
Source code analysis #
Regex finds function foo. It cannot tell that foo is in a comment, a string literal, or a different scope. Use a real parser (@babel/parser, @typescript-eslint, acorn).
Free-form natural-language matching #
For sentiment, entity extraction, or anything that needs to understand language, regex is the wrong layer. Use NLP libraries or an LLM call.
Performance tips #
Hoist regexes out of loops #
const RE = /\d+/g;
for (const item of items) {
if (RE.test(item.name)) { /* ... */ }
}
Each /foo/ literal is compiled fresh on every evaluation. Inside a hot loop, that adds up.
Watch for lastIndex with /g #
From Lesson 9.1: regex objects with g keep state. For repeated test() calls on the same regex, reset lastIndex = 0 or use matchAll/match instead.
Avoid catastrophic backtracking #
Nested quantifiers like (a+)+, (a*)*, (a|a)+ can explode on malicious or adversarial inputs. If you're validating user-controlled strings on a server:
- Use
re2(a non-backtracking engine, available via WebAssembly or as a Node binding) - Cap input length
- Use a timeout (worker threads / setTimeout-based bailout)
When to use a library #
| Task | Use |
|---|---|
| Phone numbers | libphonenumber-js |
| Markdown | marked, markdown-it |
| HTML | DOMParser, parse5, cheerio |
| CSV | papaparse, csv-parse |
| URL parsing | URL (built in) |
| Query strings | URLSearchParams (built in) |
| Dates | Date, Intl, date-fns, or Temporal |
| Email validation | Send a verification email |
| Sanitizing HTML | DOMPurify |
If the task is on this list, default to the library. Regex is your tool of choice when there isn't a purpose-built one.
A summary #
- Common validations — email, URL, password, UUID, hex color — small, well-known regexes worth keeping handy.
- Common transformations — case conversion, masking, trim multiple spaces — short patterns that come up daily.
- Use
matchAllfor any "find all matches with groups" task. - Hoist regexes out of hot loops.
- Avoid nested quantifiers to prevent catastrophic backtracking.
- Skip regex for HTML, CSV, JSON, source code, NLP. Use real parsers.
- Built-in alternatives (
URL,URLSearchParams,Date) beat hand-rolled regexes for everything they cover.
What's next #
That completes Module 9 — Regular Expressions. Module 10 dives into advanced JavaScript features: Proxy, Reflect, async iteration, streams, and Web Workers.
Try it yourself #
The matchAll + named group pattern is the most useful real-world regex idiom. Predict the output:
const text = 'Order #123 by Ada, #456 by Grace';
const orders = [...text.matchAll(/#(?<id>\d+) by (?<name>\w+)/g)]
.map(m => m.groups);
console.log(orders);js_sandboxOutput: [{ id: '123', name: 'Ada' }, { id: '456', name: 'Grace' }].One pattern, two matches, two clean little objects.
matchAll + groups + map is the modern “parse this structured text” toolkit. Once you internalize this, half the string-processing in your codebase becomes one-liners.A clean regex with named groups and matchAll is the difference between code you can come back to in 6 months and code you have to rewrite.
Up next in JavaScript
More from this topic
Enjoyed this article?
Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.


