JavaScript Strings Deep Dive: Methods, Templates, Unicode, and the Methods You Actually Need
Strings are the most-used non-trivial data type in JavaScript. Every form field, every log line, every URL, every API response touches strings somewhere. The language ships thirty-plus built-in string methods. Most developers use about ten of them daily and Google the rest. This lesson walks through every method worth knowing, the template-literal features that replace a decade of string concatenation, and the Unicode quirks that bite once a year.
Strings are primitives — and immutable #
From Lesson 1.3: strings are one of the seven primitives. They're immutable — every operation that "changes" a string actually returns a new one.
const s = 'hello';
s.toUpperCase(); // 'HELLO'
console.log(s); // 'hello' — unchanged
This is the source of a common bug — calling .toUpperCase() and expecting the original string to change. It doesn't. Always assign the result.
Creating strings #
Three quote styles, all equivalent:
const a = 'hello';
const b = "hello";
const c = `hello`; // backticks — also supports interpolation
Backtick strings (template literals) are the modern default — they support ${expression} interpolation and span multiple lines naturally.
const name = 'Ada';
const greeting = `Hello, ${name}!`;
const multiline = `
line 1
line 2
`;
Tagged templates #
A function call right before a template literal turns into a tagged template — the function receives the static parts and the interpolated values separately.
function safe(strings, ...values) {
return strings.reduce((acc, str, i) =>
acc + str + (values[i] !== undefined ? escapeHtml(values[i]) : ''),
'');
}
const userInput = '<script>alert(1)</script>';
const html = safe`<p>Hello, ${userInput}</p>`;
// <p>Hello, <script>alert(1)</script></p>
This is how libraries like lit-html, styled-components, and graphql create custom DSLs. Niche but powerful.
The methods worth knowing #
Searching #
s.includes(sub); // boolean — modern, prefer over indexOf
s.startsWith(sub); // boolean
s.endsWith(sub); // boolean
s.indexOf(sub); // index or -1
s.lastIndexOf(sub); // last occurrence
For case-insensitive search, normalize first:
s.toLowerCase().includes(sub.toLowerCase());
Extracting #
s.slice(start, end); // extract substring — supports negative indices
s.substring(start, end); // similar but no negatives, swaps args if out of order
s.charAt(i); // single character at index
s[i]; // same thing, modern
s.at(-1); // last character — supports negative indices
Use slice and at. substring is legacy.
Replacing #
s.replace('hello', 'hi'); // replaces FIRST match (with a string)
s.replaceAll('hello', 'hi'); // replaces ALL matches
s.replace(/hello/g, 'hi'); // also all matches (regex with /g flag)
The replaceAll method (ES2021) makes string-based all-replace trivial. Before, you had to use a regex with /g.
Both replace and replaceAll accept a function as the replacement — useful when each match needs different output:
'2 + 3 = 5'.replaceAll(/\d+/g, n => Number(n) * 10);
// '20 + 30 = 50'
Splitting and joining #
'a,b,c'.split(','); // ['a', 'b', 'c']
'a,b,c'.split(',', 2); // ['a', 'b'] — limit
['a', 'b', 'c'].join('-'); // 'a-b-c'
Split/join are the pair you reach for to parse and rebuild simple delimited data.
Trimming and padding #
' hello '.trim(); // 'hello'
' hello '.trimStart(); // 'hello '
' hello '.trimEnd(); // ' hello'
'5'.padStart(3, '0'); // '005' — leading zeros
'12'.padEnd(5, '.'); // '12...'
Padding is the modern way to align numbers, format IDs, etc.
Case #
'Hello'.toUpperCase(); // 'HELLO'
'Hello'.toLowerCase(); // 'hello'
'hello'.toLocaleUpperCase('tr-TR'); // locale-aware (Turkish 'i' → 'İ')
For international apps, prefer the toLocaleXxx variants — German ß, Turkish dotless i, etc., behave correctly.
Repeating #
'-'.repeat(5); // '-----'
Concatenating #
let result = 'a' + 'b' + 'c'; // 'abc'
let result = `${a}${b}${c}`; // same thing, often clearer
Don't use String.concat() — it's vestigial and slower than +/template literals.
Unicode and the surrogate pair gotcha #
JavaScript strings are sequences of UTF-16 code units, not characters. For ASCII, code units and characters are 1:1. For most everything else (emoji, many Asian scripts), one character occupies two code units (a "surrogate pair").
const emoji = '🎉';
console.log(emoji.length); // 2 — two code units, not one character
console.log(emoji[0]); // mangled — half a surrogate pair
console.log([...emoji]); // ['🎉'] — iteration is code-point-aware
For character-accurate work, spread or use the iterator:
[...emoji].length; // 1 — correct count
Array.from(emoji).length; // 1
for (const ch of '🎉👋🌍') {
console.log(ch); // each emoji once
}
For truly correct grapheme handling ("family emoji", combining marks, flag sequences), use Intl.Segmenter:
const seg = new Intl.Segmenter('en', { granularity: 'grapheme' });
const chars = [...seg.segment('👨👩👧👦')];
chars.length; // 1 — the family emoji is one grapheme cluster
This matters anywhere you need to truncate text, count characters for a UI limit, or render strings with cursors.
Codepoint methods #
'A'.codePointAt(0); // 65
'🎉'.codePointAt(0); // 127881 — the full emoji code point
String.fromCodePoint(127881); // '🎉'
Use codePointAt/fromCodePoint over the older charCodeAt/fromCharCode — the latter only handle 16-bit values and break on emoji.
Template literal patterns #
Multi-line with consistent indentation #
function dedent(strs, ...vals) {
const raw = strs.reduce((a, s, i) => a + s + (vals[i] ?? ''), '');
const min = Math.min(...raw.match(/^[ \t]*(?=\S)/gm).map(s => s.length));
return raw.replace(new RegExp(`^[ \\t]{${min}}`, 'gm'), '');
}
const sql = dedent`
SELECT *
FROM users
WHERE id = ${userId}
`;
Most teams use a library (dedent, strip-indent) or skip the indentation. But the pattern is good to know.
Conditional content #
const msg = `Hello${name ? `, ${name}` : ''}!`;
Readable for one or two conditions; reach for separate concatenation if it gets complex.
Joining arrays #
const items = ['a', 'b', 'c'];
const list = `Items: ${items.join(', ')}`;
Array.prototype.join() inside template literals is the idiom for inline lists.
Comparing strings #
'a' === 'a'; // true — works
'a' < 'b'; // true — lexicographic
'10' < '9'; // true — STRING comparison, '1' < '9'
For user-facing sorts, use localeCompare:
['banana', 'Apple', 'cherry'].sort((a, b) => a.localeCompare(b));
// ['Apple', 'banana', 'cherry'] — case-insensitive by default in many locales
['á', 'b', 'a'].sort((a, b) => a.localeCompare(b, 'en'));
// ['a', 'á', 'b'] — accent-aware
localeCompare(other, locale, options) is the function for any sort involving real names or text.
String to/from other types #
String(42); // '42'
String(null); // 'null'
String(undefined); // 'undefined'
(42).toString(); // '42'
(42).toString(2); // '101010' — binary
(255).toString(16); // 'ff' — hex
Number('42'); // 42
parseInt('42px', 10); // 42 — stops at 'p'
parseFloat('3.14'); // 3.14
We covered coercion in Lesson 6.3. The rule of thumb: convert explicitly. Number(), String(), Boolean() make intent clear.
Common patterns #
Truncate with ellipsis #
function truncate(s, n) {
return s.length > n ? s.slice(0, n - 1) + '…' : s;
}
For Unicode-correct truncation, use Intl.Segmenter (as above).
Slugify a string #
function slugify(s) {
return s
.normalize('NFKD')
.replace(/\p{Diacritic}/gu, '')
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '');
}
slugify('Café résumé!'); // 'cafe-resume'
normalize('NFKD') decomposes accented characters; then a regex strips the diacritics. Production slugifiers (the slugify npm package) do more, but the core is two lines.
Strip HTML #
function stripHtml(s) {
return s.replace(/<[^>]+>/g, '');
}
For real HTML sanitization, use a library (DOMPurify). This is for known-safe content where you just want plain text.
A summary #
- Strings are immutable primitives. Methods return new strings.
- Template literals with
${}are the modern default. Tagged templates enable DSLs. includes,startsWith,endsWithbeatindexOffor membership checks.replaceAllfor string-based all-replaces. Use regex/gonly when you need a pattern.sliceandatfor extraction.at(-1)is the modern "last character".- Unicode is code units, not characters. Spread or
Intl.Segmenterfor correct character handling. localeComparefor any user-facing sort.- Explicit conversion with
String()/Number()instead of implicit coercion.
What's next #
Lesson 8.2 covers numbers, Math, and BigInt — IEEE 754 mechanics, the precision pitfalls, and when to reach for BigInt instead of plain numbers.
Try it yourself #
The UTF-16 length surprise is the easiest one to feel. Predict the output:
const s = 'hi 🎉';
console.log(s.length);
console.log([...s].length);
console.log(s.slice(-1));
console.log([...s].at(-1));js_sandboxOutput:5 (UTF-16 length — the 🎉 takes two code units)4 (spread iterates code points — ‘h’, ‘i’, ‘ ‘, ‘🎉’)'�' (broken — half a surrogate pair)'🎉' (correct — the spread array has the full emoji as one element)This is why “truncate to N characters” with raw
.slice(0, n) can break emoji in the middle. Spread (or Intl.Segmenter) before truncating Unicode-heavy text.String length isn't character count — once you've internalized that, the rest of Unicode stops surprising you.
Up next in JavaScript
More from this topic
Enjoyed this article?
Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.


