JavaScript Strings Deep Dive: Methods, Templates, Unicode, and the Methods You Actually Need

Link copied
JavaScript Strings Deep Dive: Methods, Templates, Unicode, and the Methods You Actually Need

JavaScript Strings Deep Dive: Methods, Templates, Unicode, and the Methods You Actually Need

Strings are the most-used non-trivial data type in JavaScript. Every form field, every log line, every URL, every API response touches strings somewhere. The language ships thirty-plus built-in string methods. Most developers use about ten of them daily and Google the rest. This lesson walks through every method worth knowing, the template-literal features that replace a decade of string concatenation, and the Unicode quirks that bite once a year.

Strings are primitives — and immutable #

From Lesson 1.3: strings are one of the seven primitives. They're immutable — every operation that "changes" a string actually returns a new one.

const s = 'hello';
s.toUpperCase();  // 'HELLO'
console.log(s);   // 'hello' — unchanged

This is the source of a common bug — calling .toUpperCase() and expecting the original string to change. It doesn't. Always assign the result.

Creating strings #

Three quote styles, all equivalent:

const a = 'hello';
const b = "hello";
const c = `hello`;          // backticks — also supports interpolation

Backtick strings (template literals) are the modern default — they support ${expression} interpolation and span multiple lines naturally.

const name = 'Ada';
const greeting = `Hello, ${name}!`;
const multiline = `
  line 1
  line 2
`;

Tagged templates #

A function call right before a template literal turns into a tagged template — the function receives the static parts and the interpolated values separately.

function safe(strings, ...values) {
  return strings.reduce((acc, str, i) =>
    acc + str + (values[i] !== undefined ? escapeHtml(values[i]) : ''),
    '');
}

const userInput = '<script>alert(1)</script>';
const html = safe`<p>Hello, ${userInput}</p>`;
// <p>Hello, &lt;script&gt;alert(1)&lt;/script&gt;</p>

This is how libraries like lit-html, styled-components, and graphql create custom DSLs. Niche but powerful.

The methods worth knowing #

Searching #

s.includes(sub);            // boolean — modern, prefer over indexOf
s.startsWith(sub);          // boolean
s.endsWith(sub);            // boolean
s.indexOf(sub);             // index or -1
s.lastIndexOf(sub);         // last occurrence

For case-insensitive search, normalize first:

s.toLowerCase().includes(sub.toLowerCase());

Extracting #

s.slice(start, end);        // extract substring — supports negative indices
s.substring(start, end);    // similar but no negatives, swaps args if out of order
s.charAt(i);                // single character at index
s[i];                       // same thing, modern
s.at(-1);                   // last character — supports negative indices

Use slice and at. substring is legacy.

Replacing #

s.replace('hello', 'hi');          // replaces FIRST match (with a string)
s.replaceAll('hello', 'hi');       // replaces ALL matches
s.replace(/hello/g, 'hi');         // also all matches (regex with /g flag)

The replaceAll method (ES2021) makes string-based all-replace trivial. Before, you had to use a regex with /g.

Both replace and replaceAll accept a function as the replacement — useful when each match needs different output:

'2 + 3 = 5'.replaceAll(/\d+/g, n => Number(n) * 10);
// '20 + 30 = 50'

Splitting and joining #

'a,b,c'.split(',');             // ['a', 'b', 'c']
'a,b,c'.split(',', 2);          // ['a', 'b'] — limit
['a', 'b', 'c'].join('-');      // 'a-b-c'

Split/join are the pair you reach for to parse and rebuild simple delimited data.

Trimming and padding #

'  hello  '.trim();             // 'hello'
'  hello  '.trimStart();        // 'hello  '
'  hello  '.trimEnd();          // '  hello'
'5'.padStart(3, '0');           // '005' — leading zeros
'12'.padEnd(5, '.');            // '12...'

Padding is the modern way to align numbers, format IDs, etc.

Case #

'Hello'.toUpperCase();          // 'HELLO'
'Hello'.toLowerCase();          // 'hello'
'hello'.toLocaleUpperCase('tr-TR'); // locale-aware (Turkish 'i' → 'İ')

For international apps, prefer the toLocaleXxx variants — German ß, Turkish dotless i, etc., behave correctly.

Repeating #

'-'.repeat(5);    // '-----'

Concatenating #

let result = 'a' + 'b' + 'c';   // 'abc'
let result = `${a}${b}${c}`;    // same thing, often clearer

Don't use String.concat() — it's vestigial and slower than +/template literals.

Unicode and the surrogate pair gotcha #

JavaScript strings are sequences of UTF-16 code units, not characters. For ASCII, code units and characters are 1:1. For most everything else (emoji, many Asian scripts), one character occupies two code units (a "surrogate pair").

const emoji = '🎉';
console.log(emoji.length);     // 2 — two code units, not one character
console.log(emoji[0]);          // mangled — half a surrogate pair
console.log([...emoji]);        // ['🎉'] — iteration is code-point-aware

For character-accurate work, spread or use the iterator:

[...emoji].length;                // 1 — correct count
Array.from(emoji).length;          // 1

for (const ch of '🎉👋🌍') {
  console.log(ch);                 // each emoji once
}

For truly correct grapheme handling ("family emoji", combining marks, flag sequences), use Intl.Segmenter:

const seg = new Intl.Segmenter('en', { granularity: 'grapheme' });
const chars = [...seg.segment('👨‍👩‍👧‍👦')];
chars.length;  // 1 — the family emoji is one grapheme cluster

This matters anywhere you need to truncate text, count characters for a UI limit, or render strings with cursors.

Codepoint methods #

'A'.codePointAt(0);          // 65
'🎉'.codePointAt(0);          // 127881 — the full emoji code point
String.fromCodePoint(127881); // '🎉'

Use codePointAt/fromCodePoint over the older charCodeAt/fromCharCode — the latter only handle 16-bit values and break on emoji.

Template literal patterns #

Multi-line with consistent indentation #

function dedent(strs, ...vals) {
  const raw = strs.reduce((a, s, i) => a + s + (vals[i] ?? ''), '');
  const min = Math.min(...raw.match(/^[ \t]*(?=\S)/gm).map(s => s.length));
  return raw.replace(new RegExp(`^[ \\t]{${min}}`, 'gm'), '');
}

const sql = dedent`
  SELECT *
  FROM users
  WHERE id = ${userId}
`;

Most teams use a library (dedent, strip-indent) or skip the indentation. But the pattern is good to know.

Conditional content #

const msg = `Hello${name ? `, ${name}` : ''}!`;

Readable for one or two conditions; reach for separate concatenation if it gets complex.

Joining arrays #

const items = ['a', 'b', 'c'];
const list = `Items: ${items.join(', ')}`;

Array.prototype.join() inside template literals is the idiom for inline lists.

Comparing strings #

'a' === 'a';                  // true — works
'a' < 'b';                     // true — lexicographic
'10' < '9';                    // true — STRING comparison, '1' < '9'

For user-facing sorts, use localeCompare:

['banana', 'Apple', 'cherry'].sort((a, b) => a.localeCompare(b));
// ['Apple', 'banana', 'cherry'] — case-insensitive by default in many locales

['á', 'b', 'a'].sort((a, b) => a.localeCompare(b, 'en'));
// ['a', 'á', 'b'] — accent-aware

localeCompare(other, locale, options) is the function for any sort involving real names or text.

String to/from other types #

String(42);                  // '42'
String(null);                // 'null'
String(undefined);           // 'undefined'
(42).toString();             // '42'
(42).toString(2);            // '101010' — binary
(255).toString(16);          // 'ff' — hex

Number('42');                // 42
parseInt('42px', 10);        // 42 — stops at 'p'
parseFloat('3.14');          // 3.14

We covered coercion in Lesson 6.3. The rule of thumb: convert explicitly. Number(), String(), Boolean() make intent clear.

Common patterns #

Truncate with ellipsis #

function truncate(s, n) {
  return s.length > n ? s.slice(0, n - 1) + '…' : s;
}

For Unicode-correct truncation, use Intl.Segmenter (as above).

Slugify a string #

function slugify(s) {
  return s
    .normalize('NFKD')
    .replace(/\p{Diacritic}/gu, '')
    .toLowerCase()
    .replace(/[^a-z0-9]+/g, '-')
    .replace(/^-+|-+$/g, '');
}
slugify('Café résumé!'); // 'cafe-resume'

normalize('NFKD') decomposes accented characters; then a regex strips the diacritics. Production slugifiers (the slugify npm package) do more, but the core is two lines.

Strip HTML #

function stripHtml(s) {
  return s.replace(/<[^>]+>/g, '');
}

For real HTML sanitization, use a library (DOMPurify). This is for known-safe content where you just want plain text.

A summary #

  • Strings are immutable primitives. Methods return new strings.
  • Template literals with ${} are the modern default. Tagged templates enable DSLs.
  • includes, startsWith, endsWith beat indexOf for membership checks.
  • replaceAll for string-based all-replaces. Use regex /g only when you need a pattern.
  • slice and at for extraction. at(-1) is the modern "last character".
  • Unicode is code units, not characters. Spread or Intl.Segmenter for correct character handling.
  • localeCompare for any user-facing sort.
  • Explicit conversion with String() / Number() instead of implicit coercion.

What's next #

Lesson 8.2 covers numbers, Math, and BigInt — IEEE 754 mechanics, the precision pitfalls, and when to reach for BigInt instead of plain numbers.

Try it yourself #

The UTF-16 length surprise is the easiest one to feel. Predict the output:

YouPredict the output:
const s = 'hi 🎉';
console.log(s.length);
console.log([...s].length);
console.log(s.slice(-1));
console.log([...s].at(-1));
Claude · used js_sandboxOutput:
5 (UTF-16 length — the 🎉 takes two code units)
4 (spread iterates code points — ‘h’, ‘i’, ‘ ‘, ‘🎉’)
'�' (broken — half a surrogate pair)
'🎉' (correct — the spread array has the full emoji as one element)

This is why “truncate to N characters” with raw .slice(0, n) can break emoji in the middle. Spread (or Intl.Segmenter) before truncating Unicode-heavy text.

String length isn't character count — once you've internalized that, the rest of Unicode stops surprising you.

Up next in JavaScript

More from this topic

View all JavaScript articles →

Enjoyed this article?

Get new JavaScript tutorials delivered. No spam — just code-first articles when they ship.

Leave a Comment

Your email address will not be published. Required fields are marked *