Skip to content

Internationalization Patterns

Internationalization (i18n) is the process of designing software so it can be adapted to various languages and regions without engineering changes. Done well, i18n is invisible to the developer — done poorly, it results in garbled text, broken layouts, and confused users worldwide.


Unicode and Character Encoding

The Problem

Before Unicode, dozens of incompatible encoding systems existed. A file encoded in Windows-1252 (Western European) would display garbled text on a system using Shift_JIS (Japanese). Unicode solved this by assigning a unique number (code point) to every character in every writing system.

Unicode Basics

Unicode code points:
U+0041 → A (Latin capital letter A)
U+00E9 → é (Latin small letter e with acute)
U+4E16 → 世 (CJK character: world)
U+1F600 → 😀 (Grinning face emoji)
U+0627 → ا (Arabic letter alef)
Unicode planes:
BMP (Basic Multilingual Plane): U+0000 to U+FFFF
Most common characters, including Latin, CJK, Arabic, etc.
Supplementary planes: U+10000 to U+10FFFF
Emoji, historic scripts, musical symbols, etc.

UTF-8 Encoding

UTF-8 is the dominant encoding on the web (used by over 98 percent of websites). It is a variable-width encoding that uses 1-4 bytes per character:

Code Point Range Bytes Binary Format Example
U+0000 to U+007F 1 0xxxxxxx A → 41
U+0080 to U+07FF 2 110xxxxx 10xxxxxx é → C3 A9
U+0800 to U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx 世 → E4 B8 96
U+10000 to U+10FFFF 4 11110xxx 10xxxxxx 10xxxxxx 😀 → F0 9F 98 80
10xxxxxx

Why UTF-8 wins:

  • ASCII-compatible (all ASCII bytes are valid UTF-8)
  • No byte-order issues (unlike UTF-16)
  • Self-synchronizing (you can find character boundaries from any byte)
  • Space-efficient for Latin text

String Length Gotchas

// JavaScript strings are UTF-16 internally
const emoji = "😀";
console.log(emoji.length); // 2 (two UTF-16 code units!)
console.log([...emoji].length); // 1 (spread iterates code points)
// Use Array.from or spread for accurate counting
const text = "café";
console.log(text.length); // 4 (correct — no surrogate pairs)
const mixed = "Hello 世界 😀";
console.log(mixed.length); // 10 (wrong — emoji = 2 units)
console.log([...mixed].length); // 9 (correct code points)
console.log(new Intl.Segmenter().segment(mixed)[Symbol.iterator]);
// Grapheme clusters (user-perceived characters)
const flag = "🇺🇸";
console.log(flag.length); // 4 (two surrogate pairs)
console.log([...flag].length); // 2 (two code points)
// Actual visual characters: 1 (one flag emoji)
// Use Intl.Segmenter for accurate grapheme counting
const segmenter = new Intl.Segmenter("en", { granularity: "grapheme" });
const segments = [...segmenter.segment(flag)];
console.log(segments.length); // 1 (correct!)

Locale Handling

Detecting User Locale

// Browser locale detection
const userLocale = navigator.language; // "en-US"
const allLocales = navigator.languages; // ["en-US", "en", "fr"]
// Accept-Language header (server-side)
// Accept-Language: en-US,en;q=0.9,fr;q=0.8
// URL-based locale
// example.com/en/products
// example.com/fr/products
// Cookie-based locale
document.cookie = "locale=fr-FR; path=/; max-age=31536000";

Locale Fallback Chain

User requests: fr-CA (French Canadian)
Lookup order:
1. fr-CA → found? Use it
2. fr → found? Use it (French generic)
3. en → found? Use it (default fallback)
4. keys → show translation keys (last resort)

Date, Time, and Number Formatting

Never format dates, times, or numbers manually for i18n — always use locale-aware formatting APIs.

The JavaScript Intl API

// Number formatting
const num = 1234567.89;
new Intl.NumberFormat('en-US').format(num); // "1,234,567.89"
new Intl.NumberFormat('de-DE').format(num); // "1.234.567,89"
new Intl.NumberFormat('ja-JP').format(num); // "1,234,567.89"
new Intl.NumberFormat('ar-SA').format(num); // "١٬٢٣٤٬٥٦٧٫٨٩"
new Intl.NumberFormat('hi-IN').format(num); // "12,34,567.89"
// (Indian grouping!)
// Currency formatting
new Intl.NumberFormat('en-US', {
style: 'currency', currency: 'USD'
}).format(42.5); // "$42.50"
new Intl.NumberFormat('ja-JP', {
style: 'currency', currency: 'JPY'
}).format(4250); // "¥4,250"
new Intl.NumberFormat('de-DE', {
style: 'currency', currency: 'EUR'
}).format(42.5); // "42,50 €"
// Percentage
new Intl.NumberFormat('en-US', {
style: 'percent', minimumFractionDigits: 1
}).format(0.856); // "85.6%"
// Compact notation
new Intl.NumberFormat('en-US', {
notation: 'compact'
}).format(1500000); // "1.5M"
// Units
new Intl.NumberFormat('en-US', {
style: 'unit', unit: 'kilometer-per-hour'
}).format(120); // "120 km/h"

Date and Time Formatting

const date = new Date('2025-06-15T14:30:00Z');
// Short date
new Intl.DateTimeFormat('en-US').format(date); // "6/15/2025"
new Intl.DateTimeFormat('en-GB').format(date); // "15/06/2025"
new Intl.DateTimeFormat('ja-JP').format(date); // "2025/6/15"
new Intl.DateTimeFormat('de-DE').format(date); // "15.6.2025"
// Long date
new Intl.DateTimeFormat('en-US', {
dateStyle: 'long'
}).format(date); // "June 15, 2025"
new Intl.DateTimeFormat('fr-FR', {
dateStyle: 'long'
}).format(date); // "15 juin 2025"
// Custom formatting
new Intl.DateTimeFormat('en-US', {
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
hour: 'numeric',
minute: '2-digit',
timeZoneName: 'short',
}).format(date);
// "Sunday, June 15, 2025 at 2:30 PM UTC"
// Relative time
const rtf = new Intl.RelativeTimeFormat('en', { numeric: 'auto' });
rtf.format(-1, 'day'); // "yesterday"
rtf.format(3, 'hour'); // "in 3 hours"
rtf.format(-2, 'week'); // "2 weeks ago"
const rtfFR = new Intl.RelativeTimeFormat('fr', { numeric: 'auto' });
rtfFR.format(-1, 'day'); // "hier"
rtfFR.format(3, 'hour'); // "dans 3 heures"

Comparison and Sorting

// Locale-aware string sorting
const names = ['Ångström', 'Zebra', 'apple', 'Über'];
// Wrong: ASCII sorting
names.sort();
// ['apple', 'Zebra', 'Ångström', 'Über'] (uppercase first, then special chars)
// Correct: locale-aware sorting
names.sort(new Intl.Collator('en').compare);
// ['Ångström', 'apple', 'Über', 'Zebra']
names.sort(new Intl.Collator('sv').compare); // Swedish
// ['apple', 'Über', 'Zebra', 'Ångström'] (Å sorts last in Swedish!)
// Case-insensitive sorting
const collator = new Intl.Collator('en', { sensitivity: 'base' });
collator.compare('a', 'A'); // 0 (equal)
collator.compare('a', 'á'); // 0 (equal with sensitivity: 'base')

Pluralization Rules

English has two plural forms (singular and plural), but many languages have more complex rules:

English: 1 item, 2 items (2 forms)
French: 0 item, 1 item, 2 items (2 forms, 0 is singular)
Russian: 1 товар, 2 товара, 5 товаров (3 forms)
Arabic: 0 عناصر, 1 عنصر, 2 عنصران, 3 عناصر (6 forms!)
Polish: 1 plik, 2 pliki, 5 plików (3 forms)
Japanese: 1つのアイテム (1 form — no plurals)

ICU MessageFormat

The ICU MessageFormat is the industry standard for handling pluralization and gender:

{count, plural,
=0 {No items in your cart}
one {1 item in your cart}
other {{count} items in your cart}
}
{gender, select,
female {She liked your post}
male {He liked your post}
other {They liked your post}
}
{count, plural,
=0 {No messages}
one {You have 1 new message}
other {You have {count} new messages}
}
// Intl.PluralRules determines the plural category
const pr = new Intl.PluralRules('en-US');
pr.select(0); // "other"
pr.select(1); // "one"
pr.select(2); // "other"
const prRU = new Intl.PluralRules('ru');
prRU.select(1); // "one"
prRU.select(2); // "few"
prRU.select(5); // "many"
prRU.select(21); // "one" (Russian: 21 is singular!)
prRU.select(22); // "few"
// Using with a translation map
function pluralize(locale, count, messages) {
const pr = new Intl.PluralRules(locale);
const rule = pr.select(count);
return messages[rule].replace('{count}', count);
}
const messages = {
one: '{count} file deleted',
other: '{count} files deleted'
};
pluralize('en', 1, messages); // "1 file deleted"
pluralize('en', 5, messages); // "5 files deleted"

RTL (Right-to-Left) Support

Languages like Arabic, Hebrew, Persian, and Urdu are written right-to-left. Supporting RTL requires changes at multiple levels.

HTML Direction

<!-- Set document direction -->
<html lang="ar" dir="rtl">
<!-- Override direction for specific elements -->
<p dir="ltr">This paragraph is left-to-right</p>
<!-- Auto-detect direction based on content -->
<p dir="auto">مرحبا</p> <!-- Browser detects Arabic → RTL -->
<!-- Bidirectional text isolation -->
<p>The title is <bdi>مرحبا بالعالم</bdi> in Arabic.</p>

CSS for RTL

/* Use logical properties instead of physical */
.card {
/* Instead of margin-left, use margin-inline-start */
margin-inline-start: 16px;
margin-inline-end: 8px;
/* Instead of padding-left/right */
padding-inline: 16px;
/* Instead of text-align: left */
text-align: start;
/* Instead of border-left */
border-inline-start: 3px solid blue;
/* Instead of float: left */
float: inline-start;
}
/* Flip icons and images that have directional meaning */
[dir="rtl"] .icon-arrow {
transform: scaleX(-1);
}
/* Some things should NOT flip:
- Phone numbers
- Clocks/timelines
- Media playback controls
- Logos
- Code
*/

RTL Checklist

CategoryWhat to Check
Text alignmentStarts from the correct side
NavigationMenus flow right-to-left
IconsDirectional icons are flipped (arrows, “back” buttons)
FormsLabels and inputs are properly aligned
TablesColumn order is reversed
ImagesDirectional images are mirrored where appropriate
ScrollbarsAppear on the correct side
NumbersWestern numerals or locale-specific numerals

Translation Workflows

i18next (JavaScript)

import i18next from 'i18next';
i18next.init({
lng: 'en',
fallbackLng: 'en',
resources: {
en: {
translation: {
greeting: 'Hello, {{name}}!',
items: '{{count}} item',
items_plural: '{{count}} items',
nav: {
home: 'Home',
about: 'About',
contact: 'Contact'
}
}
},
fr: {
translation: {
greeting: 'Bonjour, {{name}} !',
items: '{{count}} article',
items_plural: '{{count}} articles',
nav: {
home: 'Accueil',
about: 'À propos',
contact: 'Contact'
}
}
}
}
});
// Usage
i18next.t('greeting', { name: 'Alice' }); // "Hello, Alice!"
i18next.t('items', { count: 5 }); // "5 items"
i18next.t('nav.home'); // "Home"
// Change language
i18next.changeLanguage('fr');
i18next.t('greeting', { name: 'Alice' }); // "Bonjour, Alice !"

String Externalization Best Practices

PracticeDescription
Never hardcode user-facing stringsAlways use translation keys
Use meaningful key namesnav.home not string_42
Do not concatenate translated strings”Hello” + name + ”!” breaks in many languages
Provide context for translatorsComments explaining where the string appears
Avoid string reuseThe same English word may translate differently in different contexts
Handle zero, one, manyUse proper pluralization, not if/else
Externalize error messagesUsers should see errors in their language
Do not embed HTML in translationsUse ICU MessageFormat or interpolation

Translation File Organization

locales/
├── en/
│ ├── common.json (shared strings: buttons, labels, errors)
│ ├── home.json (home page strings)
│ ├── dashboard.json (dashboard strings)
│ └── errors.json (error messages)
├── fr/
│ ├── common.json
│ ├── home.json
│ ├── dashboard.json
│ └── errors.json
├── de/
│ └── ...
└── ja/
└── ...