Unicode Escape / Unescape

Convert text to \uXXXX or \u{XXXXX} escapes and back. Handles surrogate pairs for emoji. Output JSON-safe or ES2015 syntax.

encoding

Unicode Escape / Unescape

Output

Runs entirely in your browser. Your input never leaves your device.

What next?

How it works

What Unicode escaping is

Unicode escaping converts characters to their \uXXXX (or \u{XXXXX}) escape sequence representation β€” and back. The character Γ© becomes Γ©; the emoji πŸ‘€ becomes \u{1f440} in ES6 mode or the surrogate pair πŸ‘€ in JSON/ES5 mode.

This is not encoding for compression or security. It's a text-safe representation that lets you embed any Unicode character in a context that only accepts ASCII β€” JSON string values, JavaScript source files, C string literals, Java .properties files, and similar contexts where arbitrary Unicode may not survive copy-paste, terminal rendering, or protocol transit.

The two escape syntaxes

\uXXXX β€” four hex digits (BMP only)

The original Unicode escape syntax, defined in the ECMAScript 1 spec (1997) and required by the JSON specification (RFC 8259 Β§7). Works for codepoints U+0000 through U+FFFF β€” the Basic Multilingual Plane (BMP). For these 65,536 characters, Γ© and Γ© are exactly equivalent in any compliant parser.

\u{XXXXX} β€” variable hex digits in braces (full Unicode)

Introduced in ES6 (ECMAScript 2015). Supports the full Unicode range U+000000 through U+10FFFF. This is the correct way to escape astral plane characters β€” codepoints above U+FFFF that include most emoji, historic scripts, and mathematical symbols. \u{1f600} is cleaner and unambiguous; the brace syntax also allows leading-zero omission.

JSON does not support \u{} β€” JSON parsers expect exactly four hex digits after \u. To embed an astral codepoint in JSON, you must use surrogate pairs.

The astral plane and surrogate pairs

Unicode's Basic Multilingual Plane holds codepoints U+0000–U+FFFF. Everything above β€” called the astral planes or supplementary characters β€” requires a different strategy in systems built on UTF-16.

JavaScript strings are internally UTF-16. To represent a codepoint above U+FFFF in UTF-16, two 16-bit code units called a surrogate pair are used: a high surrogate (U+D800–U+DBFF) followed by a low surrogate (U+DC00–U+DFFF). The formula:

codepoint = 0x10000 + (high - 0xD800) Γ— 0x400 + (low - 0xDC00)

The emoji πŸ˜€ (U+1F600) becomes the pair πŸ˜€. In JSON, this is the only way to represent it since JSON's \u escape only takes four hex digits. In ES6 JavaScript, you can use \u{1F600} directly.

The gotcha: because a surrogate pair is two \u sequences, naΓ―ve string operations that work character-by-character break. "πŸ˜€".length returns 2 in JavaScript, not 1. "πŸ˜€"[0] returns the dangling high surrogate \uD83D, not the emoji. Splitting on grapheme clusters requires [...str] (spread) or Intl.Segmenter in ES2022+.

JSON mode vs ES6 mode vs "all" mode

JSON mode β€” escapes every non-ASCII character using \uXXXX only, with surrogate pairs for codepoints above U+FFFF. The output is always valid inside a JSON string literal. Safe for embedding in any JSON-consuming system regardless of parser age.

ES6 mode β€” uses \u{XXXXX} for astral characters and \uXXXX for BMP characters. Produces shorter, more readable output for emoji-heavy strings. Only valid in ES6+ JavaScript contexts β€” not in JSON, not in older ES5 environments.

All mode β€” escapes every character, including ASCII. Useful when you need a pure ASCII representation of a string with no raw Unicode at all β€” for embedding in C strings, Python source files, or debugging character-by-character.

String.fromCodePoint vs String.fromCharCode

These two JavaScript functions reveal the surrogate pair complexity directly:

String.fromCharCode(0x1F600)   // "?" β€” wrong, truncates to U+F600
String.fromCodePoint(0x1F600)  // "πŸ˜€" β€” correct

String.fromCharCode(0xD83D, 0xDE00)  // "πŸ˜€" β€” correct via surrogate pair
String.fromCodePoint(0x1F600)        // "πŸ˜€" β€” correct via codepoint

fromCharCode predates Unicode's astral planes and operates on raw UTF-16 code units. If you pass it a codepoint above 0xFFFF, it silently truncates. fromCodePoint (ES6) handles the full range. When unescaping \u{1F600}, always use fromCodePoint or the equivalent in your runtime.

When does JSON force you to escape?

The JSON spec (RFC 8259) requires that control characters U+0000–U+001F be escaped. All other characters may be included raw as UTF-8, but many serialisers escape the full non-ASCII range for safety β€” to survive ASCII-only transport, avoid terminal rendering issues, or prevent injection in HTML contexts (<, >, &, " are often escaped as <, >, &, ").

If you're receiving a JSON payload and see Γ  where you expected Γ , the JSON was produced with a "fully escape non-ASCII" setting. Both representations are identical to a conforming JSON parser.

Template literals and when you don't need escaping

In ES6 template literals and modern JavaScript string literals, you can embed Unicode directly:

const greeting = `こんにけは`;      // no escaping needed
const emoji = "πŸ˜€";               // works in modern JS files
const escaped = "\u{1F600}";      // identical to above

You need escaping when:

  • The string will be embedded in JSON programmatically
  • The file must be ASCII-only (older toolchains, some .properties files)
  • You're debugging a character and want to see its codepoint explicitly
  • You're passing the string through a channel that mangles raw Unicode

Privacy

All processing runs in your browser. Your text is never sent to a server.

Related tools

  • URL Encode/Decode β€” percent-encoding for URI components, a different escaping scheme.
  • HTML Entity Encode/Decode β€” &amp;, &#x00E9;, and similar HTML character references.

FAQ

Why does my emoji appear as two \u sequences in JSON mode?

Because JSON's \u escape only accepts exactly four hex digits, which covers codepoints up to U+FFFF. Emoji live in the astral planes (above U+FFFF) and must be represented as UTF-16 surrogate pairs β€” two consecutive \u sequences. For example, πŸ˜€ (U+1F600) becomes πŸ˜€. Use ES6 mode to get the cleaner \u{1F600} form, but note that JSON parsers do not accept the brace syntax.

What is the difference between \uXXXX and \u{XXXXX}?

\uXXXX is the original four-hex-digit syntax from ECMAScript 1 (1997) and the JSON spec (RFC 8259). It only covers the Basic Multilingual Plane (U+0000–U+FFFF). \u{XXXXX} is the ES6 brace syntax that accepts any number of hex digits and covers the full Unicode range up to U+10FFFF. The brace form is valid in ES6+ JavaScript and many modern languages, but not in JSON β€” the JSON spec has never been updated to include it.

Why does "πŸ˜€".length return 2 in JavaScript?

JavaScript strings are internally UTF-16. Codepoints above U+FFFF are stored as two 16-bit code units (a surrogate pair), so String.prototype.length counts code units, not characters. Use [...str].length (spread uses the string iterator, which is codepoint-aware) or Array.from(str).length to get the character count. Intl.Segmenter gives grapheme cluster counts, which is what users perceive as "number of characters."

When do I actually need to escape Unicode in JavaScript?

Rarely in modern code. ES6+ source files saved as UTF-8 can contain raw Unicode characters directly. You need \u escapes when: (1) the file must be ASCII-only for a legacy toolchain, (2) you're building a JSON string programmatically and need to embed non-ASCII content safely, (3) you're targeting an environment with incomplete Unicode support, or (4) you're debugging a specific codepoint and want to make it visually explicit in source.

Does JSON require non-ASCII characters to be escaped?

No β€” RFC 8259 allows any Unicode character encoded as UTF-8 to appear unescaped in a JSON string, except for control characters (U+0000–U+001F). Many JSON serialisers escape all non-ASCII characters anyway, for safety across ASCII-only transports or to avoid HTML-injection risks (<, >, & may be escaped as <, etc.). Both representations are identical to a conforming JSON parser.

What is String.fromCodePoint and why should I prefer it over String.fromCharCode?

String.fromCharCode (ES1) operates on raw UTF-16 code units and silently truncates values above 0xFFFF. String.fromCodePoint (ES6) handles the full Unicode range β€” pass 0x1F600 and you get πŸ˜€, not garbage. Always use fromCodePoint when converting a codepoint number to a character, especially if the codepoint may be in the astral plane.

What does "all" mode do that JSON and ES6 modes don't?

All mode escapes every character β€” including plain ASCII letters and digits β€” not just non-ASCII or non-printable ones. The result is a pure-ASCII string where every character is represented as \uXXXX. This is useful for embedding in C string literals that must be ASCII-only, for certain .properties file formats (Java's native2ascii tool produces this), or for debugging character sequences where you want to see every codepoint explicitly.

Can splitting a string on emoji break it?

Yes. Because many emoji are surrogate pairs (two UTF-16 code units), operations that work at the code-unit level β€” str[i], str.substring(), str.split('') β€” can cut between the two halves of a pair, producing dangling surrogates and garbled output. Safe alternatives: the spread operator [...str] iterates by codepoint, and Intl.Segmenter iterates by grapheme cluster (accounting for emoji modifiers and ZWJ sequences like family emoji composed from multiple codepoints).