4.8char and a first look at Unicode

Last updated June 11, 2026

Lesson 4.1 said a type is an interpretation of bits, with the byte 0100 0001 readable as the number 65 or the letter 'A'. Time to meet the type doing the letter-reading.

A char holds a single character, written in single quotes: 'A', '5', '?'. Under the hood it's a number (every character has an agreed code, and A's is 65, exactly as teased), and the agreement in question is Unicode, the standard that assigns a number to essentially every character humanity uses. That ambition is why this line compiles:

fn main() {
    let letter = 'A';
    let accented = 'é';
    let hangul = '한';
    let crab = '🦀';
    println!("{letter} {accented} {hangul} {crab}");
}
A é 한 🦀

(The crab is Rust's unofficial mascot, Ferris. Yes, the emoji is a char. No, this isn't a parlor trick; your programs will meet real text from real humans, and real text looks like this.)

Unicode's catalog is far bigger than one byte's 256 slots, which explains a fact from lesson 4.1's size table that may have raised an eyebrow: a Rust char is 4 bytes, room for any of the million-plus possible code values. C and C++'s char is 1 byte, an ASCII-era design that forces their Unicode handling through add-on types and library folklore; Rust, born in 2006 rather than 1972, made the default character type Unicode-sized from day one.

For advanced readers

Precisely: a char is a Unicode scalar value, any code point except a reserved range called surrogates, valid by construction. And precision demands an asterisk on "a character": some things humans perceive as one character (certain emoji, some accented combinations) are actually sequences of scalar values. char holds exactly one scalar value, which is almost always what you want and occasionally a fascinating rabbit hole; chapter 18's string-internals lesson stands at the entrance.

Single quotes are not double quotes

The quote marks are type information, and mixing them up produces two different errors worth previewing. 'A' is a char; "A" is a string literal (a &str, chapter 5's subject) that happens to contain one character. They are different types, not interchangeable, and the compiler's correction is unusually charming:

fn main() {
    let initial: char = "A";
    println!("{initial}");
}
error[E0308]: mismatched types
 --> src/main.rs:2:25
  |
2 |     let initial: char = "A";
  |                  ----   ^^^ expected `char`, found `&str`
  |                  |
  |                  expected due to this
  |
help: if you meant to write a `char` literal, use single quotes
  |
2 -     let initial: char = "A";
2 +     let initial: char = 'A';
  |

In the other direction, multiple characters in single quotes ('ab') is a straight syntax error; C++'s "multicharacter literal" (legal there, value implementation-defined, a beloved trivia question) doesn't exist here.

'5' is not 5

The classic char trap, worth its own section because every beginner falls in once. The character '5' and the number 5 are unrelated values: the character is Unicode's catalog entry for the digit-shaped symbol, which lives at code 53. You can see a char's code with a cast (the as you'll formally meet in lesson 4.10):

fn main() {
    println!("{}", '5' as u32);
    println!("{}", 'A' as u32);
    println!("{}", '🦀' as u32);
}
53
65
129408

So '5' as u32 is 53, not 5, and arithmetic on digit characters without conversion is a bug generator in every language. When you need a digit character's numeric value, ask properly: chars have a method for it ('5'.to_digit(10) exists; its return type involves chapter 11 machinery, so we mention it and move on). For whole strings of digits, you've been converting correctly since lesson 1.12: that's what parse is for.

One last connection to close the loop with lesson 1.5: escape sequences work in char literals too ('\n', '\t', '\'' for a literal single quote), each denoting exactly one character.

Quiz time

Question #1

Which of these compile, and what's the type of each that does? 'x', "x", 'xy', '\n'

Show solution

'x' compiles: char. "x" compiles: &str (a one-character string). 'xy' does not compile: single quotes hold exactly one character. '\n' compiles: char (the escape denotes one character, the newline).

Question #2

What does this print, and what's the lesson in it?

fn main() {
    let digit = '7';
    println!("{}", digit as u32);
}
Show solution

55: the Unicode code for the character '7', not the number 7. Characters are catalog entries, not values; converting between digit-characters and numbers must be done deliberately (with parse for strings, or char's own methods).

Question #3

A char is 4 bytes and a u32 is 4 bytes. A teammate concludes they're basically the same type. Name one way Rust disagrees.

Show solution

They don't mix: a char isn't accepted where a u32 is expected (or vice versa) without an explicit cast, same as every other type pair in this chapter. (Deeper: every char is a valid Unicode scalar value by construction, while a u32 can hold any 4-byte pattern, including invalid ones, which is exactly why the types are kept distinct.)

That's the last of the simple scalar types. Before the chapter's two remaining tricks (tuples, conversions), a short lesson on the machinery that's been quietly choosing half your types for you.