字符串编码
rust 中的字符串都是使用的 UTF-8 编码,rust 代码文件也是 UTF-8 编码,如果不是,rust 会报错。
1
2
3
4
5
6
7
8
9
10
11
12
13
| use std::str;
fn main() {
let tao = str::from_utf8(&[0xE9u8, 0x81u8, 0x93u8]).unwrap(); // UTF8 到 str
assert_eq!("道", tao);
assert_eq!("道", String::from("\u{9053}"));
let unicode_x = 0x9053; // unicode 码点
let utf_x_hex = 0xe98193;
let utf_x_bin = 0b111010011000000110010011;
println!("unicode_x: {:b}", unicode_x);
println!("utf_x_hex: {:b}", utf_x_hex);
println!("utf_x_bin: 0x{:b}", utf_x_bin);
}
|
1
2
3
| unicode_x: 1001000001010011
utf_x_hex: 111010011000000110010011
utf_x_bin: 0x111010011000000110010011
|
字符
Rust 使用 char 表示单个字符,char 类型使用的整数值和 Unicode 标量值一一对应。为了存储任意的 Unicode 标量值,Rust 规定每个字符都占 4 个字节
1
2
3
4
5
6
7
8
9
10
| fn main() {
let tao = '道';
let tao_u32 = tao as u32;
assert_eq!(36947, tao_u32);
println!("U+{:x}", tao_u32); // U+9053
println!("{}", tao.escape_unicode()); // \u{9053}
assert_eq!(char::from(65), 'A');
assert_eq!(std::char::from_u32(0x9053), Some('道'));
assert_eq!(std::char::from_u32(36947), Some('道'));
}
|
1
2
3
4
5
6
7
| fn main() {
let mut b = [0; 3];
let tao = '道';
let tao_str = tao.encode_utf8(&mut b);
assert_eq!("道", tao_str);
assert_eq!(3, tao.len_utf8());
}
|