v0 Symbol Format
The v0 mangling format was introduced in RFC 2603. It has the following properties:
- It provides an unambiguous string encoding for everything that can end up in a binary's symbol table.
- It encodes information about generic parameters in a reversible way.
- The mangled symbols are decodable such that the demangled form should be easily identifiable as some concrete instance of e.g. a polymorphic function.
- It has a consistent definition that does not rely on pretty-printing certain language constructs.
- Symbols can be restricted to only consist of the characters
A-Z
,a-z
,0-9
, and_
. This helps ensure that it is platform-independent, where other characters might have special meaning in some context (e.g..
for MSVCDEF
files). Unicode symbols are optionally supported. - It tries to stay efficient, avoiding unnecessarily long names, and avoiding computationally expensive operations to demangle.
The v0 format is not intended to be compatible with other mangling schemes (such as C++).
The v0 format is not presented as a stable ABI for Rust. This format is currently intended to be well-defined enough that a demangler can produce a reasonable human-readable form of the symbol. There are several implementation-defined portions that result in it not being possible to entirely predict how a given Rust entity will be encoded.
The sections below define the encoding of a v0 symbol. There is no standardized demangled form of the symbols, though suggestions are provided for how to demangle a symbol. Implementers may choose to demangle in different ways.
Extensions
This format may be extended in the future to add new tags as Rust is extended with new language items. To be forward compatible, demanglers should gracefully handle symbols that have encodings where it encounters a tag character not described in this document. For example, they may fall back to displaying the mangled symbol. The format may be extended anywhere there is a tag character, such as the type rule. The meaning of existing tags and encodings will not be changed.
Grammar notation
The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. A consolidated summary can be found in the Symbol grammar summary.
Name | Syntax | Example | Description |
---|---|---|---|
Rule | → | A production. | |
Concatenation | whitespace | Individual elements in sequence left-to-right. | |
Alternative | | | Matches either one or the other. | |
Grouping | () | Groups multiple elements as one. | |
Repetition | {} | Repeats the enclosed zero or more times. | |
Option | opt | An optional element. | |
Literal | monospace | G | A terminal matching the exact characters case-sensitive. |
Symbol name
symbol-name →
_R
decimal-numberopt path instantiating-crateopt vendor-specific-suffixopt
A mangled symbol starts with the two characters _R
which is a prefix to identify the symbol as a Rust symbol.
The prefix can optionally be followed by a decimal-number which specifies the encoding version.
This number is currently not used, and is never present in the current encoding.
Following that is a path which encodes the path to an entity.
The path is followed by an optional instantiating-crate which helps to disambiguate entities which may be instantiated multiple times in separate crates.
The final part is an optional vendor-specific-suffix.
Recommended Demangling
A symbol-name should be displayed as the path. The instantiating-crate and the vendor-specific-suffix usually need not be displayed.
Example:
std::path::PathBuf::new();
The symbol for
PathBuf::new
in cratemycrate
is:_RNvMsr_NtCs3ssYzQotkvD_3std4pathNtB5_7PathBuf3newCs15kBYyAo9fc_7mycrate ├┘└───────────────────────┬──────────────────────┘└──────────┬─────────┘ │ │ │ │ │ └── instantiating-crate path "mycrate" │ └───────────────────────────────────── path to std::path::PathBuf::new └─────────────────────────────────────────────────────────────── `_R` symbol prefix
Recommended demangling:
<std::path::PathBuf>::new
Symbol path
path →
crate-root
| inherent-impl
| trait-impl
| trait-definition
| nested-path
| generic-args
| backref
A path represents a variant of a Rust path to some entity.
In addition to typical Rust path segments using identifiers,
it uses extra elements to represent unnameable entities (like an impl
) or generic arguments for monomorphized items.
The initial tag character can be used to determine which kind of path it represents:
Tag | Rule | Description |
---|---|---|
C | crate-root | The root of a crate path. |
M | inherent-impl | An inherent implementation. |
X | trait-impl | A trait implementation. |
Y | trait-definition | A trait definition. |
N | nested-path | A nested path. |
I | generic-args | Generic arguments. |
B | backref | A back reference. |
Path: Crate root
crate-root →
C
identifier
A crate-root indicates a path referring to the root of a crate's module tree.
It consists of the character C
followed by the crate name as an identifier.
The crate name is the name as seen from the defining crate. Since Rust supports linking multiple crates with the same name, the disambiguator is used to make the name unique across the crate graph.
Recommended Demangling
A crate-root can be displayed as the identifier such as
mycrate
.Usually the disambiguator in the identifier need not be displayed, but as an alternate form the disambiguator can be shown in hex such as
mycrate[ca63f166dbe9294]
.
Example:
fn example() {}
The symbol for
example
in cratemycrate
is:_RNvCs15kBYyAo9fc_7mycrate7example │└────┬─────┘││└──┬──┘ │ │ ││ │ │ │ ││ └── crate-root identifier "mycrate" │ │ │└────── length 7 of "mycrate" │ │ └─────── end of base-62-number │ └────────────── disambiguator for crate-root "mycrate" 0xca63f166dbe9293 + 1 └──────────────────── crate-root
Recommended demangling:
mycrate::example
Note: The compiler may re-use the crate-root form to express arbitrary unscoped, undisambiguated identifiers, such as for new basic types that have not been added to the grammar yet. To achieve that, it will emit a crate-root without an explicit disambiguator, relying on the fact that such an undisambiguated crate name cannot occur in practice. For example, the basic type
f128
would be encode asC4f128
. For this to have the desired effect, demanglers are expected to never render zero disambiguators of crate roots. I.e.C4f128
is expected to be displayed asf128
and notf128[0]
.
Path: Inherent impl
An inherent-impl indicates a path to an inherent implementation.
It consists of the character M
followed by an impl-path, which uniquely identifies the impl block the item is defined in.
Following that is a type representing the Self
type of the impl.
Recommended Demangling
An inherent-impl can be displayed as a qualified path segment to the type within angled brackets. The impl-path usually need not be displayed.
Example:
struct Example; impl Example { fn foo() {} }
The symbol for
foo
in the impl forExample
is:_RNvMs_Cs4Cv8Wi1oAIB_7mycrateNtB4_7Example3foo │├┘└─────────┬──────────┘└────┬──────┘ ││ │ │ ││ │ └── Self type "Example" ││ └─────────────────── path to the impl's parent "mycrate" │└─────────────────────────────── disambiguator 1 └──────────────────────────────── inherent-impl
Recommended demangling:
<mycrate::Example>::foo
Path: Trait impl
A trait-impl indicates a path to a trait implementation.
It consists of the character X
followed by an impl-path to the impl's parent followed by the type representing the Self
type of the impl followed by a path to the trait.
Recommended Demangling
A trait-impl can be displayed as a qualified path segment using the
<
typeas
path>
syntax. The impl-path usually need not be displayed.
Example:
struct Example; trait Trait { fn foo(); } impl Trait for Example { fn foo() {} }
The symbol for
foo
in the trait impl forExample
is:_RNvXCs15kBYyAo9fc_7mycrateNtB2_7ExampleNtB2_5Trait3foo │└─────────┬──────────┘└─────┬─────┘└────┬────┘ │ │ │ │ │ │ │ └── path to the trait "Trait" │ │ └────────────── Self type "Example" │ └──────────────────────────────── path to the impl's parent "mycrate" └─────────────────────────────────────────── trait-impl
Recommended demangling:
<mycrate::Example as mycrate::Trait>::foo
Path: Impl
impl-path → disambiguatoropt path
An impl-path is a path used for inherent-impl and trait-impl to indicate the path to parent of an implementation. It consists of an optional disambiguator followed by a path. The path is the path to the parent that contains the impl. The disambiguator can be used to distinguish between multiple impls within the same parent.
Recommended Demangling
An impl-path usually need not be displayed (unless the location of the impl is desired).
Example:
struct Example; impl Example { fn foo() {} } impl Example { fn bar() {} }
The symbol for
foo
in the impl forExample
is:_RNvMCs7qp2U7fqm6G_7mycrateNtB2_7Example3foo └─────────┬──────────┘ │ └── path to the impl's parent crate-root "mycrate"
The symbol for
bar
is similar, though it has a disambiguator to indicate it is in a different impl block._RNvMs_Cs7qp2U7fqm6G_7mycrateNtB4_7Example3bar ├┘└─────────┬──────────┘ │ │ │ └── path to the impl's parent crate-root "mycrate" └────────────── disambiguator 1
Recommended demangling:
foo
:<mycrate::Example>::foo
bar
:<mycrate::Example>::bar
Path: Trait definition
A trait-definition is a path to a trait definition.
It consists of the character Y
followed by the type which is the Self
type of the referrer, followed by the path to the trait definition.
Recommended Demangling
A trait-definition can be displayed as a qualified path segment using the
<
typeas
path>
syntax.
Example:
trait Trait { fn example() {} } struct Example; impl Trait for Example {}
The symbol for
example
in the traitTrait
implemented forExample
is:_RNvYNtCs15kBYyAo9fc_7mycrate7ExampleNtB4_5Trait7exampleB4_ │└──────────────┬───────────────┘└────┬────┘ │ │ │ │ │ └── path to the trait "Trait" │ └──────────────────────── path to the implementing type "mycrate::Example" └──────────────────────────────────────── trait-definition
Recommended demangling:
<mycrate::Example as mycrate::Trait>::example
Path: Nested path
nested-path →
N
namespace path identifier
A nested-path is a path representing an optionally named entity.
It consists of the character N
followed by a namespace indicating the namespace of the entity,
followed by a path which is a path representing the parent of the entity,
followed by an identifier of the entity.
The identifier of the entity may have a length of 0 when the entity is not named. For example, entities like closures, tuple-like struct constructors, and anonymous constants may not have a name. The identifier may still have a disambiguator unless the disambiguator is 0.
Recommended Demangling
A nested-path can be displayed by first displaying the path followed by a
::
separator followed by the identifier. If the identifier is empty, then the separating::
should not be displayed.If a namespace is specified, then extra context may be added such as:
path::{
namespace (:
identifier)opt#
disambiguatoras base-10 number}
Here the namespace
C
may be printed asclosure
andS
asshim
. Others may be printed by their character tag. The:
name portion may be skipped if the name is empty.The disambiguator in the identifier may be displayed if a namespace is specified. In other situations, it is usually not necessary to display the disambiguator. If it is displayed, it is recommended to place it in brackets, for example
[284a76a8b41a7fd3]
. If the disambiguator is not present, then its value is 0 and it can always be omitted from display.
Example:
fn main() { let x = || {}; let y = || {}; x(); y(); }
The symbol for the closure
x
in cratemycrate
is:_RNCNvCsgStHSCytQ6I_7mycrate4main0B3_ ││└─────────────┬─────────────┘│ ││ │ │ ││ │ └── identifier with length 0 ││ └───────────────── path to "mycrate::main" │└──────────────────────────────── closure namespace └───────────────────────────────── nested-path
The symbol for the closure
y
is similar, with a disambiguator:_RNCNvCsgStHSCytQ6I_7mycrate4mains_0B3_ ││ │└── base-62-number 0 └─── disambiguator 1 (base-62-number+1)
Recommended demangling:
x
:mycrate::main::{closure#0}
y
:mycrate::main::{closure#1}
Path: Generic arguments
generic-args →
I
path {generic-arg}E
A generic-args is a path representing a list of generic arguments.
It consists of the character I
followed by a path to the defining entity, followed by zero or more generic-args terminated by the character E
.
Each generic-arg is either a lifetime (starting with the character L
), a type, or the character K
followed by a const representing a const argument.
Recommended Demangling
A generic-args may be printed as: path
::
opt<
comma-separated list of args>
The::
separator may be elided for type paths (similar to Rust's rules).
Example:
fn main() { example([123]); } fn example<T, const N: usize>(x: [T; N]) {}
The symbol for the function
example
is:_RINvCsgStHSCytQ6I_7mycrate7examplelKj1_EB2_ │└──────────────┬───────────────┘││││││ │ │ │││││└── end of generic-args │ │ ││││└─── end of const-data │ │ │││└──── const value `1` │ │ ││└───── const type `usize` │ │ │└────── const generic │ │ └─────── generic type i32 │ └──────────────────────── path to "mycrate::example" └──────────────────────────────────────── generic-args
Recommended demangling:
mycrate::example::<i32, 1>
Namespace
A namespace is used to segregate names into separate logical groups, allowing identical names to otherwise avoid collisions. It consists of a single character of an upper or lowercase ASCII letter. Lowercase letters are reserved for implementation-internal disambiguation categories (and demanglers should never show them). Uppercase letters are used for special namespaces which demanglers may display in a special way.
Uppercase namespaces are:
C
— A closure.S
— A shim. Shims are added by the compiler in some situations where an intermediate is needed. For example, afn()
pointer to a function with the#[track_caller]
attribute needs a shim to deal with the implicit caller location.
Recommended Demangling
See nested-path for recommended demangling.
Identifier
identifier → disambiguatoropt undisambiguated-identifier
undisambiguated-identifier →
u
opt decimal-number_
opt bytesbytes → {UTF-8 bytes}
An identifier is a named label used in a path to refer to an entity. It consists of an optional disambiguator followed by an undisambiguated-identifier.
The disambiguator is used to disambiguate identical identifiers that should not otherwise be considered the same. For example, closures have no name, so the disambiguator is the only differentiating element between two different closures in the same parent path.
The undisambiguated-identifier starts with an optional u
character,
which indicates that the identifier is encoded in Punycode.
The next part is a decimal-number which indicates the length of the bytes.
Following the identifier size is an optional _
character which is used to separate the length value from the identifier itself.
The _
is mandatory if the bytes starts with a decimal digit or _
in order to keep it unambiguous where the decimal-number ends and the bytes starts.
bytes is the identifier itself encoded in UTF-8.
Recommended Demangling
The display of an identifier can depend on its context. If it is Punycode-encoded, then it may first be decoded before being displayed.
The disambiguator may or may not be displayed; see recommendations for rules that use identifier.
Punycode identifiers
Because some environments are restricted to ASCII alphanumerics and _
,
Rust's Unicode identifiers may be encoded using a modified version of Punycode.
For example, the function:
mod gödel {
mod escher {
fn bach() {}
}
}
would be mangled as:
_RNvNtNtCsgOH4LzxkuMq_7mycrateu8gdel_5qa6escher4bach
││└───┬──┘
││ │
││ └── gdel_5qa translates to gödel
│└─────── 8 is the length
└──────── `u` indicates it is a Unicode identifier
Standard Punycode generates strings of the form ([[:ascii:]]+-)?[[:alnum:]]+
.
This is problematic because the -
character
(which is used to separate the ASCII part from the base-36 encoding)
is not in the supported character set for symbols.
For this reason, -
characters in the Punycode encoding are replaced with _
.
Here are some examples:
Original | Punycode | Punycode + Encoding |
---|---|---|
føø | f-5gaa | f_5gaa |
α_ω | _-ylb7e | __ylb7e |
铁锈 | n84amf | n84amf |
🤦 | fq9h | fq9h |
ρυστ | 2xaedc | 2xaedc |
Note: It is up to the compiler to decide whether or not to encode identifiers using Punycode or not. Some platforms may have native support for UTF-8 symbols, and the compiler may decide to use the UTF-8 encoding directly. Demanglers should be prepared to support either form.
Disambiguator
disambiguator →
s
base-62-number
A disambiguator is used in various parts of a symbol path to uniquely identify path elements that would otherwise be identical but should not be considered the same.
It starts with the character s
and is followed by a base-62-number.
If the disambiguator is not specified, then its value can be assumed to be zero.
Otherwise, when demangling, the value 1 should be added to the base-62-number
(thus a base-62-number of zero encoded as _
has a value of 1).
This allows disambiguators that are encoded sequentially to use minimal bytes.
Recommended Demangling
The disambiguator may or may not be displayed; see recommendations for rules that use disambiguator. Generally, it is recommended that zero disambiguators are never displayed unless their accompanying identifier is empty (like is the case for unnamed items such as closures). When rendering a disambiguator, it can be shortened to a length reasonable for the context, similar to how git commit hashes are rarely displayed in full.
Lifetime
lifetime →
L
base-62-number
A lifetime is used to encode an anonymous (numbered) lifetime, either erased or higher-ranked.
It starts with the character L
and is followed by a base-62-number.
Index 0 is always erased.
Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime bound by one of the enclosing binders.
Recommended Demangling
A lifetime may be displayed like a Rust lifetime using a single quote.
Index 0 should be displayed as
'_
. Index 0 should not be displayed for lifetimes in a ref-type, mut-ref-type, or dyn-trait-type.A lifetime can be displayed by converting the De Bruijn index to a De Bruijn level (level = number of bound lifetimes - index) and selecting a unique name for each level. For example, starting with single lowercase letters such as
'a
for level 0. Levels over 25 may consider printing the numeric lifetime as in'_123
. See binder for more on lifetime indexes and ordering.
Example:
fn main() { example::<fn(&u8, &u16)>(); } pub fn example<T>() {}
The symbol for the function
example
is:_RINvCs7qp2U7fqm6G_7mycrate7exampleFG0_RL1_hRL0_tEuEB2_ │└┬┘│└┬┘││└┬┘││ │ │ │ │ ││ │ │└── end of input types │ │ │ │ ││ │ └─── type u16 │ │ │ │ ││ └───── lifetime #1 'b │ │ │ │ │└─────── reference type │ │ │ │ └──────── type u8 │ │ │ └────────── lifetime #2 'a │ │ └──────────── reference type │ └────────────── binder with 2 lifetimes └──────────────── function type
Recommended demangling:
mycrate::example::<for<'a, 'b> fn(&'a u8, &'b u16)>
Const
const →
type const-data
|p
| backrefconst-data →
n
opt {hex-digit}_
A const is used to encode a const value used in generics and types. It has the following forms:
- A constant value encoded as a type which represents the type of the constant and const-data which is the constant value, followed by
_
to terminate the const. - The character
p
which represents a placeholder. - A backref to a previously encoded const of the same value.
The encoding of the const-data depends on the type:
bool
— The valuefalse
is encoded as0_
, the value true is encoded as1_
.char
— The Unicode scalar value of the character is encoded in hexadecimal.- Unsigned integers — The value is encoded in hexadecimal.
- Signed integers — The character
n
is a prefix to indicate that it is negative, followed by the absolute value encoded in hexadecimal.
Recommended Demangling
A const may be displayed by the const value depending on the type.
The
p
placeholder should be displayed as the_
character.For specific types:
b
(bool) — Display astrue
orfalse
.c
(char) — Display the character in as a Rust character (such as'A'
or'\n'
).- integers — Display the integer (either in decimal or hex).
Example:
fn main() { example::<0x12345678>(); } pub fn example<const N: u64>() {}
The symbol for function
example
is:_RINvCs7qp2U7fqm6G_7mycrate7exampleKy12345678_EB2_ ││└───┬───┘ ││ │ ││ └── const-data 0x12345678 │└─────── const type u64 └──────── const generic arg
Recommended demangling:
mycrate::example::<305419896>
Placeholders
A placeholder may occur in circumstances where a type or const value is not relevant.
Example:
pub struct Example<T, const N: usize>([T; N]); impl<T, const N: usize> Example<T, N> { pub fn foo() -> &'static () { static EXAMPLE_STATIC: () = (); &EXAMPLE_STATIC } }
In this example, the static
EXAMPLE_STATIC
would not be monomorphized by the type or const parametersT
andN
. Those will use the placeholder for those generic arguments. Its symbol is:_RNvNvMCsd9PVOYlP1UU_7mycrateINtB4_7ExamplepKpE3foo14EXAMPLE_STATIC │ │││ │ ││└── const placeholder │ │└─── const generic argument │ └──── type placeholder └────────────────── generic-args
Recommended demangling:
<mycrate::Example<_, _>>::foo::EXAMPLE_STATIC
Type
type →
basic-type
| array-type
| slice-type
| tuple-type
| ref-type
| mut-ref-type
| const-ptr-type
| mut-ptr-type
| fn-type
| dyn-trait-type
| path
| backref
A type represents a Rust type. The initial character can be used to distinguish which type is encoded. The type encodings based on the initial tag character are:
- A basic-type is encoded as a single character:
a
—i8
b
—bool
c
—char
d
—f64
e
—str
f
—f32
h
—u8
i
—isize
j
—usize
l
—i32
m
—u32
n
—i128
o
—u128
s
—i16
t
—u16
u
— unit()
v
— variadic...
x
—i64
y
—u64
z
—!
p
— placeholder_
Remaining primitives are encoded as a crate production, e.g. C4f128
.
-
A
— An array[T; N]
.The tag
A
is followed by the type of the array followed by a const for the array size. -
S
— A slice[T]
.slice-type →
S
typeThe tag
S
is followed by the type of the slice. -
T
— A tuple(T1, T2, T3, ...)
.tuple-type →
T
{type}E
The tag
T
is followed by one or more types indicating the type of each field, followed by a terminatingE
character.Note that a zero-length tuple (unit) is encoded with the
u
basic-type. -
R
— A reference&T
.The tag
R
is followed by an optional lifetime followed by the type of the reference. The lifetime is not included if it has been erased. -
Q
— A mutable reference&mut T
.The tag
Q
is followed by an optional lifetime followed by the type of the mutable reference. The lifetime is not included if it has been erased. -
P
— A constant raw pointer*const T
.The tag
P
is followed by the type of the pointer.const-ptr-type →
P
type -
O
— A mutable raw pointer*mut T
.mut-ptr-type →
O
typeThe tag
O
is followed by the type of the pointer. -
F
— A function pointerfn(…) -> …
.fn-type →
F
fn-sigfn-sig → binderopt
U
opt (K
abi)opt {type}E
typeabi →
C
| undisambiguated-identifierThe tag
F
is followed by a fn-sig of the function signature. A fn-sig is the signature for a function pointer.It starts with an optional binder which represents the higher-ranked trait bounds (
for<…>
).Following that is an optional
U
character which is present for anunsafe
function.Following that is an optional
K
character which indicates that an abi is specified. If the ABI is not specified, it is assumed to be the"Rust"
ABI.The abi can be the letter
C
to indicate it is the"C"
ABI. Otherwise it is an undisambiguated-identifier of the ABI string with dashes converted to underscores.Following that is zero or more types which indicate the input parameters of the function.
Following that is the character
E
and then the type of the return value.
-
D
— A trait objectdyn Trait<Assoc=X> + Send + 'a
.dyn-trait-type →
D
dyn-bounds lifetimedyn-bounds → binderopt {dyn-trait}
E
dyn-trait → path {dyn-trait-assoc-binding}
dyn-trait-assoc-binding →
p
undisambiguated-identifier typeThe tag
D
is followed by a dyn-bounds which encodes the trait bounds, followed by a lifetime of the trait object lifetime bound.A dyn-bounds starts with an optional binder which represents the higher-ranked trait bounds (
for<…>
). Following that is a sequence of dyn-trait terminated by the characterE
.Each dyn-trait represents a trait bound, which consists of a path to the trait followed by zero or more dyn-trait-assoc-binding which list the associated types.
Each dyn-trait-assoc-binding consists of a character
p
followed a undisambiguated-identifier representing the associated binding name, and finally a type.
Recommended Demangling
A type may be displayed as the type it represents, using typical Rust syntax to represent the type.
Example:
fn main() { example::<[u16; 8]>(); } pub fn example<T>() {}
The symbol for function
example
is:_RINvCs7qp2U7fqm6G_7mycrate7exampleAtj8_EB2_ │││├┘│ ││││ └─── end of generic args │││└───── const data 8 ││└────── const type usize │└─────── array element type u16 └──────── array type
Recommended demangling:
mycrate::example::<[u16; 8]>
Binder
binder →
G
base-62-number
A binder represents the number of higher-ranked trait bound lifetimes to bind.
It consists of the character G
followed by a base-62-number.
The value 1 should be added to the base-62-number when decoding
(such that the base-62-number encoding of _
is interpreted as having 1 binder).
A lifetime rule can then refer to these numbered lifetimes. The lowest indices represent the innermost lifetimes. The number of bound lifetimes is the value of base-62-number plus one.
For example, in for<'a, 'b> fn(for<'c> fn (...))
, any lifetimes in ...
(but not inside more binders) will observe the indices 1, 2, and 3 to refer to 'c
, 'b
, and 'a
, respectively.
Recommended Demangling
A binder may be printed using
for<…>
syntax listing the lifetimes as recommended in lifetime. See lifetime for an example.
Backref
backref →
B
base-62-number
A backref is used to refer to a previous part of the mangled symbol. This provides a simple form of compression to reduce the length of the mangled symbol. This can help reduce the amount of work and resources needed by the compiler, linker, and loader.
It consists of the character B
followed by a base-62-number.
The number indicates the 0-based offset in bytes starting from just after the _R
prefix of the symbol.
The backref represents the corresponding element starting at that position.
backrefs always refer to a position before the backref itself.
The backref compression relies on the fact that all substitutable symbol elements have a self-terminating mangled form. Given the start position of the encoded node, the grammar guarantees that it is always unambiguous where the node ends. This is ensured by not allowing optional or repeating elements at the end of substitutable productions.
Recommended Demangling
A backref should be demangled by rendering the element that it points to. Care should be considered when handling deeply nested backrefs to avoid using too much stack.
Example:
fn main() { example::<Example, Example>(); } struct Example; pub fn example<T, U>() {}
The symbol for function
example
is:_RINvCs7qp2U7fqm6G_7mycrate7exampleNtB2_7ExampleBw_EB2_ │├┘ │├┘ │├┘ ││ ││ ││ ││ ││ │└── backref to offset 3 (crate-root) ││ ││ └─── backref for instantiating-crate path ││ │└────── backref to offset 33 (path to Example) ││ └─────── backref for second generic-arg │└───────────────── backref to offset 3 (crate-root) └────────────────── backref for first generic-arg (first segment of Example path)
Recommended demangling:
mycrate::example::<mycrate::Example, mycrate::Example>
Instantiating crate
instantiating-crate → path
The instantiating-crate is an optional element of the symbol-name which can be used to indicate which crate is instantiating the symbol. It consists of a single path.
This helps differentiate symbols that would otherwise be identical, for example the monomorphization of a function from an external crate may result in a duplicate if another crate is also instantiating the same generic function with the same types.
In practice, the instantiating crate is also often the crate where the symbol is defined, so it is usually encoded as a backref to the crate-root encoded elsewhere in the symbol.
Recommended Demangling
The instantiating-crate usually need not be displayed.
Example:
std::path::Path::new("example");
The symbol for
Path::new::<str>
instantiated from themycrate
crate is:_RINvMsY_NtCseXNvpPnDBDp_3std4pathNtB6_4Path3neweECs7qp2U7fqm6G_7mycrate └──┬───┘ │ └── instantiating crate identifier `mycrate`
Recommended demangling:
<std::path::Path>::new::<str>
Vendor-specific suffix
vendor-specific-suffix → (
.
|$
) suffixsuffix → {byte}
The vendor-specific-suffix is an optional element at the end of the symbol-name.
It consists of either a .
or $
character followed by zero or more bytes.
There are no restrictions on the characters following the period or dollar sign.
This suffix is added as needed by the implementation.
One example where this can happen is when locally unique names need to become globally unique.
LLVM can append a .llvm.<numbers>
suffix during LTO to ensure a unique name,
and $
can be used for thread-local data on Mach-O.
In these situations it's generally fine to ignore the suffix;
the suffixed name has the same semantics as the original.
Recommended Demangling
The vendor-specific-suffix usually need not be displayed.
Example:
use std::cell::RefCell; thread_local! { pub static EXAMPLE: RefCell<u32> = RefCell::new(1); }
The symbol for
EXAMPLE
on macOS may have the following for thread-local data:_RNvNvNvCs7qp2U7fqm6G_7mycrate7EXAMPLE7___getit5___KEY$tlv$init └───┬───┘ │ └── vendor-specific-suffix
Recommended demangling:
mycrate::EXAMPLE::__getit::__KEY
Common rules
decimal-number →
0
| non-zero-digit {digit}non-zero-digit →
1
|2
|3
|4
|5
|6
|7
|8
|9
digit →0
| non-zero-digitlower →
a
|b
|c
|d
|e
|f
|g
|h
|i
|j
|k
|l
|m
|n
|o
|p
|q
|r
|s
|t
|u
|v
|w
|x
|y
|z
upper →
A
|B
|C
|D
|E
|F
|G
|H
|I
|J
|K
|L
|M
|N
|O
|P
|Q
|R
|S
|T
|U
|V
|W
|X
|Y
|Z
A decimal-number is encoded as one or more digits indicating a numeric value in decimal.
The value zero is encoded as a single byte 0
.
Beware that there are situations where 0
may be followed by another digit that should not be decoded as part of the decimal-number.
For example, a zero-length identifier within a nested-path which is in turn inside another nested-path will result in two identifiers in a row, where the first one only has the encoding of 0
.
A digit is an ASCII number.
A lower and upper is an ASCII lower and uppercase letter respectively.
base-62-number
base-62-number → { digit | lower | upper }
_
A base-62-number is an encoding of a numeric value.
It uses ASCII numbers and lowercase and uppercase letters.
The value is terminated with the _
character.
If the value is 0, then the encoding is the _
character without any digits.
Otherwise, one is subtracted from the value, and it is encoded with the mapping:
0
-9
maps to 0-9a
-z
maps to 10 to 35A
-Z
maps to 36 to 61
The number is repeatedly divided by 62 (with integer division round towards zero) to choose the next character in the sequence. The remainder of each division is used in the mapping to choose the next character. This is repeated until the number is 0. The final sequence of characters is then reversed.
Decoding is a similar process in reverse.
Examples:
Value | Encoding |
---|---|
0 | _ |
1 | 0_ |
11 | a_ |
62 | Z_ |
63 | 10_ |
1000 | g7_ |
Symbol grammar summary
The following is a summary of all of the productions of the symbol grammar.
symbol-name →
_R
decimal-numberopt path instantiating-crateopt vendor-specific-suffixoptpath →
crate-root
| inherent-impl
| trait-impl
| trait-definition
| nested-path
| generic-args
| backrefcrate-root →
C
identifier
inherent-impl →M
impl-path type
trait-impl →X
impl-path type path
trait-definition →Y
type path
nested-path →N
namespace path identifier
generic-args →I
path {generic-arg}E
identifier → disambiguatoropt undisambiguated-identifier
undisambiguated-identifier →u
opt decimal-number_
opt bytes
bytes → {UTF-8 bytes}disambiguator →
s
base-62-numberimpl-path → disambiguatoropt path
type →
basic-type
| array-type
| slice-type
| tuple-type
| ref-type
| mut-ref-type
| const-ptr-type
| mut-ptr-type
| fn-type
| dyn-trait-type
| path
| backrefbasic-type → lower
array-type →A
type const
slice-type →S
type
tuple-type →T
{type}E
ref-type →R
lifetimeopt type
mut-ref-type →Q
lifetimeopt type
const-ptr-type →P
type
mut-ptr-type →O
type
fn-type →F
fn-sig
dyn-trait-type →D
dyn-bounds lifetimegeneric-arg →
lifetime
| type
|K
constconst →
type const-data
|p
| backrefconst-data →
n
opt {hex-digit}_
hex-digit → digit |
a
|b
|c
|d
|e
|f
fn-sig → binderopt
U
opt (K
abi)opt {type}E
typeabi →
C
| undisambiguated-identifierdyn-bounds → binderopt {dyn-trait}
E
dyn-trait → path {dyn-trait-assoc-binding}
dyn-trait-assoc-binding →p
undisambiguated-identifier typebinder →
G
base-62-numbervendor-specific-suffix → (
.
|$
) suffix
suffix → {byte}decimal-number →
0
| non-zero-digit {digit}base-62-number → { digit | lower | upper }
_
non-zero-digit →
1
|2
|3
|4
|5
|6
|7
|8
|9
digit →0
| non-zero-digit
lower →a
|b
|c
|d
|e
|f
|g
|h
|i
|j
|k
|l
|m
|n
|o
|p
|q
|r
|s
|t
|u
|v
|w
|x
|y
|z
upper →A
|B
|C
|D
|E
|F
|G
|H
|I
|J
|K
|L
|M
|N
|O
|P
|Q
|R
|S
|T
|U
|V
|W
|X
|Y
|Z
Encoding of Rust entities
The following are guidelines for how Rust entities are encoded in a symbol. The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous.
-
Named functions, methods, and statics shall be represented by a path production.
-
Paths should be rooted at the inner-most entity that can act as a path root. Roots can be crate-ids, inherent impls, trait impls, and (for items within default methods) trait definitions.
-
The compiler is free to choose disambiguation indices and namespace tags from the reserved ranges as long as it ascertains identifier unambiguity.
-
Generic arguments that are equal to the default should not be encoded in order to save space.