Using Box<T> to Point to Data on the Heap
The most straightforward smart pointer is a box, whose type is written Box
- Boxes allow you to store data on the heap rather than the stack.
- What remains on the stack is the pointer to the heap data.
- Refer to Chapter 4 to review the difference between the stack and the heap.
You’ll use them most often in these situations:
- When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
- When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
- When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
- Using Box<T> to Point to Data on the Heap
- Using a Box<T> to Store Data on the Heap
- Enabling Recursive Types with Boxes
The most straightforward smart pointer is a box, whose type is written
Box<T>:
- Boxes allow you to store data on the heap rather than the stack.
- What remains on the stack is the pointer to the heap data.
- Refer to Chapter 4 to review the difference between the stack and the heap.
Boxes don’t have performance overhead, other than storing their data on the heap instead of on the stack. But they don’t have many extra capabilities either.
You’ll use them most often in these situations:
- When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
- When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
- When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
- We’ll demonstrate the first situation in the “Enabling Recursive Types with Boxes” section.
- In the second case, transferring ownership of a large amount of data can take a long time because the data is copied around on the stack. To improve performance in this situation, we can store the large amount of data on the heap in a box. Then, only the small amount of pointer data is copied around on the stack, while the data it references stays in one place on the heap.
- The third case is known as a trait object, and Chapter 17 devotes an entire section, “Using Trait Objects That Allow for Values of Different Types,” just to that topic. So what you learn here you’ll apply again in Chapter 17!
Using a Box<T> to Store Data on the Heap
Before we discuss the heap storage use case for Box<T>, we’ll cover the
syntax and how to interact with values stored within a Box<T>.
Listing 15-1 shows how to use a box to store an i32 value on the heap:
Filename: src/main.rs
fn main() {
let b = Box::new(5);
println!("b = {}", b);
}
Listing 15-1: Storing an i32 value on the heap using a
box
- We define the variable
bto have the value of aBoxthat points to the value5, which is allocated on the heap. - This program will print
b = 5; in this case, we can access the data in the box similar to how we would if this data were on the stack. - Just like any owned value, when a box goes out of
scope, as
bdoes at the end ofmain, it will be deallocated. - The deallocation happens both for the box (stored on the stack) and the data it points to (stored on the heap).
Putting a single value on the heap isn’t very useful, so you won’t use boxes by
themselves in this way very often. Having values like a single i32 on the
stack, where they’re stored by default, is more appropriate in the majority of
situations.
Let’s look at a case where boxes allow us to define types that we wouldn’t be allowed to if we didn’t have boxes.
Enabling Recursive Types with Boxes
A value of recursive type can have another value of the same type as part of itself.
Recursive types pose an issue because at compile time Rust needs to know how much space a type takes up. However, the nesting of values of recursive types could theoretically continue infinitely, so Rust can’t know how much space the value needs.
Because boxes have a known size, we can enable recursive types by inserting a box in the recursive type definition.
As an example of a recursive type, let’s explore the cons list. This is a data type commonly found in functional programming languages:
- The cons list type we’ll define is straightforward except for the recursion;
- therefore, the concepts in the example we’ll work with will be useful any time you get into more complex situations involving recursive types.
More Information About the Cons List
A cons list is a data structure that comes from the Lisp programming language and its dialects and is made up of nested pairs, and is the Lisp version of a linked list.
Its name comes from the
consfunction (short for “construct function”) in Lisp that constructs a new pair from its two arguments. By callingconson a pair consisting of a value and another pair, we can construct cons lists made up of recursive pairs.
For example, here’s a pseudocode representation of a cons list containing the list 1, 2, 3 with each pair in parentheses:
(1, (2, (3, Nil)))
Each item in a cons list contains two elements: the value of the current item and the next item.
- The last item in the list contains only a value called
Nilwithout a next item. - A cons list is produced by recursively calling the
consfunction. - The canonical name to denote the base case of the recursion is
Nil. - Note that this is not the same as the “null” or “nil” concept in Chapter 6, which is an invalid or absent value.
The cons list isn’t a commonly used data structure in Rust.
-
Most of the time when you have a list of items in Rust,
Vec<T>is a better choice to use. -
Other, more complex recursive data types are useful in various situations,
but by starting with the cons list in this chapter, we can explore how boxes let us define a recursive data type without much distraction.
Listing 15-2 contains an enum definition for a cons list. Note that this code
won’t compile yet because the List type doesn’t have a known size, which
we’ll demonstrate.
Filename: src/main.rs
enum List {
Cons(i32, List),
Nil,
}
fn main() {}
Listing 15-2: The first attempt at defining an enum to
represent a cons list data structure of i32 values
Note: We’re implementing a cons list that holds only
i32values for the purposes of this example. We could have implemented it using generics, as we discussed in Chapter 10, to define a cons list type that could store values of any type.
Using the List type to store the list 1, 2, 3 would look like the code in
Listing 15-3:
Filename: src/main.rs
enum List {
Cons(i32, List),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Cons(2, Cons(3, Nil)));
}
Listing 15-3: Using the List enum to store the list 1, 2, 3
- The first
Consvalue holds1and anotherListvalue. - This
Listvalue is anotherConsvalue that holds2and anotherListvalue. - This
Listvalue is one moreConsvalue that holds3and aListvalue, which is finallyNil, the non-recursive variant that signals the end of the list.
If we try to compile the code in Listing 15-3, we get the error shown in Listing 15-4:
$ cargo run
Compiling cons-list v0.1.0 (file:///projects/cons-list)
error[E0072]: recursive type `List` has infinite size
--> src/main.rs:1:1
|
1 | enum List {
| ^^^^^^^^^ recursive type has infinite size
2 | Cons(i32, List),
| ---- recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `List` representable
|
2 | Cons(i32, Box<List>),
| ++++ +
error[E0391]: cycle detected when computing drop-check constraints for `List`
--> src/main.rs:1:1
|
1 | enum List {
| ^^^^^^^^^
|
= note: ...which immediately requires computing drop-check constraints for `List` again
= note: cycle used when computing dropck types for `Canonical { max_universe: U0, variables: [], value: ParamEnvAnd { param_env: ParamEnv { caller_bounds: [], reveal: UserFacing, constness: NotConst }, value: List } }`
Some errors have detailed explanations: E0072, E0391.
For more information about an error, try `rustc --explain E0072`.
error: could not compile `cons-list` due to 2 previous errors
Listing 15-4: The error we get when attempting to define a recursive enum
- The error shows this type “has infinite size.”
- The reason is that we’ve defined
Listwith a variant that is recursive: it holds another value of itself directly. - As a result, Rust can’t figure out how much space it needs to store a
Listvalue.
Let’s break down why we get this error.
- First, we’ll look at how Rust decides how much space it needs to store a value of a non-recursive type.
Computing the Size of a Non-Recursive Type
Recall the Message enum we defined in Listing 6-2 when we discussed enum
definitions in Chapter 6:
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
fn main() {}
To determine how much space to allocate for a Message value, Rust goes
through each of the variants to see which variant needs the most space.
- Rust sees that
Message::Quitdoesn’t need any space,Message::Moveneeds enough space to store twoi32values, and so forth. - Because only one variant will be
used, the most space a
Messagevalue will need is the space it would take to store the largest of its variants.
Contrast this with what happens when Rust tries to determine how much space a recursive type like the
Listenum in Listing 15-2 needs.
- The compiler starts
by looking at the
Consvariant, which holds a value of typei32and a value of typeList. - Therefore,
Consneeds an amount of space equal to the size of ani32plus the size of aList. - To figure out how much memory the
Listtype needs, the compiler looks at the variants, starting with theConsvariant. - The
Consvariant holds a value of typei32and a value of typeList, and this process continues infinitely, as shown in Figure 15-1.
Figure 15-1: An infinite List consisting of infinite
Cons variants
Using Box<T> to Get a Recursive Type with a Known Size
Because Rust can’t figure out how much space to allocate for recursively defined types, the compiler gives an error with this helpful suggestion:
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `List` representable
|
2 | Cons(i32, Box<List>),
| ++++ +
In this suggestion, “indirection” means that instead of storing a value directly, we should change the data structure to store the value indirectly by storing a pointer to the value instead.
Because a
Box<T>is a pointer, Rust always knows how much space aBox<T>needs: a pointer’s size doesn’t change based on the amount of data it’s pointing to.
- This means we can put a
Box<T>inside theConsvariant instead of anotherListvalue directly. - The
Box<T>will point to the nextListvalue that will be on the heap rather than inside theConsvariant. - Conceptually, we still have a list, created with lists holding other lists, but this implementation is now more like placing the items next to one another rather than inside one another.
We can change the definition of the List enum in Listing 15-2 and the usage
of the List in Listing 15-3 to the code in Listing 15-5, which will compile:
Filename: src/main.rs
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}
Listing 15-5: Definition of List that uses Box<T> in
order to have a known size
- The
Consvariant needs the size of ani32plus the space to store the box’s pointer data. - The
Nilvariant stores no values, so it needs less space than theConsvariant. - We now know that any
Listvalue will take up the size of ani32plus the size of a box’s pointer data.
By using a box, we’ve broken the infinite, recursive chain, so the compiler can figure out the size it needs to store a
Listvalue.
Figure 15-2 shows what the Cons variant looks like now.
Figure 15-2: A List that is not infinitely sized
because Cons holds a Box
- Boxes provide only the indirection and heap allocation;
- they don’t have any other special capabilities, like those we’ll see with the other smart pointer types.
- They also don’t have the performance overhead that these special capabilities incur, so they can be useful in cases like the cons list where the indirection is the only feature we need.
- We’ll look at more use cases for boxes in Chapter 17, too.
The
Box<T>type is a smart pointer because it implements theDereftrait, which allowsBox<T>values to be treated like references.
When a
Box<T>value goes out of scope, the heap data that the box is pointing to is cleaned up as well because of theDroptrait implementation.
These two traits will be even more important to the functionality provided by the other smart pointer types we’ll discuss in the rest of this chapter. Let’s explore these two traits in more detail.