Announcing Better Support for Fuzzing with Structured Inputs in Rust
New releases of arbitrary
, libfuzzer-sys
, and cargo fuzz
provide better support for fuzzing with custom, well-formed inputs.
Today, on behalf of the Rust Fuzzing Authority, I’d like to
announce new releases of the arbitrary
,
libfuzzer-sys
, and cargo fuzz
crates. Collectively, these releases better support writing fuzz targets that
take well-formed instances of custom input types. This enables us to combine
powerful, coverage-guided fuzzers with smart test case generation.
Install or upgrade cargo fuzz
with:
cargo install --force cargo-fuzz
To upgrade your fuzz targets, bump your libfuzzer-sys
dependency to 0.2.0
on
crates.io. That should be all that’s needed for most cases. However, if you were
already using Arbitrary
inputs for your fuzz target, some changes will be
required. See the upgrading fuzz targets section
below for more details.
Fuzzing with Well-Formed, Custom Inputs
Imagine we are testing an ELF object file parser. In this scenario, it makes sense to fuzz the parser by feeding it raw byte buffers as input. Parsers for binary formats take raw byte buffers as input; there’s no impedance mismatch here. Additionally, coverage-guided fuzzers like libFuzzer naturally generate byte buffers as inputs for fuzz targets, so getting fuzzing up and running is easy.
Now instead imagine we are testing a color conversion library: converting
between RGB colors to HSL colors and back. Our color conversion functions don’t
take raw byte buffers as inputs, they take Rgb
or Hsl
structures. And these
structures are defined locally by our library; libFuzzer doesn’t have any
knowledge of them and can’t generate instances of them on its own.
We could write a color string parser, parse colors from the fuzzer-provided, raw input bytes, and then pass the parsed results into our color conversion functions. But now the fuzzer is going to spend a bunch of time exploring and testing our color string parser, before it can get “deeper” into the fuzz target, where our color conversions are. Small changes to the input can result in invalid color strings, that bounce off the parser. Ultimately, our testing is less efficient than we want, because our real goal is to exercise the conversions but we’re doing all this other stuff. On top of that, there’s this hump we have to get over to start fuzzing, since we have to write a bunch of parsing code if we didn’t already have it.
This is where the Arbitrary
trait comes in. Arbitrary
lets us create structured inputs from raw byte buffers with as thin a veneer as
possible. As best it can, it preserves the property that small changes to the
input bytes lead to small changes in the corresponding Arbitrary
instance
constructed from those input bytes. This helps coverage-guided fuzzers like
libFuzzer efficiently explore the input space. The new 0.3.0
release of
arbitrary
contains an overhaul of the trait’s design to further these goals.
Implementing Arbitrary
for custom types is easy: 99% of the time, all we need
to do is automatically derive it. So let’s do that for Rgb
in our color
conversion library:
The libfuzzer-sys
crate lets us define fuzz targets where the
input type is anything that implements Arbitrary
:
Now we have a fuzz target that works with our custom input type directly, is the thinnest abstraction possible over the raw, fuzzer-provided input bytes, and we didn’t need to write any custom parsing logic ourselves!
This isn’t limited to simple structures like Rgb
. The arbitrary
crate
provides Arbitrary
implementations for everything in std
that you’d expect
it to: bool
, u32
, f64
, String
, Vec
, HashMap
, PathBuf
,
etc… Additionally, the custom derive works with all kinds of struct
s and
enum
s, as long as each sub-field implements Arbitrary
.
For more details check out the new Structure-Aware Fuzzing section of The
Rust Fuzzing Book and the arbitrary
crate. The
book has another neat example, where we’re testing a custom allocator
implementation and the fuzz target takes a sequence of malloc
, realloc
, and
free
commands.
Bonus: Improved UX in cargo fuzz
When cargo fuzz
finds a failing input, it will display the Debug
formatting
of the failing input (particularly nice with custom Arbitrary
inputs) and
suggest common next tasks, like reproducing the failure or running test case
minimization.
For example, if we write a fuzz target that panics when r < g < b
for a given
Rgb
instance, libFuzzer will very quickly find an input that triggers the
failure, and then cargo fuzz
will give us this friendly output:
Failing input:
fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554
Output of `std::fmt::Debug`:
Rgb {
r: 30,
g: 40,
b: 110,
}
Reproduce with:
cargo fuzz run my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554
Minimize test case with:
cargo fuzz tmin my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554
Upgrading Fuzz Targets
First, make sure you’ve upgraded cargo fuzz
:
cargo install --force cargo-fuzz
Next, upgrade your libfuzzer-sys
from a git dependency to the 0.2.0
version
on crates.io in your Cargo.toml
:
If your existing fuzz targets were not using custom Arbitrary
inputs, and
were taking &[u8]
slices of raw bytes, then you’re done!
If you’re implementing Arbitrary
for your own custom input types, you’ll need
to bump your dependency on Arbitrary
to version 0.3
. We recommend that,
unless you have any specialized logic in your Arbitrary
implementation, that
you use the custom derive.
Enable the custom derive by requiring the "derive"
feature:
And then derive Arbitrary
automatically:
Finally, if you do you specialized logic in your Arbitrary
implementation, and
can’t use the custom derive, your implementations will change something like
this:
The trait has been simplified a little bit, and Unstructured
is a concrete
type now, rather than a trait you need to parameterize over. Check out the
CHANGELOG
for more details.
CHANGELOG
s
Thank You! 💖
Thanks to everyone who contributed to these releases!
- Alex Rebert
- koushiro
- Manish Goregaokar
- Nick Fitzgerald
- Simonas Kazlauskas
And a special shout out to Manish for fielding so many pull request reviews!
FAQ
What is fuzzing?
Fuzzing is a software testing technique used to find security, stability, and correctness issues by feeding pseudo-random data as input to the software.
Learn more about fuzzing Rust code in The Rust Fuzzing Book!
How is all this different from quickcheck
and proptest
?
If you’re familiar with quickcheck
or proptest
and
their own versions of the Arbitrary
trait, you might be wondering what the
difference is between what’s presented here and those tools.
The primary goal of what’s been presented here is to have a super-thin,
efficient layer on top of coverage-guided, mutation-based fuzzers like
libFuzzer. That means the paradigm is a little different from quickcheck
and
proptest
. For example, arbitrary::Arbitrary
doesn’t take a random number
generator like quickcheck::Arbitrary
does. Instead it takes an Unstructured
,
which is a helpful wrapper around a raw byte buffer provided by the fuzzer. It’s
similar to libFuzzer’s
FuzzedDataProvider
.
The goal is to preserve, as much as possible, the actual input given to us by
the fuzzer, and make sure that small changes in the raw input lead to small
changes in the value constructed via arbitrary::Arbitrary
. Similarly, we don’t
want different, customizable test case generation strategies like proptest
supports, because we leverage the fuzzer’s insight into code coverage to
efficiently explore the input space. We don’t want to get in the way of that,
and step on the fuzzer’s toes.
This isn’t to say that quickcheck
and proptest
are without value! On the
contrary, if you are not using a coverage-guided fuzzer like libFuzzer or AFL
— perhaps because you’re working on an unsupported platform — and
are instead using a purely-random, generation-based fuzzing setup, then both
quickcheck
and proptest
are fantastic options to consider.