Program synthesis is the act of automatically constructing a program that
fulfills a given specification. Perhaps you are interested in sketching a
program, leaving parts of it incomplete, and then having a tool fill in
those missing bits for you? Or perhaps you are a compiler, and you have some
instruction sequence, but you want to find an equivalent-but-better instruction
sequence? Program synthesizers promise to help
you out in these situations!
I recently stumbled across Adrian Sampson’s Program Synthesis is
Possible blog post. Adrian describes and implements
minisynth, a toy program synthesizer that generates constants for holes in a
template program when given a specification. What fun! As a way to learn more
about program synthesis myself, I ported minisynth to Rust.
The Language
The input language is quite simple. The only type is the signed integer and our
operations are addition, subtraction, multiplication, division, negation, left-
and right-shift, and if-then-else conditionals.
Here is an example:
And here is conditional expression that evaluates to 27 if x is non-zero, and 42
otherwise:
Abstract Syntax Tree
My representation of the AST uses an id-based arena and interns
identifier strings, which is a bit overkill for such a small program, but is a
pattern that has worked well for me in Rust. This pattern makes implementing
the petgraph crate’s traits easy, which gets you all the graph
traversals and dominator algorithms, etc that a non-toy implementation will
eventually want.
The ast::Context structure contains the arena of AST nodes and the interned
strings.
The ast::Node definition is an enum with a variant for each type of
expression in the language.
The ast::Context also has methods to allocate new ast::Nodes, get interned
strings, and access allocated nodes. These definitions are straightforward, so I
have elided them here. If you’re interested, you can look at the source on
GitHub.
To interpret expressions, we need a lookup function that maps identifiers to
values, the ast::Context that owns the AST nodes, and the id of the node we
are evaluating. We match on this node, and handle the following cases:
If the node represents a constant, we return that node’s associated constant
value.
If the node represents an identifier, we get a reference to its interned
identifier string from the context, and then query the lookup function for
its value.
If the node represents an operator, we recursively evaluate its operands and
then apply the operator to the operands’ values. For division, we also check
for divide-by-zero return an error.
Here is our initial interpreter function:
From Interpreter to Synthesizer
Our synthesizer will take a specification program and a template program. The
template program may contain holes — in our system, these are variables
that start with the letter “h”. Our goal is to synthesize constant values for
these holes such that the template program implements the specification for all
values of the non-hole variables.
Let’s consider the example from the original blog post:
Can we transform multiplication by ten into the sum of two constant left shifts?
If we can find constant values for h1 and h2, then the answer is yes. Our
synthesizer should answer that either h1 = 1 and h2 = 3, or that h1 = 3
and h2 = 1.
To implement synthesis, we will walk the AST and generate constraints for the
Z3 SMT solver that reflect the program’s semantics. We will do this for
both the specification and the template, and then constrain the results of each
of them to be equal to each other for every non-hole constant variable. Finally,
we ask Z3 if it can find a solution to all of the constraints. Any solution that
exists will provide definitions for the holes.
That is, we are asking the solver to find a solution for
∃h1h2…hm: ∀c1c2…cn: t = s
where t is the template’s constraints, s is the specification’s constraints,
hi are holes in the template, and cj are constants in
the template and specification programs.
Adrian’s original Python implementation of minisynth takes advantage of Python’s
dynamic nature and the Z3 Python library’s operator overloading to reuse the
interpreter for constraint generation without any changes to the interpreter
function. All you have to do is supply a lookup function that returns Z3
bitvector variables instead of signed integers. A neat trick!
For our Rust implementation, we want to reuse the interpreter as well, but Rust
is statically typed and the z3 crate for Rust doesn’t implement operator
overloading. So we will factor out an interpret function from our eval
function that is generic over some abstract interpreter.
An abstract interpreter must have an associated output type. For normal
evaluation, this will be an i64, and for constraints generation it will be a
Z3 constraint. The abstract interpreter must have methods for evaluating each
operation of the input language, taking its operands as its output type,
applying the operation to them, and returning the results as its output type. It
must also provide a way to translate constants and identifiers into its output
type.
Next we make an interpretation function that takes an abstract interpreter and
uses it to interpret an expression of our input language. This looks almost the
same as our original eval function, but there is one tricky bit: encoding
conditional’s semantics into interpreter methods without using Rust’s control
flow, which would be invisible to the solver. To do this, we multiply the
activated conditional arm by one and the deactivated conditional arm by zero and
then sum the products. An alternative approach would be to add a method for
interpreting conditionals directly to the AbstractIntepret trait.
We refactor eval to apply an implementation of AbstractInterpret that has an
i64 associated output type and directly evaluates expressions:
Finally, we are ready to start implementing synthesis!
First we create an implementation of AbstractInterpret that builds up Z3
constraints. Its lookup method keeps track of which variables have been used,
categorizes them by whether they are a hole or an unknown constant, and makes
sure that subsequent lookups of the same identifier return the same Z3
variable. All other methods map straightforwardly onto Z3 method calls.
Our synthesis function will take the specification program and the template
program, and then use the Synthesize abstract interpreter to generate
constraints for each of them.
Next, we extract the constant variables and create our goal, which is a for-all
constraint. The template must be equal to the specification for all possible
values these constants could take.
Now that we have constructed our goal, we ask Z3 to solve it. If it can find an
answer, we extract the values its assigned to each of the holes and return the
results as a hash map.
And now we have a synthesizer!
When given
our synthesis gives the answer
And when given
it gives us the answer
Conclusion
This was quite fun!
Thanks to Adrian Sampson for writing the original blog post and minisynth Python
implementation.
If you would like to learn more, here are a few resources:
Synthesizing Constants by John Regehr. A blog post about
synthesizing constants in practice, informed by the experience of working on
Souper. Lots of links to more papers to read.
Rosette is a Racket-based language by Emina Torlak that leverages
the similarity between interpreters and constraint generation for synthesizers
to make developing synthesis and verification tooling easy and fun.