Patrice Godefroid's
Research Overview
November 2024
My main research area during
the last 35+ years has been software model
checking in a broad sense. Here are some main themes. (I also worked on many
other smaller projects on other topics that cannot be summarized here -- please
see my list of publications for more information on those.)
Partial-Order Reduction
In 1988, I started
doing research under the supervision of Professor Pierre Wolper on what is now known as partial-order
reduction. Partial-order reduction (POR) denotes a family of
algorithmic techniques for pruning the state spaces of concurrent
reactive programs in such a way that parts of the state space that are
pruned away are guaranteed not to contain error states of a specific
type, such as deadlocks. POR can considerably speed up verification by
model checking, and is nowadays implemented in many model
checkers. With my advisor Pierre Wolper and
collaborators Gerard Holzmann and Didier
Pirottin, we pioneered the development of
POR, together with other prominent contributors such as Doron Peled and Antti Valmari. My own
work on POR culminated in January 1996 with the publication of my PhD
thesis in Springer's Lecture
Notes in Computer Science series. This work was the co-recipient
of the CAV 2014 Award.
Software Model Checking via
Systematic Testing: VeriSoft
A year or so after joining Bell
Labs in 1994, I started working on software model checking, that is, how to
broaden the scope of model checking from abstract software systems (specified
in modeling languages) to concrete software systems (described using
programming languages). In 1996, I designed and implemented a first version of VeriSoft, the first model checker for analyzing software
written in mainstream programming languages such as C and C++. The main idea
behind VeriSoft is simple: like a traditional
model checker computes the product of finite-state machines described in some
modeling language, VeriSoft dynamically computes the product of
operating-system (Unix-like) processes described in any programming language.
Several technical innovations made this possible: the use of a run-time
scheduler to systematically drive process executions, a construct to simulate nondeterminism at run-time, and the use of partial-order
reduction to make a search in the state space of concurrent OS processes
tractable even without storing any intermediate states in memory. Many of these
features have now been adopted in other software model checkers (like Java PathFinder, CMC, CHESS, etc.).
From 1996 to 2001, I worked
mostly on developing VeriSoft further. With several
Bell Labs colleagues, we investigated many extensions, described in several
published papers. We also started applying VeriSoft
to check the correctness of various software applications inside Lucent
Technologies. After several successful applications, VeriSoft
was also considered in 1999 for a Lucent-funded start-up. Working with Lucent's
venture-capitalist team was an enlightening experience. That same year, I
taught a course (mostly on VeriSoft) at
Software Model Checking via
Abstraction: May/Must Abstractions, 3-Valued Temporal Logics, and Generalized
Model Checking
Around 2000, another approach
to software model checking started to emerge: the static approach. Unlike VeriSoft, static software model checkers parse the source
code of the software to be checked, compute a conservative abstraction of the
code, and then perform model checking on this abstraction. One of the
pioneering projects in that area is SLAM, a static software model checker for C
code, developed at Microsoft Research originally by Tom Ball and Sriram Rajamani. From 2001 to
2004, with Glenn Bruns, Radha
Jagadeesan and Michael Huth,
we developed a new framework for static software model checking that uses
may/must abstractions instead of may-only conservative abstractions. With such
abstractions, both proofs and counterexamples (bugs) are now guaranteed to be
sound, by construction. We also showed that verification results can sometimes
be more precise with generalized model
checking, which checks whether there exists a concretization of an
abstraction satisfying a temporal property. From a theoretical point of view,
generalized model checking is an interesting problem since it generalizes both
model checking (when the abstraction is complete) and satisfiability
(when the abstraction is completely unknown), probably the two most studied
problems related to temporal logic and verification. From a practical point of
view, our work in this area helps explain the foundation of static software
model checking.
Automating Software Testing
using Program Analysis: DART and SMART
In 2004, I started working with
renewed energy on how to extend the VeriSoft approach
(aka software model checking via systematic testing) to deal with data-driven
applications (after a first attempt described in a PLDI 1998 paper). With Nils Klarlund and Koushik Sen (both funded by my NSF grant with my Bell Labs
colleagues Dennis Dams and Kedar Namjoshi),
we implemented a first version of Directed
Automated Random Testing, or DART for short, a new approach to automate
testing that combines three main techniques: (1) automated interface extraction from source code, (2) random testing at that interface, and
(3) dynamic test generation to direct
executions along alternative program paths. The main strength of DART is that
testing can be performed completely automatically on any program that compiles,
as there is no need to manually write any test driver or harness code. Also,
whenever a symbolic expression cannot be generated for an expression involving
some input, the concrete value of that input can be used to simplify this
expression, which allows dynamic test generation to drive executions through
program statements that purely-static test generation cannot handle.
A DART directed search attempts
to sweep through all the feasible execution paths of a program using dynamic
test generation: the program under
test is first executed on some random or well-formed input, symbolic
constraints on inputs are gathered at conditional branches during that run, and
then a constraint solver is used to generate variants of the previous inputs in
order to steer the next execution of the program towards an alternative program
branch. This process is repeated until all (in
practice, many) feasible program paths of the program are executed, while
detecting various types of errors using run-time checking tools, like Purify,
for instance. DART can thus be viewed as one way of combining static (interface
extraction, symbolic execution) and dynamic (testing, run-time checking)
program analysis with model-checking techniques (systematic state-space
exploration).
Obviously, systematically executing all feasible program paths does not scale to large, realistic programs. In 2006, I developed a variant of the DART search algorithm that performs dynamic test generation compositionally. This new algorithm, dubbed SMART, eliminates path explosion due to interprocedural (interblock) paths: the number of whole execution paths becomes linear in the number of intraprocedural paths, instead of being possibly exponential. Moreover, for programs whose conditional statements can all be driven using symbolic execution, this efficiency gain is obtained without losing any precision. A SMART search is key to make the DART approach (aka systematic dynamic test generation) scalable to large programs if the goal is to achieve full path coverage (i.e., verification).
The DART technique, also called dynamic test generation, execution-generated tests, or concolic testing, has revolutionized automatic test generation, with thousands of citations to our work and dozens of academic and industrial tools implementing this approach. This work was the recipient of the HVC 2009 Award.
Whitebox Fuzzing for Security Testing: SAGE
In 2006, I joined Microsoft Research and started working on the "killer app" for DART, namely fuzzing. Fuzzing, or fuzz testing, is the process of finding security vulnerabilities in input-parsing code by repeatedly testing the parser with modified, or fuzzed, inputs. With Michael Levin and several other Microsoft colleagues including (then-intern) David Molnar, we started developing SAGE, the first whitebox fuzzer for security testing. Whitebox fuzzing extends DART from unit testing to security testing of large programs. SAGE performs dynamic symbolic execution at the x86 binary level, and implements several optimizations that are crucial for dealing with huge execution traces with hundreds of millions of machine instructions, in order to scale to large file parsers embedded in applications with millions of lines of code, like Microsoft Excel or PowerPoint. SAGE also pioneered the use of search heuristics based on code coverage for fuzzing purposes.
Since 2008, SAGE has been running in production for over 1,000 machine-years, automatically fuzzing hundreds of applications. This is the largest computational usage ever for any Satisfiability-Modul-Theories (SMT) solver according to the authors of the Z3 SMT solver (also from Microsoft Research), with around 10 billion constraints processed to date. During all this fuzzing, SAGE found many new security vulnerabilities (buffer overflows) in hundreds of Windows parsers and Office applications, including image processors, media players, file decoders, and document parsers. Notably, SAGE found roughly one third of all the bugs discovered by file fuzzing during the development of Microsoft's Windows 7, saving (many) millions of dollars by avoiding expensive security patches for nearly a billion PCs worldwide. In 2012, I was promoted to "Microsoft Partner" essentially for my work on SAGE and its successful applications in Microsoft.
Today, whitebox fuzzing has been adopted in many other security-testing tools, and has inspired numerous variants (such as greybox fuzzing and hybrid fuzzing) and extensions. Our seminal work on whitebox fuzzing (first published in 2008, recipient of the NDSS 2022 Test-of-Time Award) was credited to introducing the "fuzzing" problem to the program analysis, software engineering, and security academic communities, with thousands of citations to our work.
Fuzzing in the Cloud: Project Springfield
In 2015, with my Microsoft Research colleague David Molnar, I co-founded Project Springfield, the first commercial cloud fuzzing service. Customers who subscribe to this cloud service can submit fuzzing jobs targeting their own software. Fuzzing jobs are processed by creating many virtual machines in the cloud and by running different fuzzing tools (including SAGE) and configurations on each of these machines. Fuzzing results (bugs) are continually collected by the service and post-processed for analysis, triage and prioritization, with final results available directly to customers on a secured website.
Project Springfield operated as a "virtual start-up" (or "special project") inside Microsoft Research. I served as its CTO for 2 years. Project Springfield was renamed Microsoft Security Risk Detection in 2017. Later, the project gradually re-focused on its core technical contributions, in contrast to its initial business aspirations, and evolved into a cloud fuzzing platform called OneFuzz, which became open-source in 2020.
Fuzzing the Cloud: RESTler
In 2017, with my Microsoft Research colleague Marina Polishchuk and intern Vaggelis Atlidakis, we started developing RESTler, the first stateful REST API fuzzing tool for automatically testing cloud services through their REST APIs and finding security and reliability bugs in these services. For a given cloud service with an OpenAPI/Swagger specification, RESTler analyzes its entire specification, and then generates and executes tests that exercise the service through its REST API. RESTler intelligently infers producer-consumer dependencies among request types from the API specification. During testing, it checks for specific classes of bugs and dynamically learns how the service behaves from prior service responses. This intelligence allows RESTler to explore deeper service states reachable only through specific request sequences (hence the term "stateful") and to find more bugs.
In 2020, RESTler became open-source, and its usage has been steadily growing since, both inside and outside Microsoft. Inside Microsoft, RESTler has found 100s of new bugs in Microsoft Azure, Office365 and Bing services, including severe critical bugs. At the time of this writing, RESTler is still under active development.
Code Security at Lacework
In 2022, I joined Lacework, a cloud security start-up, to co-found its Code Security team. In March 2023, my first product at Lacework was released: we developed new runtime-monitoring tech, dubbed Code-Aware Agents, to determine what code is executed (and not executed) in the cloud, with very low cost so that this monitoring can run everywhere, all the time, in production, and at cloud scale. CAA can detect whether vulnerable packages in cloud workloads are ever being executed or not. And it turns out that most open-source packages dragged into cloud workloads through dependencies are actually never executed anytime anywhere. For the first time, CAA can prove when that is the case. This is a game changer for anyone who has ever had to prioritize what vulnerable packages to fix next.
In November 2023, our Code Security team came out of stealth with the public announcement of several new static-analysis products for Software Composition Analysis (i.e., identify all third-party vulnerable packages in code repos) and for finding security-related bugs in first-party code. Our combined static and dynamic analyses extend the Lacework platform to cover the entire development cycle of cloud services, from code to production.
Lacework was acquired by Fortinet in 2024.