Patrice Godefroid's Research Overview

September 2021


My main research area during the last 32 years has been software model checking in a broad sense. Here are some main themes. (I also worked on many other smaller projects on other topics that cannot be summarized here -- please see my list of publications for more information on those.)

Partial-Order Reduction

In 1988, I started doing research under the supervision of Professor Pierre Wolper on what is now known as partial-order reduction. Partial-order reduction (POR) denotes a family of algorithmic techniques for pruning the state spaces of concurrent reactive programs in such a way that parts of the state space that are pruned away are guaranteed not to contain error states of a specific type, such as deadlocks. POR can considerably speed up verification by model checking, and is nowadays implemented in many model checkers. With my advisor Pierre Wolper and collaborators Gerard Holzmann and Didier Pirottin, we pioneered the development of POR, together with other prominent contributors such as Doron Peled and Antti Valmari. My own work on POR culminated in January 1996 with the publication of my PhD thesis in Springer's Lecture Notes in Computer Science series. This work was the co-recipient of the CAV 2014 Award.

Software Model Checking via Systematic Testing: VeriSoft

A year or so after joining Bell Labs in 1994, I started working on software model checking, that is, how to broaden the scope of model checking from abstract software systems (specified in modeling languages) to concrete software systems (described using programming languages). In 1996, I designed and implemented a first version of VeriSoft, the first model checker for analyzing software written in mainstream programming languages such as C and C++. The main idea behind VeriSoft is simple: like a traditional model checker computes the product of finite-state machines described in some modeling language, VeriSoft dynamically computes the product of operating-system (Unix-like) processes described in any programming language. Several technical innovations made this possible: the use of a run-time scheduler to systematically drive process executions, a construct to simulate nondeterminism at run-time, and the use of partial-order reduction to make a search in the state space of concurrent OS processes tractable even without storing any intermediate states in memory. Many of these features have now been adopted in other software model checkers (like Java PathFinder, CMC, CHESS, etc.).

From 1996 to 2001, I worked mostly on developing VeriSoft further. With several Bell Labs colleagues, we investigated many extensions, described in several published papers. We also started applying VeriSoft to check the correctness of various software applications inside Lucent Technologies. After several successful applications, VeriSoft was also considered in 1999 for a Lucent-funded start-up. Working with Lucent's venture-capitalist team was an enlightening experience. That same year, I taught a course (mostly on VeriSoft) at Stanford University hosted by Professor David Dill, another valuable experience (teaching a new course takes a lot of time!). Meanwhile, VeriSoft was gaining traction in some development and testing groups where its use contributed to finding several expensive, customer-visible bugs in various Lucent products. Since 1999, VeriSoft has also been publicly available outside Lucent and has been licensed to hundreds of users in industry and academia in more than 25 countries. In 2001, I was promoted to "distinguished member of the technical staff" of Bell Labs for essentially my work on VeriSoft and its successful applications in Lucent.

Software Model Checking via Abstraction: May/Must Abstractions, 3-Valued Temporal Logics, and Generalized Model Checking

Around 2000, another approach to software model checking started to emerge: the static approach. Unlike VeriSoft, static software model checkers parse the source code of the software to be checked, compute a conservative abstraction of the code, and then perform model checking on this abstraction. One of the pioneering projects in that area is SLAM, a static software model checker for C code, developed at Microsoft Research originally by Tom Ball and Sriram Rajamani. From 2001 to 2004, with Glenn Bruns, Radha Jagadeesan and Michael Huth, we developed a new framework for static software model checking that uses may/must abstractions instead of may-only conservative abstractions. With such abstractions, both proofs and counterexamples (bugs) are now guaranteed to be sound, by construction. We also showed that verification results can sometimes be more precise with generalized model checking, which checks whether there exists a concretization of an abstraction satisfying a temporal property. From a theoretical point of view, generalized model checking is an interesting problem since it generalizes both model checking (when the abstraction is complete) and satisfiability (when the abstraction is completely unknown), probably the two most studied problems related to temporal logic and verification. From a practical point of view, our work in this area helps explain the foundation of static software model checking.

Automating Software Testing using Program Analysis: DART and SMART

In 2004, I started working with renewed energy on how to extend the VeriSoft approach (aka software model checking via systematic testing) to deal with data-driven applications (after a first attempt described in a PLDI 1998 paper). With Nils Klarlund and Koushik Sen (both funded by my NSF grant with my Bell Labs colleagues Dennis Dams and Kedar Namjoshi), we implemented a first version of Directed Automated Random Testing, or DART for short, a new approach to automate testing that combines three main techniques: (1) automated interface extraction from source code, (2) random testing at that interface, and (3) dynamic test generation to direct executions along alternative program paths. The main strength of DART is that testing can be performed completely automatically on any program that compiles, as there is no need to manually write any test driver or harness code. Also, whenever a symbolic expression cannot be generated for an expression involving some input, the concrete value of that input can be used to simplify this expression, which allows dynamic test generation to drive executions through program statements that purely-static test generation cannot handle.

A DART directed search attempts to sweep through all the feasible execution paths of a program using dynamic test generation:  the program under test is first executed on some random or well-formed input, symbolic constraints on inputs are gathered at conditional branches during that run, and then a constraint solver is used to generate variants of the previous inputs in order to steer the next execution of the program towards an alternative program branch. This process is repeated until all  (in practice, many) feasible program paths of the program are executed, while detecting various types of errors using run-time checking tools, like Purify, for instance. DART can thus be viewed as one way of combining static (interface extraction, symbolic execution) and dynamic (testing, run-time checking) program analysis with model-checking techniques (systematic state-space exploration).

Obviously, systematically executing all feasible program paths does not scale to large, realistic programs. In 2006, I developed a variant of the DART search algorithm that performs dynamic test generation compositionally. This new algorithm, dubbed SMART, eliminates path explosion due to interprocedural (interblock) paths: the number of whole execution paths becomes linear in the number of intraprocedural paths, instead of being possibly exponential. Moreover, for programs whose conditional statements can all be driven using symbolic execution, this efficiency gain is obtained without losing any precision. A SMART search is key to make the DART approach (aka systematic dynamic test generation) scalable to large programs if the goal is to achieve full path coverage (i.e., verification).

The DART technique, also called dynamic test generation, execution-generated tests, or concolic testing, has revolutionized automatic test generation, with thousands of citations to our work and dozens of academic and industrial tools implementing this approach. This work was the recipient of the HVC 2009 Award.

Whitebox Fuzzing for Security Testing: SAGE

In 2006, I joined Microsoft Research and started working on the "killer app" for DART, namely fuzzing. Fuzzing, or fuzz testing, is the process of finding security vulnerabilities in input-parsing code by repeatedly testing the parser with modified, or fuzzed, inputs. With Michael Levin and several other Microsoft colleagues including (then-intern) David Molnar, we started developing SAGE, the first whitebox fuzzer for security testing. Whitebox fuzzing extends DART from unit testing to security testing of large programs. SAGE performs dynamic symbolic execution at the x86 binary level, and implements several optimizations that are crucial for dealing with huge execution traces with hundreds of millions of machine instructions, in order to scale to large file parsers embedded in applications with millions of lines of code, like Microsoft Excel or PowerPoint. SAGE also pioneered the use of search heuristics based on code coverage for fuzzing purposes.

Since 2008, SAGE has been running in production for over 1,000 machine-years, automatically fuzzing hundreds of applications. This is the largest computational usage ever for any Satisfiability-Modul-Theories (SMT) solver according to the authors of the Z3 SMT solver (also from Microsoft Research), with around 10 billion constraints processed to date. During all this fuzzing, SAGE found many new security vulnerabilities (buffer overflows) in hundreds of Windows parsers and Office applications, including image processors, media players, file decoders, and document parsers. Notably, SAGE found roughly one third of all the bugs discovered by file fuzzing during the development of Microsoft's Windows 7, saving (many) millions of dollars by avoiding expensive security patches for nearly a billion PCs worldwide. In 2012, I was promoted to "Microsoft Partner" essentially for my work on SAGE and its successful applications in Microsoft.

Today, whitebox fuzzing has been adopted in many other security-testing tools, and has inspired numerous variants (such as greybox fuzzing and hybrid fuzzing) and extensions. Our seminal work on whitebox fuzzing (first published in 2008, recipient of the NDSS 2022 Test-of-Time Award) was credited to introducing the "fuzzing" problem to the program analysis, software engineering, and security academic communities, with thousands of citations to our work.

Fuzzing in the Cloud: Project Springfield

In 2015, with my Microsoft Research colleague David Molnar, I co-founded Project Springfield, the first commercial cloud fuzzing service. Customers who subscribe to this cloud service can submit fuzzing jobs targeting their own software. Fuzzing jobs are processed by creating many virtual machines in the cloud and by running different fuzzing tools (including SAGE) and configurations on each of these machines. Fuzzing results (bugs) are continually collected by the service and post-processed for analysis, triage and prioritization, with final results available directly to customers on a secured website.

Project Springfield operated as a "virtual start-up" (or "special project") inside Microsoft Research. I served as its CTO for 2 years. Project Springfield was renamed Microsoft Security Risk Detection in 2017. Later, the project gradually re-focused on its core technical contributions, in contrast to its initial business aspirations, and evolved into a cloud fuzzing platform called OneFuzz, which became open-source in 2020.

Fuzzing the Cloud: RESTler

In 2017, with my Microsoft Research colleague Marina Polishchuk and intern Vaggelis Atlidakis, we started developing RESTler, the first stateful REST API fuzzing tool for automatically testing cloud services through their REST APIs and finding security and reliability bugs in these services. For a given cloud service with an OpenAPI/Swagger specification, RESTler analyzes its entire specification, and then generates and executes tests that exercise the service through its REST API. RESTler intelligently infers producer-consumer dependencies among request types from the API specification. During testing, it checks for specific classes of bugs and dynamically learns how the service behaves from prior service responses. This intelligence allows RESTler to explore deeper service states reachable only through specific request sequences (hence the term "stateful") and to find more bugs.

In 2020, RESTler became open-source, and its usage has been steadily growing since, both inside and outside Microsoft. Inside Microsoft, RESTler has found 100s of new bugs in Microsoft Azure, Office365 and Bing services, including severe critical bugs. At the time of this writing, RESTler is still under active development.