Concurrency Analysis Platform and Tools for Finding Concurrency Bugs
< Go Back
This session describes the Concurrency Analysis Platform (CAP) from Microsoft Research and how CAP enables the use of various concurrency bug-finding tools, including CHESS, FeatherLite, and Sober.
The speaker opens this session with a discussion of the challenges in concurrency programming. When developing concurrent programs, rare thread interleavings can result in bugs. These bugs are hard to find, reproduce, and debug. These "Heisenbugs" occasionally surface in systems that have otherwise been running reliably for months. Heisenbugs present a huge productivity problem because developers and testers can spend weeks chasing a single bug.
The speaker then explains that the Concurrency Analysis Platform (CAP) gives users the ability to explore all interleavings. Exploring the interleavings is difficult typically since the programmer needs to understand the complexities of concurrency. However, CAP introduces an API that the common user can understand.
Next, the speaker describes CHESS, a tool built on CAP that allows you to find and reproduce Heisenbugs. CHESS repeatedly runs a concurrent test ensuring that every run takes a different interleaving. If an interleaving results in an error, CHESS can reproduce the interleaving for improved debugging. CHESS is available for both managed and native programs, and it is used extensively at Microsoft in programs such as the Parallel Computing Platform (PCP), the Singularity operating system, Dryad, and Cosmos. CHESS will be releasing via DevLabs and is designed to integrate well with Microsoft Visual Studio.
The speaker then discusses two debugging tools built on CAP that are in development. FeatherLite, a CAP debugging tool currently in development, is designed for lightweight data-race detection. A data race occurs in a multithreaded program when two threads access the same memory location without any intervening synchronization operations and at least one of the accesses is a write. Data races result in access to data with insufficient synchronization and are a common source of concurrency errors. Existing data-race detection tools process every memory access and have a large runtime overhead, which results in more than a ten-fold increase in processing time. FeatherLite introduces sampling to reduce overhead. With intelligent sampling algorithms, FeatherLite processes less than five percent of the memory accesses, resulting in less than thirty percent runtime overhead. FeatherLite plus CAP creates active data-race detection.
The second future CAP tool the speaker explores is Sober, a tool for finding memory model errors. Expert programmers use "lock-free" techniques that use low-level synchronizations and volatile variables, which expose programs to memory-model issues. For instance, the compiler can reorder instructions and hardware can reorder or delay memory accesses, resulting in hidden bugs that are difficult to understand.
This session includes several demonstrations about why concurrency is difficult, taming concurrency, CHESS in Visual Studio, and Heisenbugs in cluster continuous replication (CCR).



