Program analysis of binary (i.e., executable) code has become an important and recurring goal in software analysis research and practice. Binary code analysis is attractive because it offers high fidelity reasoning of the code that will actually execute, and because not requiring source code makes such techniques more widely applicable. BAP, the Binary Analysis Platform, is the third incarnation of our infrastructure for performing analysis on binary code. Like other platforms such as CodeSurfer/x86 [3], McVeto [15], Phoenix [11], and Jakstab [9], BAP first disassembles binary code into assembly instructions, lifts the instructions to an intermediate language (IL), and then performs analysis at the IL level. BAP provides the following salient features: – BAP makes all side effects of assembly instructions explicit in the IL. This enables all subsequent analyses to be written in a syntax-directed fashion. For example, the core code of our symbolic executor for assembly is only 250 lines long due to the simplicity of the IL. The operational semantics of the IL are formally defined and available in the BAP manual [7]. – Common code representations such as CFGs, static single assignment/threeaddress code form, program dependence graphs, a dataflow framework with constant folding, dead code elimination, value set analysis [3], and strongly connected component (SCC) based value numbering. – Verification capabilities via Dijkstra and Flanagan-Saxe style weakest preconditions and interfaces with several SMT solvers. The verification can be performed on dynamically executed traces (e.g., via an interface with Intel’s Pin Framework), as well as on static code sequences. – BAP is publicly available with source code at http://bap.ece.cmu.edu/. BAP currently supports x86 and ARM. We have leveraged BAP and its predecessors in dozens of security research applications ranging from automatically generating exploits for buffer overflows to inferring types on assembly. A recurring task in our research is to generate Published in the Proceedings of 2011 Conference on Computer Aided Verification 2 BAP logical verification conditions (VCs) from code, usually so that satisfying answers are inputs that drive execution down particular code paths. Generating VCs that are actually solvable in practice is important; we routinely solve VCs hundreds of megabytes in size that capture the semantics of 100,000s of assembly instructions using BAP. In the rest of this paper we discuss these features, how they evolved, compare them to other platforms where possible, and provide examples of how we have used them in various projects. 2 BAP Goals and Related Work Fully representing the semantics of assembly is more challenging than it would seem. In order to appreciate the difficulty, consider the three line assembly program below. Suppose we want to create a verification condition (VC) that is satisfied only by inputs that take the conditional jump (e.g., to find inputs that take the jump). The challenge is that arithmetic operations set up to 6 status flags, and control flow in assembly depends upon the values of those flags. Simply lifting line 1 to something like ebx = eax + ebx does not expose those side effects. 1 add %eax , %ebx # ebx=eax+ebx ( s e t s OF, SF , ZF , AF, CF , PF) 2 s h l %c l , %ebx # ebx=ebx<<c l ( s e t s OF, SF , ZF , AF, CF , PF) 3 j c t a r g e t # jump t o t a r g e t i f c a r r y f l a g i s s e t The first generation of our binary analysis tools, asm2c, attempted to directly decompile x86 assembly to C, and then perform all software analysis on the resulting C code. asm2c left instruction side effects implicit, which made it difficult to analyze control flow. Other binary tools such as instrumentors, disassemblers, and editors (e.g., DynInst [13], Valgrind [12], and Microsoft Phoenix [11]) also did not represent these side effects explicitly. Our next incarnation, Vine, was designed to address the problem by explicitly encoding side-effects in the IL. The result is that subsequent analyses and verification could rely upon the IL syntax alone. Vine is significantly more successful than asm2c, and has been used in dozens of research projects (see [4]).1 Vine used VEX [12] to provide a rough IL for each instruction, which was then augmented by Vine to expose all otherwise-implicit side effects. An important implementation decision was to implement the Vine back-end in OCaml (asm2c was in C++). We found OCaml’s language features to be a much better match for program analysis and verification. However, the Vine IL grew over time, lacked a formal semantics for the IL itself, and did not handle bi-endian architectures such as ARM correctly.
|
Good coders code.
|
the_opportunity_analysis_canvas_-_third_edition.pdf | |
File Size: | 1685 kb |
File Type: |