Accrue Manual

Accrue Version 0.3

Table of Contents

Overview

The Accrue Object Analysis Framework (Accrue) is a framework for interprocedural analysis of Java programs, implemented as a Polyglot compiler extension (see http://www.cs.cornell.edu/projects/polyglot/). Accrue contains some common and useful analyses, such as a non-null analysis and a precise exception tracker. More importantly, it contains sufficient building blocks to make it easy to write new interprocedural analyses.

Accrue is part of the Accrue project at Harvard University, which aims to provide security guarantees that are proportional to programmer effort. Accrue uses the Accrue framework to implement novel program analyses to infer information security guarantees that Java programs offer.

This manual assumes familiarity with the Polyglot extensible compiler framework. Anyone wishing to implement an interprocedural analysis with Accrue will need to use the underlying Polyglot representations of ASTs and types in order to implement the analysis. The Polyglot website contains resources for learning how to use Polyglot.

This manual provides high-level information on how to understand and use the Accrue framework. This manual is intended to be used in conjunction with the Accrue source code. More specific information is included in the source code documentation.

There are two key components for interprocedural analysis in Accrue:

  1. Configurable pointer analysis. A subset-based flow-insensitive pointer analysis with easily configurable abstract objects and code contexts.
  2. Workqueue for managing interprocedural analysis. Classes and patterns to facilitate analysis of code until a fixed point it reached.

Using Accrue

To create a Polyglot extension that uses Accrue, create an ExtensionInfo that extends accrue.AccrueExtensionInfo. Similarly, the new extension's Scheduler should override accrue.analysis.AnalysisScheduler. (See the accrue.AccrueExtensionInfo class for information on the NodeFactory and Option classes that the new extension should extend if desired.)

By default, for each Job the AnalysisScheduler attempts to achieve the AnalysisScheduler.AnalysesDone(Job) goal. This goal should have new analyses added as prerequisites, to ensure that interprocedural analyses are run. If the new extension wants to produce output code, then AccrueExtensionInfo.getCompileGoal(Job) should be overridden.

See the Writing Interprocedural Analyses section for how to write an interprocedural analysis.

Limitations

Accrue does not currently support reflection or dynamic class loading. That is, analysis of code that uses Java reflection, or usees custom class loaders will be unsound and imprecise.

Accrue is able to handle Java 5 language features. See the Java 5 Language Constructs section.

Pointer Analysis

Accrue provides an easily modifiable flow-insensitive pointer analysis, based on the framework described in the paper "Pick Your Contexts Well: Understanding Object-Sensitivity" by Smaragdakis, Bravenboer, and Lhotak, which appeared in POPL 2011.

The code for pointer analysis is contained in package accrue.analysis.pointer. The pointer analysis works by first making a pass over the whole program, registering pointer statements (via the visitor RegisterPointerStmtsVisitor). That is, for each statement that may affect points-to information, a constraint is generated and added to a global set. Once all pointer statements have been registered, the PointerAnalysisPass can be run, which invokes a PointsToEngine to solve the set of pointer constraints, and thus produce a PointsToGraph, which represents the result of the pointer analysis.

In the implementation of the pointer analysis, abstract objects are represented by instances of the HContext interface, analogous to the HContext domain described in Smaragdakis et al. The interface CContext corresponds to the Context domain described in Smaragdakis et al. Thus, a variety of analyses can be easily described by defining appropriate functions record:Context \times AllocationSite \rightarrow HContext and merge:CallSite \times HContext \times Context \rightarrow Context, which, respectively, describe how an abstract object should be described when created at an allocation site in a given context, and how a new context should be created when a method is invoked in a given context, with a given receiver.

Several pointer analyses are already implemented, including a full object sensitive analysis, a type-sensitive analysis, and a call-site sensitive analysis. The default pointer analysis is a type sensitive analysis (T2-1H).

Different pointer analyses can be specified by implementing the HeapAbstractionFactory class (which has abstract methods corresponding to the record andmerge functions). A specific HeapAbstractionFactory can be specified using the command line argument -heapabstraction haf. The format for haf is

   haf  ::= fact
         |  "[" fact "|" fact "|" ... "]"
   fact ::= classname(arg1, arg2, ...) 
where classname is a fully qualified classname of a subclass of HeapAbstractionFactory, and the optional "(arg1, arg2, ...)" are zero or more integer or string arguments, which will be passed to the constructor for the heap abstraction.

For convenience, classname may be abbreviated for several of the pointer analyses that come with Accrue, described as follows.
AbbreviationFully qualified class name/factory
fullaccrue.analysis.pointer.analyses.FullObjSensitiveAnalysis
typeaccrue.analysis.pointer.analyses.TypeSensitive_nTmH_Analysis
2t1haccrue.analysis.pointer.analyses.TypeSensitive_nTmH_Analysis(2,1)
scsaccrue.analysis.pointer.analyses.StaticCallSiteContextSensitiveAnalaysis
csaccrue.analysis.pointer.analyses.CallSiteContextSensitiveAnalaysis
distinguishaccrue.analysis.pointer.analyses.DistinguishClass

The following are examples of strings that are acceptable for the heap abstraction argument.

  • full
  • full(5)
  • type(2,1)
  • 2t1h
  • accrue.analysis.pointer.analyses.FullObjSensitiveAnalysis(2)
The following strings will create Cross product analyses (see accrue.analysis.pointer.analyses.CrossProductAnalysis)
  • [2t1h | scs(2)]
  • [2t1h | scs(2) | distinguish(mypackage.Foo)]

Writing Interprocedural Analyses

Accrue provides several classes that simplify the process of implementing top-down interprocedural analyses. Specifically, Accrue provides infrastructure to analyze code declarations in zero or more contexts, and, if necessary, to iterate analysis until a fixed point is reached.

Class AnalysisUnit (in package accrue.analysis.interprocanalysis) represents the unit of an interprocedural analysis: it is a code declaration (i.e., a method, a constructor, a static initializer, or an instance initializer) and an AnalysisContext. An AnalysisContext is (mostly) a CContext, i.e., the context used in the pointer analysis. (See the JavaDoc for AnalysisContext to understand when it is more than just a CContext.)

Class WorkQueue is responsible for managing the analysis of AnalysisUnits. It starts by analyzing the unique start procedure (either the only main method found in the source code, or the main method of the class specified using the command line argument -start) in the initial CContext. How exactly to analyze a given AnalysisUnit will be dependent on the specific analysis being implemented.

The WorkQueue requires that the analysis of a unit takes as input some item of type T and returns as output an ExitMap<T>, that is, a map that provides a T for each possible way that control flow can leave the analysis unit. Type T will depend on the analysis. For example, in a flow-sensitive analysis, T may be a variable context, mapping variables and (abstract) heap locations to abstract values. (See below for additional support for writing interprocedural analyses that use variable contexts.) Alternatively, if the analysis simply needs to visit each analysis unit once (say, to generate constraints), then type T may be instantiated to accrue.analysis.interprocanalysis.Unit, which has only a single non-null value. Type T must implement interface Ordered<T> to allow the WorkQueue to combine analysis results, and determine when analysis of a analysis unit must be re-run (see below).

During the analysis of an AnalysisUnit, the analysis may request the analysis result for another AnalysisUnit (e.g., such as when a method is called). The WorkQueue will either immediately start analysis of the requested analysis unit, or, if the previously computed result for the analysis unit is suitable, will return the previous result. Specifically, if the new input for the analysis unit is less than or equal to the existing input for that unit (according to the Ordered<T>.leq(T) method), then the existing analysis result is reused. For a recursive call (i.e., the analysis result for an analysis unit was requested while we are currently analyzing that analysis unit), a suitable default analysis result is returned. If an analysis result for an analysis unit changes, then all analysis units that requested the previous result will be re-analyzed. This will continue until a fix-point is reached.

Static initializers (i.e., initializers that are run when a class is loaded) are analyzed immediately prior to the analysis of the unique main method. This decision is a compromise between soundness (i.e., we want to analyze static initializers) and the difficulty in determining when in a program's execution a given static initializer will be run. Analysis static initializers before the invocation of main is analogous to all classes that will be required during execution being loaded at the beginning of the main method. Note that initializers of static fields are converted into static initializers by the SimplifyExpressions pass. See the Simplified Expressions section for more details.

Interprocedural Analysis

To create an interprocedural analysis, you must first create an appropriate Goal. Let's call this goal G. To make sure that the new analysis is run, G must be made a prerequisite of the AnalysisScheduler.AnalysesDone(Job) goal. Moreover, G will be compiler-wide goal (as opposed to a job-specific goal).

Goal G must have as prerequisites the goals AnalysisScheduler.RegisterProceduresDone() and AnalysisScheduler.PointerAnalysis(). The first prerequisite goal (RegisterProcedures) will ensure that we know about all procedures (methods, constructors) that we have source code for, and the second prerequisite goal (PointerAnalysis) will ensure that the pointer analysis has been run before G. In addition to computing points-to sets, the pointer analysis computes the call graph, which is needed for interprocedural analysis.

All goals in Polyglot create a Pass to accomplish the goal, in method createPass(ExtensionInfo) method. Goal G should return a subclass of accrue.analysis.interprocanalysis.InterProcAnalysisPass. Goal G will also need to create a subclass of AnalysisFactory to pass to the constructor of InterProcAnalysisPass. Both InterProcAnalysisPass and AnalysisFactory provide extension points to allow interprocedural analyses to customize the behavior of the interprocedural analysis framework.

Class InterProcAnalysisPass provides a method postSuccessfulProcess(WorkQueue) that is invoked upon the successful completion of the interprocedural analysis. This method can be an appropriate place for analyses to emit results to file or the screen.

Class AnalysisFactory has several abstract methods that must be defined in an analysis-specific subclass:

  • analysisName() returns a human-readable name for the analysis.
  • analysisReportName() returns a string that can be used to enable reporting using Polyglot's reporting framework.
  • analysisUtil(WorkQueue,AnalysisUnit) creates an AnalysisUtil for the analysis unit passed in as an argument. Class AnalysisUtil provides utility methods for the analysis of code, and is described below.
  • startItem(AnalysisUtil autil, ProcedureDecl startProc) creates the initial analysis item for the analysis of the start procedure startProc.
There is one AnalysisFactory object created per analysis.

Class AnalysisUtil contains utility methods for analyzing analysis units, including methods to access the points-to sets of expressions, and the abstract locations that a field access may refer to. One AnalysisUtil is created each time an analysis unit is analyzed. The key method of this class that needs to be overridden is ExitMap<T> analyze(CodeDecl n, T before) which is responsible for implementing the analysis of code n. (Note that the AnalysisUtil contains methods to access the analysis unit and the analysis context in which this code is being analyzed.) Argument before is the analysis item immediately before the invocation of the code, and the method must return a map that provides an analysis item T for each possible way that control flow can leave the analysis unit.

The AnalysisUtil class contains a method analyzeInitializers which will analyze instance initializers of a class. This method must be called by the analysis immediately after analysis of a constructor call to a super class (i.e., a constructor call of the form super(...)). Failure to call analyzeInitializers during the analysis of a constructor declaration will result in a runtime error during analysis.

Interprocedural Analysis with Variable Contexts

Accrue provides additional classes to facilitate interprocedural analyses that use variable contexts (i.e., where local variables and abstract locations contain abstract values). These classes are in the package accrue.analysis.interprocvarcontext.

Class VarContext maps local variables and abstract locations to abstract values. Objects of this class are the analysis item passed in to analysis units, and analysis of units produces a VarContext object for each possible way that control flow can exit an analysis unit. VarContext also has an "expression stack" that tracks the abstract values for expressions that are currently being evaluated, and a "return result" abstract value that is used to track the abstract value for exceptions and for values being returned by methods. The class also contains utility methods to help with the management of local variables, such as being able to restore a caller's local variables following the return from a callee. Class FinalVarContext extends ExitMap, and records a VarContext for each possible way that control flow can leave code.

Class InterProcVarContextPass extends InterProcAnalysisPass to provide a WorkQueue that treats VarContexts specially. Analysis passes that want to use variable contexts should extend InterProcVarContextPass. The WorkQueue tracks for each analysis unit which abstract locations the analysis unit reads and writes, and uses this information to improve the efficiency of the analysis. In particular, analysis of an analysis unit will need to be repeated only if the abstract value of any read abstract location (or any procedure argument) changes; analysis will not be repeated if the only change to the input VarContext of the analysis unit is to abstract locations that are not read by the analysis unit.

Interprocedural Dataflow Analysis

Accrue provides additional classes to facilitate interprocedural dataflow analyses. These classes are in package accrue.analysis.interprocvarcontext.

Abstract class PreciseDataFlow extends the Polyglot class polyglot.visit.DataFlow and provides a more precise control-flow graph on which to perform dataflow. Any analysis that uses PreciseDataFlow to perform its analysis of analysis units should have as a pre-requisite the goal AnalysisScheduler.PreciseExceptionAnalysis(2). To see an example of an analysis that uses PreciseDataFlow, see the domination/post-domination analysis in package accrue.analysis.domination.

Abstract class InterProcDataFlow extends PreciseDataFlow for dataflow analyses that use variable contexts, i.e., where the dataflow analysis associates dataflow facts with variables and abstract locations. To simplify implementation of dataflow analyses, Accrue extension objects implement a flowDispatch method (see accrue.analysis.ext.ExtTerm) that will call an appropriate method for a specific AST node. See NotNullDataFlow (in package accrue.analysis.notnulldataflow) for an example of an InterProcDataFlow.

Finally, class AnalysisUtilDataFlow extends AnalysisUtil and provides additional utility methods for interprocedural dataflow analyses.

Saving and Retrieving Analysis Results

Analyses may save and retrieve their analysis results using utility methods in the AST extensions accrue.analysis.ext.AccrueExt. Each AST node has an AccrueExt object which may be retrieved using the static method AccrueExt_c.ext(Node).

Analysis results can be stored using one of the recordAnalysisResult methods, retrieved using one of the getAnalysisResult methods, and removed using one of the removeAnalysisResult methods. These methods all take as their first argument an analysisKey which is an object that uniquely identifies the analysis. For example, a string that contains the fully-qualified name of the analysis pass would be an appropriate object to use for the analysisKey.

Different methods are provided so that analyses can store results at the appropriate granularity. For example, method recordAnalysisResult(Object analysisKey, T result) should be used in each AST nodes has just a single result (regardless of which analysis contexts the node is analyzed within); method recordAnalysisResult(Object analysisKey, AnalysisContext context, T result) should be used when a node may have different results for each analysis context; and method recordAnalysisResult(Object analysisKey, AnalysisContext context, PeerKey peerKey, T result) should be used when the analysis results arise from a dataflow analysis (where one AST node may have multiple peers within an analysis context).

Note that analyses should use only one granularity; granularities cannot be mixed for the same analysis.

Simplified Expressions

To reduce the number of language constructs that writers of new analyses must handle, the SimplyExpressionsGoal goal (in package accrue.analysis.goals) reduces several language constructs to simpler ones. This goal is run before the pointer analysis pass, meaning that any interprocedural analysis can assume that the program consists just of the simpler constructs. The list below describes how SimplyExpressionsGoal simplifies expressions.
  • Anonymous classes and inner classes are made (static) nested classes.
  • Targets of break and continue statements are made explicit.
    Every loop and switch statement is given an explicit label, and all breaks and continues are given the appropriate label.
  • Assignments of the form a op= e are simplified to a = a op e. For example x += 42 becomes x = x + 42. If expression a may have side-effects, then the transformation will fail.
  • For loops are converted into while loops.
  • Switch statements are converted into cascading conditional statements.
  • Implicit coercions to String are made explicit.
    For example, expression "Hello " + w (where w is not of type String) is converted to "Hello " + String.valueOf(w).
  • Increment and decrement are converted into explicit assignments.
    For example, ++x is converted to x=x+1 and x-- is converted to (x=x-1)+1.

Java 5 Language Constructs

The Accrue framework currently has partial support for language features added to Java 1.5, but does not currently support language features added in later versions.

To write an analysis for JL5 programs, extend the class accrue.jl5.AccrueJL5ExtensionInfo. Currently the pointer analysis and the non-null analysis explicitly support Java 5 language features. Other analyses (e.g., the precise exception analysis) work without modification. Note that this extension info translates away extended-for loops and provides explicit boxing and unboxing of primitive values to objects of wrapper classes. These translations simplify subsequent analyses without changing the meaning of programs, or type signatures of classes and methods.

An alternative approach to implement analyses for Java 1.5 programs is to use the Polyglot compiler support for translation of Java 1.5 (and later) programs to Java 1.4. This can be achieved on the command line using the $POLYGLOT/bin/jl5c command with the -removeJava5isms command-line flag to produce Java 1.4 files, on which the Accrue framework can be run.

Alternatively, the class accrue.jl5.AccrueJL5TranslationExtensionInfo can be used to automatically invoke the Java 1.5 to Java 1.4 translator before passing the resulting AST to the Accrue framework. Implementors should subclass AccrueJL5TranslationExtensionInfo, passing in to the constructor the AccrueExtensionInfo to run on the Java 1.4 classes. Let the subclass of AccrueJL5TranslationExtensionInfo be called T. Polyglot can be invoked with the subclass T as the specified ExtensionInfo, and will automatically translate Java 1.5 classes to 1.4 and then invoke the specified AccrueExtensionInfo analysis.

In a future version of Accrue, Java 7 language features will be supported directly, without need for translation to Java 5 or to 1.4.