
Accrue Manual
Accrue Version 0.3
Table of Contents
- Overview
- Pointer analysis
- Writing interprocedural analyses
- Interprocedural Analysis
- Interprocedural Analysis with Variable Contexts
- Interprocedural Dataflow Analysis
- Saving and Retrieving Analysis Results
- Simplified Expressions
- Java 5 Language Constructs
Overview
The Accrue Object Analysis Framework (Accrue) is a framework for interprocedural analysis of Java programs, implemented as a Polyglot compiler extension (see http://www.cs.cornell.edu/projects/polyglot/). Accrue contains some common and useful analyses, such as a non-null analysis and a precise exception tracker. More importantly, it contains sufficient building blocks to make it easy to write new interprocedural analyses.
Accrue is part of the Accrue project at Harvard University, which aims to provide security guarantees that are proportional to programmer effort. Accrue uses the Accrue framework to implement novel program analyses to infer information security guarantees that Java programs offer.
This manual assumes familiarity with the Polyglot extensible compiler framework. Anyone wishing to implement an interprocedural analysis with Accrue will need to use the underlying Polyglot representations of ASTs and types in order to implement the analysis. The Polyglot website contains resources for learning how to use Polyglot.
This manual provides high-level information on how to understand and use the Accrue framework. This manual is intended to be used in conjunction with the Accrue source code. More specific information is included in the source code documentation.
There are two key components for interprocedural analysis in Accrue:
- Configurable pointer analysis. A subset-based flow-insensitive pointer analysis with easily configurable abstract objects and code contexts.
- Workqueue for managing interprocedural analysis. Classes and patterns to facilitate analysis of code until a fixed point it reached.
Using Accrue
To create a Polyglot extension that uses Accrue, create an
ExtensionInfo
that extends
accrue.AccrueExtensionInfo
. Similarly, the new extension's
Scheduler
should override
accrue.analysis.AnalysisScheduler
. (See the
accrue.AccrueExtensionInfo
class for information on
the NodeFactory
and Option
classes that
the new extension should extend if desired.)
By default, for each Job
the AnalysisScheduler
attempts to
achieve the AnalysisScheduler.AnalysesDone(Job)
goal. This goal should have new analyses added as prerequisites, to
ensure that interprocedural analyses are run. If the new extension
wants to produce output code, then
AccrueExtensionInfo.getCompileGoal(Job)
should be
overridden.
See the Writing Interprocedural Analyses section for how to write an interprocedural analysis.
Limitations
Accrue does not currently support reflection or dynamic class loading. That is, analysis of code that uses Java reflection, or usees custom class loaders will be unsound and imprecise.
Accrue is able to handle Java 5 language features. See the Java 5 Language Constructs section.
Pointer Analysis
Accrue provides an easily modifiable flow-insensitive pointer analysis, based on the framework described in the paper "Pick Your Contexts Well: Understanding Object-Sensitivity" by Smaragdakis, Bravenboer, and Lhotak, which appeared in POPL 2011.
The code for pointer analysis is contained in package
accrue.analysis.pointer
. The pointer analysis works by
first making a pass over the whole program, registering pointer
statements (via the visitor
RegisterPointerStmtsVisitor
). That is, for each statement
that may affect points-to information, a constraint is generated and
added to a global set. Once all pointer statements have been
registered, the PointerAnalysisPass
can be run, which
invokes a PointsToEngine
to solve the set of pointer
constraints, and thus produce a PointsToGraph
, which
represents the result of the pointer analysis.
In the implementation of the pointer analysis, abstract objects are
represented by instances of the HContext
interface,
analogous to the HContext domain described in Smaragdakis et
al. The interface CContext
corresponds to the
Context domain described in Smaragdakis et al. Thus, a variety
of analyses
can be easily described by defining appropriate functions
record:Context \times AllocationSite \rightarrow HContext and
merge:CallSite \times HContext \times Context \rightarrow
Context, which, respectively, describe how an abstract object
should be described when created at an allocation site in a given
context, and how a new context should be created when a method is
invoked in a given context, with a given receiver.
Several pointer analyses are already implemented, including a full object sensitive analysis, a type-sensitive analysis, and a call-site sensitive analysis. The default pointer analysis is a type sensitive analysis (T2-1H).
Different pointer analyses can be specified by implementing the
HeapAbstractionFactory
class (which has abstract methods
corresponding to the record andmerge functions). A
specific HeapAbstractionFactory
can be specified using
the command line argument -heapabstraction
haf
. The format for haf
is
haf ::= fact | "[" fact "|" fact "|" ... "]" fact ::= classname(arg1, arg2, ...)where classname is a fully qualified classname of a subclass of
HeapAbstractionFactory
, and the optional "(arg1, arg2,
...)" are zero or more integer or string
arguments, which will be passed to the constructor for the heap abstraction.
For convenience, classname may be abbreviated for several of the pointer analyses that come with Accrue, described as follows.
Abbreviation | Fully qualified class name/factory |
---|---|
full | accrue.analysis.pointer.analyses.FullObjSensitiveAnalysis |
type | accrue.analysis.pointer.analyses.TypeSensitive_nTmH_Analysis |
2t1h | accrue.analysis.pointer.analyses.TypeSensitive_nTmH_Analysis(2,1) |
scs | accrue.analysis.pointer.analyses.StaticCallSiteContextSensitiveAnalaysis |
cs | accrue.analysis.pointer.analyses.CallSiteContextSensitiveAnalaysis |
distinguish | accrue.analysis.pointer.analyses.DistinguishClass |
The following are examples of strings that are acceptable for the heap abstraction argument.
full
full(5)
type(2,1)
2t1h
accrue.analysis.pointer.analyses.FullObjSensitiveAnalysis(2)
accrue.analysis.pointer.analyses.CrossProductAnalysis
)
[2t1h | scs(2)]
[2t1h | scs(2) | distinguish(mypackage.Foo)]
Writing Interprocedural Analyses
Accrue provides several classes that simplify the process of implementing top-down interprocedural analyses. Specifically, Accrue provides infrastructure to analyze code declarations in zero or more contexts, and, if necessary, to iterate analysis until a fixed point is reached.
Class AnalysisUnit
(in package
accrue.analysis.interprocanalysis
) represents the unit of
an interprocedural analysis: it is a code declaration (i.e., a method,
a constructor, a static initializer, or an instance initializer) and
an AnalysisContext
. An AnalysisContext
is
(mostly) a CContext
, i.e., the context used in the
pointer analysis. (See the JavaDoc for AnalysisContext
to
understand when it is more than just a CContext
.)
Class WorkQueue
is responsible for managing the
analysis of AnalysisUnit
s. It starts by analyzing the
unique start procedure (either the only main
method found
in the source code, or the main
method of the class
specified using the command line argument -start
) in the
initial CContext
. How exactly to analyze a given
AnalysisUnit will be dependent on the specific analysis being
implemented.
The WorkQueue requires that the analysis of a
unit takes as input some item of type T and returns as output
an ExitMap<T>
, that is, a map that provides
a T for each possible way that control flow can leave the
analysis unit. Type T will depend on the analysis. For example,
in a flow-sensitive analysis, T may be a variable context,
mapping variables and (abstract) heap locations to abstract
values. (See below for additional support for writing interprocedural
analyses that use variable contexts.) Alternatively, if the analysis
simply needs to visit each analysis unit once (say, to generate
constraints), then type T may be instantiated to
accrue.analysis.interprocanalysis.Unit
, which has only a
single non-null value. Type T must implement interface
Ordered<T>
to allow the WorkQueue to combine
analysis results, and determine when analysis of a analysis unit must
be re-run (see below).
During the analysis of an AnalysisUnit
, the
analysis may request the analysis result for another
AnalysisUnit
(e.g., such as when a method is called). The
WorkQueue will either immediately start analysis of the requested
analysis unit, or, if the previously computed result for
the analysis unit is suitable, will return the previous
result. Specifically, if the new input for the analysis unit is less
than or equal to the existing input for that unit (according to the
Ordered<T>.leq(T)
method), then the
existing analysis result is reused. For a recursive call (i.e., the analysis result for an
analysis unit was requested while we are currently analyzing that
analysis unit), a suitable default analysis result is returned. If an
analysis result for an analysis unit changes, then all
analysis units that requested the previous result will be
re-analyzed. This will continue until a fix-point is reached.
Static initializers (i.e., initializers that are run when a class
is loaded) are analyzed immediately prior to the analysis of the
unique main
method. This decision is a compromise between
soundness (i.e., we want to analyze static initializers) and the
difficulty in determining when in a program's execution a given static
initializer will be run. Analysis static initializers before the
invocation of main
is analogous to all classes that will
be required during execution being loaded at the beginning of the
main
method. Note that initializers of static fields are
converted into static initializers by the
SimplifyExpressions
pass. See the Simplified Expressions
section for more details.
Interprocedural Analysis
To create an interprocedural analysis, you must first create an appropriate
Goal
. Let's call this goal G. To
make sure that the new analysis is run, G must be made a
prerequisite of the AnalysisScheduler.AnalysesDone(Job)
goal. Moreover, G
will be compiler-wide goal (as opposed to a job-specific goal).
Goal G must have as
prerequisites the goals
AnalysisScheduler.RegisterProceduresDone()
and
AnalysisScheduler.PointerAnalysis()
. The first
prerequisite goal (RegisterProcedures) will
ensure that we know about all procedures (methods, constructors) that we have source code for, and the second
prerequisite goal (PointerAnalysis) will ensure that the pointer
analysis has been run before G. In addition to computing
points-to sets, the pointer analysis computes the call graph, which is
needed for interprocedural analysis.
All goals in Polyglot create a Pass
to accomplish the
goal, in method
createPass(ExtensionInfo)
method. Goal G should return a subclass of
accrue.analysis.interprocanalysis.InterProcAnalysisPass
. Goal
G will also need to create a subclass of
AnalysisFactory
to pass to the constructor of
InterProcAnalysisPass
. Both
InterProcAnalysisPass
and AnalysisFactory
provide
extension points to allow interprocedural analyses to customize the
behavior of the interprocedural analysis framework.
Class InterProcAnalysisPass
provides a method
postSuccessfulProcess(WorkQueue)
that is invoked upon the
successful completion of the interprocedural analysis. This method can
be an appropriate place for analyses to emit results to file or the screen.
Class AnalysisFactory
has several abstract methods
that must be defined in an analysis-specific subclass:
analysisName()
returns a human-readable name for the analysis.analysisReportName()
returns a string that can be used to enable reporting using Polyglot's reporting framework.analysisUtil(WorkQueue,AnalysisUnit)
creates anAnalysisUtil
for the analysis unit passed in as an argument. ClassAnalysisUtil
provides utility methods for the analysis of code, and is described below.startItem(AnalysisUtil autil, ProcedureDecl startProc)
creates the initial analysis item for the analysis of the start procedurestartProc
.
AnalysisFactory
object created per analysis.
Class AnalysisUtil
contains utility methods for
analyzing analysis units, including methods to access the points-to
sets of expressions, and the abstract locations that a field access
may refer to. One AnalysisUtil
is created
each time an analysis unit is analyzed. The key method of this class
that needs to be overridden is ExitMap<T>
analyze(CodeDecl n, T before)
which is responsible for
implementing the analysis of code n
. (Note that the
AnalysisUtil
contains methods to access the analysis
unit and the analysis context in which this code is being analyzed.)
Argument before
is the analysis item immediately before
the invocation of the code, and the method must return a map that provides
an analysis item T for each possible way that control flow can leave the
analysis unit.
The AnalysisUtil
class contains a method
analyzeInitializers
which will analyze instance
initializers of a class. This method must be called by the analysis
immediately after analysis of a constructor
call to a super class (i.e., a constructor call of the form
super(...)
). Failure to call
analyzeInitializers
during the analysis of a constructor
declaration will result in a runtime error during analysis.
Interprocedural Analysis with Variable Contexts
Accrue provides additional classes to facilitate interprocedural
analyses that use variable contexts (i.e., where local variables and
abstract locations contain abstract values). These classes are in the
package accrue.analysis.interprocvarcontext
.
Class VarContext
maps local variables and abstract
locations to abstract values. Objects of this class are the analysis
item passed in to analysis units, and analysis of units produces a
VarContext
object for each possible way that control flow
can exit an analysis unit. VarContext
also has an "expression stack" that
tracks the abstract values for expressions that are currently being
evaluated, and a "return result" abstract value that is used to track
the abstract value for exceptions and for values being returned by
methods. The class also contains utility methods to help with the
management of local variables, such as being able to restore a
caller's local variables following the return from a callee. Class
FinalVarContext
extends ExitMap
, and records
a VarContext
for each possible way that control flow can
leave code.
Class InterProcVarContextPass
extends
InterProcAnalysisPass
to provide a WorkQueue that treats
VarContext
s specially. Analysis passes that want to use variable
contexts should extend InterProcVarContextPass
. The
WorkQueue tracks for each analysis unit which abstract locations the
analysis unit reads and writes, and uses this information to improve
the efficiency of the analysis. In particular, analysis of an analysis
unit will need to be repeated only if the abstract value of any read
abstract location (or any procedure argument) changes; analysis will
not be repeated if the only change to the input VarContext of the
analysis unit is to abstract locations that are not read by the
analysis unit.
Interprocedural Dataflow Analysis
Accrue provides additional classes to facilitate
interprocedural dataflow analyses. These classes are in package
accrue.analysis.interprocvarcontext
.
Abstract class PreciseDataFlow
extends the Polyglot class
polyglot.visit.DataFlow
and provides a more precise
control-flow graph on which to perform dataflow. Any analysis that
uses PreciseDataFlow
to perform its analysis of analysis
units should have as a pre-requisite the goal
AnalysisScheduler.PreciseExceptionAnalysis(2)
. To see an
example of an analysis that uses PreciseDataFlow
, see the
domination/post-domination analysis in package accrue.analysis.domination
.
Abstract class InterProcDataFlow
extends
PreciseDataFlow
for dataflow analyses that use variable
contexts, i.e., where the dataflow analysis associates dataflow facts
with variables and abstract locations. To simplify implementation of
dataflow analyses, Accrue extension objects implement a
flowDispatch
method (see
accrue.analysis.ext.ExtTerm
) that will call an
appropriate method for a specific AST node.
See
NotNullDataFlow
(in package
accrue.analysis.notnulldataflow
) for an example of an
InterProcDataFlow
.
Finally, class AnalysisUtilDataFlow
extends
AnalysisUtil
and provides additional utility methods for
interprocedural dataflow analyses.
Saving and Retrieving Analysis Results
Analyses may save and retrieve their analysis
results using utility methods in the AST extensions
accrue.analysis.ext.AccrueExt
. Each AST node has an
AccrueExt
object which may be retrieved using the static
method AccrueExt_c.ext(Node)
.
Analysis results can be stored using one of the
recordAnalysisResult
methods, retrieved using one of the
getAnalysisResult
methods, and removed using one of the
removeAnalysisResult
methods. These methods all take as their
first argument an analysisKey
which is an object that uniquely
identifies the analysis. For example, a string that contains the
fully-qualified name of the analysis pass would be an appropriate
object to use for the analysisKey
.
Different methods are provided so that analyses can store results
at the appropriate granularity. For example, method
recordAnalysisResult(Object analysisKey, T result)
should be used in each AST nodes has just a single result (regardless
of which analysis contexts the node is analyzed within); method
recordAnalysisResult(Object analysisKey, AnalysisContext
context, T result)
should be used when a node may have
different results for each analysis context; and method
recordAnalysisResult(Object analysisKey, AnalysisContext
context, PeerKey peerKey, T result)
should be used when the analysis results arise from a dataflow
analysis (where one AST node may have multiple peers within an
analysis context).
Note that analyses should use only one granularity; granularities cannot be mixed for the same analysis.
Simplified Expressions
To reduce the number of language constructs that writers of new analyses must handle, theSimplyExpressionsGoal
goal (in
package accrue.analysis.goals
) reduces several language
constructs to simpler ones. This goal is run before the pointer
analysis pass, meaning that any interprocedural analysis can assume
that the program consists just of the simpler constructs. The list
below describes how SimplyExpressionsGoal
simplifies expressions.
- Anonymous classes and inner classes are made (static) nested classes.
- Targets of
break
andcontinue
statements are made explicit.
Every loop and switch statement is given an explicit label, and all breaks and continues are given the appropriate label. - Assignments of the form
a op= e
are simplified toa = a op e
. For examplex += 42
becomesx = x + 42
. If expressiona
may have side-effects, then the transformation will fail. - For loops are converted into while loops.
- Switch statements are converted into cascading conditional statements.
- Implicit coercions to
String
are made explicit.
For example, expression"Hello " + w
(wherew
is not of typeString
) is converted to"Hello " + String.valueOf(w)
. - Increment and decrement are converted into explicit
assignments.
For example,++x
is converted tox=x+1
andx--
is converted to(x=x-1)+1
.
Java 5 Language Constructs
The Accrue framework currently has partial support for language features added to Java 1.5, but does not currently support language features added in later versions.
To write an analysis for JL5 programs, extend the class
accrue.jl5.AccrueJL5ExtensionInfo
. Currently the pointer
analysis and the non-null analysis explicitly support Java 5 language
features. Other analyses (e.g., the precise exception analysis) work
without modification. Note that this extension info translates away
extended-for loops and provides explicit boxing and unboxing of
primitive values to objects of wrapper classes. These translations
simplify subsequent analyses without changing the meaning of
programs, or type signatures of classes and methods.
An alternative approach to implement analyses for Java 1.5 programs
is to use the Polyglot compiler support for
translation of Java 1.5 (and later) programs to Java 1.4. This can be
achieved on the command line using the $POLYGLOT/bin/jl5c
command with the -removeJava5isms
command-line flag to
produce Java 1.4 files, on which the Accrue framework can be run.
Alternatively, the class
accrue.jl5.AccrueJL5TranslationExtensionInfo
can be used to
automatically invoke the Java 1.5 to Java 1.4 translator before
passing the resulting AST to the Accrue framework. Implementors
should subclass AccrueJL5TranslationExtensionInfo
,
passing in to the constructor the AccrueExtensionInfo
to
run on the Java 1.4 classes. Let the subclass of
AccrueJL5TranslationExtensionInfo
be called
T. Polyglot can be invoked with the subclass T as the
specified ExtensionInfo
, and will automatically translate
Java 1.5 classes to 1.4 and then invoke the specified
AccrueExtensionInfo
analysis.
In a future version of Accrue, Java 7 language features will be supported directly, without need for translation to Java 5 or to 1.4.