-
Notifications
You must be signed in to change notification settings - Fork 142
Writing an IFDS Analysis
In order to solve a data-flow analysis using IFDS you have to create an instance of the IFDSTabulationProblem
interface.
For that you create a new class that inherits from IFDSTabulationProblem
and implements all its pure-virtual functions.
A guide on how to implement these functions you can find below.
First, though, we have to supply an AnalysisDomain
as type parameter to the IFDSTabulationProblem
template.
The AnalysisDomain
is a struct that aggregates a number of type aliases that are used by the interface, including the instruction-type n_t
and the data-flow fact type d_t
.
In most cases, it is a good idea to use the pre-defined LLVMIFDSAnalysisDomainDefault
that defines the type-aliases based on the most common uses on LLVM IR (n_t = const llvm::Instruction *
and d_t = const llvm::Value *
.
If you need to customize some of these types, just create a new struct inheriting from LLVMIFDSAnalysisDomainDefault
that shadows the customized types.
So, for example to customize d_t
:
struct MyAnalysisDomain : psr::LLVMIFDSAnalysisDomainDefault {
using d_t = MyDataflowFact;
};
class MyIFDSAnalysis : public psr::IFDSTabulationProblem<MyAnalysisDomain> {
...
};
In IFDS, data-flow facts are propagated through the program in a point-wise manner.
That is, the IFDSSolver
repeatedly invokes the flow-functions per instruction for each incoming data-flow fact in isolation.
Thus, a FlowFunction
in PhASAR has the signature d_t -> set<d_t>
, computing the data-flow facts that should hold after the current statement if the analysis has already inferred that the incoming "source" data-flow fact holds before the current statement.
Unconditionally generating data-flow facts is done by artificially generating them from the special zero fact.
For the IFDSSolver
to know, what flow function should be invoked at a certain instruction, we have to override the so-called flow-function factories of the IFDSTabulationProblem
.
For each reached instruction exactly one flow-function factory is called exactly once (except for the summary flow-function factory -- see below).
The resulting flow function is cached within the solver.
There are the following flow-function factories:
-
getNormalFlowFunction
:- Intra-procedural data flow. Gets invoked for each instruction that is neither a function-call nor a return/resume.
- Use the flow-function templates from
FlowFunctions.h
to simplify the process. If none of the pre-defined helper functions fits your needs, you may want to uselambdaFlow
and provide a C++-lambda expression as flow function.
-
getCallToRetFlowFunction
:- Intra-procedural data flow at function calls. Gets invoked for each function call instruction and models all data flows that are not affected by the call.
- Consider using the helper function
mapFactsAlongsideCallSite
-
getCallFlowFunction
:- Inter-procedural data flow at function calls. Gets invoked for each callsite-callee pair and is responsible for mapping arguments at the callsite to the corresponding parameters within the callee. Additionally, globals are mapped to the callee context.
- Consider using the helper function
mapFactsToCallee
-
getRetFlowFunction
:- Inter-procedural data flow at function returns. Gets invoked for each
ret
andresume
instruction and is responsible for mapping the return-value to the callsite and optionally mapping back parameters to their corresponding arguments at the callsite. - Consider using the helper function
mapFactsToCaller
- Inter-procedural data flow at function returns. Gets invoked for each
-
getSummaryFlowFunction
[optional]:- For special functions where you want to customize their effects on the analysis. Prevents the function's body being analyzed by the IFDS analysis.
- If this function returns non-nullptr, the
getCallFlowFunction
function will not be called. - Useful for modeling the effects of compiler-intrinsics or standard-library functions
- Defaults to unconditionally returning
nullptr
In addition to the flow-function factories, you have to implement a number of miscellaneous functions:
-
ctor
: For being able to create an instance of your IFDS analysis class, you (obviously) have to provide a constructor. As the base-classIFDSTabulationProblem
does not provide a default-constructor, we have to explicitly initialize the base as well. In fact you have to provide the following arguments to the base:-
IRDB
: An instance ofLLVMProjectIRDB
that owns the LLVM IR to analyze -
EntryPoints
: A vector of (mangled) function names that identify the functions within theIRDB
where the analysis should start -
ZeroValue
[optional]: A dummy data-flow fact that is assumed to hold at each instruction unconditionally. UsuallyLLVMZeroValue::getInstance()
.
Important: If not provided as ctor argument, you have to explicitly initialize the inheritedZeroValue
data-member to a non-nullopt value in your ctor.
-
-
psr::NToString(n_t)
[optional]: Customizes, how an instruction (n_t
) should be printed, for LLVM values making use ofllvmIRToString
. For custom types defaults to.str()
,to_string(...)
oroperator<<
in decreasing priority. -
psr::DToString(d_t)
[optional]: Customizes, how a data-flow fact (d_t
) should be printed, for LLVM values making use ofllvmIRToString
orllvmIRToShortString
. For custom types defaults to.str()
,to_string(...)
oroperator<<
in decreasing priority. -
psr::FToString(d_t)
[optional]: Customizes, how a function (d_t
) should be printed, for LLVM functions only printing the name. For custom types defaults to.str()
,to_string(...)
oroperator<<
in decreasing priority. -
isZeroValue
[optional]: Checks whether the given data-flow fact is the special zero value. Defaults to==
comparison withgetZeroValue()
. -
initialSeeds
: Creates a map from instruction (n_t
) to data-flow fact (d_t
) that defines, what facts should hold unconditionally at the points where the analysis should start. Should be based on theEntryPoints
ctor parameter. You may want to use the utilitiesaddSeedsForStartingPoints
orforallStartingPoints
for this. -
emitTextReport
[optional]: Customizes, how the analysis results should be printed to the terminal.
The flow-function factories use the custom type FlowFunctionPtrType
as a return type.
This type hides the memory management mechanism used from the actual analysis interface.
Currently, FlowFunctionPtrType
maps to std::shared_ptr<FlowFunction<D>>
, but this may change in the future.
- Home
- Reference Material
- Getting Started:
- Building PhASAR
- Using PhASAR with Docker
- A few uses of PhASAR
- Coding Conventions
- Contributing to PhASAR
- Errors and bug reporting
- Update to Newer LLVM Versions
- OS Support