Skip to content

Writing an IFDS Analysis

Fabian Schiebel edited this page Sep 6, 2023 · 6 revisions

In order to solve a data-flow analysis using IFDS you have to create an instance of the IFDSTabulationProblem interface. For that you create a new class that inherits from IFDSTabulationProblemand implements all its pure-virtual functions. A guide on how to implement these functions you can find below.

Analysis Domain

First, though, we have to supply an AnalysisDomain as type parameter to the IFDSTabulationProblem template. The AnalysisDomain is a struct that aggregates a number of type aliases that are used by the interface, including the instruction-type n_t and the data-flow fact type d_t. In most cases, it is a good idea to use the pre-defined LLVMIFDSAnalysisDomainDefault that defines the type-aliases based on the most common uses on LLVM IR (n_t = const llvm::Instruction * and d_t = const llvm::Value *. If you need to customize some of these types, just create a new struct inheriting from LLVMIFDSAnalysisDomainDefault that shadows the customized types. So, for example to customize d_t:

struct MyAnalysisDomain : psr::LLVMIFDSAnalysisDomainDefault {
  using d_t = MyDataflowFact;
};

class MyIFDSAnalysis : public psr::IFDSTabulationProblem<MyAnalysisDomain> {
  ...
};

Flow Functions

In IFDS, data-flow facts are propagated through the program in a point-wise manner. That is, the IFDSSolver repeatedly invokes the flow-functions per instruction for each incoming data-flow fact in isolation. Thus, a FlowFunction in PhASAR has the signature d_t -> set<d_t>, computing the data-flow facts that should hold after the current statement if the analysis has already inferred that the incoming "source" data-flow fact holds before the current statement. Unconditionally generating data-flow facts is done by artificially generating them from the special zero fact.

For the IFDSSolver to know, what flow function should be invoked at a certain instruction, we have to override the so-called flow-function factories of the IFDSTabulationProblem. For each reached instruction exactly one flow-function factory is called exactly once (except for the summary flow-function factory -- see below). The resulting flow function is cached within the solver.

There are the following flow-function factories:

  • getNormalFlowFunction:
    • Intra-procedural data flow. Gets invoked for each instruction that is neither a function-call nor a return/resume.
    • Use the flow-function templates from FlowFunctions.h to simplify the process. If none of the pre-defined helper functions fits your needs, you may want to use lambdaFlow and provide a C++-lambda expression as flow function.
  • getCallToRetFlowFunction:
    • Intra-procedural data flow at function calls. Gets invoked for each function call instruction and models all data flows that are not affected by the call.
    • Consider using the helper function mapFactsAlongsideCallSite
  • getCallFlowFunction:
    • Inter-procedural data flow at function calls. Gets invoked for each callsite-callee pair and is responsible for mapping arguments at the callsite to the corresponding parameters within the callee. Additionally, globals are mapped to the callee context.
    • Consider using the helper function mapFactsToCallee
  • getRetFlowFunction:
    • Inter-procedural data flow at function returns. Gets invoked for each ret and resume instruction and is responsible for mapping the return-value to the callsite and optionally mapping back parameters to their corresponding arguments at the callsite.
    • Consider using the helper function mapFactsToCaller
  • getSummaryFlowFunction [optional]:
    • For special functions where you want to customize their effects on the analysis. Prevents the function's body being analyzed by the IFDS analysis.
    • If this function returns non-nullptr, the getCallFlowFunction function will not be called.
    • Useful for modeling the effects of compiler-intrinsics or standard-library functions
    • Defaults to unconditionally returning nullptr

Misc Functions

In addition to the flow-function factories, you have to implement a number of miscellaneous functions:

  • ctor: For being able to create an instance of your IFDS analysis class, you (obviously) have to provide a constructor. As the base-class IFDSTabulationProblem does not provide a default-constructor, we have to explicitly initialize the base as well. In fact you have to provide the following arguments to the base:
    • IRDB: An instance of LLVMProjectIRDB that owns the LLVM IR to analyze
    • EntryPoints: A vector of (mangled) function names that identify the functions within the IRDB where the analysis should start
    • ZeroValue [optional]: A dummy data-flow fact that is assumed to hold at each instruction unconditionally. Usually LLVMZeroValue::getInstance().
      Important: If not provided as ctor argument, you have to explicitly initialize the inherited ZeroValue data-member to a non-nullopt value in your ctor.
  • psr::NToString(n_t) [optional]: Customizes, how an instruction (n_t) should be printed, for LLVM values making use of llvmIRToString. For custom types defaults to .str(), to_string(...) or operator<< in decreasing priority.
  • psr::DToString(d_t) [optional]: Customizes, how a data-flow fact (d_t) should be printed, for LLVM values making use of llvmIRToString or llvmIRToShortString. For custom types defaults to .str(), to_string(...) or operator<< in decreasing priority.
  • psr::FToString(d_t) [optional]: Customizes, how a function (d_t) should be printed, for LLVM functions only printing the name. For custom types defaults to .str(), to_string(...) or operator<< in decreasing priority.
  • isZeroValue [optional]: Checks whether the given data-flow fact is the special zero value. Defaults to == comparison with getZeroValue().
  • initialSeeds: Creates a map from instruction (n_t) to data-flow fact (d_t) that defines, what facts should hold unconditionally at the points where the analysis should start. Should be based on the EntryPoints ctor parameter. You may want to use the utilities addSeedsForStartingPoints or forallStartingPoints for this.
  • emitTextReport [optional]: Customizes, how the analysis results should be printed to the terminal.

Memory Management

The flow-function factories use the custom type FlowFunctionPtrType as a return type. This type hides the memory management mechanism used from the actual analysis interface. Currently, FlowFunctionPtrType maps to std::shared_ptr<FlowFunction<D>>, but this may change in the future.

Clone this wiki locally