Skip to content

Writing an IDE Analysis

Fabian Schiebel edited this page Aug 11, 2024 · 6 revisions

In order to solve a data-flow analysis using IDE you have to create an instance of the IDETabulationProblem interface. For that you create a new class that inherits from IDETabulationProblem and implements all its pure-virtual functions, similar to IFDS. In fact the IFDSTabulationProblem inherits from IDETabulationProblem and defaults all the IDE-related parts. So, in this guide, we will focus on the additional API functions that were not necessary to implement for IFDS. If you haven't read Writing an IFDS Analysis yet, we recommend to complete it before starting with this page.

In theory, IDE is a generalization of IFDS and adds more features to it, such as edge functions that are annotated to all computed data-flow edges. These edge functions are used to solve an additional value-computation problem while solving the data-flow problem.

Analysis Domain

As an IDE analysis -- in contrast to IFDS -- solves an additional value-computation problem, in the analysis domain we have to specify the type l_t of edge-values to compute:

struct MyAnalysisDomain : psr::LLVMAnalysisDomainDefault {
  using d_t = MyDataflowFact; // optional
  using l_t = MyEdgeValue;    // required
};

class MyIDEAnalysis : public psr::IDETabulationProblem<MyAnalysisDomain> {
  ...
};

Edge-Function Factories

Similar to IFDS flow functions, the IDETabulationProblem requires implementing factories to create edge functions. Each edge function is attached to a data-flow edge (n_t, d_t) -> (n_t, d_t). As each edge-function factory receives such a data-flow edge as parameters, the usual task to perform within such a EF factory is to identify the edge and select the right edge-function to attach. You can find some pre-defined edge functions in EdgeFunctionUtils.h.

In the following, we will consider a simplified version of the IDELinearConstantAnalysis as example. The linear constant analysis computes integer values for variables that always have a constant value based on linear arithmetic operations (were always at most one operand has a non-literal value). The task of the edge functions in this case is to transform the constant value of the one non-literal operand of an arithmetic instruction based on the performed operation.

There are the following edge-function factories:

  • getNormalEdgeFunction:
    • Returns edge functions for data-flow edges created by normal-flow-functions.
    • Example: Arithmetic instructions for linear constant analysis:
      ...
      if (llvm::isa<llvm::BinaryOperator>(Curr) // The source-instruction of the data-flow edge is a binary operation
          && SuccNode == Curr                   // We have generated the SSA value produced by this instruction as data-flow fact
          && CurrNode != SuccNode               // We have generated the new fact from an operand
          ) {
        unsigned Op = Curr->getOpcode();
        auto *Lop = Curr->getOperand(0);
        auto *Rop = Curr->getOperand(1);
        // For non linear constant computation we propagate bottom
        if ((CurrNode == Lop && !llvm::isa<llvm::ConstantInt>(Rop)) ||
            (CurrNode == Rop && !llvm::isa<llvm::ConstantInt>(Lop))) {
          return AllBottom<l_t>{};
        }
      
        // Attach the arithmetic transformer to this edge
        return lca::BinOp{Op, llvm::dyn_cast<llvm::ConstantInt>(Lop), llvm::dyn_cast<llvm::ConstantInt>(rop)}; 
      }
      ...
  • getCallToRetEdgeFunction:
    • Returns edge functions for data-flow edges created by call-to-return flow-functions.
    • Usually EdgeIdentity
  • getCallEdgeFunction:
    • Returns edge functions for data-flow edges created by call-flow-functions.
    • Usually EdgeIdentity, however, for facts that are generated from zero in the Call-FF we may need to attach something else. For example in the linear constant analysis, we need to attach a constant value to parameters that are passed with a literal:
      if (isZeroValue(SrcNode) && !isZeroValue(DestNode)) {                // We have generated the target fact from zero -- literal argument
        if (const auto *A = llvm::dyn_cast<llvm::Argument>(DestNode)) {    // It really was an argument (no global or sth else)
          const auto *CS = llvm::cast<llvm::CallBase>(CallSite);
          const auto *Actual = CS->getArgOperand(A->getArgNo());           // Get the corresponding argument value at the call-site
          if (const auto *CI = llvm::dyn_cast<llvm::ConstantInt>(Actual)) {// It was really a literal
            auto IntConst = CI->getSExtValue();
            return lca::GenConstant{IntConst};                             // The parameter within the callee should then have that very constant value
          }
        }
      }
      return EdgeIdentity<l_t>{};                                          // Otherwise, a call does not modify the parameter values
  • getReturnEdgeFunction:
    • Returns edge functions for data-flow edges created by return-flow-functions.
    • Usually EdgeIdentity, however if the return-value was generated from zero, we may want to attach something else. For the linear constant analysis, for example, we have a special case when a literal constant is returned:
      if (isZeroValue(ExitNode) && !isZeroValue(RetNode)) {                     // We have generated the target fact from zero -- literal return value
        const auto *Return = llvm::cast<llvm::ReturnInst>(ExitStmt);
        auto *ReturnValue = Return->getReturnValue();
        if (auto *CI = llvm::dyn_cast_or_null<llvm::ConstantInt>(ReturnValue)) {// The return-value is indeed a literal
          auto IntConst = CI->getSExtValue();
          return lca::GenConstant{IntConst};                                    // The SSA value at the call-site should then have that returned value
        }
      }
      return EdgeIdentity<l_t>{};                                               // Otherwise, returning from a function does not modify values
  • getSummaryEdgeFunction [optional]:
    • Returns edge functions for data-flow edges created by summary-flow-functions.
    • Attach special pre-defined value transformations to summary edges
    • For linear constant analysis, this may for example be used to model intrinsics like addWithOverflow, __builtin_clz, ...

Misc Functions

Similar to IFDS, for IDE we also have to implement some more miscellaneous functions (in addition to the ones for IFDS):

  • psr::LToString(l_t) [optional]: Customizes, how an edge-value(l_t) should be printed. Defaults to .str(), to_string(...) or operator<< in decreasing priority.
  • allTopFunction [optional]: An edge function that always returns the top-value of the value-domain. Defaults to AllTop<l_t>{} in case l_t implements the JoinLatticeTraits.

For the edge-values you have to either implement the JoinLatticeTraits (recommended) or implement three more functions on the analysis problem:

Inside the analysis problem:

  • topElement: The bottom value of the value lattice (Yes, we call the bottom element Top and vice versa!)
  • bottomElement: The top value of the value lattice
  • join: Merge two edge values together, going up the lattice (towards bottomElement)

Via JoinLatticeTraits you essentially implement the same functions, just tied to the edge-value type l_t.

In the development version of phasar, it is also possible to implement the following functions in the analysis problem:

  • extend [optional]: Edge-function composition. For details, see compose below. If provided, this functions takes precedence over compose of your edge-functions. Sometimes, it is easier to implement this functionality within the problem, e.g., if caching is involved.
  • combine [optional]: Edge-function join. For details, see join below. If provided, this functions takes precedence over join of your edge-functions. Sometimes, it is easier to implement this functionality within the problem, e.g., if caching is involved.

It is recommended to implement either both of extend and combine, or none.

Edge Functions

An edge function models a value transformation that is attached to a data-flow edge. The value transformation is represented as a function with the signature l_t -> l_t. However, an edge function additionally models a lattice and a bounded idempotent semiring (For a formal explanation, refer to Definition 5 within Synchronized Pushdown Systems for Pointer and Data-Flow Analysis. That is, we have to provide a few more member-functions to each edge function.

In the edge-function factories above, we have already seen the lca::BinOp edge function in action. Now, let's take a look at how it is implemented:

struct BinOp {
  using l_t = LatticeDomain<int64_t>;      // The type of the edge-values

  unsigned OpCode;                         // The kind of arithmetic operation we want to perform
  llvm::ConstantInt *LeftConst, *RightConst; // Necessary data that we want to work on (see "Edge-Function Factories" above for reference)

  l_t computeTarget(l_t Source) const {...}

  EdgeFunction<l_t> compose(EdgeFunctionRef<BinOp> This, const EdgeFunction<l_t> &SecondFunction){...}
  EdgeFunction<l_t> join(EdgeFunctionRef<BinOp> This, const EdgeFunction<l_t> &OtherFunction){...}

  bool isConstant() const noexcept{...}

  bool operator==(const BinOp &BOP) const noexcept {
    return BOP.Op == Op && BOP.LeftConst == LeftConst && BOP.RightConst == RightConst;
  }

  friend llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
                                       const BinOp &BOP) {
    ...
  }
};

In PhASAR an edge function consists of the following member functions:

  • computeTarget:
    • The actual edge function describing the value-transformation
    • Example lca::BinOp:
      l_t computeTarget(l_t Source) const {
        if (LeftConst && RightConst) { // Simple constant-folding
          return executeBinOperation(Op, LeftConst->getSExtValue(), RightConst->getSExtValue());
        }
        if (Source == Bottom{}) { 
          // Bottom is the top-value of our lattice. Whatever we do to it, it will always stay Bottom
          return Source;
        }
      
        // Now, perform the linear arithmetic.
        // First, we have to check, which of the both operands is the literal
        if (RightConst) { 
          // The right operand is the literal, so we plug in the incoming value as left operand
          return executeBinOperation(Op, Source, RightConst->getSExtValue());
        }
        if (LeftConst) { 
          // The left operand is the literal, so we plug in the incoming value as right operand
          return executeBinOperation(Op, LeftConst->getSExtValue(), Source);
        }
      
        llvm::report_fatal_error(
            "Only linear constant propagation can be specified!");
      }
  • compose:
    • Function composition. Although PhASAR provides a pre-defined EdgeFunctionComposer, it is recommend to immediately reduce edge-function composition to ensure computeTarget can be executed in constant time.
    • To ensure semantical correctness, please make sure that compose really has the semantics of function composition, that is: Given two edge functions f and g, for each edge-value x: l_t it must hold: compose(f, g).computeTarget(x) == g.computeTarget(f.computeTarget(x)).
    • Neutral element: EdgeIdentity
    • Annihilator: ConstantEdgeFunction
    • Example lca::BinOp:
      static EdgeFunction<l_t> compose(EdgeFunctionRef<BinOp> This,
                                       const EdgeFunction<l_t> &SecondFunction) {
        // Trivial compositions can be defaulted:
        if (auto Default = defaultComposeOrNull(This, SecondFunction)) {
          return Default;
        }
        
        // Here, we could for example add transformations like:
        // compose(BinOp{Add, ..., Const1}, BinOp{Add, ..., Const2}) --> BinOp{Add, ..., Const1 + Const2}
      
        // Fallback for when we don't know better:
        return LCAEdgeFunctionComposer{This, SecondFunction};
      }
  • join:
    • Lattice join. Merge two edge functions together, going up the lattice (towards AllBottom<l_t>).
    • To ensure semantical correctness, please make sure that going up the lattice on the edge-function domain also means going-up the value-lattice on the computed edge-values, that is: Given two edge functions f and g, for each edge-value x: l_t it must hold (with joinl = JoinLatticeTraits<l_t>::join): joinl(join(f, g).computeTarget(x)), joinl(f.computeTarget(x), g.computeTarget(x))) == join(f, g).computeTarget(x)
    • Neutral element: AllTop
    • Annihilator: AllBottom
    • Example lca::BinOp:
      static EdgeFunction<l_t> join(EdgeFunctionRef<BinOp> This,
                                    const EdgeFunction<l_t> &OtherFunction) {
        // Trivial joins can be defaulted:
        if (auto Default = defaultJoinOrNull(This, OtherFunction)) {
          return Default;
        }
      
        // Here we could, e.g., check whether the two edge functions are semantically equivalent, althouth different and then return one of them
      
        // Sound fallback in case, we don't know better
        return AllBottom<l_t>{};
      }
  • isConstant [optional]:
    • Hint to the solver that this edge function always computes the same value, no matter the input
    • Defaults to unconditionally returning false
    • Example lca::BinOp:
    bool isConstant() const noexcept {
      // If both operands of this binary operation are literal constant, this edge function always computes the same value
      return LeftConst && RightConst;
    }
  • operator==:
    • Compares two edge functions for equality.
    • Not needed if the edge function stores no data, i.e. std::is_empty_v holds for the edge function type
  • operator<< [optional]:
    • Customizes, how the edge function should be printed
    • Defaults to printing the fully-qualified type name

Memory Management

For Memory Management of flow functions please refer to "Writing an IFDS Analysis".

Our edge-function factories use the custom type EdgeFunction as a return type. This type provides a very efficient and low overhead type-erasing mechanism for edge functions. All the builtin edge-function implementations as well as you own edge-function classes are implicitly convertible to EdgeFunction<l_t> as long as they provide implementations for all required API functions (see Edge Functions). The EdgeFunction class then takes care about lifetime and ownership of the respective edge function. If your edge function types are large (> sizeof(void *)) you may want to check out the EdgeFunctionSingletonCache for caching; otherwise we use small-object-optimization.

Clone this wiki locally