Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optimization Barriers #4763

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

georgerennie
Copy link
Collaborator

This adds the $barrier cell type which represents an optimization barrier that Yosys should not optimize through. Changes include:

  • Added the cell to the kernel
  • Added support in some of the common backends, treating $barrier as just a buffer. I also added $buf support while I was at it where it didn't exist.
  • Updated opt_merge to ignore $barrier
  • Added -barriers to flatten that will insert barriers to drive all public wires driven by the flattening
  • Added optbarriers command to insert/remove barriers everywhere that public wires are driven. This has the following options:
    • -nocells - don't add barriers for the outputs of cells
    • -noprocs - don't add barriers for the outputs of processes
    • -private - add barriers for private wires as well as public
    • -remove - convert selected barriers back to connections
  • Added -barriers flag to prep that calls optbarriers at the start and passes -barriers to flatten. This should be a drop in replacement for a regular prep call that has the benefit of maintaining all the structure among public wires.

This still needs tests and probably some discussion about what the flags/defaults should be as currently this is very conservative on the optimizations allowed. I wasn't sure exactly what optimizations people would want to opt into but attributes can be added to $barrier to do so.

* This acts as an optimization barrier, limiting the rewriting allowed
* For now this forbids all optimization, but attributes can be added to
  opt in to specific optimizations as required
* This uses $barrier optimization barriers to connect wires into the
  flattened module instead of connections
* This can optionally ignore rewriting the outputs of cells or processes
* This by default rewrites drivers of wires with public names but can
  also optionally rewrite drivers of wires with private names
* A -remove flag allows cleaning up the design by replacing barriers
  with connections
* This uses optbarriers and flatten -barriers to insert barriers on
  public wires in the design, limiting the optimization that can affect
  them
@whitequark
Copy link
Member

Is there a background discussion on this cell somewhere? I'm interested in it as it's similar to a $freeze cell (named by analogy to the proposed LLVM instruction) that I've suggested about two years ago.

@georgerennie
Copy link
Collaborator Author

georgerennie commented Nov 20, 2024

I brought it up at the Dev JF on monday but we didn't write down too much about the conclusions. The gist of it is that there are some applications for Yosys where it is more important to retain some amount of the structural information rather than just the semantic. This has caused issues like #3426 in the past and has given me similar issues recently where I want to structurally rewrite the drivers for some nodes which causes semantic changes to the design for formal verification. The alternative approach to something like optimization barriers is to not do the optimizations in the first place, but in my experience there tends to be lots of logic involved with private wires that cause a blowup in design size and can be cleaned away straightforwardly, the structure with respect to public wires is usually all that matters.

Am open to any suggestions/critiques about how to go about this, this is just a draft for one way it can be done.

@georgerennie
Copy link
Collaborator Author

I think I may need a bit more care with how procs get rewritten, there are some unhelpful interactions like the below

read_verilog <<EOT
module top(input wire clk, input wire rst, input wire [7:0] a, output reg [7:0] b);

always @(posedge clk or posedge rst) begin
	if (rst)
		;
	else
		b <= a;
end

endmodule
EOT

optbarriers
proc
clean
dump

without optbarriers this generated a $dff as expected, with it it generates an $aldff. This is because proc_arst rewrites this to

if (rst)
    barrier_in <= barrier_out;
else
    barrier_in <= a;

proc_dff can normally see that barrier_in and barrier_out are the same wire and so optimizes that case away into b <= rst ? b : a, but with the barrier it can't see that the two are the same.

I think the right solution to this is that as well as rewriting signals to come from a barrier when they appear on the lhs of assignments in processes, any times a signal appears on the rhs of an assignment in the same process in which it is assigned in an lhs we should rewrite the rhs to use the pre-barrier version of the signal. Maybe this should only be done for the default assignment at the top of the process though? I haven't quite worked this out yet (putting it here so I remember what the issue was when I come back to this).

@jix
Copy link
Member

jix commented Nov 23, 2024

Is there a background discussion on this cell somewhere? I'm interested in it as it's similar to a $freeze cell (named by analogy to the proposed LLVM instruction) that I've suggested about two years ago.

Since variants of this were brought up independently and for different use cases, I made an attempt now to summarize the motivating use cases that I'm aware of and how it would apply there:

  • Controlling 'x propagation during synthesis
    • Configured as a barrier that is transparent for values that are statically known to be always fully defined and opaque otherwise
    • Ensures that treating an 'x input-bit as a don't care and replacing it with an arbitrary value results in a consistent choice for all users as the barrier will inhibit any x-propagation or other optimization peeking across it until such a choice is fixed
    • @whitequark suggested this 5 years ago already: class Memory: add reset_less parameter that will not initialize memories with 0 value in generated RTL. amaranth-lang/amaranth#270 (comment) (2nd to last paragraph of the comment)
    • This could be done in such a way that a single potentially-'x bit makes the full barrier opaque or it could be treated as a bitwise barrier
    • It's not clear to me whether and how this would be generated automatically / implicitly, but I think the answer to that question wouldn't affect the requirements for the cell itself
      • The 1364.1 standard for verilog RTL synthesis (that was never updated for SV) restricts the use of x to instances where it is directly assigned to a signal. It also states "The value x shall not be used with any operators or mixed with other expressions.". It also separately states that for synthesis making an x assignment will result in that signal being treated as a don't care. I'd say 1364.1 doesn't clearly specify the intended overall behavior and is open to interpretation, but my interpretation would be that the idea was to allow direct x assignments to signals for don't cares but require that they are resolved at the point of assignment as that's the only way to ensure that x values do not end up being "used with any operators"
      • In practice people expect to be able to use x without such restriction, and, since the restriction is what motivated the interpretation that x has to be resolved at the point of assignment, I have no idea whether treating assignments as resolving don't cares is still reasonable
    • In this configuration it is the same idea as LLVM's freeze
      • That goes back to the 2017 paper "Taming Undefined Behavior in LLVM"
      • The language reference for LLVM's freeze instruction doesn't mention partially undef values, but my understanding is that it acts bitwise on partially undef values and there being no such thing as a partially poison scalar value (but I only have superficial knowledge of LLVM)
  • Provide consistent 'x-as-unconstrained behavior during 2-valued formal verification
    • The default behavior for FV is to use 2-valued logic with no x-propagation where any 'x value is treated as an unconstrained value (converted to $anyseq etc. during the FV preparation flow)
    • Currently the FV flow depends on prep calling opt -keepdc to prevent selecting arbitrary concrete values for don't care bits before the part of the flow that's run by SBY has a chance to convert x constants to $anyseq
      • This is quite fragile, I discovered multiple places in opt passes that were missing an explicit check for keepdc and I'm not super confident there aren't any left that are just harder to hit
      • It's much easier to miss a check for a -keepdc option by accident (especially if it was added as an afterthought) than accidentally traversing a barrier.
      • Operators that can produce 'x bits from fully defined values (division, etc) are not consistently treated as producing unconstrained values in that instance
      • It doesn't prevent cases where the frontend already does x-propagation
      • It also doesn't prevent duplicating x values as that happens with assign a = 'x; assign b = a and running clean already
    • The idea to properly fix this would be to
      • Insert barriers around any x constant as early as possible (would need frontend changes), configured for the same freeze-like behavior as in the previous synthesis use case
      • For cells that can produce x values from fully defined inputs add the same kind of barrier behind the output and eventually insert a mux to $anyseq for input values where the output is undefined. The barrier makes it safe to run other optimizations in between that might even get rid of the cell
      • Since elaboration of parameters requires evaluating potentially x containing values, I'd argue that the frontend should still do x-propagation for operators where all operands are constants. Importantly this does not include simplifying a + 'x to 'x when a is a signal even when there is an assign a = 'x nearby.
        • This would ensure consistent behavior when moving subexpressions between a signal vs a constant context (as an expression referring to a signal cannot be used in a constant context).
  • Preserving the module structure in a flattened design
    • For public signals we can already preserve the structure using hdlname and $scopeinfo, but for synthesis flow we might not want to or even be able to preserve all internal public signals
    • Flatten would insert barriers in place of flattened module ports
    • Different use cases for this might use different barrier configurations, depending on the amount of cross-module optimization that is wanted
      • Preventing all optimizations can be useful if the goal is to export to some backend that doesn't support submodules (e.g. BTOR2) while still being able to attribute every primitive to its containing module
      • Even when the module hierarchy is to be fully preserved in the sense that every post-synthesis cell being attributable to its source module, it would still be reasonable to allow removing unused logic, constant propagation or even optimizations that rewrite logic locally but make use of invariants derived from the input and/or output cone crossing a module boundary
      • More extreme but still precisely preserving the boundary would be to allow optimizations that re-encode signals crossing a boundary, similar to how FSM reencoding works
  • (Selectively) preserving the connectivity structure within a module
    • Manually with a (* dont_touch *) attribute that's stronger than (* keep *)
      • Whereas (* keep *) ensure that logic equivalent to the signal is kept, (* dont_touch *) ensures that all uses of the signal will use the kept logic
      • This requires making the signal fully opaque for all uses, i.e. corresponds to inserting a full optimization barrier
    • Automatically for analysis and/or verification methods that need to preserve the connectivity structure of the input design
      • Reasons for this that I can think of are methods that
        • report results back in terms of paths in the design or check a property specified in terms of that
        • derive useful heuristics from the connectivity structure of the design
        • perform some analysis using per-primitive approximations where performing logically valid optimizations that change the connectivity structure would still affect the quality of the results in ways that are hard to understand and/or control for the user
      • The appropriate kind of barrier to use might vary between different analyses
      • I believe @georgerennie's use case would also fit into this category

@whitequark
Copy link
Member

Thanks for such a thorough analysis, @jix!

@mattyoung101
Copy link

I will also chime in and add that, if I'm understanding this pull correctly, this would be an incredibly useful feature for my project. I'm working on a triple modular redundancy plugin, and one major unresolved problem is passes like opt_share identifying the redundant logic as redundant (which it technically is!) and optimising it out.

If I could instead annotate cells with $barrier and have Yosys avoid optimising them, this would be a major help and mean I can move the TMR pass much earlier in the design pipeline and have better control of granularity. Currently, it has to run very late, after all the opt passes have completed.

@georgerennie
Copy link
Collaborator Author

georgerennie commented Dec 2, 2024

Have fixed the process issue but it adds a reasonable bit more complexity. The idea is that a path through a set of assignments in a process from a variable to itself should see the pre-barrier version of itself, as this is needed in proc_dff to catch where an async rule doesn't change an FFs value. We do this by finding such cyclic paths through the assignment graph and duplicating them out to have a copy that is identical except its leaves see the pre-barrier version of the variable.

I have an example on a real design that matches the pattern I commented earlier in the thread where this more careful rewriting is needed to prevent an $aldff being inferred that fails check due to a comb path from the output back to the aload input. This is also helped by #4781, but obviously is primarily relevant for the verilog frontend.

On a reasonable sized design running proc; opt_clean I get the following stats running optbarriers first or not (only showing the cell counts that differ and flops:

   Number of wires:              76444
   Number of wire bits:         679737
   Number of public wires:       23528
   Number of public wire bits:  308522
   Number of ports:              14944
   Number of port bits:         172522
   Number of memories:               5
   Number of memory bits:          880
   Number of processes:              0
   Number of cells:              64408
     $adff                        1746
     $and                         2956
     $barrier                    16413
     $dff                            8
     $dlatch                       256
     $eq                          9056
     $logic_not                   2337
     $mux                        20295
     $not                         2788
     $or                          1928
     $pmux                        2431
   Number of wires:              62813
   Number of wire bits:         495929
   Number of public wires:       24379
   Number of public wire bits:  314651
   Number of ports:              14944
   Number of port bits:         172522
   Number of memories:               5
   Number of memory bits:          704
   Number of processes:              0
   Number of cells:              47385
     $adff                        1746
     $and                         2836
     $dff                            8
     $dlatch                       256
     $eq                          9044
     $logic_not                   2303
     $mux                        19912
     $not                         2736
     $or                          1920
     $pmux                        2430

The flop/latch inferences are the same (which is not true without the more complex process rewriting) and the rest of the numbers seem to be in the reasonable ranges for what id expect, although I haven't done a proper equivalence check or anything yet and this could probably do with tests. Not sure when I will get much time to add tests at the moment. Other than that I think the logic should be done so I will mark as ready for review for now.

@georgerennie georgerennie marked this pull request as ready for review December 2, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants