-
Notifications
You must be signed in to change notification settings - Fork 177
RFC: Emit sync-read mems intact, with readwrite ports if applicable #2092
Conversation
d7185ad
to
a208309
Compare
I think this is a great idea and I am in general for emitting sync read mems directly by default (unless there are significant behavioral differences but I don't think there are, right?). Could you add an example to your PR that shows how the Verilog changes? |
a208309
to
e3649bd
Compare
Here is a basic example of a 1R1W sync-read memory. The details around disabled behavior are easy to change, and there isn't really a single answer for the right level of conservative behavior. Though I consider that and random initialization to be details subject to change once it's decided whether native sync-read mem emission is a good idea, these aspects do tend to have a heavy influence on the visual difference in the code.
// Current Verilog: // New Verilog:
module syncreadmem( module syncreadmem(
input clk, input clk,
input [5:0] addr, input [5:0] addr,
input we, input re,
input re, input we,
input [3:0] wdata, input [3:0] wdata,
output [3:0] rdata output [3:0] rdata
); );
reg [3:0] m [0:63]; reg [3:0] m [0:63];
wire [3:0] m_r_data; wire m_r_en = re;
wire [5:0] m_r_addr; // [Added comment: note that this is now a reg]
wire [3:0] m_w_data; reg [5:0] m_r_addr;
wire [5:0] m_w_addr; wire [3:0] m_r_data;
wire m_w_mask; wire [3:0] m_w_data = wdata;
wire m_w_en; wire [5:0] m_w_addr = addr;
reg [5:0] m_r_addr_pipe_0; wire m_w_mask = we;
assign m_r_addr = m_r_addr_pipe_0; wire m_w_en = we;
assign m_r_data = m[m_r_addr]; assign m_r_data = m[m_r_addr];
assign m_w_data = wdata; assign rdata = m_r_data;
assign m_w_addr = addr; always @(posedge clk) begin
assign m_w_mask = we; if(m_r_en) begin
assign m_w_en = we; m_r_addr <= addr;
assign rdata = m_r_data; end
always @(posedge clk) begin if(m_w_en & m_w_mask) begin
if(m_w_en & m_w_mask) begin m[m_w_addr] <= m_w_data;
m[m_w_addr] <= m_w_data; end
end end
m_r_addr_pipe_0 <= addr; endmodule
end
endmodule |
I just ran the example through For the old style emission I get:
For the new emission:
New emission with
So for yosys's ice40 flow, the new emission style does not seems to make a difference. Sorry if these questions are naive, but I don't have a lot of experience with BRAM inference and I am trying to figure out in which situations the old emission style was sub-optimal. |
That makes complete sense, I should have posted an example that more clearly motivates the problem. For reference, this is the solution to the issue presented in chipsalliance/chisel#1788. I will add a two-RW example in this thread. Unfortunately, I don't think it will be possible to highlight this issue with Yosys, since iCE40 Embedded Block Rams (EBRs) support either 1R1W or single-RW port configurations, so splitting an RW port is not as meaningful of a barrier to efficient synthesis in an iCE40-based flow as it would be in modern Xilinx-based flow.
As an aside, there are many competing arguments for many different variants of "what to do with disabled ports." The "disable some synchronous element" variant might cost some small amount of logic, but it can often save power. It's completely possible to make either approach behave nearly any way; the current disabled-port behavior is more-or-less arbitrarily dictated by VerilogMemDelays and does not necessarily represent an optimum. There are three primary concerns: synthesis mapping, power, and verification pessimism. There are also three approaches to balance these: compromise-striking emission patterns, specialize backends, and conditional compilation. In general, I would contend that retaining memories intact to the emitter (the real effect of this PR) gives us more semantic information to strike a better balance, but the potential for increased Scala implementation complexity creates risk. |
Funny enough, I initially wrote the formal backend without using What was the initial reason for splitting read/write ports into two? Could we change The other big problem I know about wrt. memory emission is that we split up non-ground-type memories. This prevents synthesis tools from inferring write masks properly. |
I don't know what the original justification was, but I assume VerilogMemDelays owes its existence to Scala implementation simplicity. FIRRTL supports a very broad range of memory behaviors, and handling them all in VerilogMemDelays is simpler than emitting them.
Changing VerilogMemDelays to not do that is the core feature added by this PR. To elaborate: keeping RW ports intact prevents lowering to combinational-read memories, so it necessitates adding support for both sync-read-mem and RW-port emission to the emitter. This PR adds that support and modifies VerilogMemDelays to be less invasive in its modification of sync-read memories. The space of what memories to apply this to and what flags to export to the user is a downstream problem to the technical changes to the emitter to enable this at all.
The bigger problem is not just not inferring write masks properly, but the fact that it often prevents memories where the mask is not meaningfully used at all from being correctly inferred as a single memory. There currently exists a |
High level: this makes sense. The memory lowering in FIRRTL (without An immediate related concern is #856 ( Longer term, BRAM compatibility with a specific target vendor would be ideal, but may be tedious in practice. (I don't know about what differences, if any, exist between Xilinx, Altera, or Yosys tooling with respect to BRAM inference, but I expect there are things which are not compatible and I expect write masks are weird. Some Googling got me to: https://danstrother.com/2010/09/11/inferring-rams-in-fpgas/.) This is exacerbated by the fact that the Verilog Emitter is not particularly flexible... Towards a more flexible Verilog Emitter... I've been doing a lot of work on Thanks for hacking on this @albert-magyar. I'll add this to the agenda for Monday's dev meeting to get some broader feedback on this RFC. |
I don't believe that
I think this has mostly been a symptom of people being afraid to touch the Verilog emitter.
I believe that the C++ firrtl compiler (CFC? analogous to SFC?) will eventually replace the more researchy Scala implementation. However until then there seems to be at least a bunch of deployment issues to be solved (like how do you ship a shared object with a Scala library?) so I think it makes sense to continue adding features to the SFC. |
@ekiwi wrote:
The problem I see is that any non-trivial Verilog generation requires an actual Verilog IR to lower to. The current implementation doing, roughly, Adding Verilog IR or pushing more towards custom IR nodes is tractable (like what @ekiwi wrote:
Yeah, I don't know what to call this. I've been going with "MLIR FIRRTL Compiler" (MFC). No reason to stop dev here. I do expect that eventually Verilog emission hacking will become far easier with the MFC, though. (And I do have a published prototype that will let you do mixed lowering with the SFC to a point and then let the MFC take over, assuming that you have the MFC installed on your path: https://github.com/sifive/chisel-circt.) |
This is a very appreciated initiative! Besides Vivado and yosys, we should also check Quartus. I can help with that. From my experience with FPGA memories there are all three behaviors of read during write (old data - seldom, new data - sometimes by synthesize done with forwarding, undefined - which is probably a possible mix of old and new bits). AFAIK undefined is common, but cannot really be expressed in Verilog or VHDL. In most cases, I do an explicit forwarding to get the newly written data. |
@albert-magyar Could you rebase this on the latest master branch please? I fear that since I added a dependency on |
Superseded by #2111 |
The default handling of sync-read memories by the FIRRTL compiler has been mentioned in several issues, including recently in chipsalliance/chisel#1788. While
--repl-seq-mem
can be used along with Verilog memory-generation flows, the emission of native FIRRTL memories has been a point of surprise and an FPGA QoR hurdle for new Chisel/FIRRTL users.I think arguments can be made either way about adding this feature, so I'd like to hear what anyone thinks about:
--repl-seq-mem
).If there's strong sentiment either way, it's easy to tweak this for different disabled/address-out-of-range behavior.
Type of improvement: Verilog emission
API impact: No public API changes, but passes downstream of VerilogMemDelays may now see sync-read mems.
Backend code-generation impact: most sync-read mems will now (potentially optionally) be emitted directly from FIRRTL to Verilog without splitting readwrite ports or adding pipelining registers.
Contributor Checklist
Reviewer Checklist (only modified by reviewer)
Please Merge
?