The internal EBR array must stall new operations if there's a pending
read result that hasn't been retired yet. All that clever stuff with
non-blocking DelayLines and all that? Yeah, spoiler alert, there's a
reason guarded FIFOs are the preferred API for this stuff. Play unsafe
games, win unsafe prizes.
Uses the serial debug module and currently only works with
hardware/ulx3s, probably only on my specific machine where the
USB serial port is mapped _just so_. But it does work. Very WIP
unclean code, but checkpointing because it can hex view and hexedit
correctly.
When using more than 1 bit data words, you have to use a subset of the
14 address bits that the primitive offers. I was feeding addresses into
the lower N bits of the primitive (e.g. lower 12 bits for a 4-bit memory),
but it turns out you're supposed to use the _upper_ N address bits.
As part of debugging why my writes seemed to get mirrored across a
stripe of 4 bytes. This test verifies that writing two contiguous
addresses reads back the correct value, when run against the simulated
ECP5 EBR model.
Bluespec uses active-low reset signals, whereas the ECP5 primitives
use active-high. So this was holding the EBRs in reset after the rest
of the design was running. Oops.
With this you can feed a stream of bytes in and get multi-byte structs
out, or vice versa. Handy for hooking up stuff like debuggers to
narrower serial busses.
In practice the flow control is unusable on ULX3S dev boards because
the CTS line isn't hooked up (it's instead wired to JTAG_TDO, to enable
the USB<>UART chip to serve a dual purpose as a bitbanged JTAG programmer)
Still, support for flow control is nice, for the future. And the UART
itself also works regardless of flow control, which is of course nice.
It doesn't matter hugely, but by default nextpnr synthesizes for 12MHz,
which doesn't force it to work too hard on the placement. By requesting
100MHz, it needs to try a bit harder on timing and gives results that
are a bit closer to the fully constrained outcomes.
The numeric types vs numeric value thing sucks, but there's a mild
workaround where you just recurse through numeric types until you
find one that matches the value you wanted. It's icky, but it ensures
registers are exactly the correct width instead of relying on later
synthesis to find and execute the width reduction.
I was mostly using a separate interface to be able to mark the methods
always_enabled and always_ready, but you can attach those annotations
to the module constructor instead.
VRAMs are powers of two, so if memory wiring is wrong and we end up
with ram blocks mirrored at several points in the address space, we
want a write pattern that doesn't repeat cleanly on power of two
blocks. That way, a mirrored memory block cannot contain values that
are valid for all its locations.
To implement the mux tree that feeds into RAM ports, we need to know the
port index of the grantee to be able to wire it up. In theory we could
dispense with the per-port grant signal, but keeping it around allows
each client to deal with local concerns separate from the port routing.
Rather than hardcode one architecture for GARY, the arbiters
are now split and can be allocated per-port. The arbiter interface
includes plumbing so that one arbiter can propagate a write conflict
to another, so it can still implement multi-port arbitration as long
as every client is statically allocated to one port.