When using more than 1 bit data words, you have to use a subset of the
14 address bits that the primitive offers. I was feeding addresses into
the lower N bits of the primitive (e.g. lower 12 bits for a 4-bit memory),
but it turns out you're supposed to use the _upper_ N address bits.
Callers can still specify whacky cross-domain RAMs in the cfg, but the
default is what you usually want: a dual-port RAM with both ports in the
caller's clock/reset domain.