Using Linux, a PCI-MXI-2, and a VXI-MXI-2 I am interested in
maximizing VXI throughput in writing register sets to VXI boards.
I have to write several different sets of registers for each board,
and they have to be timed correctly. So the computations for the
boards are precomputed into local shadow register sets, to be dumped
in tight loops based on the driving events. In particular, I write, per board:
1 set of 16 uint32 registers (contiguous)
1 set of 7 uint32 registers (contiguous)
8 sets of 32 uint32 registers (each set of 32 is contiguous)
With 6 boards, this works out to 1674 words to be transfered, in 60 separate transfers.
I'm timing something like this:
start timer
for each board // 6 boards
{
load the 16 register set from shadow
load the 7 register set from shadow
load each of the 8 32 register sets from shadow
}
stop timer
Timing these transfers, which consist primarily of repeated calls to the
function below, I measure about 3.3ms when using the VXImove form, and
about 3.1ms when using the direct loop form. If I comment out the
VXImove() line or the *from = *to; line, the timing plummets to 35us,
representing the tiny overhead.
This works out to about 1.95us per word, or 2.05Mb/s, which seems
pretty horrible to me. How can I do better? In another version of this
system, which does not use PCI-MXI-2 and PC as the driver, we
configure a set of chained DMAs as allowed by some of our VME
hardware, which at least allow me to transfer the data while I'm doing
other things. I will go back to that hardware and make the same
timings to see if I'm just stuck with poor MXI thoughput (we use
VME-MXE-2 and VXI-MXI-2 there).
Here's the load_register() code with two flavors I've been
timing. Note, if I change the dest_flags from 0x0003 to 0x0013 to try
to driver A32 nonpriv block, I get a -1 (bus error) return code from
VXImove.
void load_registers(Board* board, unsigned int* from)
{
// write the current "message" to the board
#if 0
// mapping was initialized with MapVXIAddress(board->offset)
volatile unsigned int* to = board->mapping;
unsigned int n = from[0] + 1;
while (n--)
{
*from = *to;
++from;
++to;
}
#else
NIVXI_STATUS rc;
UINT16 source_flags = 0; // from local RAM
UINT32 source_address = (UINT32)from;
UINT16 dest_flags = 0x0003; // A32 nopriv
UINT32 dest_address = board->offset;
UINT32 length = from[0] + 1;
UINT16 width = 4;
rc = VXImove(source_flags, source_address,
dest_flags, dest_address,
length, width);
if (rc < 0) fprintf(stderr, "VXImove error %d\n", rc);
assert(rc == 0);
#endif
}