Assuming that you are talking about the PCI bus that is around the kind of performance I would expect. It will change a bit from system to system depending on hardware architecture, number of PCI bridges, and the latency of your device. Even when you are using the Move functions the actual PCI bus is still reading or writing each byte/word one at a time, which can be very time consuming. I had ran into a similar problem a while back. Thinking it was the overhead of NI-VISA that was the problem I invested a few weeks creating my own dlls that made direct assembly calls. The performance increase was minimal to nothing. In order to get the maximum amount of through put that you expect will require you to use DMA transfers. This is very device specific (if you device supports DMA bus mastering at all). You can use the VISA functions to configure your device for DMA and then use VISA to check on its progress. This is much faster, but it is much more difficult and it requires that you know how to configure your device to use DMA.
I hope this helps and good luck.
-Josh