My experiances from the past.
In 1985 we have build using a intel 8080 3MHz a digital io interface. It used a GPIB interface and we were able to do 300 loops a second for reading 32 analog inputs, 18 analog outputs, 8 digital inputs and 8 digital outputs. The results were transfered over a GPIB interface each 10 msec.
Now almost 20 years later, using a 200 MHz 16 bit processor it is still hard to do the same.
Where is the bottleneck?
Have program a realtime simulator for powerplants written in C++.I translates the RT sim to NI components and software (Labview).
My rt pxi turbine simulator for simulating grid incidents was succesfully used in a nuclear plant in 2006.
Look at http://sine.ni.com/cs/app/doc/p/id/cs-755