I don't have a solution regarding the different fixed-point types (but why/how do you have these - you said two of the same module, no?) but you don't have to acquire all channels - you can (I think) use array subsets.
You cannot allow the references to be dynamic though - so you can't for example specify them from the host PC (again, AFAIK). They must be specified as BD constants (you also can't select by reference, which is annoying but I'm told a necessary requirement, see https://forums.ni.com/t5/LabVIEW-FPGA-Idea-Exchange/Selecting-between-two-I-O-refnums-not-allowed/id...).
If you have a For loop in a SCTL, the loop will be unraveled and all iterations will happen at once. If this can't be done, the compilation will fail. You don't need to add the P node.
Outside of a SCTL, I think it behaves like a normal For loop on e.g. Windows - they will run one after another.
If I set a For loop with "Iteration Parallelism" (i.e. the P node) then I get an error - "For Loop with iteration parallelism is not supported on this target." (cRIO-9045, presumably general?)
This is true both inside and outside of SCTL (so you can never have the P node).