I have to process operations on a 64 elements array of unsigned 32 in as few cycles as possible.
These operations are typically incrementing or decrementing, following a given condition.
I currently use a for loop, autoindexed on the array.The logic inside the for loop is place in an SCTL. The design is working correctly in simulation, and I was thinking that once implemented every 64 operations would exceute in parallel.
But incrementing the whole array take about 200 ticks! I suppose is something like 64 * 3 ticks, with means all operations are excucted sequentially.
I thus wonder how I can really process, in parallel, operations on each elements of an array, without having to set 64 clones of the same operation?
Many thanks for any help.
Unfortunately you answered your own question. To process the entire array in paralell you need 64 instances of your logic.
How fast do you really need to do it?
200 ticks at 40MHz is only 5 microseconds.
If you need it faster and your logic isn't too complex you can clock that SCTL with a higher clock rate.
The maximum rate can be found after compiling from the report:
Clock Rates: (Requested rates are adjusted for jitter and accuracy)
Base clock: 40 MHz Onboard Clock
Requested Rate: 40.06872MHz
Theoretical Maximum: 66.626691MHz
However, any change to the FPGA code may reduce this rate.
If you need your result faster than 5us you could index the array in slices of some quantity. Operate on 16? elements at a time brings your time down by a factor of 4.
How fast do you really need it?
How much logic are you doing in that SCTL?
Actually, the question rather was : how can I code 64 instances of the logic without placing 64 VIs on the LV diagram?
I have other SCTLs on the diagram that may lower the maximum clock speed.
If the only solution to instanciate 64 times the same logic is to insert 64 VIs, I'm going to try it... although using 4 slices of slices may be an alternative.
The critical point that remains is the logic utilization...
Anyway, thanks a lot for your answer.
If you want to increment and decrement in parallel, you cannot use For loop and While loop because they are sequential.
I have attached a picture for example below.
I thinks WillD's suggestion is good. Because operating with 4 for loop in parallel can reduce the time by 4, and it's save space on the FPGA.