04-28-2021 12:58 AM
Hi,
I am new to FPGA, and as per my understanding the High-throughput Math functions on Labview FPGA use CORDIC algorithm which is iterative. Whereas the code inside SCTL needs to be executed in 1 tick of the clock. Can anyone explain how is CORDIC implemented in SCTL since it is iterative and each iteration depends on the result from previous iteration?
I am guessing it implements pipeline and handshaking which would increase latency but accepts inputs at high throughput. I am not sure about this though.
Solved! Go to Solution.
04-28-2021 02:20 AM - edited 04-28-2021 02:20 AM
You guess right. By separating the algorithme into stages that are then pipelined through internal Flip-Flop registers, one can make sure each iteration of the function is performed within a clearly defined time that is known at design time, so that the SCTL requirement can be verified. The result is that you need to run the loop multiple times in order to get the result from an input applied at iteration N, before it is output at iteration N + n, with n being the pipeline length.
04-28-2021 02:26 AM
I noticed that on the configuration page of the function it say that the function is not pipelined.
So I don't think my guess is correct.
04-28-2021 02:31 AM
Well, it still is delayed, although they seem to not call that pipelined. You see the Throughput control that specifies 16 cycles per sample. This means you have to iterate 16 times before the correct result appears at the output. And yes you need to use the handshake controls too.
04-29-2021 12:59 AM
04-29-2021 05:03 AM
Isn't Pipelining that you can enter a new value each cycle but in this case have a 16 loop latency? In this case i assume the next input will have to wait those 16 cycles before being worked on.
04-29-2021 05:12 AM
Hi,
The function is pipelined if we reduce the throughput.
My question here is how is it being implemented without pipeline since CORDIC is iterative?
Also in order to pipeline the function it would require the use of loops with shift registers or feedback nodes. Using feedback nodes makes the code highly complex. And using loops would not support the function in SCTL. So how is this function implementing CORDIC?
04-29-2021 05:26 AM - edited 04-29-2021 05:45 AM
Well, as you can see the function has either a 16 cycle latency or a 16 cycle pipeline (or a combination of both).
Basically the logic is so called unrolled and in each cycle the data goes through the next stage. The CORDIC algorithm while iterative seems to be able to be completed in 16 or less iterations so instead of putting a loop in a loop, you let the outer loop do the looping for you. If you decrease the latency, the algorithme uses extra pipelining which requires more resources as for each pipeline stage the entire logic has to be fully present. Without pipelining the logic can be reused in each iteration as far as possible but the latency is higher and you need the outer loop to be up to 16 times as fast as the sampling frequency of your signal.
The delay from when your data arrives at the input until it appears at the output is in all cases always 16 iterations.
If you want to know more details about the actual implementation you will have to ask Xilinx. Much of the core of the high throughput functions is not designed by NI themselves but they use premade IP modules from Xilinx for that.
This is a very simply idea how you can do this for a 16 latency algorithme.
04-29-2021 05:39 AM
Just to be clear,
If I understand you correctly, 16 iterations are performed and the logic for all 16 iterations is written in such a way that the output of ith iteration is passed as input to (i+1)th iteration through a feedback node. Thus making the latency of the logic to be 16 cycles which is controlled by the outer most loop?
And 4-wire handshaking is used to synchronize the inputs and the output?
04-29-2021 07:41 AM
Not specifically. This was just a very simple setup for a 16 latency algorithme where every iteration executes the same iterative step. Potentially you have algorithmes that may perform different operations at different stages and then you would have a case structure where you now see a sequence frame.
4 wire handshake makes everything simply more complex and I did not worry about that for this simple setup example.
It's just meant to give you an idea how a multistage algorithm can be setup to be executable in a SCTL without using an additional loop (which is not allowed inside an SCTL). This is NOT the solution to your problem, simply a possible idea about how it can be made.
I actually trust that you are not intending to implement the CORDIC algorithm yourself. To me it sounded like you had an academic interest in how an iterative algorithm can be implemented inside an SCTL without using loops. My simple setup gives you an idea how it can be done. If you seriously want to implement something yourself, you will need to do a LOT more research yourself.