SCTL clock domain

kabooom · ‎12-04-2020

Hi

this is a good idea and could work i guess...

i need to be sure the precision will be enough but if i put enough point in the lookup table it should be ok...

As i said in the previous post ... i'm trying to understand SCTL. In my last compilation, i didnt had performance problem even without SCTL..

But i will keep your idea in mind...

wiebe@CARYA · ‎12-04-2020

I think the node in the SCTL takes longer, simply because the high troughputting adds overhead. The throughput gets higher (1 sample per cycle) at the cost of a latency.

A higher clock frequency probably isn't overclocking your CPU. The gates in the FPGA have a certain time, and the frequency probably only tells the gates when to be in sync. So, a higher frequency will simply mean less will fit in a cycle. If enough fits , you have the benefit of a higher frequency. But the sine node will simply need more cycles, as the workload doesn't change. So you het 42 small cycles, in stead of 21 larger cycles.

Another reason could be that the node delegates the task to some magic part of the FPGA, and simply gets the result when done. That wouldn't change the time to calculate if the frequency changes.

Not sure that's all 100% true. It seems plausible.

Search LabVIEW like a graph!

cbutcher · ‎12-04-2020

Aha! I'm sure this is a benchmarking problem now...

@kabooom wrote:

Your guess are quite accurate ... impressive 🙂

You're remarks are very welcome, maybe i could calculate the sinus before writting in the memory, but the detect task rate has to be at least as fast as the Data rate otherwise the FIFO will be filled with sample. Actually my acquisition rate is 50 kS/s which mean i have 20 µs to do the period detection, calculate frequency, for the 32 channels, writting data to memory and do some error checks (phase, frequency)...

So at first i tried to do all calculation in this loop , but it wasnt fast enough, that why i decided to create a third task to do the calculation.

But as i wrote the performance issue seems to be corrected since i duplicate the calculate task... my question was only to undersand the working of the SCTL.

In order to test the performance of the SCTL I just test the function High Trhoughput Sin like this:

This takes 31 µs...

This is what i expected :

With a clock of 40 MHz (50*17)/40 e6 = 21 µs

With a clock of 80 MHz (50*17)/80 e6 = 10.5 µs

The measurement you're making isn't "How long does it take to run a For loop 50 times containing a SCTL with an input, and then an output?", but rather, "How long passes in between While loop iterations, when the While loop contains a For loop with 50 iterations and each iteration contains a SCTL with a sine node?".

So any behaviour related to the start/stop of the For loop, the For iterations, or the While loop will effect your measurement.

I believe it's guaranteed that a While loop adds 2 ticks on FPGA in addition to the contents, I don't know if the same is true for For loops but it wouldn't surprise me - that gives you maybe ~100 ticks just at the end of your loops. There are maybe also ticks to setup the SCTL?

I'd be curious to see if this behaves more like you'd expect:

Here you should be able to compile this once, and then run it with different input "Test Length" values.

See if the Ticks produced are more linear (I'd expect there to be some offset that you could identify in both "Ticks" and "Ticks 2", and then a scaling value like (L+16)* iteration period).

Don't set Test Length to less than 1!

Edit: That snippet has cycles/sample at 16, so probably the time will be 16*L*period, not (16+L)*period. But you can play with the throughput setting too (would require recompiling the bitfile though, urgh...)

Intaris · ‎12-04-2020

Can you show the configuration page for both your High-Throughput Sin and your 80MHz Clock?

kabooom · ‎12-07-2020

Hi cbutcher

I tested your code this morning, this is what i have:

Length = 1 Tick40MHz= 18 Tick80MHz = 21

Length = 10 Tick40MHz= 162 Tick80MHz = 93

Length = 50 Tick40MHz= 802 Tick80MHz = 413

Length = 100 Tick40MHz= 1602 Tick80MHz = 813

So, it means with 40 MHz clock:

High Troughput Sin take effectively 16 ticks to execute and while loop take 2 ticks.

with 80MH clock :

High Troughput Sin take effectively 8 ticks to execute and while loop take 13 ticks.

So i guess the SCTL works as expected, ( I just dont understand why the 80 MHz Timed Loop has a greater overhead). It's just the method to do the measurement that was false....

thanks a lot for your help !

Intaris · ‎12-07-2020

What is the top-level clock defined as? 40MHz? This dictates the basic speed of your non-SCTL AFAIK.

If you then embed a loop at a frequency other than this 40MHz, Clock Domain Crossing will be required which requires a few cycles. Try creating a slightly different 40MHz clock (maybe with slightly more jitter or something) and then try that one. I would almost expect it to have similar overhead to the 80MHz version.

kabooom · ‎12-07-2020

Hi Intaris,

Just to tell you you're right, I made the same test as before but with a 40 MHz Clock and a 41 MHz derived clock.

Length = 1 Tick40MHz= 18 Tick41MHz = 33

Length = 10 Tick40MHz= 162 Tick41MHz = 173

Length = 50 Tick40MHz= 802 Tick41MHz = 797

Length = 100 Tick40MHz= 1602 Tick41MHz = 1578

So for one sin we have something like 15 tick overhead (2 tick for while loop e.g. 31-16 = 15 ticks) .

We still have same overhead for other calcul e.g

- length 10 : 160*40/41 = 157 tick 171-157 = 14

- length 50 : 800*40/41 = 780 tick 795-780 = 15

- length 100 : 1600*40/41 = 1561 tick 1576-1561 = 15

As you said this must be linked to Clock Domain Crossing.

Intaris · ‎12-07-2020

Top trick: You can change the top-level clock that your code is running at in the properties of your FPGA target. I wonder what happens if you simply change from 40MHz to 80MHz.....

Intaris · ‎12-07-2020

If you want to go all advanced, there are ways to get around this problem by using BRAM as an interface between the two clock domains. If you take care not to read and write from the same address at the same time, this can be a very efficient way to cross clock domains. It would almost certainly improve speeds.

LabVIEW

SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain

Re: SCTL clock domain