08-03-2011 07:34 PM
First, I apologise for cross posting. I initially posted this question in 'High Speed Digitisers' and was advised that this could be a more appropriate board.
So here is the copy/paste:
I am having trouble with timing violations popping up whenever I try to compile my code, and it is driving me a bit crazy.
The FlexRIO module I am using is NI 7953R with NI 5781 high speed I/O module. The aim is to perform some real-time control and estimation in an optics experiment.
The controller is to be realised as a cascade of second order sections, with reasonably high internal precision: +/-<24,4> for inputs, coefficients and outputs of sections, with +/-<48,8> for intermediate results. Obviously, the final output is truncated to 16 bits for the DAC.
The initial simple project is just a single biquad, just to test the behaviour, before moving on to the full controller and estimation. However, this is where I got stuck :-(
Initially, I tried using the system-synchronous CLIP for 5781, with IO Mod Clock 0 driving ADC, DAC and the SCTL containing the biquad at 100MHz. Unfortunately, the compiler reported a timing violation of about 3ns over the required 10ns in the SCTL containing the biquad (consistent with the 100 MHz clock).
As the 100MHz is a bit of an overkill for the bandwidth of the experiment, I turned to clock-division in order to run everything at a lower clock rate. 50MHz and 25MHz would satisfy the requirements, but I could not go lower than that because the I/O delay would become unacceptable. So the only lower achievable frequency of 10MHz was not an option, but that didn't matter anyway. As clock division can only be achieved with standard CLIP, I had to forget about using the system-synchronous CLIP and flip-flop FIFOs. Unfortunately, the only FIFO I could use across different clock domains was the block-memory, which would increase delays to borderline acceptable.
Unfortunately, I never really got to the point of checking the delay for the biquad implementation, since I could not get past the compilation stage.
The project (attached) uses the code from LabVIEW 'clock division' example in both host and FPGA VIs. The modifications were the SCTL containing the biquad, and input clock division (the example only divides the output clock). All coefficients are created as constants, and all arithmetic operators' output configurations are set manually to +/-<48,8>, 'truncate' and 'wrap', in order to avoid making the logic in the FPGA more complex.
The FIFOs are block-memory and minimum length, while the IO Clock 0 and IO Clock 1 were set to compile for frequencies between 1 and 25 MHz (I wanted to check it working at a lower clock rate before moving to higher, and 1MHz was put in there just to relax the timing restrictions). Considering that the total delay inside the loop in the previous case was 13ns, I thought 40ns should easily accommodate two, maybe even three biquad sections.
I was wrong.
Now the compiler reports timing violations for non-diagram components! Whatever those are...
I have also attached window captures of all the reports as well as the Xilinx log. The Xilinx log shows that my code is still checked against a 10ns timing constraint, although there are no 100MHz clocks included in the project, and the loops use clocks that should be compiled for the range between 1 and 25MHz!
BTW, I tested the same project without the biquad section (replaced the biquad with a wire connecting Data In and Data Out FIFOs), and it does not do anything but it compiles and works. I tested it with a second-order FIR (basically removed all the feedback) and it fails with 6ns over 10ns constraint. So I guess that the problem lies in that SCTL.
Based on the above, I see only two possible explanations:
1. I am doing something really stupid, and am not aware of it, so that messes up my project (wouldn't be the first time).
2. Whatever the clock frequency I select, the SCTL will work only if the code in it executes in 10ns (which seems to defeat the purpose of clock division and SCTLs, since you apparently cannot do very much in 10ns, not even a second order FIR).
I honestly hope number 1. is the case.
Apologies for the length of this post, but I figured it was best to provide all the information I could right at the start.
If anybody can help I would be very grateful.
08-04-2011 04:26 PM - edited 08-04-2011 04:26 PM
So I've taken a look at the situation, and I have a few things to point out.
First, the recommended clock setting is the system sync CLIP, unless you do actually need to perform input and output at different speeds. However, it looks as though you wish them to be synchronized, so this should not be a problem.
Since you have the ability to set the clock speed during run-time, the compiler automatically specifies 100 MHz as the upper limit of the clock, regardless of what you set as the possible range of values in your project. Thus, the timing violation you saw occurred exactly as expected.
The non block diagram components could be any number of items in this case, from CLIP to DSP48E slices (used when math functions are called in a SCTL) to the FIFOs. I personally am not sure which item is causing the problem, but since we know the loop we know how to fix it. The only way is to more thoroughly pipeline your code in the upper loop, keeping in mind that this will delay the speed from AI to AO. You can also make sure that your Xilinx compiler options are set to optimize for speed, and you might also completely turn off arbitration for your FIFOs.
Another option, which may work, is to compile the upper loop at the 40 MHz rate (or whatever is required) and simply set the other loops to run at the same rate, or use some logic to make sure that data is written for multiple iterations of those faster IO loops when there is no data processed in the upper loop.
I've got something compiling on my end, and I'll let you know how that goes. I would suggest doing the same on your end.
08-04-2011 04:41 PM
By turning off arbitration, using the sync CLIP, and adding feedback nodes, I got this clock domain up to a maximum of 174 MHz, so you can probably remove a strategic feedback node and still meet timing if you need to:
Let me know if this works for you.
08-04-2011 11:50 PM
Thank you so much for your help.
I am just about to leave for Germany for three weeks, and I wouldn’t count on having access to LabVIEW over there (although I would completely exclude the possibility), but I’ll try your suggestions as soon as I can.
Regarding your first post, I would definitely prefer the project to run as system-synchronous. I actually forgot to explain in my original post why I went for clock division. After getting the timing violation for 100MHz clock (system synchronous), I tried compiling the biquad loop at a lower frequency (set in the IO Module Clock 0 dialog). However, whichever frequency I chose, the SCTL ran at 100 MHz. I tested this by putting a 16 bit counter (a delay element and an adder so I can change the increment) inside the SCTL, wiring its output to AO 0 and observing the resulting sawtooth wave on an external oscilloscope. Regardless of the value I chose and compiled for IO Module Clock 0 frequency, the sawtooth period was always approximately 656us (consistent with 2^16 * 10ns).
I was eventually told by technical support here in Australia that IO Module Clock 0 will always run at 100MHz, no matter what frequency I put in the clock properties dialog, and that the only way to get lower frequencies was to go for clock dividing. I found that somewhat strange (what is the point of being able to change the frequency in the properties dialog if that is just ignored?), but it was consistent with my observations so I tried the clock dividing, and ended up with what I described in my first post.
I really don’t need to change the clock speed during runtime; in the experiment the controller is supposed to be started and forgotten about until we decide to change the transfer function. But if that happens we would restart the whole experiment. If I could set the IO Module Clock 0 to run somewhere between 25 and 40 MHz in the system synchronous mode that would be great, but I just wasn’t able to do it. Was I doing something wrong, or did I miss something? Sorry for my ignorance, I am a bit of a newbie, I recently did the NI FPGA course but FlexRIO was hardly mentioned in it.
08-05-2011 04:58 PM
I looked through this a bit more and figured out what I was missing. The clip specs for the system sync CLIP say that "*100 MHz is the only possible case when using the internal clock; if an external clock is provided through CLK IN, the clock dividers and interpolation depend on the frequency of the provided clock.", so you and the australia branch are correct--regardless of what you set, it will still run at 100 MHz.
I suppose this leaves us with two options:
-Use the system sync clock and use 'software' clock control to change what you are outputting every, for example, 4th iteration. Or, with the feedback nodes as they are, you should get data at just about the right rate, although there will be multiple cycles worth of lag as the data goes through the system--of course, you said that you are not looking for a terribly fast data rate, so this may be acceptable.
-Use the original CLIP and configure all three clocks to run at X MHz. They will be out of sync with one another, but if they are all moving at the same rate you shouldn't have any buffer problems, and the data will be synced up to within 1 rising edge of those X MHz clocks.
Sorry about the confusion with the system sync CLIP, but apparently the 100 MHz rate is a requirement for that synchronization.
Hopefully one of these options will get you on the right track, and good luck in germany.
08-09-2011 01:52 PM
Hey again Aleks,
At R&D's suggestion, I switched the clock over to a normal derived clock, which gave us a much better look at the timing violation--the only non block diagram component was the fifo:
01-29-2012 06:31 PM
Sorry to revive this thread with such a big delay, but due to an injury I have been away from work and my computer for several months :-(
Now, I'm back and I'm trying to pick up where I left off...
Thanks for your help and suggestions from previous posts.
I decided to go for clock division, as it seems to be the most elegant solution.
I used a clock divider constant to set input and output clocks in a loop clocked by the onboard 40MHz clock. Initialisation and clock division setting are put in a frame of a sequence structure, so no actual code gets executed until this is done. I deliberately used constants to prevent interactive changes of the clock (to reduce the number of things that could go wrong during testing).
In order to make things work, I had to place input and output modules in two separate SCTLs, and the code in a third one, controlled by its own, derived clock. I used
The loops clocked by I/O clocks still get checked against the 1ns time constraint, regardless of what I set in the clock properties dialog!
It doesn't quite make sense to be able to set the compiling requirements for the clocks, and then have them completely ignored by the compiler without a warning...
With some standalone FPGA boards that I have worked with (Xilinx and Altera), I was able to manually set timing values for clocks in constraint files, and in compilation those values I set were used to check the design. So the behaviour I am seeing in LabVIEW is confusing, it looks like a bug to me.
I checked the I/O clocks during execution and they indeed to tick at the rate set by the divider, which makes me even more confused about LabVIEW's behaviour.
Could there be another way of telling the compiler what timing constraint to use for loops controlled by I/O clocks, and not to stick to 100MHz frequency, which is never used in the design or operation? I thougth that the clock properties setting is the place to do it, but that obviously isn't working, even with standard CLIP (which is what I am using).
Now, since the only allowed clock divider values are 1, 2, 4, 6, 8 and 10, the only frequencies that 7953R FPGA board and the 5781 I/O board have in common are 100MHz, 50MHz and 10MHz. Compilation results show that the biquad section in its own SCTL could be clocked up to slightly more than 40MHz, which leaves me only with a 10MHz clock option.
I'd be quite keen to use a higher frequency to avoid increasing I/O latency. 40MHz would be great as it is already provided onboard 7953R, but sadly I cannot get that frequency by clock division on 5781. 25MHz would have less latency than 10MHz, and would be acceptable, but 7953R doesn't support 25MHz (I can only get 26.6666MHz with 2/3, but not 5/8 for 25MHz).
It seems that external clocking is the only option if I want some flexibility with clocks to give my code enough time to execute and cover for all the delays, but also reduce the I/O latency as much as I can. We've looked at 665x family of timing/synchronisation devices, but it is not clear to me whether I could use those to provide arbitrary clocks, and whether the procedure would be the same as with an external clock generator.
One thing that bothers me about external clocking is that, from the example that comes with LabVIEW, it seems that I could run into same problem as before.
Namely, according to NI 5781 CLIP Help, 'Clock Select' will set the mode for the CLK chip, but in the actual example code the loops are clocked with IO Module clock 0 (since it is system synchronous), and during execution, the clock will behave according to the selected option (i.e. internal or external source).
Now this means that during compilation the SCTLs will be checked for a 1ns timing constraint whatever my external clock choice is (and, based on my experience described above, whatever my CLIP choice is), and therefore my code will not compile.
Would you have any suggestions for me?
Thanks so much,
02-25-2012 01:28 AM