LabVIEW Embedded for ARM performance

vitoi · ‎02-09-2012

My work colleague and I have been doing some performance measurement for the two current LabVIEW Embedded for ARM Tier 1 boards.

A detailed report of the computational testing is attached (LabVIEW Embedded – PerformanceTestign.pdf). The other testing was done using the simplest VI to execute the listed function.

The results for the LM3S8962 Evaluation Board are summarised below. Results for the computation performance for the MCB2300 Evaluation board can be found in the attachment.

Test	Single Loop	Multiple Loops	Notes
Computation	50 MIPS	10 MIPS	Effective MIPS
UDP Read	4,770 Hz*	4,021 Hz	Maximum rate, tested over 10,000 loops
UDP Write	13,344 Hz	14,371 Hz	Maximum rate, tested over 10,000 loops
TCP Read	515 Hz	516 Hz	Maximum rate, tested over 10,000 loops
TCP Write	58 Hz	60 Hz	Maximum rate, tested over 10,000 loops
Display update	18 Hz	18 Hz	Maximum rate

* Note, the UDP read operation requires a LabVIEW Embedded for ARM bug to be corrected. See http://forums.ni.com/t5/LabVIEW/LV-for-ARM-UDP-read-55-ms-wait/td-p/1812354/highlight/true/page/2 for details. With the bug, the maximum Single Loop UDP Read rate is 1904 Hz.

Conclusions

1) The LabVIEW/RTOS overhead results in 80% of CPU performance to be lost (10 MIPS left over for the application).

2) Performance is inadequate for anything but simple application or loop rates no higher than 100 Hz.

3) A faster microcontroller would probably give disproportionately more CPU performance, assuming a fairly fixed overhead (100 MHz microcontroller may give 50 MIPS left over for the application)

4) If you don’t require the TCP features for datacommunication, UDP is much quicker.

Evandro · ‎02-10-2012

Congratulations for these tests, I just needed this reference to learn what I could still improve on my code!

I also use the LM3S8962 and noticed the overhead, especially with the display. My application does not use Ethernet, so I can't opine about it.

To achieve a loop time of about 100us for my controller loop (which is a state-space controller), I did several speed tests replacing the routines with greater delay for equivalent with lower delays, I did not use parallel execution and only used integer variables. I also follow several tips in NI articles. Another tip is to use inline C routines to perform more specific and critical tasks, I did it for the use of two ADC channels and I got a significant improvement in speed. However, I expected a higher speed for integer simple arithmetic blocks.

In the case of display, I haven't found a solution. I verified the routine loop time is 56ms (don't matter the image) when I enable the "update", but to my loop this is too long. I think is possible to reduce this time in the C display block code, e.g. using an interrupt, but like this, there are a lot of embedded C code who we don't can access directly do modify in the ARM blocks.

Best regards!

Evandro Rech

vitoi · ‎02-12-2012

Yes, the display update time of 56 ms is massive. It’s interesting to examine what’s going on.

The Stellaris LM3S8962 Evaluation Board Manual (http://www.ti.com/lit/ug/spmu032b/spmu032b.pdf) states “The OLED display has a built-in controller IC with synchronous serial and parallel interfaces. Synchronous serial (SSI) is used on the EVB as it requires fewer microcontroller pins. Data cannot be read from the OLED controller; only one data line is necessary.” So communication with the display is one bit at a time.

The OLED display is a “RiT P14201 series display”. From http://www.ritdisplay.com/in_English/Product_Technology/Product_Technology.htm we learn that the controller IC is a SSD1329. The data sheet for the SSD1329 (http://www.trulydisplays.com/oled/specs/IC%20SSD1329%20Spec.pdf ) shows that the minimum write cycle time is 250 ns per pixel.

The LM3S8962 microcontroller can perform 12 instructions in this time period. Plenty of time to toggle the data and clock line. In fact the microcontroller will be wasting cycles since there’s no time to go off and do anything else.

Although the SSD1329 IC allows addressing of individual pixels, chances are that a screen updates sends all 128 x 96 pixels. If so, then it would take 128 x 96 x 250 ns to update the display. That is, 3.1 ms. This is well short of the 56 ms observed. So, what is going on?

Actually the driver IC can do 128 x 128 and if you reset the development board at the right time, you’ll see that scanning is top to bottom, so all 128 x 128 of the controller IC’s pixels are being written to. Well, 3.1 ms is a long time to be away for a deterministic system. If after each pixel LabVIEW/RTOS did a context switch, this would add 128 x 128 x 163 [context switch cycles] x 20 ns = 53 ms. Add to this the time required to write to the display’s single pixel and we get 56 ms.

So, we know why we get a whopping 56 ms display update time, when nothing else is happening.

However, this doesn’t explain why it still takes 56 ms even when parallel loops are disabled and presumably the RTOS is not intervening.

Programming bare metal (LabVIEW Embedded or C) would make this problem go away if you run your main loop at 96 Hz or faster. Once each loop you can update one column of the display. This will only consume 0.002% of the CPU. Even if the main loop ran at 10 kHz (nice rate), the display update would only consume 2.4% of the CPU’s computing resources.

It’s handy to keep close to the metal.

LabVIEW Embedded

LabVIEW Embedded for ARM performance

LabVIEW Embedded for ARM performance

Re: LabVIEW Embedded for ARM performance

Re: LabVIEW Embedded for ARM performance