LabVIEW Embedded

cancel
Showing results for 
Search instead for 
Did you mean: 

LabVIEW Embedded for ARM performance

My work colleague and I have been doing some performance measurement for the two current LabVIEW Embedded for ARM Tier 1 boards.

 

A detailed report of the computational testing is attached (LabVIEW Embedded – PerformanceTestign.pdf). The other testing was done using the simplest VI to execute the listed function.

 

The results for the LM3S8962 Evaluation Board are summarised below. Results for the computation performance for the MCB2300 Evaluation board can be found in the attachment.

 

Test

Single Loop

Multiple Loops

Notes

Computation

50 MIPS

10 MIPS

Effective MIPS

UDP Read

4,770 Hz*

4,021 Hz

Maximum rate, tested over 10,000 loops

UDP Write

13,344 Hz

14,371 Hz

Maximum rate, tested over 10,000 loops

TCP Read

515 Hz

516 Hz

Maximum rate, tested over 10,000 loops

TCP Write

58 Hz

60 Hz

Maximum rate, tested over 10,000 loops

Display update

18 Hz

18 Hz

Maximum rate

 

 

* Note, the UDP read operation requires a LabVIEW Embedded for ARM bug to be corrected. See http://forums.ni.com/t5/LabVIEW/LV-for-ARM-UDP-read-55-ms-wait/td-p/1812354/highlight/true/page/2 for details. With the bug, the maximum Single Loop UDP Read rate is 1904 Hz.

 

Conclusions

 

1) The LabVIEW/RTOS overhead results in 80% of CPU performance to be lost (10 MIPS left over for the application).

2) Performance is inadequate for anything but simple application or loop rates no higher than 100 Hz.

3) A faster microcontroller would probably give disproportionately more CPU performance, assuming a fairly fixed overhead (100 MHz microcontroller may give 50 MIPS left over for the application)

4) If you don’t require the TCP features for datacommunication, UDP is much quicker.

Message 1 of 3
(5,791 Views)

Congratulations for these tests, I just needed this reference to learn what I could still improve on my code!

I also use the LM3S8962 and noticed the overhead, especially with the display. My application does not use Ethernet, so I can't opine about it.

To achieve a loop time of about 100us for my controller loop (which is a state-space controller), I did several speed tests replacing the routines with greater delay for equivalent with lower delays, I did not use parallel execution and only used integer variables. I also follow several tips in NI articles. Another tip is to use inline C routines to perform more specific and critical tasks, I did it for the use of two ADC channels and I got a significant improvement in speed. However, I expected a higher speed for integer simple arithmetic blocks.

In the case of display, I haven't found a solution. I verified the routine loop time is 56ms (don't matter the image) when I enable the "update", but to my loop this is too long. I think is possible to reduce this time in the C display block code, e.g. using an interrupt, but like this, there are a lot of embedded C code who we don't can access directly do modify in the ARM blocks.

Best regards!

 

Evandro Rech

0 Kudos
Message 2 of 3
(5,769 Views)

Yes, the display update time of 56 ms is massive. It’s interesting to examine what’s going on.

 

The Stellaris LM3S8962 Evaluation Board Manual (http://www.ti.com/lit/ug/spmu032b/spmu032b.pdf) states “The OLED display has a built-in controller IC with synchronous serial and parallel interfaces. Synchronous serial (SSI) is used on the EVB as it requires fewer microcontroller pins. Data cannot be read from the OLED controller; only one data line is necessary.” So communication with the display is one bit at a time.

 

The OLED display is a “RiT P14201 series display”. From http://www.ritdisplay.com/in_English/Product_Technology/Product_Technology.htm we learn that the controller IC is a SSD1329. The data sheet for the SSD1329 (http://www.trulydisplays.com/oled/specs/IC%20SSD1329%20Spec.pdf ) shows that the minimum write cycle time is 250 ns per pixel.

 

The LM3S8962 microcontroller can perform 12 instructions in this time period. Plenty of time to toggle the data and clock line. In fact the microcontroller will be wasting cycles since there’s no time to go off and do anything else.

 

Although the SSD1329 IC allows addressing of individual pixels, chances are that a screen updates sends all 128 x 96 pixels. If so, then it would take 128 x 96 x 250 ns to update the display. That is, 3.1 ms. This is well short of the 56 ms observed. So, what is going on?

 

Actually the driver IC can do 128 x 128 and if you reset the development board at the right time, you’ll see that scanning is top to bottom, so all 128 x 128 of the controller IC’s pixels are being written to. Well, 3.1 ms is a long time to be away for a deterministic system. If after each pixel LabVIEW/RTOS did a context switch, this would add 128 x 128 x 163 [context switch cycles] x 20 ns = 53 ms. Add to this the time required to write to the display’s single pixel and we get 56 ms.

 

So, we know why we get a whopping 56 ms display update time, when nothing else is happening.

 

However, this doesn’t explain why it still takes 56 ms even when parallel loops are disabled and presumably the RTOS is not intervening.

 

Programming bare metal (LabVIEW Embedded or C) would make this problem go away if you run your main loop at 96 Hz or faster. Once each loop you can update one column of the display. This will only consume 0.002% of the CPU. Even if the main loop ran at 10 kHz (nice rate), the display update would only consume 2.4% of the CPU’s computing resources.

 

It’s handy to keep close to the metal.

Message 3 of 3
(5,732 Views)