02-10-2012 02:31 AM
1)I think that you can do it in one week, then you can have a "basic" target. I've spend the most time on getting the library's work (FlashFS and TCPNet). Just look at the examples (the three TIER 1 targets). LabVIEW and the library's of FlashFS and TCPNet are expecting some functions and initialization. And like I said before: if you've done it once, you can make the next targets in two days.
2)for a lpc1758 or lpc1788 I must check that, but I've seen your code and the overhead you are seeing is only the LabVIEW overhead. try a timed loop (what basicly is a RTX Task in C code) and disable parallel execution. LabVIEW will not do the "LabVIEW parallel execution blabla", RTX will do this. also you have a difference on the build specification (disable parallel execution, use stack vars, etc.) so the difference you are seeing is not only the "one loop to multiple loops"
02-10-2012 02:48 AM
Programming LV for ARM is different than normal LabVIEW, keep that in mind.
It is an ARM processor, there are do's and do not's, especially with LV for ARM because LabVIEW is generating the code!
check the code what has been made by LabVIEW.
for example: Never use Elemental I/O's (huge overhead)
just make a Vi with a C node that is setting/reading GPIO
02-10-2012 05:01 AM
1) When you say it's possible to port to a new development board in one week, does that include the peripherals such as Ethernet, I2C, CAN, etc? If this is the case then I should stop thinking and just do it!
2) When you say, "look at the examples (the three TIER 1 targets)", where would I find the examples?
3) Are you saying that if I use timed loops and have "Disable parallel execution" on, I will get parallel execution, courtesy of RTX?
4) When I did my timing test, I only did the minimum to enable parallel execution. I only set "Disable parallel execution" off. The compiler options then forced to me set "Use stack variables", "Enable expression folding" and "Generate C function calls" off. There is not choice here - it won't compile unless you also make the other changes. In summary, I did the minimum changes to allow parallel execution.
5) It would be great to see what performance degradation you get on your faster boards between code in on loop with all the fastest compiler options and then the same code split into two loops with the fastest compiler changes to make it run in parallel. You'll either get only a halving in speed (great!) or a five time reduction (bad news for me).
6) I'm surprised that Elemental I/O contributes to poor performance. Thanks for the tip.
Some of the things I think you are saying are promising and may lead me use LabVIEW Embedded for ARM. But I'm not sure if I understand correctly. If you could do a speed check it would be great. If you want to put together your own speed test code I can try it on the Tier 1 boards.
02-10-2012 05:36 AM
1) yes, the only thing that takes longer are the Ethernet and FlashFS, but if you have an microcontroller that has some "brother" micro's which have examples, it goes much faster! search the KEIL example directory (keil uvision\ARM\Boards\Keil\MCB1700\RL) you can find here some examples with drivers for flashFS and TCPNet. also look at (Keil uvision\ARM\RL\TCPNet\Drivers and RL\FlashFS\Drivers)
I had trouble with FlashFS because the only micro that was supported already used the SPI interface for SD card. my lpc1788 has a MCI driver..
find a brother mcu and family (like Cortex-m3) and just look at the code.
then you can find yourself what is needed to make your target run.
2) examples are those board examples and the tier 1 targets that already exist (LabVIEW\Targets\Keil\Embedded\RealVIEW\Generic) here you find LM3Sxxxx, LPC2378, LPC2468 (and in my case also LPC1758 and LPC1788)
3) if you use timed loops you will get paralellism because it creates a RTX task for that loop.disable parallel execution in LV and LV won't have to do it anymore for you, the RTX kernel will do this! LV has standard 1 RTX task for the main (LVembeddedmain.c with a stack of 2048, I have made this definable ) and if you use TCP it will use 2 tasks extra (1 for TCP handling and 1 for TCP timer) see the LV driver RLARM_TCPWrapper.c (tcp_timer_task and tcp_task)
4) you can do the first (reference) test also with disable parallel execution off. but what you did is also a good test, but a little bit unfair because you have disabled parallel execution in your reference. (your difference in the test results are based on the multiple loops AND build specification)
5) I will have to do this
6) never seen the code that LV generates? It is LabVIEW, so it checks if all the requirements are met (LV scheduling etc)
You will hear from me.
02-10-2012 03:15 PM
3) if you use timed loops you will get paralellism because it creates a RTX task for that loop.disable parallel execution in LV and LV won't have to do it anymore for you,
What about TCP IP then? How can we put tcp/ip communication code into a timed loop, since tcp/ip is not really a deterministic process?
02-13-2012 08:26 PM
>> 3) if you use timed loops you will get paralellism because it creates a RTX task for that loop.disable parallel execution in LV and LV won't have to do it anymore for you, the RTX kernel will do this!
In response to your suggestion to try timed loops, I have tried extensive testing with LabVIEW Embedded for ARM (LVEA) targetting the MCB2300 dev board
and I am only able to get worse performance using them (in comparison to normal while loops).
This investigation began because I was unhappy with the results obtained when I split the work inside a normal while loop into two while loops - I got an 8-10 times speed decrease (presumably due to the overhead from the RTOS context switching between loops). This is unacceptably high. I interpreted that your suggestion to use timed loops would overcome this problem, in fact it got worse !!
Attached are the vi snippets so you can repeat the testing yourself. I have to attach the 4th snippet in the next post since I can only attach 3 files per post. Timing results are in the comments.
02-14-2012 07:41 AM
I have made some tests and I can confirm your conclusion. (in LV for ARM 2010)
I was using the Timed loops in a big application that I was testing.
The timed loops here are endless and are three tasks that have different periods of executing.
Also I am using disable parallel execution and use C function calls.
I don't know why I can't use it here in my test apps.
In my opinion a RTX task must have more effiency then a LabVIEW generated task switch.
with the timed loops, my code run perfectly at the given periods.
I will look at why we can't use the wanted compiler options with these tests (timed loops)
02-14-2012 09:28 AM
In my huge ARM application I have disabled parallel execution and I am using C function names.
when I select "use stack variables" and "enable expression folding", LabVIEW crashes during the build...
A LV for ARM application is one RTX task (main_task)
with timed loops you can create additional RTX tasks.
any scheduling in a LV application is done by LabVIEW code (not RTX!)
so the tests with the two for loops, is scheduling by LabVIEW.
and it's logic that the code (with multiple for loops) will get slower. you're creating more code and LabVIEW must schedule this.
So if you want to speed up your parallel code (using disable parallel execution),
you want to use timed loops, otherwise your code won't be parallel anymore.