06-26-2014 06:03 AM
I'm trying to compile a larger shared library for the NI Linux RT system.
This shared library uses at some points some of the atomic intrinsics __sync_........ that were available in GCC since about 4.1. The code compiles find and shows no warnings whatsover in the Eclipse compile toolchain. However when I try to deploy a VI to the target system that makes use of the shared library the deplyoment hangs after downloading the VI and when I cancel the deployment I get an error message that the import __sync_bool_compare_and_swap_4 could not be resolved. This seems to happen with any of the __syn intrinsics.
The GCC documentation has to say as much that these intrinsics may not be available on all targets (not nice but ok) and that the compiler in that case issues a warning (doesn't happen here!) and inserts a call to the __sync_<intrinsic name>_n symbol with n being the number of bytes for the operand (this seems to happen here!).
However: Compiling and deploying an executable that makes use of these intrinsics to the target and running it does seem to work fine. I use the exact same compile and linker settings for this test executable and the shared library.
In the attachment is the test executable that I used to test if the intrinsics are even available at all.
Does anybody have any ideas why GCC might try to force these intrinsics to be external symbols in a shared library while it seems to work for a simple executable? Or is the LabVIEW load table messing with the ld loader in some ways such that it can not resolve these external symbols for my shared library? Or do I need to specify an extra explicit library for the linker command?
06-26-2014 08:54 AM
roflk,
That is a hairy problem. Odd that it works fine when building a simple binary that uses the same intrinsics. If possible, it would be interesting to build a binary that dlopen()'s/dlsym's the same library that you're using to see if the behavior matches the binary that calls the intrinsics directly or LabVIEW. It also allows you to easily debug the issue than trying to attach to the lvrt binary and set a pending breakpoint in the library code (assuming it loads at all, since it seemingly has issues with resolving some symbols).
Another venue for getting some more information would be to look at the lvrt logs on the target (/etc/natinst/share/log/ iirc) and/or enabling console out and hooking up a serial console to the target (some information may be printed that gives some clues)
06-26-2014 04:49 PM
Some nice ideas, but loading the shared library in a small test program of course fails, since the shared library also links with LabVIEW manager functions!
So I did a few more tests:
Looking at the assembly of the executable it seems most of the __sync_....() gets removed, despite that I compile without any optimizations. This is in principle possible since no memory fence is required nor multithreading is present in a simple executable with just a main funciton. So gcc seems simply to smart here, even when compiling with debug settings.
Looking with objdump -T at the dynamic symbol table of my shared library I clearly can see import references for __sync_fetch_and_and_4 and __sync_fetch_and_or_4. These are the actual functions I need to access and modify bitflags concurrently. I have an alternative implementation that provides these functions with cmp_exchange operation, resulting in the reference to __sync_bool_compare_and_swap, but that gets also externalized by arm-gcc.
So I'm a now trying to find an alternative solution but that seems hard. ARMv5TE doesn't really seem to allow for lockless concurrency safe load and store operations. Also it doesn't seem to provide memory fence support either, which is normally not a problem since they seldom have multiple CPUs. But the cRIO-9068 seems to have dual cores, so some form of memory fence seems necessary or does the NI Linux RT impose some limitations here such that it is not really requiring SMP safe memory fences? I supposee the cores implement some form of caching of their own?
The assembly code I can find is either fully concurrency safe but for AMRv6 or if for ARMv5 it has either a small window where it can fail concurrency safety or uses more or less global spinlocks (terrible thought for something which should run on an RT system).
Another solution is to protect the resource inside my own code with a mutex specifically for that resource (a 32 bit bit flag variable per device state). But since we have 2 CPUs I might still need something extra to also guarantee that two threads executing on different cores don't get inconsitent information due to memory caching.
It would seem to me that the LabVIEW realtime engine also needs this in some form or the other, or does it really protect every resource that can be accessed concurrently through a mutex and how does it deal with memory cache coherency here?
06-26-2014 05:26 PM
Since the machine is an ARMv7 ISA machine, and has the SCU active, in order to ensure low-level atomic accesses aim at ARMv7 assembly. Look at LDREX/STREX implementations of this sort of functionality. While the tools and libc are targeted at ARMv5, the compiler supports targeting ARMv7 (obviously) and, with appropriate mcpu/mtune flags you will get ARMv7 code.
06-26-2014 06:25 PM
To add to Brad's answer, I know I have successfully used these on a 9068 with GCC 4.4:
__sync_add_and_fetch
__sync_lock_test_and_set
__sync_or_and_fetch
__sync_and_and_fetch
with 32-bit variables (resolving to the "_4" on the end). The 64-bit/_8 ones are not available. The key might be adding the flags properly as Brad also mentioned. Looking through the CPU flags I pass for my code I see
-march=armv7-a
06-26-2014 07:37 PM
Thanks! With -march=armv7-a I can indeed use the LDREX/STREX opcodes in inline assembly. Will try tomorrow to define it back to just use __sync_ functions, as I would prefer to not have to tinker with assembly if possible.
My confussion started when looking at the predefined defines after things didn't work. The __ARM_ARCH_5TE__ in there made me assume that the Zync chip would only support armv5 architecture. Didn't realize that that might be a default of the toolchain and with no configuration support for changing it.
Have to say that the ARM version nomenclature isn't exactly transparent. There is an HW architecture version, a software version and what else and they mix and match between each other in rather confusing ways, with similar naming but different meaning.
06-27-2014 10:35 AM
rolfk wrote:
Have to say that the ARM version nomenclature isn't exactly transparent. There is an HW architecture version, a software version and what else and they mix and match between each other in rather confusing ways, with similar naming but different meaning.
You're not the only one confused!
This is a pretty good guide I found (aimed at consumer devices, but still relevant here): http://www.overclock.net/t/896724/arm-naming-guide-watch-as-i-classify-your-device
07-01-2014 11:09 AM
Well, I tried various things. While there seems no problem in using the GCC intrinsics in an executable I can't seem to make it work in a shared library. Even with -march=armv7-a it still keeps inclluding references to the external functions which can't be resolved on loading of the shared library.
I have now resolved to implement a compareAndExchange() inline function using inline assembly and simulate the other boolean functions using this function. That seems to work fine and was also necessary for the vxVorks version as well as the Pharlap, but then of course with different assembly code.
07-02-2014 10:13 AM
One difference that I noted comparing the resulting .so file from the 4.4.1 toolchain and a locally-build 4.7.3 toolchain is that the symbols for the atomic builtins have different visibility (under the 4.4.1 toolchain, they're listed as .hidden in the .symtab symbol table, for 4.7.3 they're just normal visibility in .symtab).
One additional thing to note is that the 4.4 GCC release is where these builtins were introduced, and at least for the first release they wanted to gate the use a bit, asking that you #include a header to use the functionality. I have not tested the resulting binary from my 4.7 toolchain from a LabVIEW CLFN, but it seems that there have been some changes in the way that gcc handles those builtins with later tools. Note that, within reason, you can use a different toolchain than the system library on the machine that you plan on running the binary on (within reason, with too new a toolchain, it will pick a symbol that is not supported or versioned incorrectly for the CRT on the target), but it's something to try out if you wanted to give it a try.
From readelf on the resulting binaries:
gcc -v == gcc version 4.7.3 (Brad's so-fresh local build)
$ readelf -a libtest.so.1.0 | grep __sync_bool
188: 0000137c 48 FUNC LOCAL DEFAULT 10 __sync_bool_compare_and_s
215: 0000134c 48 FUNC LOCAL DEFAULT 10 __sync_bool_compare_and_s
216: 00001320 44 FUNC LOCAL DEFAULT 10 __sync_bool_compare_and_s
gcc -v == gcc version 4.4.1 (Sourcery G++ Lite 2010q1-202)
$ readelf -a libtest.so.1.0 | grep __sync_bool_compare
0x12ec <__sync_bool_compare_and_swap_4>: @0x14d4
0x1318 <__sync_bool_compare_and_swap_2>: 0x80a8b0b0
182: 00001318 32 FUNC LOCAL HIDDEN 10 __sync_bool_compare_and_s
198: 00001338 28 FUNC LOCAL HIDDEN 10 __sync_bool_compare_and_s
208: 000012ec 44 FUNC LOCAL HIDDEN 10 __sync_bool_compare_and_s
The additional entries at the start deal with stack unwinding
07-02-2014 03:43 PM
What do you mean with including a header? I simply defined my own cross toolchain functions to resolve to the __sync functions and that all compiles and links fine in the Sourcery 4.4.1 toolchain. For an executable it also executes fine on the cRIO-9068 system, but trying to load a .so into LabVIEW (or with dlopen()) gives missing symbols for these intrinsics.