As multi-core machines become more commonplace, we are seeing an increasing demand for parallel processing on those machines. Many of the cutting-edge applications we tackle today come down to solving a mathematical problem. At the heart of these mathematical problems we often find key analysis functions. One such function, the matrix-vector multiplication, is vital to the control systems of the Extremely Large Telescope. These key functions are often the CPU-intensive bottleneck in meeting the specifications of the applications. To overcome the bottleneck, we could rely on a machine with a faster processor. But such a single-core upgrade will only help so much. We need to leverage modern architectures - multi-core machines - and figure out how to parallelize the computation. It then becomes very important to provide a set of high-performance analysis functions in the domains of mathematics and signal processing. The characteristics of the high performance analysis functions include:
Creating parallel versions of computational code can be challenging. Parallelization continues to be an important topic in academia and industry. Of course, there is a very understandable interest on the side of chip manufacturers. Companies like Intel, AMD, and NVidia invest a lot of time and energy in optimizing their mathematical libraries, which are then leveraged by National Instrument’s products. One of the most prominent examples is the Intel Math Kernel Library (MKL).
MKL contains a collection of highly-optimized, thread-safe mathematical functions. For several years now, LabVIEW Analysis has used MKL to achieve processor-tuned performance. But for reasons explained shortly, LabVIEW only uses single-threaded MKL routines, missing opportunities offered by the latest generations of multi-threaded MKL.
Why is LabVIEW Analysis limiting calls to single-threaded MKL? The complication comes from MKL’s thread-management, which can potentially collide with LabVIEW’s thread model. Note that LabVIEW is already an inherently parallel language. The compiler in LabVIEW automatically breaks up an application’s source code into clumps, and then the LabVIEW execution system schedules data-independent clumps to run in different threads. These threads can then execute on different cores on a multi-core machine. This is true for both G code and calls to an external library like MKL. It is therefore possible that more than one LabVIEW clump calls into MKL simultaneously. There would be no way, then, to exactly know how many system resources should be allocated because the MKL routines are also threaded. In order to avoid potential disaster in overusing resources, Intel strongly recommends turning off MKL threading in nested parallelism situation, which can happen naturally in LabVIEW applications.
Despite this possible conflict, we consider MKL to be a compelling technology for us to implement the parallel analysis functions in LabVIEW. We are able to meet the desired behavior of parallel analysis functions given the functionality and parallelism MKL offers. Through NI Labs, we intend to make the following MKL routines available within LabVIEW.
When we implement the parallel analysis functions, we add some special support code in order to employ multi-threaded MKL while at the same time overcoming the conflict between LabVIEW and MKL thread models. Even so, you still have to use the functions with care. We recommend the following guidelines:
Download the Quick Reference Guide for the list of functions exposed.
There are several examples (link) to be downloaded. These examples show how to use this library to accelerate your VI. They also show how to use support VIs to control the behavior of threading.
Download the examples (link) for the benchmark VI. It benchmarks the performance of matrix-matrix multiplication with different number of threads. The benchmark result might vary through different computers. The following figure is the result from an 8-core computer with Intel Xeon W5580 processors.