06-17-2021 03:32 AM
Looking for some ideas, solutions or a bit of break through genius.
I have an application which is acquiring data quite fast (180Mbps).
I'm trying to process the data as fast as possible, but unfortunately due to huge data chunk sizes and data operations (FFTs, and various matrix manipulations on arrays of 20k*5k) there is no way I can process this data at the same rate it appears.
I need to run n different (configurable) processes on the data and overlay it.
The results are then display and logged to files.
Acquisition, processing, display and logging are all separate functions that happen in parallel.
I have a fairly mature (inherited but evolving) code structure with circa 30 core module functions and multiple UI displays. (only 1 of which is the 'main' processing block). - requirements spec is nasty.
Processing block was originally done in series, has now been setup to be done in parallel with reentrant VIs but I'm getting performance complaints when the configurable acquisition rate, martix sizes and number of processes is set high.
I've looked at Windows Processor affinity but seem to be having an issue with setting the mask on PCs with high numbers of logical processors.
I've considered putting the Processing block into a timed sequence and forcing it onto a specific processor (but then the sequence fails to execute if I set 2 processing blocks to the same processor at the same time.
Speed of results is not so much an issue as preventing CPU overload, but I would like to get maximum speed possible from parallel processing. (just having program pushed beyond design specs)
Any ideas please?
06-17-2021 03:39 AM
Hi James,
@James_W wrote:
Speed of results is not so much an issue as preventing CPU overload,
You might check CPU usage when your program is running.
When usage rises above a certain limit you could add small wait states in your processing call chain…
06-17-2021 03:43 AM - edited 06-17-2021 03:51 AM
Hi Gerd,
I've tried adding small waits, How would you propose checking CPU usage?
edit:
ahh found the link:
Solved: How to check CPU usage and Paging using LabVIEW - NI Community
also found which may be worth investigation too:
High CPU usage of Labview executables when using remote desktop - NI Community
Cheers
James
06-17-2021 06:20 AM
update:
CPU usage measurement function was 2010 and is not available in 2015 or above (is .NET) - so deadend.
Monitor Resolution:
is not the main issue, although it is causing a small issue.
06-17-2021 06:43 AM
Trying to test with Timed loops. Have 12 cores. As soon as I turn on for loop parallelism, I run for ever and don't see some of the dialogs?
06-17-2021 06:45 AM
Trying to test with Timed loops. Have 12 cores. As soon as I turn on for loop parallelism, I run for ever and don't see some of the dialogs?
06-17-2021 07:03 AM - edited 06-17-2021 07:04 AM
Hi James,
@James_W wrote:
CPU usage measurement function was 2010 and is not available in 2015 or above (is .NET) - so deadend.
Use PerformanceCounters:
The example is made for a German Windows installation and also runs on Win10 (in contrast to the comment). The missing subVI is a running average from my user.lib, you can replace/delete it…
06-17-2021 07:19 AM
Hi James,
What happens when you run into these limits? I guess maybe you don't slow down the acquisition; is it finite in duration or do you have a growing buffer of data to be processed?
I'd guess the answer(s) to these type of questions govern what you want to do about it - should some parts of the processing be (anti-?)preferentially completed, is it a pipeline or many different steps with the same source data, etc?
Is slowing down UI updating in order to more effectively process the data and log to disk allowed, or is the responsiveness of the (all the) display(s) critical?
Is acquiring more/different hardware an option? Perhaps you'd have more luck if you could offload some of the processing (especially if by chance it happens to involve early and plausibly simple decimation of some kind (you mentioned FFTs, but I suspect you probably want both the original data and the FFT, so that might not qualify here), in which case an FPGA might help? But just guessing).
06-17-2021 08:12 AM - edited 06-17-2021 08:14 AM
Thanks Gerd,
@cbuthcher, The acquisition is a data stream from another machine that once started needs to be maintained to prevent the TCP buffer on the other machine overflowing. Between processing data "chunks" it is acceptable to chuck away streamed data locally whilst the UI updates, logging takes place and processing occurs etc.
Buffer grows and shrinks in a queue and can hold up to 5mins of data at 180Mbps (60GB) before acquisition is stopped to prevent a Windows crash from paging overflow.
Data is dealt with in 1 sec parcels locally but each "chunk" could be up to 5 mins of data - although its typically 15-30secs.
The UI must be responsive whilst processing is occuring but data is sent to the UI as it is processed (in chunks) per process number.
UI is only updated every data set so not that often anyway.
No other H/W is currently an option and as in some cases we want large matrices and some we want small, the using a GPU / FPGA for the matrix operations is not necessarily an advantage due to the GPU load/unload time.
Time restrictions rule out H/W changes too for the moment.
06-17-2021 09:06 AM
@GerdW wrote:
Hi James,
@James_W wrote:
CPU usage measurement function was 2010 and is not available in 2015 or above (is .NET) - so deadend.
Use PerformanceCounters:
The example is made for a German Windows installation and also runs on Win10 (in contrast to the comment). The missing subVI is a running average from my user.lib, you can replace/delete it…
I'm obviously being dozy. Can't seem to get this working on English Windows.
Can't seem to find the documentation for it but I have looked.