Benchmarking Vision Appication on Didferent Processors - Strange Results

omarbedair · ‎01-20-2017

Greetings,

We have a certain apparatus set up for performance benchmarks, where printed barcodes are stuck to blades of a rotating fan (variable speed).

The two computers subject to benchmark ran the same OS (WE7S), same SSD, but different processors (1- 1037u Dual adore Celeron at 1.8GHz, and 2- i7-4500u dual core at 1.8GHz and 2 threads per core) and different Gigabit Ethernet controllers: both not intel pro.

The application run is a simple grab, hardware triggered from a GIGE 1.3 MP camera, and followed by a vision assistant performing simple BCG and barcode reading, all placed in a single while loop with initialization and closing at both ends.

The reaults showed the following top-level results:

1- The 1037u processor was able to run up to 43 FPS, while the i7 was limited at 28 FPS.

2- On gradually increasing the fan speed, the 1037u CPU load increased on core1; until max capacity was reached, then core 2 started kicking in and increased gradually in the same manner.

3- On gradually increasing speed on the i7 processor, CPU load was found to be equally divided on four threads, and increased gradually on four threads all together.

Questions:

1- Is the excessive multi-threading on the i7 causing the delays

2- Where does this multi threading come from, given that a simple while loop is being used with a purely sequential process, no pipelining involved.

3- If this multi threading behavior is built in the OS; is there anyway we can alter this operation, or may be do a work around in programming.

Our goal is to be able to utilize the extra power inherent in the core i7 system.

P.S. More benchmarks using more complex architectures and parallel grab will be done at later stages.

GreyGrey · ‎01-23-2017

Hi Omarbedair,

I'm not sure if the multi-threading is slowing down the execution of the tasks but you could go into task manager and set the processor affinity to only use 1 core and see if that has any impact on the number of FPS involved. This is not something that is LabVIEW specific but instead can be done for any process in Task Manager.

This can be done by right-clicking the process in Task Manager and selecting the "Set affinity" option and unchecking all cores but one. If you are in Windows 10, you'll have to go to the Details tab (which can be gotten to by right-clicking the process and choosing "Go to details").

-----------------------------------------------
Brandon Grey
Certified LabVIEW Architect

AmitShachaf · ‎01-23-2017

Hi

I was involved in the past in developing high throughput vision processing on RT

system. System was PXI with quad core processor (7 years ago) and LV2010 RT.

Image size was small 640x50 8bit on camera link Frame Grabber.

I was able to have reliably processing of 700 frames / second.

Processing was Auto threshold, morphology, centroid.

Image transfer to PC over TCPIP and some additional signal processing.

About your testing setup it might not reflect real performance after pipeline.

Items to consider

1. When the image acquisition slow down the process, then usually Task manager will show that all cores are busy. To time correctly you need to copy the image to another loop and process in second loop.

2. When designing pipeline image processing you need to consider if your image processing is processor limited or memory limited. That is usually depends on the size of the image and processor cache memory size. If your image processing is memory limited, then Task manager will show that all cores are busy while in reality processor is waiting for the memory to transfer the image and not really doing any work.

3. I was doing all this work in Vision Development module. I don't think you can do a real testing of pipeline processing on Vision Assistant.

But I usually program directly in Vision Development module. So I am not sure about vision assistant capabilities

Good luck with your testing - Amit,

Amit Shachaf

Craig_ · ‎01-23-2017

Modern processors will adjust their clock speeds dynamically based on temperature and CPU load.

What temperatures are the CPUs are running at during the benchmark?

It's possible that the 4500U isn't being cooled as well and is being thermally throttled.

Comparing the CPUs on Intel's ARK they look pretty similar: http://ark.intel.com/compare/75460,71995

Note that the celeron has a 17W TDP, while the 4500U has a 15W TDP. Also note that both processors have a base clock of 1.8GHz. While the 4500U can turbo up to 3GHz multithreaded loads could cause it downclock to 1.8GHz, in which case it would perform similarly to the celeron. At the very least I would expect both of these CPUs to perform similarly, it does seem odd that the benchmark is performing worse on the i7.

Craig H. | CLA CTA CLED | Applications Engineer | NI Employee 2012-2023

BlueCheese · ‎01-24-2017

I can't recall if VDM takes hyperthreading into account when determining how many threads to use for multi-core algorithms. It is possible that for some CPU-heavy algorithms, the aggregate performance can get worse when utilizing hyperthreading. If you disable hyperthreading in your BIOS (so both CPUs have 2 cores, 2 threads), does it change the performance picture?

You can also use the IMAQ Multi-Core Options VI on the palette to force the system to use a single core (disable the threading) or to just 2 cores and see how it affects performance.

Eric

Machine Vision

Benchmarking Vision Appication on Didferent Processors - Strange Results

Benchmarking Vision Appication on Didferent Processors - Strange Results

Re: Benchmarking Vision Appication on Didferent Processors - Strange Results

Re: Benchmarking Vision Appication on Didferent Processors - Strange Results

Re: Benchmarking Vision Appication on Didferent Processors - Strange Results

Re: Benchmarking Vision Appication on Didferent Processors - Strange Results