From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
Peti

Add OpenCL support

Status: New

Dear all Labview fans,

 

Motivation:

I'm a physicist student who uses Labview for measurement and also for evaluation of data. I'm a fan since version 6.i (year 2005 or like)

My typical experimental set-up looks like:  a lot of different wires going every corner of the lab, and it is left to collect gigabytes of measurement data in the night. Sometimes I do physics simulation in Labview, too. So I really depend on gigaflops.

 

I know, that there is already an idea for adding CUDA support. But,not all of us has an nvidia GPU. Typically, at least in our lab, we have Intel i5 CPU and some machines have a minimalist AMD graphics card (other just have an integrated graphics)

 

So, as I was interested in getting more flops, I wrote an OpenCL dll wrapper, and (doing a naive Mandelbrot-set calculation for testing) I realized 10* speed-up on CPU and 100* speed-up on the gamer gpu of my home PC (compared to the simple, multi-threaded Labview implementation using parallel for loops) Now I'm using this for my projects.

 

What's my idea:

-Give an option for those, who don't have CUDA capable device, and/or they want their app to run on any class of calculating device.

-It has to be really easy to use (I have been struggling with C++ syntax and Khronos OpenCL specification for almost 2 years in my free time to get my dll working...)

-It has to be easy to debug (in example, it has to give human readable, meaningful error messages instead of crashing Labview or making a BSOD)

 

Implemented so far, by me, for testing the idea:

 

-Get information on the dll (i.e..: "compiled by AMD's APP SDK at 7th August, 2013, 64 bits" , or alike)

 

-Initialize OpenCL:

1. Select the preferred OpenCL platform and device (Fall back to any platform & CL_DEVICE_TYPE_ALL if not found)

2. Get all properties of the device (CLGetDeviceInfo)

3. Create a context & a command queue,

4. Compile and build OpenCL kernel source code

5. Give all details back to the user as a string (even if all successful...)

 

-Read and write memory buffers (like GPU memory)

Now, only blocking read and blocking write are implemented, i had some bugs with non blocking calls.

(again, report details to the user as a string)

 

-Execute a kernel on the selected arrays of data

(again, report details to the user as a string)

 

-close openCL:

release everything, free up memory, etc...(again, report details to the user as a string)

 

Approximate Results for your motivation (Mandelbrot set testing, single precision only so far.):

10 gflops on a core2duo (my office PC)

16  gflops on a 6-core AMD x6 1055T

typ. 50 gflops on an Intel i5

180 gflops on a Nvidia GTS450 graphics card

 

70 gflops on EVGA SR-2 with 2 pieces of Xeon L5638 (that's 24 cores)

520 gflops on Tesla C2050

 

(The parts above are my results, the manufacturer's spec sheets may say a lot more theoretical flops. But, when selecting your device, take memory bandwidth into account, and the kind of parallelism in your code. Some devices dislike the conditional branches in the code, and Mandelbrot set test has conditional branches.)

 

Sorry for my bad English, I'm Hungarian.

I'm planning to give my code away, but i still have to clean it up and remove non-English comments...

13 Comments
Peti
Member

Thank you for the link to the OpenCLV toolkit.

 

My first impression is, this implements way too much compared to what I really need, and in an unknown way.

 

(I just do "inline" some simple openCL kernels to speed up things a bit, not that advanced complex thing. My wrapper itself is as small as 6 subVI-s, and the dll is about 1400 lines of code in c++. The compiler is the 64-bit gcc, the enviroment is the free Code::Blocks 12.11 )

 

And, OpenCLV is a proprietary third party thing, with some password protected subvi-s. (if there is an error, no way to figure out what went wrong, and there is no way to learn OpenCL API better. -I mean, the original specification on the Khronos website. )

 

OpenCL has "open" in its name. Labview itself is proprietary, but almost all the programs written are open source. I really detest if one hides his code by password protection. That doesn't help the community to learn things better.

 

OpenCLV has an evaluation license for 30 days.(price unknown, but anyway, i dislike to pay for something that is operating in a hidden unknown secret way. I'm a physicist, because I want to understand nature, and I'm curious on everything. It is not just a single problem, that is solved once by a secret software, using just one small manual, implemented, and forgotten. )

 

(Somewhere I have seen another OpenCL & Labview implementation, 2 years ago, that was using .NET and giving me a lot of error messages. I don't even know what the heck is .NET, but I have got my simple c++ DLL to do all the needed things, and to give me all debugging information on error in a readable way. I needed a month or so of my free time for the 1st version. Then I had a lot of things to work, and upgrades to my dll take 2 years of my nonexistent free time -having to work typically 10+ hours a day  and there are a lot of people in my life who are just always asking for some help or something...)

 

Anyway, I will clean up my OpenCL wrapper implementation code in the next some weeks, and give it away to the community as freeware.

amcelroy
Member

Hey Peti,

 

I wrote the OpenCLV toolkit and would be willing to work with you to solve your problem.  Most of my work is at UT - Austin doing a lot of academic image processing in Labview and we desperately needed more processing power and there seemed to be a need for others as well.  Right now a few guys in our lab are using it doing stuff like curve fitting, anisotropic filters, and optical coherence tomography processing but it would be neat to see simulation work as well.  Soon I'll be adding intensity texture analysis and Levenburg-Marquardt stuff as well.

 

I'm sorry that we disagree on code protection: it took me many months to implement OpenCL, do proper error handling, make custom palletes, organize everything, create documentation, debug, test, and do example code.  NI sells CUDA even though users can implement it for free if they have the knowledge, so it seemed the proper thing to do in this case as well.

 

Feel free to PM me or we can keep this thread alive.

 

Austin

 

 

altenbach
Knight of NI