01-19-2014 11:51 PM
Hi All,
I'm having no success with this (simple?) multithreading problem on my core-i7 processor, using CVI 9.0 (32-bit compiler).
In the code snippets below, I have a node level structure of 5 integers, and I use 32 calls to calloc() to allocate space for 32 blocks of 128*128 (16K) nodes and store the returned pointers in an array as a global var.
Node size in bytes = 20, block size in bytes = (approx) 328KB, total allocated size in bytes = (approx) 10.5MB.
I then spawn 32 threads, each of which is passed a unique index into the "node_space" pointer_array (see code below), so each thread is manipulating (reading/writing) a separate 16K block of nodes.
It should be thread safe and scale by the number of threads because each thread is addressing a different memory block (with no overlap), but multithreading goes no faster (maybe slightly) than a single thread.
I've tried various threadpool sizes, padding nodes to 16 and 64 byte boundaries, all to no avail.
Is this a memory bandwidth problem due to the size of the arrays? Does each thread somehow load the whole 32 blocks? Any help appreciated.
struct Nodes
{
unsigned int a;
unsigned int b;
unsigned int c;
unsigned int d;
unsigned int e;
} ;
typedef struct Nodes Nodes;
typedef Nodes *Node_Ptr;
Node_Ptr node_space[32]; /* pointer array into 32 separate blocks ( loaded via individual calloc calls for each block) */
.... Thread Spawning ....
for (index = 0; index < 32; ++index)
CmtScheduleThreadPoolFunction(my_thread_pool_handle, My_Thread_Function, &index, NULL);
Solved! Go to Solution.
01-20-2014 01:41 AM
Hi again,
Sorry, the block indexing scheme in the orig post was not right.
I actually use an indexed list of integers to provide the thread with a unique value via the pointer-to-data parameter in CmtScheduleThreadPoolFunction.
should be:
struct Nodes
{
unsigned int a;
unsigned int b;
unsigned int c;
unsigned int d;
unsigned int e;
} ;
typedef struct Nodes Nodes;
typedef Nodes *Node_Ptr;
Node_Ptr node_space[32]; /* pointer array into 32 separate blocks ( loaded via individual calloc calls for each block) */
int index_list[32];
/* <<<<< all above are globals >>>>> */
.... Thread Spawning ....
int index;
for (index = 0; index < 32; ++index)
{
index_list[index] = index;
CmtScheduleThreadPoolFunction(my_thread_pool_handle, My_Thread_Function, (index_list + index), NULL);
}
01-20-2014 04:24 AM - edited 01-20-2014 04:25 AM
Hello CVI_Rules!
Have you considered the following options in order to enhance the performance of your application?
01-20-2014 05:21 AM
Hello CVI_Rules,
It's hard to answer your question because it depends on what you are doing in your thread function. Since you are not seeing any speed up in your program when you change the number of threads in your thread pool, you are either doing too much (or all of the work) in each thread, serializing your threads with locks, or somehow slowing down execution in each thread.
Your basic setup looks fine. You can simplify it slightly by passing the nodes directly to your thread function:
for (index = 0; index < 32; ++index) { CmtScheduleThreadPoolFunction(pool, My_Thread_Function, node_space[index], NULL); }
...
static int My_Thread_Function(void *functionData)
{
Node_Ptr nodes = functionData;
...
But that's not going to affect performance.
Things to look into:
Apart from that, can you explain what you are doing in your thread function so that we can have a better understanding of your program and what might inhibit parallelism?
01-20-2014 08:29 PM
01-20-2014 08:40 PM
01-21-2014 08:32 AM
Hello again! We're very glad that it eventually worked out well for you and that you were able to find a solution to your problem. I wish you good luck with your university projects!