LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How to plot a heatmap for occurence of strings in different groups

Solved!
Go to solution

Hello,

I have a csv file with a list of strings in the first column and their corresponding groups in the second column.

There are many strings that appear multiple times in different groups.

I would like to draw a heatmap where I can identify the proportion of each group occurring in the N group.

Thus, I will have the groups in the x and y axes and a diagonal equal to 1

For each intersection of the heatmap, I have the proportion of strings of group i of x-axis occurring in group j in the y axis.

 

Thanks,

 

0 Kudos
Message 1 of 19
(2,744 Views)

An example of your data would help. How many groups? Also how the output should look like.

 

Maybe maps would help. What have you tried?

0 Kudos
Message 2 of 19
(2,711 Views)

It looks a bit complicated.

I have 23 groups of strings

An example:

Jake 1

Alex 2

Jake 2

Mike 1

Olga 2

Chris 22

Olga 1

...

 

Then in x axis and y axis I will have the groups from 1 to 23. Let's say we have 20 people in group 1 and 30 people in group 2

The first item (group of x-axis) is 1. So the proportion is 20/20 (because all the items in group 1 are obvioly in group one) but they may exist in other groups

the second item (group of x axis) is 2, So the proportion is 1/30 because 1 item only are in group2 (jake is included in group 1 and 2). The question to ask is how many items from group 1 are in group 2

 

 

So this is to see how the people are partitioned in the different group, sometimes one person belong to multiple groups

Download All
0 Kudos
Message 3 of 19
(2,705 Views)

Your csv file has no resemblance to your problem description. There are quite a few commas and there is no obvious delimiter (A plain comma gives quite a few columns and If we use <",>, e.g. the fifth line has a problem getting the group because the string is not in quotes).

 

Can you attach a cleaner file?

0 Kudos
Message 4 of 19
(2,695 Views)

@altenbach wrote:

Your csv file has no resemblance to your problem description. There are quite a few commas and there is no obvious delimiter (A plain comma gives quite a few columns and If we use <",>, e.g. the fifth line has a problem getting the group because the string is not in quotes).

 


 

yes ... also there appear to be 25 groups instead of 23 ...

 

alexderjuengere_0-1645730929809.png

 

 

groups.png

 

Message 5 of 19
(2,688 Views)

Indeed, sorry I put the wrong file.

I attach it here with the 25 groups.

To simplify the problem of the heatmap, I would like to calculate for each pair of groups (i,j),  the proportion of group i in group j and the proportion of group j in group i. The groups have different sizes.

If I manage to fill in this matrix, then the heatmap will be straightforward.

 

0 Kudos
Message 6 of 19
(2,680 Views)

@ziedhosni wrote:

I attach it here with the 25 groups.


Where?

0 Kudos
Message 7 of 19
(2,673 Views)

@ziedhosni wrote:

I would like to calculate for each pair of groups (i,j),  the proportion of group i in group j and the proportion of group j in group i. The groups have different sizes.


"Proportion" of what? Are you talking about the number of unique elements in each group?

0 Kudos
Message 8 of 19
(2,670 Views)

As already mentioned, I probably would use a map where the key is the group# (I32) and the value is a set of names (strings).

 

Here's a quick draft. (this works with your original file, ignoring any line that does not start with a <">.)

 

altenbach_0-1645739383681.png

 

altenbach_1-1645739740281.png

 

 

 

There are plenty of ways to compare set(i) and set(k) in the stack of loops on the right. Once your description is a bit less ambiguous, we can narrow it down. Modify as needed.

 

Note that sets only contain unique elements.

Message 9 of 19
(2,657 Views)

Let me put a real-life example. Let's imagine some students participating in the clubs of the university. Some of them decide to join one club each and some participate in different clubs.

The heat map will measure the proportion of each club i members in club j. The diagonal will be 1 obviously.

If no member of the club of music is participating in theatre club then we have zero in that interesction (pair).

I keep forgetting the dataset.

0 Kudos
Message 10 of 19
(2,628 Views)