How to plot a heatmap for occurence of strings in different groups

ziedhosni · ‎02-24-2022

Hello,

I have a csv file with a list of strings in the first column and their corresponding groups in the second column.

There are many strings that appear multiple times in different groups.

I would like to draw a heatmap where I can identify the proportion of each group occurring in the N group.

Thus, I will have the groups in the x and y axes and a diagonal equal to 1

For each intersection of the heatmap, I have the proportion of strings of group i of x-axis occurring in group j in the y axis.

Thanks,

altenbach · ‎02-24-2022

An example of your data would help. How many groups? Also how the output should look like.

Maybe maps would help. What have you tried?

LabVIEW Champion.

ziedhosni · ‎02-24-2022

It looks a bit complicated.

I have 23 groups of strings

An example:

Jake 1

Alex 2

Jake 2

Mike 1

Olga 2

Chris 22

Olga 1

...

Then in x axis and y axis I will have the groups from 1 to 23. Let's say we have 20 people in group 1 and 30 people in group 2

The first item (group of x-axis) is 1. So the proportion is 20/20 (because all the items in group 1 are obvioly in group one) but they may exist in other groups

the second item (group of x axis) is 2, So the proportion is 1/30 because 1 item only are in group2 (jake is included in group 1 and 2). The question to ask is how many items from group 1 are in group 2

So this is to see how the people are partitioned in the different group, sometimes one person belong to multiple groups

altenbach · ‎02-24-2022

Your csv file has no resemblance to your problem description. There are quite a few commas and there is no obvious delimiter (A plain comma gives quite a few columns and If we use <",>, e.g. the fifth line has a problem getting the group because the string is not in quotes).

Can you attach a cleaner file?

LabVIEW Champion.

alexderjuengere · ‎02-24-2022

@altenbach wrote:

Your csv file has no resemblance to your problem description. There are quite a few commas and there is no obvious delimiter (A plain comma gives quite a few columns and If we use <",>, e.g. the fifth line has a problem getting the group because the string is not in quotes).

yes ... also there appear to be 25 groups instead of 23 ...

ziedhosni · ‎02-24-2022

Indeed, sorry I put the wrong file.

I attach it here with the 25 groups.

To simplify the problem of the heatmap, I would like to calculate for each pair of groups (i,j), the proportion of group i in group j and the proportion of group j in group i. The groups have different sizes.

If I manage to fill in this matrix, then the heatmap will be straightforward.

altenbach · ‎02-24-2022

@ziedhosni wrote:
I attach it here with the 25 groups.

Where?

LabVIEW Champion.

altenbach · ‎02-24-2022

@ziedhosni wrote:

I would like to calculate for each pair of groups (i,j), the proportion of group i in group j and the proportion of group j in group i. The groups have different sizes.

"Proportion" of what? Are you talking about the number of unique elements in each group?

LabVIEW Champion.

altenbach · ‎02-24-2022

As already mentioned, I probably would use a map where the key is the group# (I32) and the value is a set of names (strings).

Here's a quick draft. (this works with your original file, ignoring any line that does not start with a <">.)

There are plenty of ways to compare set(i) and set(k) in the stack of loops on the right. Once your description is a bit less ambiguous, we can narrow it down. Modify as needed.

Note that sets only contain unique elements.

LabVIEW Champion.

ziedhosni · ‎02-25-2022

Let me put a real-life example. Let's imagine some students participating in the clubs of the university. Some of them decide to join one club each and some participate in different clubs.

The heat map will measure the proportion of each club i members in club j. The diagonal will be 1 obviously.

If no member of the club of music is participating in theatre club then we have zero in that interesction (pair).

I keep forgetting the dataset.

LabVIEW

How to plot a heatmap for occurence of strings in different groups

How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups

Re: How to plot a heatmap for occurence of strings in different groups