I was asked by my stakeholder to perform a Venn diagram in one of their dashboard. There is excellent ressource online, which explain how to do it, but they all request some data transformation or densification and sometimes it is not very convenient when working in a business environment.
So I asked my stakeholder why he needed such a visualisation and I try to find an alternative. Don't get me wrong I like Venn Diagram, they are fun and remind me to my math classes but I do not find them very effective to display information
I came across a graph displayed in Github, showing an alternative and I found it very good. I figured it later how it was called an up-set plot. I decided to try to reproduce it in Tableau, because it doesn't need any data preparation, just some calculations. I always prefer not to create a data source to create a specific visualisation, especially at work. why? Because I personnally think that beauty should not overcome functionality. And because having only one datasource always provide more flexibility for dashboard actions or future maintenance.
Disclaimer : the view below can surely be done in a more elegant way :)
I reproduce below the use case I faced at work using Superstore (of course). Let's imagine that your stakeholder wants to know how many customers bought this year any combinason of the product categories; in order to determine the cross sellings and see if we should advertised some products to certain clients to increase the sales.
1. Calculate the Sales per category and client
You need to calculate the sales for each client and category :
Sales for Furniture
{ FIXED [Customer Name]: SUM(if [Category]="Furniture" then [Sales] end) }
Sales for Office Supplies
{ FIXED [Customer Name]: SUM(if [Category]="Office Supplies" then [Sales] end) }
Sales for Technology
{ FIXED [Customer Name]:SUM( if[Category]="Technology" then [Sales] END)}
2. Calculate the number of customers having bought at least one of the category
For instance, to determine the number of clients having bought at least one furniture, we calculate the count distinct of if [sales in furniture]>0 then [Customer Name] END.
We end up with 3 calculations :
3. Calculate the number of customers for each combinaisons of products ( Furniture F, Office Supplies O, Technology T)
We need to calculate here, the number of customers having bought :
For instance F+O is calculated like this :
if [sales in furniture]>0 and [sales in office supplies]>0 and ISNULL([sales in technologies])
then [Customer Name]
END
When we plot the different calculation we obtain the graph below:
4. Now the last part : the dots :)
To create the legend, I use a dual axis and some again calculations. For instance to come up with the Furniture and Office Supplies line, I calculate the following
if [Category]="Furniture" OR [Category]="Office Supplies"
then 1
END
and place the new calculated field into color. I will hide the null value.
into column : attr(1) attr(1). : one with a round mark, the other with a line mark
into row : category
I then use dual axis and I do the same for the other calculations.... after some formatting I obtain the following view and I just have to put them together.
At the end, the following view, allows us to see that the cross selling work quite well, and that office supplies trigger the main part of the sales. Only 8 clients have bought F and T and not O...
I will suggest definitely " an introduction on how to read it" but once you understand, it is much more easy to read than the potatoes chart... I mean the Venn diagram :)
So I asked my stakeholder why he needed such a visualisation and I try to find an alternative. Don't get me wrong I like Venn Diagram, they are fun and remind me to my math classes but I do not find them very effective to display information
I came across a graph displayed in Github, showing an alternative and I found it very good. I figured it later how it was called an up-set plot. I decided to try to reproduce it in Tableau, because it doesn't need any data preparation, just some calculations. I always prefer not to create a data source to create a specific visualisation, especially at work. why? Because I personnally think that beauty should not overcome functionality. And because having only one datasource always provide more flexibility for dashboard actions or future maintenance.
Disclaimer : the view below can surely be done in a more elegant way :)
I reproduce below the use case I faced at work using Superstore (of course). Let's imagine that your stakeholder wants to know how many customers bought this year any combinason of the product categories; in order to determine the cross sellings and see if we should advertised some products to certain clients to increase the sales.
1. Calculate the Sales per category and client
You need to calculate the sales for each client and category :
Sales for Furniture
{ FIXED [Customer Name]: SUM(if [Category]="Furniture" then [Sales] end) }
Sales for Office Supplies
{ FIXED [Customer Name]: SUM(if [Category]="Office Supplies" then [Sales] end) }
Sales for Technology
{ FIXED [Customer Name]:SUM( if[Category]="Technology" then [Sales] END)}
2. Calculate the number of customers having bought at least one of the category
For instance, to determine the number of clients having bought at least one furniture, we calculate the count distinct of if [sales in furniture]>0 then [Customer Name] END.
We end up with 3 calculations :
- "at least one furniture" =if [sales in furniture]>0 then [Customer Name] END
- "at least one office supplies" =if [sales in office supplies]>0 then [Customer Name] END
- "at least one technology" = if [sales in technology]>0 then [Customer Name] END
3. Calculate the number of customers for each combinaisons of products ( Furniture F, Office Supplies O, Technology T)
We need to calculate here, the number of customers having bought :
- F + T
- F + O
- O + T
- F + T + O
For instance F+O is calculated like this :
if [sales in furniture]>0 and [sales in office supplies]>0 and ISNULL([sales in technologies])
then [Customer Name]
END
When we plot the different calculation we obtain the graph below:
4. Now the last part : the dots :)
To create the legend, I use a dual axis and some again calculations. For instance to come up with the Furniture and Office Supplies line, I calculate the following
if [Category]="Furniture" OR [Category]="Office Supplies"
then 1
END
and place the new calculated field into color. I will hide the null value.
into column : attr(1) attr(1). : one with a round mark, the other with a line mark
into row : category
I then use dual axis and I do the same for the other calculations.... after some formatting I obtain the following view and I just have to put them together.
At the end, the following view, allows us to see that the cross selling work quite well, and that office supplies trigger the main part of the sales. Only 8 clients have bought F and T and not O...
I will suggest definitely " an introduction on how to read it" but once you understand, it is much more easy to read than the potatoes chart... I mean the Venn diagram :)