Data Visualization Lab / Nice to Meet You

Adobe Illustrator

Vue, Vue Router, Vuetify, d3.js, JavaScript, Node.js, Docker, Docker Compose, Webpack, AWS

Node.js, Express, AWS RDS MySQL API

My Role in This Project

Position:

Project Lead, Design & Engineering

Design:

Data Visualization, User Interface Design, Information Architecture, User Experience, Concept, Research

Data:

Collection, Cleaning and Processing: R, Node.js, Excel
API design and implementation: Node.js, Express.js, AWS RDS MySQL, SQL

Analysis:

Unsupervised Machine Learning Model - Hierarchical Clustering, Correlation

Engineering:

Front-end: D3.js, VUE, VUE ROUTER, VUETIFY, JavaScript
Back-end & DevOps: Node.js, Express.js, Docker, Docker Compose, Nginx, Webpack, AWS

About Nice to Meet You

Nice to Meet You takes advantage of design, statistics, machine learning, and advanced engineering, to respond with data visualization to current political and economic events and inter-group conflicts happening worldwide. The foundational visualization, called "Who Are You?", analyzes diversity and heterogeneity across 250 countries and territories, paving the way to subsequent visualizations.
...

What happens if we reconstruct the map of the world based on how closely countries resemble each other rather than what their geography is? What happens if we distill countries to their levels of religious, linguistic, and ethnic diversity – the three main dividers of society? Who are our neighbors then? And is there anything we can learn from each other?

At Nice to Meet You, we visualize measures of ethnic, linguistic, religious, and cultural heterogeneity across 250 countries and territories in the world. We intuitively understand that what language we are born speaking, what religion we follow and what the color of our skin is, predetermine a very significant part of our lives and who we are. The goal is to use this intuition to learn more about different societies via these proxy variables and test if the complexity of each country correlates with its economic outcomes, political system, geographic location, religion, or level of moral freedom.

We further employ a clustering algorithm and create a taxonomy that organizes the countries in our world by their shared complexity (heterogeneity vs homogeneity). As a result, a new world emerges, one in which our closest neighbors are not the ones sharing our borders but countries located thousands of miles away. The combined analysis and visualizations lead us to the conclusion that there is ultimately more that binds us than divides us.

About the Data Visualization Lab

During the past year, I have been dedicated to realizing a longstanding vision of mine – creating a data visualization lab in which I can apply my experience and skills in data visualization to promote a data-driven approach to problem solving of some of the worlds most pressing challenges – from the climate of political and inter-group division that is aggressively taking over countries around the world, to environmental policy, and economic uncertainty. This space is called “Nice to Meet You” and it aims to challenge us to restart our way of thinking and approach familiar issues with a new perspective.

Diversity Profile
For each country and territory – 250 in total, we build a diversity profile. Each axis represents one of the diversity measures R (Religious), E (Ethnic), L (Linguistic), C (Cultural), and D (Diversity Score) – the average between religious, ethnic, and linguistic diversity. We then plot the diversity profile against the world average. The graph animates on hover and shows all labels and values – this allows us to keep the interface uncluttered and only show the information when users request it.
Further, we plot the median age against the population size for context and understanding magnitude. The final chart – Religious Following – shows the breakdown of religions within each country.

Context and Trends
Each country has contextual graphs associated with it which allow us to understand better both its demographics and other relevant factors. The project is in beta, and in the current release, the first contextual chart is Religious & Ethnic Trends. Users can switch between absolute and relative scale to easily comprehend both change over time, encoded in the slopes, and magnitude encoded in the height of each bar.

A synchronized tooltip allows users to follow the trends among all groups and interact with the dashboard as a whole rather than having to request labels for each group individually.

Similarity View
We then use an unsupervised machine learning algorithm – hierarchical clustering – to cluster similarly diverse countries together. The algorithm is set to be highly sensitive so countries grouped together are within a small margin of each other.

The criteria for the selection of the number of clusters was a balance between precision – the desire for more clusters and therefore more precise “neighborhoods” and meeting the standard for good models by using common evaluation metrics used to evaluate machine learning models. Namely WSS (Within Sum of Squares) used to evaluate the compactness of the cluster, BSS (Between Sum of Squares) identifying the separation between clusters, and TSS (Total Sum of Squares) which relates the WSS and BSS. The goal was to achieve high R2​ ​and the final clusters have met that goal with an ability to explain between 98-99% of the underlying data depending on which diversity measure the clustering was applied to. Finally, a silhouette coefficient was employed in determining the recommended number of clusters for each of the interval-ratio variables the model was applied to.

Each cluster has a description associated with it, guiding users in both how to interpret the diversity profiles and the clustering model.

Complexity and User Experience
Given the complexity of the data involved, it was crucial to enable users to understand both how the data are collected and measured originally, and how they are employed in the visualization. All diversity measures are on a scale from 0 to 1 reflecting probabilities – a concept requiring context as it is not as universally understood. Descriptions of the variables are embedded within the dropdown menus so users can understand what they are looking at even before making a selection. Further information about each indicator and data sources are also embedded and accessible via interaction with the (?) icon next to each variable.

This was possibly one the more challenging aspects of the information architecture – it was crucial to enable users to easily understand what they are looking at, while maintaining compactness of the UI, and not overwhelming them.

Individual Measures and Summary
In addition to the “Big Picture” that clusters similar countries together based on all diversity measures, users can interact with each individual measure independently. Here we show ethnic diversity. At the top, we have a summary visualization that shows how countries cluster based on ethnic diversity specifically. A number of countries are preselected to aid the user in reading the visualization. A dropdown menu enables them to select and deselect countries and highlight them on the grid. Finally, the dropdown list of countries is sorted identically to the clustering - from high diversity to low diversity, which enhances its utility – just by interacting with it, users gain perspective on how diverse countries are, judging both by their rank number and position in the list. Of course, to maintain user friendliness, the list is searchable and updates in real time as users type in each new letter.

Summary • Diversity Profiles • Shortcuts
Users can hover over the summary visualization to expose more information about each country as well as click on a country and be taken to its, and the other countries in its cluster, diversity profiles. Taking the Unites States as an example, we see that it is closest in ethnic diversity to Nicaragua and Morocco. All three of them have slightly above average ethnic diversity.

The population and median age chart tells us that the US has much higher population than the other two countries – in fact, there are about 10 countries that are so many times more populous than the rest of the world that their population circles are intentionally cut off, to accurately display the magnitude visually. In addition, United States' median age is also higher than Nicaragua's and Morocco's.

Grouping
In addition to the interval-ratio (numeric) variables in the dataset, users can also examine closely the ordinal (descriptive) variables in the dataset. One such variable is political regime – here instead of talking about clusters, we talk about groups. The plotting logic is identical but rather than employing machine learning we simply plot together countries belonging to the same group.

In the next release, users will also be able to answer questions such as why is the US classified as a democracy rather than a full democracy.

Search
A search menu is available for all indicators. It changes dynamically based on which measure users are looking at and reflects the clustering order. Once again this is done to enhance users ability to orient themselves within the clusters and think outside of the geographical box. There is also an alphabetic view as well as regions view which enable users to look at the countries in more traditional ways.

Further, each country within the search menu has a label - “has data” or “no data“ which immediately tells users if we have information for this indicator in the database for the country they are looking for. The purpose of not hiding countries when we don’t have data for this specific indicator is inclusiveness. We wanted as many people as possible to see their country on the list. Upon clicking on a country that has no data users are referred to the alphabetical view which effectively shows the entire database without any filtering. This method allowed us to include many many small islands and territories, and even Antarctica, on the list – typically they are not found in many data visualization projects as there are not enough data for them.

Correlation
There are seven hypothetical correlates to diversity that users can color code by. These are Income Group, Human Development Index, Political Regime, Main Religion, Level of Moral Freedom, Region, and Electoral Democracy.

They potentially are affected by or affect diversity and population size and also interact with each other. For example, the level of moral freedom of a country's citizens in theory should be correlated to the countries political regime and main religion. This theory (hypothesis), and many others, can be easily tested with this project.

For the curious, the answer to the above is that level of moral freedom does correlate strongly with the religion of the majority (main religion). It is also true that the more a regime deviates from democracy, the more likely it is that the country’s citizens will have lower levels of moral freedom. Find out the details here and here

Performance
Despite all of the computational tasks happening at the backend, resource caching and other performance optimizations, a project of this magnitude is very taxing for browsers. The DOM is constantly updated and manipulated. The single page application setup (vue) in itself speeds it up immensely but that is not sufficient. During development, I got to a point where the initial load was taking over 30 seconds – this was not acceptable.

Many projects in data visualization have their own performance limits and optimization needs but this was one of the most challenging and interesting problems I have had to solve. Multiple optimization techniques, resolved bottlenecks, tweaked sorting algorithms, and an infinite scroll later the performance have much improved and the initial load now takes about couple of seconds (+ fetching the data from the DB) depending on the browser and internet speed. There is still more that can and will be done as the project evolves in complexity.

Development and Backend
I have hosted on AWS before but with a much simpler setup – an EC2 instance with mysql installed on the operating system itself (LINUX). However Nice to Meet You has very specific long term needs, so I opted to learn more about and use Docker and Docker Compose, and host it as a multi docker AWS EB environment. Overall it was a great learning experience where servers and proxies are concerned and the above diagram shows the setup (work in progress). I find Docker to be immensely powerful and logical and it greatly facilitated the development experience after the initial investment and steep AWS learning curve. I am looking forward to learning more about it and using it in other projects.

Your JavaScript is disabled.
Please enable your JavaScript or visit: Universe Of Atoms (static content only) for an accessible version of the website.

You are using an outdated browser.
Please upgrade your browser or visit Universe Of Atoms (static content / outdated)