kdb Products
Overview
kdb+
kdb Insights
kdb Insights Enterprise
Capabilities
The Data Timehouse
kdb+ Time Series Database
PyKX Python Interoperability
Services & Support
Financial Services
Quant Research
Trading Analytics
Industry & IoT
Automotive
Energy & Utilities
Healthcare & Life Sciences
Manufacturing
Telco
Learn
Overview
Featured Courses
KX Academy
KX University Partnerships
Connect
KX Community
Community Events
Developer Blog
Build
Download
Documentation
Support
About Us
Partner with Us
Become a Partner
Find a Partner
Partner Signup
Join Us
Connect with Us
By Declan Fallon
Today, we will work with the charting component of KX Dashboards. Dashboards is an interactive data visualization tool for use by non-technical and power users to easily query, transform and present both static and real-time streaming data in informative and visually appealing formats. The WYSIWYG interface is familiar to any user of desktop software, with basic drag-and-drop, point-and-click operations used to design visualizations.
For a little additional flavour will we also take on board this month’s #SWDChallenge. The #SWDChallenge is an informal monthly exercise in data visualization, run by data specialist author and podcaster, Cole Nussbaumer Knaflic. A topic is announced and participants have a couple of weeks to approach the challenge and tell the story through data visualization. The focus of this month’s challenge is to introduce a visualization tool, so we will look to introduce KX Dashboards.
For the challenge we will look at the popularity of baby names over the various decades, dating back to the 1880s, as made available by the U.S. Social Security Administration website[1]. The website offers the results in a standard list of top 200 names but let’s see if we can glean a little more information from this.
The website offers data for each decade going back to the 1880s, so this will give us a chance to see patterns in naming over time. As a starting point I have aggregated the data into a kdb+ database. To focus the article a little more I’m going to show the top 20 girls and boys names for the current decade.
We are going to start with a Canvas Chart component. Drag this into your Dashboard and add a new Layer. Then we need to create a Data Source.
In my query I have an included a selection variable which will allow me to create subsets from my parent data for each decade by changing this value. These subsets will form the different chart layers added later. I will also set the Max rows variable to 20 so I only return the top 20 results; if you wanted to pull all top 200 results you can leave this value unchanged at the default value. For more information on creating a query inside KX Dashboards see my opening post in this Insights series.
Next step is to configure the Canvas Chart by defining the Axes for what we are charting on the x- and y-axis. We will also make this a horizontal chart, so we also need to enable Transpose.
We have a chart with our first layer showing the top 20 names. We need to add a new layer for the 2000s decade and additional layers for each decade back to 1880. To do this we copy the query, and paste to create a new data source query. We can then change the input value for seldec, subtracting ten for each decade, to generate the data set for the number of babies named in that decade. For example, for the 2000 decade we have:
This will add a new layer to the chart. We also need to stack the bars in the chart, this is done for both the Y- and X-Axes.
We keep on building our layers so we have a data set for our top 20 covering each decade back to 1880. This will give us something like this:
This is a quite a busy chart and is somewhat confusing, so we need to make some changes. First we have to reorder the layers so that the left-most bar is the one for number of babies named in the current decade. This can be done with a right-click-and-drag reordering of the layers; the top layer will be for 1880, moving down in sequence to the current decade, 2010.
Now we can see the semblance of order for the current decade with Emma the most popular name for the current decade and Grace the least, but the chart still remains confusing with the large number of colors.
Next we are going to emphasis the left most bar in our sequence, and make all other decades a grey color.
We can use a view state parameter to assign our color code. This way we don’t have to go back to the color palette each time to select the color, and we can keep the continuity of the color.
For the other decades, we will assign Background Color and Border Color to a Grey (#d3d3d3). This leaves us with:
We now have a clearer picture of the name ranking for the current decade but we need to return the differentiation for each decade. To do this, we will use opacity gradient to emphasis each of the decades from heaviest for the current decade, to lightest, the 1880s. The gradient is an integer (Type: int) between 0 to 100.
So the following gradients were applied for each decade:
Now we have a significantly heavier emphasis on the current decade (with color) and a long tail of data. After 1950s, the grays become very light, so I have removed the gridlines by reducing their opacity to zero as these dominate a little too much.
Now we have a chart looking like this:
But with Dashboards we can add a little more flavor. Next we will add a color to show in previous decades if any of the 20 names appeared in the top 20 for that decade. For this, we will use a Highlight Rule for each data Layer in the chart. The highlight rule will apply a wild card search to each names ranking for that decade and apply the color to that segment of the bar where that name made the top 20. This is our rule:
Now we have something like this:
Using two Text components we can add some flavor text and a title to our chart. We will have a title text box with a gray background (the same as used in the chart)
And a flavor text, where the HTML editor of the Text component is used to change the color of the highlight text to match the color of the bar chart.
Now are our chart image looks like this:
For comparison, here is the one for Top 20 Boys Names. To create this I duplicated the screen, switched the plotted data columns from Girlrank to Boyrank, and then changed the color set in our view state parameter to a blue – this single change will apply across the chart for all instances where the view state parameter is used.
What can information can we draw from these charts?
Names like William, Michael, James, David, Joseph and Daniel have remained popular for boys throughout the decades – although not as popular as they were in earlier decades. Only Elizabeth has maintained a presence in the top 20 for girls.
[1] https://www.ssa.gov/oact/babynames/decades/names2000s.html