KX in the Public Cloud: Autoscaling using kdb+

By Rebecca Kelly

As acceptance of enterprise migration to the cloud increases, many KX financial services customers are evaluating moving data, applications and other aspects of their businesses to the public cloud. KX has been preparing for this shift for years, and is in a position to offer kdb+ on the Google Cloud Platform (GCP) and the Amazon Web Services platform (AWS), with more to come.

For programmers considering moving data and applications to the cloud, there are a number of considerations for maximizing the performance of kdb+ you should be aware of. In this blog, I will give a basic example of autoscaling code for cloud installations using kdb+. You can also refer to a live demo of me demonstrating how this code works at a recent KX Meetup in New York City . Since presenting this live demo in February 2019, the code has been enhanced to include both AWS and GCP, and is available in my GitHub repository.

For most programmers, using the cloud is all about data at scale. You want to know how to scale in the most efficient way possible, and how to make best use of the resources you have available. The latest version of kdb+ has a number of features you may not be familiar with that I’d like to highlight, as they will enable you to get more from kdb+ on the cloud.

Deferred response (-30!)

One of the most significant recent changes in kdb+ v3.6 that is relevant to the cloud is deferred response. The reason this capability is important for cloud usage is that it enables applications to process multiple client synchronous requests without using multi-threading. It allows client programs to have asynchronous conversations while the server can be programmed in an easy-to-use synchronous fashion without creating blockages.

Dynamic slave assignment (s)

Another recent change that enables easy scaling in kdb+ is the ability of users to change the number of slaves after initialization. When running a kdb+ instance, the number of slave threads can be adjusted dynamically up to the maximum specified on the command line. This is particularly handy when using the kdb+ on-demand offering, which is charged on a per-core per-minute basis. This offers you the flexibility to start a process with 20 slaves, and instantly scale down to one, only expanding as needed, based upon workflows.

Autoscaling with kdb+

To take full advantage of the cost efficiencies of the cloud, it is best if your application scales along with your workload. To do this, you need to understand what ‘load’ means to you. Load can mean the CPU load on your system; or it can mean offering high availability to your users; or, you may have other load considerations. In all cases, you want to be able to access your cloud infrastructure in efficient bursts of activity, so that you are only paying for the compute you use.

My live demonstration at the KX Meetup, shows examples of how quickly kdb+ instances can spin up and spin down on the cloud, dynamically depending on user load.

Coded Autoscaling Example

A basic example, provided in my GitHub, assumes a setup where there is an initial gateway (lb_gw) which routes queries to connected HDB instances, and also monitors the load on these HDBs in order to start new instances as load increases.

In this system, load is defined as when an end-user’s query will experience a wait time in excess of 100ms. Incoming queries are assumed to take 50ms on average to execute, and so when there are more than two queries queued against a connected HDB process, a new instance is started to deal with any subsequent queries.

This can be extended from the assumption of constant time queries through benchmarking of API’s in order to understand how they scale for various parameter inputs and thus provide a lookup of expected execution times that can be used instead. A more advanced example may look at predicting query execution times based on historical behavior and scaling to accommodate potential demand.

Please refer to my GitHub for complete code and setup, and send me feedback to rebecca@devweb.kx.com.

Check back for future articles about KX on the Public Cloud!

KX in the Public Cloud: Autoscaling using kdb+

Demo kdb, the fastest time-series data analytics engine in the cloud