Benchmarking Specialized Databases for High-frequency Data

In the era of big data, industries such as finance and manufacturing generate large volumes of time series data every day. This data, unlike classical relational data, is updated more frequently, with new data points continuously generated. This necessitates the need for specialized time series databases that meet the stringent requirements of read/write throughput and latency.

In a recent paper titled “Benchmarking Specialized Databases for High-frequency Data” researchers Fazl Barez (University of Edinburgh), Paul Bilokon, and Ruijie Xiong (Imperial College London) conducted a comprehensive performance benchmark evaluation of four specialized solutions: ClickHouse, InfluxDB, kdb+, and TimescaleDB.

Setup

Using a personal PC, equipped with an Intel i7-1065G7 CPU and 16GB of RAM, the researchers defined a scenario in which each database would be evaluated to store trade and order book records derived from a 25GB cryptocurrency feed.

Benchmarks were divided into five categories: Trades, Order Book, Complex Queries, Writing, and Storage Efficiency.

The main metric for each benchmark was the query latency, or the amount of time between sending the query to the server and receiving the results. Query latency was derived from the built-in timers of each database solution.

The process was automated via Python, which made a connection with the database hosted on the local machine. All databases saved their data in SSD and adequate memory was made available during execution. It should be noted that each database was tested individually, and results were based on the average result from 10 executions.

Results

As illustrated in the table below, the results of the experiment concluded that kdb+ had the best overall performance among all tested solutions and therefore the most suitable database for financial analysis application of time series data.

kdb+ was able to quickly ingest data in bulk, with good compression ratios for storage efficiency. It also demonstrated low query latency for all benchmarks including read queries, computationally intensive queries, and complex queries.

Benchmark kdb+ InfluxDB TimescaleDB ClickHouse
 Query Latency (ms)
Write perf (day trades\order book data to persistent storage) 33889 324854 53150 765000
Avg trading volume 1 min\Day 81 93 469 272
Find the highest bid price in a week 94 146 3938 962
National Best Bid and Offer for day 34 191 1614 454
Volume Weighted Average Price 1 min\Day 75 12716 334 202
Bid/Ask spread over a day 1386 623732 2182 14056
Mid-quote returns 5 min\Day 113 99 1614 401
Volatility as standard deviation of execution price returns 5 min\Day 51 2009 324 190

 

STAC-M3 Financial Services Benchmarking

Tested to be 29x faster than Cassandra and 38x faster than MongoDB, Kdb+ holds the record for 10 out of 17 benchmarks in the Antuco suite and 9 out of 10 benchmarks for the Kanaga suite in the STAC-M3 benchmark, the industry standard for time series data analytics benchmarking in financial services.

STAC-M3 represents one of the only truly independently audited set of data analytics benchmarks available to the financial services industry and acts as a valuable service to FSI businesses by “cutting through” the proprietary technical marketing benchmarks available from each technology vendor.

 

Typical Kdb+/tick Architecture:

 

The kdb+/tick architecture is designed from the ground up to capture, process, and analyse real-time and historical data.

The superior performance of kdb+ in handling high-frequency data underscores the importance of choosing the right database for time-series analysis. And with our latest updates, we have once again pushed the boundaries of speed, achieving an additional 50% reduction in ingestion times through multithreaded data loading and a fivefold increase in performance when working with high-volume connections.

Read the paper here: [2301.12561] Benchmarking Specialized Databases for High-frequency Data (arxiv.org) then try it yourself by signing up for your free 12-month personal edition of kdb+

 

Related Resources

Seven Innovative Trading Apps and 7 Best Practices You Can Steal

Quant Trading Data Management by the Numbers

KDB+ Data Sheet

Discover kdb+ 4.1’s New Features

With the announcement of kdb+ 4.1, we’ve made significant updates in performance, security, and usability, empowering developers to turbo charge workloads, fortify transmissions and improve storage efficiency. In this blog, we’ll explore these new features and demonstrate how they compare to previous versions.

Let’s begin.

Peach/Parallel Processing Enhancements

“Peach” is an important keyword in kdb+, derived from the combination of “parallel” and “each”. It enables the parallel execution of a function on multiple arguments.
In kdb+ 4.1 the ability to nest “peach” statements now exists. Additionally, it no longer prevents the use of multithreaded primitives within the “peach” operation.

It also introduces a work-stealing algorithm technique in which idle processors can intelligently acquire tasks from busy ones. This marks a departure from the previous method of pre-allocating chunks to each thread, and with it, better utilization of CPU cores, which in our own test have resulted in significant reductions in processing time.


q)\s
8i

/Before: kdb+ 4.0
q)\t (inv peach)peach 2 4 1000 1000#8000000?1.
4406

/After: kdb+ 4.1
q)\t (inv peach)peach 2 4 1000 1000#8000000?1.
2035

 

Network Improvements

For large enterprise deployments and cloud enabled workloads, unlimited network connections offer reliability and robust performance at an unprecedented scale. With kdb+ 4.1, user-defined shared libraries now extend well beyond 1024 file descriptors for event callbacks, and HTTP Persistent Connections elevate efficiency and responsiveness of data interactions, replacing the one-and-done approach previously used to reduce latency and optimize resource utilization.

Multithreaded Data Loading

The CSV load, fixed-width load (0:), and binary load (1:) functionalities are now multithreaded, signifying a significant leap in performance and efficiency, this is particularly beneficial for handling large datasets which in our own tests have resulted in a 50% reduction in ingestion times.

Socket Performance

Socket operations have also been enhanced, resulting in a five-fold increase in throughput when tested against previous versions. With kdb+ 4.1 users can expect a significant boost in overall performance and a tangible difference, when dealing with high volume connections.


q)h:hopen `:tcps://localhost:9999
q)h ".z.w"
1004i

/Before: kdb+ 4.0
q)\ts:10000 h "2+2"
1508 512

/After: kdb+ 4.1
q)\ts:10000 h "2+2"
285 512

 

Enhanced TLS Support and Updated OpenSSL

Enhanced TLS and Updated OpenSSL Support guarantee compliance and safeguard sensitive real-time data exchanges. With kdb 4.1 OpenSSL 1.0, 1.1.x, and 3.x, coupled with dynamic searching for OpenSSL libraries and TCP and UDS encryption, offers a robust solution for industries where data integrity and confidentiality are non-negotiable.

Furthermore, TLS messaging can now be utilized on threads other than the main thread. This allows for secure Inter-Process Communication (IPC) and HTTP operations in multithreaded input queue mode. Additionally, HTTP client requests and one-shot sync messages within secondary threads are facilitated through “peach”.

More Algorithms for At-Rest Compression

With the incorporation of advanced compression algorithms, kdb+ 4.1 ensures maximized storage efficiency without compromising data access speed. This provides a strategic advantage when handling extensive datasets.

New q Language Features

Kdb+ 4.1 introduces several improvements to the q language that will help to streamline your code.

  1. Dictionary Literal Syntax
    With dictionary literals, you can concisely define dictionaries. For instance, for single element dictionaries you no longer need to enlist. Compare

    
    
    q)enlist[`aaa]!enlist 123 / 4.0
    aaa| 123
    
    

    With

    
    
    q)([aaa:123]) / 4.1
    aaa| 123
    
    

    This syntax follows rules consistent with list and table literal syntax.

    
    
    q)([0;1;2]) / implicit key names assigned when none defined
    x | 0
    x1| 1
    x2| 2
    
    q)d:([a:101;b:]);d 102 / missing values create projections
    a| 101
    b| 102
    
    q)d each`AA`BB`CC
    a b
    ------
    101 AA
    101 BB
    101 CC
    
    

     

  2. Pattern Matching
    Assignment has been extended so the left-hand side of the colon (:) can now be a pattern.

    
    
    /Traditional Method
    q)a:1 / atom
    
    /Using new method to assign b as 2 and c as 3
    q)(b;c):(2;3) / list
    
    /Pattern matching on dictionary keys
    q)([four:d]):`one`two`three`four`five!1 2 3 4 5 / dictionary
    
    /Assigning e to be the third column x2
    q)([]x2:e):([]1 2;3 4;5 6) / table
    
    q)a,b,c,d,e
    1 2 3 4 5 6
    
    

    Before assigning any variables, q will ensure left and right values match.

    
    
    q)(1b;;x):(1b;`anything;1 2 3) / empty patterns match anything
    q)x
    
    1 2 3
    
    

    Failure to match will throw an error without assigning.

    
    q)(1b;y):(0b;3 2 1)
    'match
    
    q)y
    'y
    
    

     

  3. Type checking
    While we’re checking patterns, we could also check types.

    
    
    /Checks if surname is a symbol
    q)(surname:`s):`simpson
    
    /Checks if age is a short
    q)(name:`s;age:`h):(`homer;38h)
    
    /Type error triggered as float <> short
    q)(name:`s;age:`h):(`marge;36.5) / d'oh
    'type
    
    q)name,surname / woohoo!
    `homer`simpson
    
    

    And check function parameters too.

    
    
    q)fluxCapacitor:{[(src:`j;dst:`j);speed:`f]$[speed<88;src;dst]}
    
    q)fluxCapacitor[1955 1985]87.9
    1955
    
    q)fluxCapacitor[1955 1985]88
    'type
    
    q)fluxCapacitor[1955 1985]88.1 / Great Scott!
    1985
    
    

     

  4. Filter functions

    We can extend this basic type checking to define our own ‘filter functions’ to run before assignment.

    
    
    / return the value (once we're happy)
    
    q)tempCheck:{$[x<0;' "too cold";x>40;'"too hot" ;x]}
    q)c2f:{[x:tempCheck]32+1.8*x}
    q)c2f -4.5
    'too cold
    
    q)c2f 42.8
    'too hot
    
    q)c2f 20 / just right
    68f
    
    

    We can use filter functions to change the values that we’re assigning.

    
    
    q)(a;b:10+;c:100+):1 2 3
    q)a,b,c
    1 12 103
    
    

    Amend values at depth without assignment.

    
    
    q)addv:{([v:(;:x+;:(x*10)+)]):y}
    q)addv[10;([k:`hello;v:1 2 3])]
    k| `hello
    v| 1 12 103
    
    

    Or even change the types.

    
    
    q)chkVals:{$[any null x:"J"$ "," vs x;'`badData;x]}
    q)sumVals:{[x:chkVals]sum x}
    q)sumVals "1,1,2,3,5,8"
    20
    
    q)sumVals "8,4,2,1,0.5"
    'badData
    
    

 

Happy Coding

We hope you are as excited as we are about the possibilities these enhancements bring to your development toolkit. From parallel processing and network scalability to streamlined data loading, secure transmissions, and optimized storage, our commitment is to empower you with tools that make your coding life more efficient and enjoyable.

We invite you to dive in, explore, and unlock the full potential of kdb+ 4.1. Download our free Personal Edition today.

Happy coding!

Related Resources

11 Insights to Help Quants Break Through Data and Analytics Barriers

Book a Demo

The Montauk Diaries – Two Stars Collide

by Steve Wilcockson

 

Two Stars Collide: Thursday at KX CON [23]

 

My favorite line that drew audible gasps at the opening day at the packed KX CON [23]

“I don’t work in q, but beautiful beautiful Python” said Erin Stanton of Virtu Financial simply and eloquently. As the q devotees in the audience chuckled, she qualified her statement further “I’m a data scientist. I love Python.”

The q devotees had their moments later however when Pierre Kovalev of the KX Core Team Developer didn’t show Powerpoint, but 14 rounds of q, interactively swapping characters in his code on the fly to demonstrate key language concepts. The audience lapped up the q show, it was brilliant.

Before I return to how Python and kdb/q stars collide, I’ll note the many announcements during the day, which are covered elsewhere and to which I may return in a later blog. They include:

Also, Kevin Webster of Columbia University and Imperial College highlighted the critical role of kdb in price impact work. He referenced many of my favorite price impact academics, many hailing from the great Capital Fund Management (CFM).

Yet the compelling theme throughout Thursday at KX CON [23] was the remarkable blend of the dedicated, hyper-efficient kdb/q and data science creativity offered up by Python.

Erin’s Story

For me, Erin Stanton’s story was absolutely compelling. Her team at broker Virtu Financial had converted a few years back what seemed to be largely static, formulaic SQL applications into meaningful research applications. The new generation of apps was built with Python, kdb behind the scenes serving up clean, consistent data efficiently and quickly.

“For me as a data scientist, a Python app was like Xmas morning. But the secret sauce was kdb underneath. I want clean data for my Python, and I did not have that problem any more. One example, I had a SQL report that took 8 hours. It takes 5 minutes in Python and kdb.”

The Virtu story shows Python/kdb interoperability. Python allows them to express analytics, most notably machine learning models (random forests had more mentions in 30 minutes than I’ve heard in a year working at KX, which was an utter delight! I’ve missed them). Her team could apply their models to data sets amounting to 75k orders a day, in one case 6 million orders over a 4 months data period, an unusual time horizon but one which covered differing market volatilities for training and key feature extraction. They could specify different, shorter time horizons, apply different decision metrics. ”I never have problems pulling the data.” The result: feature engineering for machine learning models that drives better prediction and greater client value. With this, Virtu Financial have been able to “provide machine learning as a service to the buyside… We give them a feature engineering model set relevant to their situation!,” driven by Python, data served up by kdb.

The Highest Frequency Hedge Fund Story

I won’t name the second speaker, but let’s just say they’re leaders on the high-tech algorithmic buy-side. They want Python to exhibit q-level performance. That way, their technical teams can use Python-grade utilities that can deliver real-time event processing and a wealth of analytics. For them, 80 to 100 nodes could process a breathtaking trillion+ events per day, serviced by a sizeable set of Python-led computational engines.

Overcoming the perceived hurdle of expressive yet challenging q at the hedge fund, PyKX bridges Python to the power of kdb/q. Their traders, quant researchers and software engineers could embed kdb+ capabilities to deliver very acceptable performance for the majority of their (interconnected, graph-node implemented) Python-led use cases. With no need for C++ plug-ins, Python controls the program flow. Behind-the-scenes, the process of conversion between NumPy, pandas, arrow and kdb objects is abstracted away.

This is a really powerful use case from a leader in its field, showing how kdb can be embedded directly into Python applications for real-time, ultra-fast analytics and processing.

Alex’s Story

Alex Donohoe of TD Securities took another angle for his exploration of Python & kdb. For one thing, he worked with over-the-counter products (FX and fixed income primarily) which meant “very dirty data compared to equities.” However, the primary impact was to explore how Python and kdb could drive successful collaboration across his teams, from data scientists and engineers to domain experts, sales teams and IT teams.

Alex’s personal story was fascinating. As a physics graduate, he’d reluctantly picked up kdb in a former life, “can’t I just take this data and stick it somewhere else, e.g., MATLAB?”

He stuck with kdb.

“I grew to love it, the cleanliness of the [q] language,” “very elegant for joins” On joining TD, he was forced to go without and worked with Pandas, but he built his ecosystem in such a way that he could integrate with kdb at a later date, which he and his team indeed did. His journey therefore had gone from “not really liking kdb very much at all to really enjoying it, to missing it”, appreciating its ability to handle difficult maths efficiently, for example “you  do need a lot of compute to look at flow toxicity.” He learnt that Python could offer interesting signals out of the box including non high-frequency signals, was great for plumbing, yet kdb remained unsurpassed for its number crunching.

Having finally introduced kdb to TD, he’s careful to promote it well and wisely. “I want more kdb so I choose to reduce the barriers to entry.” His teams mostly start with Python, but they move into kdb as the problems hit the kdb sweet spot.

On his kdb and Python journey, he noted some interesting, perhaps surprising, findings. “Python data explorers are not good. I can’t see timestamps. I have to copy & paste to Excel, painfully. Frictions add up quickly.”  He felt “kdb data inspection was much better.” From a Java perspective too, he looks forward to mimicking the developmental capabilities of Java when able to use kdb in VS Code.”

Overall, he loved that data engineers, quants and electronic traders could leverage Python, but draw on his kdb developers to further support them. Downstream risk, compliance and sales teams could also more easily derive meaningful insights more quickly, particularly important as they became more data aware wanting to serve themselves.

Thursday at KX CON [23]

The first day of KX CON [23] was brilliant. a great swathe of great announcements, and superb presentations. For me, the highlight was the different stories of how when Python and kdb stars align, magic happens, while the q devotees saw some brilliant q code.

kdb+ Personal Edition Download

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.