Overcoming AI Challenges with KDB.AI 1.1

In 2023, KX launched KDB.AI, a groundbreaking vector database and search engine to empower developers to build the next generation of AI applications for high-speed, time-based, multi-modal workloads. Used in industries such as Financial Services, Telecommunications, Manufacturing and more, KDB.AI is today recognized as the world’s leading vector database solution for enterprise customers.

In our latest update, we’re introducing several new features that will significantly improve vector performance, search reliability, and semantic relevance.

Let’s explore.

Hybrid Search

The first is Hybrid Search, an advanced tool that merges the accuracy of keyword-focused sparse vector search with the contextual comprehension provided by semantic dense vector search.

Sparse vectors predominantly contain zero values. They are created by passing a document through a tokenizer and associating each word with a numerical token. The tokens, along with a tally of their occurrences, are then used to construct a sparse vector for that document. This is incredibly useful for information retrieval and Natural Language Processing Scenarios where specific keyword matching must be highly precise.

Dense vectors in contrast predominantly contain non-zero values and are used to encapsulate the semantic significance, relationships and attributes present within the document. They are often used with deep learning models where the semantic meaning of words is important.

With KDB.AI 1.1, analysts can tweak the relative importance of sparse and dense search results via an alpha parameter, ensuring highly pertinent data retrieval and efficient discovery of unparalleled insight.

Example Use Case

Consider a financial analyst looking for specific information on a company’s performance in order to assess investment risk. The analyst might search for “Company X’s Q3 earnings report” in which a sparse vector search would excel.

However, the analyst might also be interested in the broader context, such as market trends, competitor performance, and economic indicators that could impact Company X’s performance. Dense vector search could be used to find documents that may not contain the exact keywords but are semantically related to the query.

For example, it might find articles discussing a new product launched by a competitor or changes in trade policies affecting Company X’s industry.

With Hybrid Search the analyst is afforded the best of both worlds, and ultimately retrieves a comprehensive set of information to assist with the development of their investment strategy.

Temporal Similarity Search

The second key feature is the introduction of Temporal Similarity Search (TSS), a comprehensive suite of tools for analyzing patterns, trends, and anomalies within time series datasets.

Comprising of two key components, Transformed TSS for highly efficient vector searches across massive time series datasets and Non-Transformed TSS, a solution for near real-time similarity search of fast-moving data, TSS enables developers to extract insights faster than ever before.

Transformed Temporal Similarity Search

Transformed Temporal Similarity Search is our patent-pending compression model designed to dimensionally reduce time-series windows by more than 99%. With Transformed TSS, KDB.AI can compress data points into significantly smaller dimensions whilst maintaining the integrity of the original data’s shape.

It also enables the compression of varying sized windows into a uniform dimensionality, in valuable when working with time series data of different sample rates and window sizes.

By doing so, Transformed TSS significantly reduces memory usage and disk space requirements to minimize computational burden. And with the ability to attach compressed embeddings to prebuilt Approximate Nearest Neighbor (ANN) indexes, developers can expect significant optimization of retrieval operations in large scale embeddings.

Example Use Case

Consider a multinational retail corporation that has been experiencing stagnant growth and is now looking for ways to improve their business strategies.

With Transformed TSS, they can perform detailed analysis of their time series user interaction data, including clicks, views, and engagement times. This allows them to uncover hidden patterns and trends, revealing optimal times and contexts for ad placement.

Applying a similar concept to their retail operations, they can segment purchase history data into time windows, resulting in advanced similarity searches that unveil subtle purchase patterns, seasonal variations, and evolving consumer preferences.

Armed with these insights, the corporation can fine-tune their marketing strategies, optimize stock levels, and predict future buying trends.

Non-Transformed Temporal Similarity Search

Non-Transformed Temporal Similarity Search is a revolutionary algorithm designed for conducting near real-time similarity search with extreme memory efficiency across fast moving time-series data. It provides a precise and efficient method to analyze patterns and trends with no need to embed, extract, or store vectors in the database.

Non-Transformed TSS enables direct similarity search on columnar time-series data without the need to define an Approximate Nearest Neighbor (ANN) search index. Tested on one million vectors, it was able to achieve a memory footprint reduction of 99% percent, and a 17x performance boost over 1K queries.

Non-Transformed TSS Hierarchical Navigable Small Worlds Index
Memory Footprint 18.8MB 2.4GB
Time to Build Index 0s 138s
Time for Single Similarity Search 23ms 1ms (on prebuilt index)
Total Time for Single Search (5 neighbors) 23ms 138s+1ms
Total Time for 1000 searches (5 neighbors) 8s 139s

Example Use Case

Consider a financial organization looking to enhance its fraud detection capabilities and better respond to the increased cadenced and sophistication of attacks. With millions of customers and billions of transaction records, the organization requires a computationally efficient solution that will scale on demand.

With Non-Transformed Temporal Similarity Search the organization can analyze transactions in near real-time, without the need to embed, extract or store incoming records into a database prior to analysis. Inbound transactions are compared against historical patterns in the same account, and those exhibiting a high degree of dissimilarity can be flagged for further investigation.

We hope that you are as excited as we are about the possibilities these enhancements bring to your AI toolkit. You can learn more by checking out our feature articles over on the KDB.AI Learning Hub then try them yourself by signing up for free at KDB.AI

Related Resources

The New Dynamic Data Duo: Structured Meets Unstructured Data to Win on the Generative AI Playing Field

On Wall Street, algorithmic trading has long been the revenue playing field: statistical analysis of micro-market movements helps traders predict how the profitability winds will blow at any millisecond. Profit, or “alpha,” is found by traders who create novel approaches to anticipate those price changes before their competitors win by analyzing the movement and momentum of structured data: numbers representing market prices, trades, and volume.

Today, the rise in technologies that make it cost-effective and easy to process unstructured data creates a new opportunity to gain an analytics edge: the combination of unstructured and structured data. This new source of insight is found in the connections between unstructured and structured data.

A new dynamic data duo, if you will.

Let’s explore this new insight opportunity and how firms can capitalize on it.

Structured data, meet unstructured data

Historically, unstructured data – found in PDF documents, on the web, or images, video and audio – has been explored but unexploited. Today, with the rise of generative AI and LLM technology, analyzing unstructured data creates new opportunities for insight.

In financial services, fusing structured market data with unstructured data like SEC filings, client interactions, analyst reports, social media sentiment, news, and more can reveal a new depth to insights. Combining structured and unstructured data is a revolutionary way to unlock data in ways never done before.

As we see below, unstructured data provides good, general advice about why an investment portfolio might decline in value, citing market volatility, news, currency fluctuations, and interest rate changes as reasons your portfolio might underperform.

But add individualized data from structured data sources, including account numbers, specific investments, their temporal performance, and indexes, and we get true insight (shown at right). We see why my portfolio declined. We see that my portfolio is outperforming its index. We see why my portfolio performed as it did. We see unique generative AI insight.

This dynamic duo of unstructured and structured data leverages three new computing elements.

Real-time unstructured data. Much of today’s structured data, like vital signs emitted from medical devices, are readily available. However, unstructured data for business applications, such as digital versions of doctors’ notes, are not as prevalent. But thanks to the rise in capabilities to analyze conversational data, these capabilities are becoming ubiquitous and cost-effective.

Digitized unstructured data. Thanks to the rise of generative AI, conversational messaging, and intelligent document processing technologies are more prevalent, less expensive, and easier to use than ever before. One area of this is conference call transcription and summarization, both available in tools like Zoom and Otter.ai. These tools emit a new source of digitized unstructured data useful for analysis.

Databases that fuse unstructured and structured data. Generative AI applications also require data management systems to connect and combine unstructured with structured data via vector embeddings, synthetic data sources, and data warehouses full of fused data to help prepare data for analysis. For example, KX’s new KDB.AI offering is designed to generate vector embeddings on unstructured documents and make them available for real-time queries.

The new dynamic duo and the role of LLMs 

This dynamic data duo is not only at work on Wall Street; it’s also being used on Main Street applications. Consider healthcare.  When you visit a hospital, doctors talk to you. That conversation generates unstructured data, with clues hidden inside your responses to the questions. Also, frontline staff takes your vital signs which provide a numerical read on how your body is actually performing.

The art of medicine is a doctor’s ability to connect numerical facts with clues revealed in conversations about how you feel. The National Institute of Health in Singapore implements this system today.  Their system, Endeavor, combines conversations with vital signs in real-time to produce predictive, proactive insights.

For example, below, a machine learning algorithm evaluates unstructured doctor’s notes to identify references to abdominal pain reported by a patient.

Structured data comes from medical devices that monitor patient vital signs, and unstructured data comes from digital versions of doctor notes, patient utterances, medical journals, and research, providing a 360-degree view of insight to help improve care.

This unstructured and structured data is sent in real time to AI algorithms that silently search for and predict the likelihood of dozens of potential ailments and diseases, including eye disease, cardiac abnormalities, pulmonary disease, neurological disorders, septic shock, and oncology.

Predictions are returned to front-line medical staff who can make smarter recommendations. In this case, AI predicts that this patient is 95% likely to have appendicitis.

A new dynamic duo, a new source of insights

Traditionally, the two “data worlds” of unstructured and structured data did not collide. But today, unstructured data is easier and more cost-effective to extract than ever before, which makes it possible for the first time to easily combine with structured data to generate new insights.

This new dynamic duo of data affords new opportunities for insight hidden between conversational data and real-time streaming data, from Wall Street to Main Street. Databases designed to combine structured and unstructured data to unlock new hidden insights are the key arbiters of these data insights.

Related Resources

Seven Innovative Trading Apps and 7 Best Practices You Can Steal

Quant Trading Data Management by the Numbers

Insights from the AI Decision Makers Summit

Insights from the AI Decision Makers Summit

Insights from the AI Decision Makers Summit

11 Insights to Help Quants Break Through Data and Analytics Barriers

Book a Demo

Transforming Enterprise AI with KDB.AI on LangChain

Artificial Intelligence (AI) is transforming every industry and sector, from healthcare to finance, from manufacturing to retail. However, not all AI solutions are created equal. Many of them suffer from limitations such as poor scalability, low accuracy, high latency, and lack of explainability.

That’s why we’re excited to announce the integration of KDB.AI and LangChain, two cutting-edge technologies designed to overcome these challenges and deliver unparalleled capabilities for enterprise AI via a simple and intuitive architecture that doesn’t require complex infrastructure or costly expertise.

In this blog post, I’ll give you a brief overview of each technology, discuss typical use cases, and then show you how to get started. Let’s begin.

What is KDB.AI?

KDB.AI is an enterprise grade vector database and analytics platform that enables real-time processing of both structured and unstructured time-oriented data. It’s based on kdb+, the world’s fastest time-series database, which is widely used by leading financial institutions for high-frequency trading and market data analysis.

With KDB.AI, developers can seamlessly scale from billions to trillions of vectors without performance degradation, thanks to its distributed architecture and efficient compression algorithms. It also supports various data formats, such as text, images, audio, video, and more.

With KDB.AI you can:

To learn more about KDB.AI, visit our documentation site.

 

What is LangChain?

LangChain is an open-source framework designed to simplify the creation of applications powered by language models. At its core, LangChain enables you to “chain” together components, acting as the building blocks for natural language applications such as Chatbots, Virtual Agents and document summarization.

LangChain doesn’t rely on traditional NLP pipelines, such as tokenization, lemmatization, or dependency parsing, instead, it uses vector representations of natural language, such as word embeddings, sentence embeddings, or document embeddings, which capture the semantic and syntactic information of natural language in a compact and universal way.

To learn more about LangChain, visit their documentation site.

 

How KDB.AI and LangChain work together

The integration of KDB.AI and LangChain empowers developers with real-time vector processing capability and state-of-the-art NLP models. This combination opens new possibilities and use cases for enterprise AI, such as:

 

How to get started with KDB.AI and LangChain

If you’re interested in trying out KDB.AI on LangChain, I invite you to follow these simple steps.

  1. Sign up for a free trial of KDB.AI.
  2. Set up your environment and configure pre-requisites.
  3. Work through the sample integration.

We also have some great resources from our evangelism team, including samples over on the KDB.AI learning hub and regular livestreams. And should you have any feedback, questions, or issues a dedicated team over on our Slack community.

Happy Coding!

 

RELATED RESOURCES

The Montauk Diaries – Two Stars Collide

by Steve Wilcockson

 

Two Stars Collide: Thursday at KX CON [23]

 

My favorite line that drew audible gasps at the opening day at the packed KX CON [23]

“I don’t work in q, but beautiful beautiful Python” said Erin Stanton of Virtu Financial simply and eloquently. As the q devotees in the audience chuckled, she qualified her statement further “I’m a data scientist. I love Python.”

The q devotees had their moments later however when Pierre Kovalev of the KX Core Team Developer didn’t show Powerpoint, but 14 rounds of q, interactively swapping characters in his code on the fly to demonstrate key language concepts. The audience lapped up the q show, it was brilliant.

Before I return to how Python and kdb/q stars collide, I’ll note the many announcements during the day, which are covered elsewhere and to which I may return in a later blog. They include:

Also, Kevin Webster of Columbia University and Imperial College highlighted the critical role of kdb in price impact work. He referenced many of my favorite price impact academics, many hailing from the great Capital Fund Management (CFM).

Yet the compelling theme throughout Thursday at KX CON [23] was the remarkable blend of the dedicated, hyper-efficient kdb/q and data science creativity offered up by Python.

Erin’s Story

For me, Erin Stanton’s story was absolutely compelling. Her team at broker Virtu Financial had converted a few years back what seemed to be largely static, formulaic SQL applications into meaningful research applications. The new generation of apps was built with Python, kdb behind the scenes serving up clean, consistent data efficiently and quickly.

“For me as a data scientist, a Python app was like Xmas morning. But the secret sauce was kdb underneath. I want clean data for my Python, and I did not have that problem any more. One example, I had a SQL report that took 8 hours. It takes 5 minutes in Python and kdb.”

The Virtu story shows Python/kdb interoperability. Python allows them to express analytics, most notably machine learning models (random forests had more mentions in 30 minutes than I’ve heard in a year working at KX, which was an utter delight! I’ve missed them). Her team could apply their models to data sets amounting to 75k orders a day, in one case 6 million orders over a 4 months data period, an unusual time horizon but one which covered differing market volatilities for training and key feature extraction. They could specify different, shorter time horizons, apply different decision metrics. ”I never have problems pulling the data.” The result: feature engineering for machine learning models that drives better prediction and greater client value. With this, Virtu Financial have been able to “provide machine learning as a service to the buyside… We give them a feature engineering model set relevant to their situation!,” driven by Python, data served up by kdb.

The Highest Frequency Hedge Fund Story

I won’t name the second speaker, but let’s just say they’re leaders on the high-tech algorithmic buy-side. They want Python to exhibit q-level performance. That way, their technical teams can use Python-grade utilities that can deliver real-time event processing and a wealth of analytics. For them, 80 to 100 nodes could process a breathtaking trillion+ events per day, serviced by a sizeable set of Python-led computational engines.

Overcoming the perceived hurdle of expressive yet challenging q at the hedge fund, PyKX bridges Python to the power of kdb/q. Their traders, quant researchers and software engineers could embed kdb+ capabilities to deliver very acceptable performance for the majority of their (interconnected, graph-node implemented) Python-led use cases. With no need for C++ plug-ins, Python controls the program flow. Behind-the-scenes, the process of conversion between NumPy, pandas, arrow and kdb objects is abstracted away.

This is a really powerful use case from a leader in its field, showing how kdb can be embedded directly into Python applications for real-time, ultra-fast analytics and processing.

Alex’s Story

Alex Donohoe of TD Securities took another angle for his exploration of Python & kdb. For one thing, he worked with over-the-counter products (FX and fixed income primarily) which meant “very dirty data compared to equities.” However, the primary impact was to explore how Python and kdb could drive successful collaboration across his teams, from data scientists and engineers to domain experts, sales teams and IT teams.

Alex’s personal story was fascinating. As a physics graduate, he’d reluctantly picked up kdb in a former life, “can’t I just take this data and stick it somewhere else, e.g., MATLAB?”

He stuck with kdb.

“I grew to love it, the cleanliness of the [q] language,” “very elegant for joins” On joining TD, he was forced to go without and worked with Pandas, but he built his ecosystem in such a way that he could integrate with kdb at a later date, which he and his team indeed did. His journey therefore had gone from “not really liking kdb very much at all to really enjoying it, to missing it”, appreciating its ability to handle difficult maths efficiently, for example “you  do need a lot of compute to look at flow toxicity.” He learnt that Python could offer interesting signals out of the box including non high-frequency signals, was great for plumbing, yet kdb remained unsurpassed for its number crunching.

Having finally introduced kdb to TD, he’s careful to promote it well and wisely. “I want more kdb so I choose to reduce the barriers to entry.” His teams mostly start with Python, but they move into kdb as the problems hit the kdb sweet spot.

On his kdb and Python journey, he noted some interesting, perhaps surprising, findings. “Python data explorers are not good. I can’t see timestamps. I have to copy & paste to Excel, painfully. Frictions add up quickly.”  He felt “kdb data inspection was much better.” From a Java perspective too, he looks forward to mimicking the developmental capabilities of Java when able to use kdb in VS Code.”

Overall, he loved that data engineers, quants and electronic traders could leverage Python, but draw on his kdb developers to further support them. Downstream risk, compliance and sales teams could also more easily derive meaningful insights more quickly, particularly important as they became more data aware wanting to serve themselves.

Thursday at KX CON [23]

The first day of KX CON [23] was brilliant. a great swathe of great announcements, and superb presentations. For me, the highlight was the different stories of how when Python and kdb stars align, magic happens, while the q devotees saw some brilliant q code.

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.