The new dynamic data duo: Structured meets unstructured data to win on the generative AI playing field

Mark Palmer
21 March 2024 | 5 minutes

On Wall Street, algorithmic trading has long been the revenue playing field: statistical analysis of micro-market movements helps traders predict how the profitability winds will blow at any millisecond. Profit, or “alpha,” is found by traders who create novel approaches to anticipate those price changes before their competitors win by analyzing the movement and momentum of structured data: numbers representing market prices, trades, and volume.

Today, the rise in technologies that make it cost-effective and easy to process unstructured data creates a new opportunity to gain an analytics edge: the combination of unstructured and structured data. This new source of insight is found in the connections between unstructured and structured data.

A new dynamic data duo, if you will.

Let’s explore this new insight opportunity and how firms can capitalize on it.

Structured data, meet unstructured data

Historically, unstructured data – found in PDF documents, on the web, or images, video and audio – has been explored but unexploited. Today, with the rise of generative AI and LLM technology, analyzing unstructured data creates new opportunities for insight.

In financial services, fusing structured market data with unstructured data like SEC filings, client interactions, analyst reports, social media sentiment, news, and more can reveal a new depth to insights. Combining structured and unstructured data is a revolutionary way to unlock data in ways never done before.

As we see below, unstructured data provides good, general advice about why an investment portfolio might decline in value, citing market volatility, news, currency fluctuations, and interest rate changes as reasons your portfolio might underperform.

But add individualized data from structured data sources, including account numbers, specific investments, their temporal performance, and indexes, and we get true insight (shown at right). We see why my portfolio declined. We see that my portfolio is outperforming its index. We see why my portfolio performed as it did. We see unique generative AI insight.

This dynamic duo of unstructured and structured data leverages three new computing elements.

Real-time unstructured data. Much of today’s structured data, like vital signs emitted from medical devices, are readily available. However, unstructured data for business applications, such as digital versions of doctors’ notes, are not as prevalent. But thanks to the rise in capabilities to analyze conversational data, these capabilities are becoming ubiquitous and cost-effective.

Digitized unstructured data. Thanks to the rise of generative AI, conversational messaging, and intelligent document processing technologies are more prevalent, less expensive, and easier to use than ever before. One area of this is conference call transcription and summarization, both available in tools like Zoom and Otter.ai. These tools emit a new source of digitized unstructured data useful for analysis.

Databases that fuse unstructured and structured data. Generative AI applications also require data management systems to connect and combine unstructured with structured data via vector embeddings, synthetic data sources, and data warehouses full of fused data to help prepare data for analysis. For example, KX’s new KDB.AI offering is designed to generate vector embeddings on unstructured documents and make them available for real-time queries.

The new dynamic duo and the role of LLMs 

This dynamic data duo is not only at work on Wall Street; it’s also being used on Main Street applications. Consider healthcare.  When you visit a hospital, doctors talk to you. That conversation generates unstructured data, with clues hidden inside your responses to the questions. Also, frontline staff takes your vital signs which provide a numerical read on how your body is actually performing.

The art of medicine is a doctor’s ability to connect numerical facts with clues revealed in conversations about how you feel. The National Institute of Health in Singapore implements this system today.  Their system, Endeavor, combines conversations with vital signs in real-time to produce predictive, proactive insights.

For example, below, a machine learning algorithm evaluates unstructured doctor’s notes to identify references to abdominal pain reported by a patient.

Structured data comes from medical devices that monitor patient vital signs, and unstructured data comes from digital versions of doctor notes, patient utterances, medical journals, and research, providing a 360-degree view of insight to help improve care.

This unstructured and structured data is sent in real time to AI algorithms that silently search for and predict the likelihood of dozens of potential ailments and diseases, including eye disease, cardiac abnormalities, pulmonary disease, neurological disorders, septic shock, and oncology.

Predictions are returned to front-line medical staff who can make smarter recommendations. In this case, AI predicts that this patient is 95% likely to have appendicitis.

A new dynamic duo, a new source of insights

Traditionally, the two “data worlds” of unstructured and structured data did not collide. But today, unstructured data is easier and more cost-effective to extract than ever before, which makes it possible for the first time to easily combine with structured data to generate new insights.

This new dynamic duo of data affords new opportunities for insight hidden between conversational data and real-time streaming data, from Wall Street to Main Street. Databases designed to combine structured and unstructured data to unlock new hidden insights are the key arbiters of these data insights.

Related Resources

Demo kdb, the fastest time-series data analytics engine in the cloud








    For information on how we collect and use your data, please see our privacy notice. By clicking “Download Now” you understand and accept the terms of the License Agreement and the Acceptable Use Policy.