Data Management in Surveillance, and Beyond

26 January 2021 |
Notice: Undefined variable: post in /var/www/vhosts/kx.com/devweb.kx.com/wp-content/themes/eprefix-bootstrap/functions.php on line 18

Notice: Trying to get property 'ID' of non-object in /var/www/vhosts/kx.com/devweb.kx.com/wp-content/themes/eprefix-bootstrap/functions.php on line 18
7 minutes

by James Corcoran, CTO Enterprise Solutions.

One of the challenges in setting out a product strategy is striking the balance between what is wanted and what is needed, or more correctly, will be needed. It’s not just about addressing problems, it’s about anticipating them. The unfortunate consequence is that, almost by definition, a roadmap can not be validated at the outset. It is only when the envisaged need arises and the solution is already present that it is proven to have been correct. 

It was refreshing, therefore, to be on an A-Team panel recently with Nick Maslavets, Global Head of Surveillance Design Methodology and Analytics at Citadel and Ilija Zovko, Senior Researcher, Aspect Capital where the vision we have been evolving for our streaming analytics platform was confirmed by their insights and a survey of peers and practitioners in the market.

The theme discussed by the panel was data management in the context of surveillance, and while some of the points were specific to the discipline the overall thrust was that as data pervades the organization, it should be viewed as an enterprise asset rather than through the lens of individual domain-specific requirements. The consequence of not doing so is blatantly clear in too many organizations: the dreaded data silos they struggle to eliminate.

The strategy for our streaming analytics platform had three main pillars for supporting current and future needs:

  • Combining both real-time streaming and historical data in decision making and developing insights 
  • Being Cloud-led but not Cloud-only, i.e. embracing the cloud but accommodating on-premises too
  • Building scalability for growth in volume, usage, and scope

To the first point, surveillance is a great example of the value of combining both real-time and historical data to enable informed decision making. In the realm of algo trading, for example, early detection of a malfunctioning algorithm is critical in limiting its damage. The same applies in detecting market abuse in techniques like quote stuffing or spoofing. In other cases, however, the abuse is not revealed in any individual update but rather in a pattern of, say, Momentum Ignition that can only be detected by coupling with historical data too. Even beyond detection, access to historical data, and to a very granular level, is crucial for investigations or recreating a book to examine how or why a certain set of events unfolded.

The importance of Cloud adoption is no longer debated but the legacy of on-premises computing is too large, and too ingrained, in many operations to enable total and instant migration. There may be legal barriers to moving certain data, or perhaps performance considerations in certain ultra-low latency trading activities, as mentioned by Ilija, where the need for proximity predominates. But the longer-term benefits of the cloud in terms of flexibility through microservices and further hardware agnosticism via serverless computing will become increasingly essential.  So, we require a cloud approach that addresses the hybrid of remote and local, yet makes the location, access and processing of data invisible to users. On a technical side, that requires smart routing of queries and the adoption of cloud-native technologies using best practices across areas like security, authentication and load balancing in order to present the data fabric as a continuum, rather than a discrete set of entities.

On the scalability side, data is continuing to grow. Market intelligence firm IDC predicts that global data creation will grow to an enormous 163 zettabytes (ZB) by 2025, a figure ten times higher than the amount of data produced in 2017. With it will grow usage as new analytics, derivations and reports are developed. Thinking about scalability only when capacity is stretched is too late. It needs to be factored in up-front. That’s one of the innate benefits of using KX, where scalability is a proven design feature of the underlying technology offering both vertical and horizontal scalability with supporting failover and fault-tolerance capabilities. 

Making it Usable

Aside from the technical challenges are the operational and usage challenges. As both Ilijai and Nick pointed out, their overarching objective is to democratize data and make it easy and efficient to use. Their role is not to tame data, it is to ask questions of it and to extract value from it.  For that, they need the ability to view, filter and analyze it in flexible ways knowing that it is complete, accurate and consistent across the enterprise. It is the role of the platform to make that easy to do.

Governance, ironically, is the solution, and it applies in three areas: integration, processing and interaction. 

On the integration side, all data should be captured in a common ingestion layer to ensure consistency, avoid duplication and provide first-line capabilities for ensuring data quality and integrity. Moreover, it should be able to quickly and easily introduce new sources and indeed new formats of data. Remember, our digital transformation is still in its relative infancy and decision making will come to depend on many additional factors, not all of which may currently be in digital form. We have seen already how social media data, for example, can inform surveillance investigations. We can but wonder what new ones will prevail.

On the processing side, the goal should be to have data cleansed once, and once only, as close to the source as possible so that multiple consumers can benefit from the validated pre-processing and enterprise-wide consistency imposed. Where possible, and for the same reasons, value-added analytics should be similarly applied centrally but accommodation must be made for proprietary or exploratory analytics especially in areas of R&D or short-window opportunities. User-Defined Functions and sandbox are examples of how this can be accommodated.

At the consumer side, an interface layer should control who can access the data but should also make it easy for them to do so. Querying and filtering data should be made easy via pre-canned views but support for more complex queries should be supported using either scripting or interactive Dashboards for instant visualization. As mentioned previously, the required data may reside on-premises or in the cloud, and over time its location may change so APIs and supporting gateways form a robust mechanism for abstracting from the query both the location of the data and its representation in the data model. This approach greatly reduces the risk and maintenance overhead of regular upgrades.

The Value of a Platform

The approach can be summarized as a combination of tightly integrated but loosely coupled processing that makes the streaming analytics platform a horizontal capability across the organization. This was well illustrated recently by one of our clients whose initial implementation of our platform was to support a one-location, single-asset use case. The application subsequently expanded to a global, multi-asset implementation – all based on the same initial platform. This is why I refer to “and beyond” in the title of this blog. A platform’s remit should not be limited by the applications it serves, it should be an enterprise-wide asset that can serve many others.

Nick made an interesting observation at the start of our panel discussion where he noted that decision making has always depended on having access to data. The difference today is that there is simply too much data from any individual to process intuitively and without support. Insights are no longer visible, they must be mined and they must be mined quickly before either a problem occurs or an opportunity passes. For that, you need a robust streaming analytics platform. 

Please visit this link if you would like to listen to the discussion in full 

 

Demo kdb, the fastest time-series data analytics engine in the cloud








    For information on how we collect and use your data, please see our privacy notice. By clicking “Download Now” you understand and accept the terms of the License Agreement and the Acceptable Use Policy.