Co-author: Christopher Peel
Did you know that in a typical week, an investment bank’s traders will send millions of messages to each other and third parties? Whilst banks have a duty to mitigate the risk of, and detect, unlawful behavior by traders, identifying this through communication data can be a herculean task given the volume of information and the limited ability of existing systems to automatically flag suspicious behavior.
Activities that investigators are looking for include the disclosure of sensitive information on clients, collusion with competitors, or other attempts to manipulate markets. There is a direct financial incentive to do this as well as protecting their brand. In just one anti-trust case in 2019, seven banks were collectively fined £2.15 billion by regulators.
In this post, HCL’s European Data Science Team will outline how they implemented a novel, cutting-edge approach to communications surveillance that overcomes the most common challenges by applying machine learning algorithm.
Current systems for flagging suspicious messages to investigation teams are ‘lexicon’ based. If a word from a predefined list is present in the message then the investigators are alerted. These alerts have a false-positive rate of more than 99.9% and require huge amounts of investigative resource to trawl through and uncover instances of behavior that are of interest. This performance is not surprising. Context is critical in determining whether a set of words is suspicious or not, but lexicon-based systems don’t consider context at all.
Banks have experimented with more complex approaches, including deep-learning classification algorithms, however these have proved problematic for other reasons.
These include the:
- Difficulty or impossibility of gathering enough labelled data for training
- Resource intensive nature of training and maintenance
- Limited ability to explain why specific messages are flagged as suspicious
- Inability to identify new, undesirable trader behavior without explicitly training the model
- Necessity of retraining the model before new data sources can be used
- Large upfront resource commitment before any results can be produced, with no guarantee that they would be useful
A communications surveillance solution was required that could increase efficiency and efficacy of the investigations team without suffering from the above issues.
We proposed to create a self-organizing, explainable, and holistic surveillance system that can be used to detect suspicious and anomalous behavior in the surveillance system. This would be built in a modular and scalable fashion so that new data sources and functionality could be easily added later. To achieve this objective, the team worked using open-source technologies, coding the system in Python and focusing on one data source initially (Bloomberg chat data).
From a data science perspective, this solution utilized natural language processing (NLP) for data harvesting, network graph theory, and custom-built algorithms for behavior profiling and clustering, and finally interactive visualization capabilities.
The system consisted of four main components: the data harvester, the self-organizing holistic surveillance system graph, the associative memory fabric, and an interactive visualization – these are explained in more detail below.
The Data Harvester Service
First, we had to ingest and parse the Bloomberg chat data. The Data Harvester performed the following steps:
- Ingesting and parsing the chat data
- Basic cleaning of conversation contents by removing punctuation and non-relevant special characters
- Fixing spelling mistakes, typos and ‘texting’ language (e.g. b4 = before)
- Removing ‘stop words’, such as ‘and’, ‘the’, etc. This is because they are extremely common words that contain little important information
- Replacing acronyms and trader slang with consistent terms
Finally, the critical information was extracted from the cleaned messages, including conversation participants, financial jargon terms, institutions, time-based information, geographic locations, and financial instruments.
The Holistic Surveillance Graph
The business concepts captured by the Data Harvester were represented as nodes and relationships in a network graph.
Figure 2: Illustration of the nodes and relationships in a graph format
The importance of terms within a conversation are captured as node and relationship weights in the Knowledge Graph. The weights are calculated using machine learning algorithm based on Text Rank (which is a variation of the Page Rank algorithm originally used by Google to rank web pages).
The Associative Memory Fabric
With the conversation data now in the Knowledge Graph, custom-built machine-learning algorithms, based on the concept of Growing Neural Gas with some modifications, were used to generalize and cluster normal behavior and detect anomalies (new behavior) in the data. Neural Gas is an artificial neural network and the name refers to the way its data representations (“neurons”) distribute themselves in the data space.
As new conversations are fed into the machine, their representation is associated with a ‘nearest neighbor’ neuron (hence the term “associative memory”) or, if an existing neuron is not close enough, a new one gets created. Through a process known as Competitive Hebbian Learning (CHL) the neurons in the data space get updated when new data is seen. CHL ensures that only the best matching neuron and its direct neighbors get updated, rather than the whole network, ensuring computational efficiency.
Once the system has ingested some base data, a representation of normal behaviors emerges. Thus, if some new data does in fact result in a new neuron being created, we have detected an anomaly!
Although an anomaly may not represent suspicious behavior (it is simply different from what has been seen before), if an investigator does deem the behavior to be of interest, the neuron representing it can be labelled as suspicious. Any future behavior which gets associated with this neuron can then be automatically flagged as suspicious. In this way, the alert incrementally improves over time.
Crucially, this approach does not require any ‘labelled’ data (a set of messages that were labelled as suspicious or not suspicious) up front, thus overcoming one of the biggest obstacles to implementing machine learning in this context.
In practice, multiple associative memory networks were created to represent generalizations of different types of data (topics, financial terms etc.). These were combined to create broader generalizations of data and this is where the term Associative Memory Fabric comes from; it describes the entire set of neural networks in the system.
To enable business users to understand what was captured in the knowledge graph and to explore its contents, we needed a visualization tool. As the front end of the graph database we were using was not flexible enough, and visualization tools such as Tableau do not support network graphs, we built our own in Python.
The backend of the application handled the communication with the database and parsing the results of queries, whilst the front end gave an attractive, easy way for users to explore the millions of data points and their relationships.
Results and Next Steps
While implementing this system for a client, we successfully demonstrated its ability to detect anomalous behavior in a meaningful, explainable way and have secured considerable interest from senior stakeholders in putting the system into production.
There are three clear ways this system will provide value to organizations in the banking and finance domain:
- Increased likelihood of suspicious behavior being detected which could otherwise damage the brand and lead to financial penalties
- Increased efficiency of the investigation teams, meaning that an increase in transaction volume does not necessitate a proportional increase in team size
- Potential fines by regulators could be mitigated by demonstrating investment in cutting-edge tools and technology, which could save tens of millions of pounds
The next steps for this project will be to:
- Include more communication sources such as email
- Use the system’s predictive capability
- Implement advanced search capability (e.g. find instances where a person in role type A has talked about topic B with organization types C at time D) so that specific anti-trust scenarios can be sought out proactively
We have proven the potential of this technology to significantly enhance the alerting of communications, which considers not just particular words, but also their meaning and the context in which they were said.
While the capabilities we have demonstrated so far are valuable, it is only a small part of what this system could do. The ever-improving automated alerting, together with surfacing the predictive and advanced search capabilities, would truly transform the client’s surveillance and investigation capabilities.
This technology is applicable to other domain and context and is not limited to text data either; it can be used to cluster, classify, and predict numeric data and can replace and improve on many existing applications of machine learning where greater transparency and flexibility are required.