Data Quality : Why Most Models For Predicting Financial Crimes Fail?

Excerpts of Interview : Liudmyla Glashchenko, Digital Head in conversation with Abhishek Gupta – Managing Director of Effiya Technologies.

This is the first interview published as part of ‘Effiya Insights’ series; conversations on building models for predicting financial crimes, managing data quality & governance and related financial compliance product landscape.



Why Most Models For Predicting Financial Crimes Fail?

One of the biggest challenges that happens with the models are three things. The first one is the data itself. Majority of the time the senior management gets the vision of how the models will help them in the business. The fact they do not understand, which we often need to make them aware of, is that models are always ‘Garbage In, Garbage Out’.


Data Quality is the first requirement for a financial model

So if we really do not focus on the quality of the data which is supposed to come in, for example – there are a lot of banks when asked, can you help me to tap the transactions that are risky, which eventually led to STR filing? – they are not able to as majority of the banks do not have this information. In half the cases, classification or tagging of basic data is wrong. So having data quality issues is bad, and it becomes one of the biggest challenges for a good model output.

“Models are always Garbage In, Garbage Out. So if we do not focus on the quality of data which is supposed to come in, we are at a loss.”


Proper classification of events is important

The second challenge possibly is the definition of how you are classifying your events. For example – people define either a filed suspicious transaction as an STR / event, and many a time people have defined case creation as an event. Now, case creation is definitely a deteriorated case, but it does not a prove with evidence that somebody has committed a financial crime. Most of the time depending on the data availability, you mix these kind of events and the predictive power of the model is poor.


Proficiency of the modeller impacts quality of output

The third is the quality of the modellers themselves. Modelling is 80% science and 20% art. When you really do not understand the mechanics of how one particular variable coded in a particular variant influences the outcome, many a times the modellers can make mistakes while developing models like these. So these are the three key reasons, why we believe, models do not perform as per the expectations.

“Modelling is 80% science and 20% art. Keen understanding of business situations determines the predictive power of the model.”


If bank does not have a good quality data, what can they possibly do, do they still have a chance?

They do. The first thing could be that the back end data is poor, or sometimes it’s just the way it had been collated by the IT in the traditional AML system. Many a times you can ignore the tagging and categorisation of the traditional AML system, start collecting and correlating the data from the source system, in a particular manner you require.  Nobody wants to do that exercise, and IT always pushes their core AML solution data to you for modelling.

The second is when you do not have the required data in the front end. You will have to collect the data which will take time.

But in my experience, the focus should primarily be about the right codification of the data from the source system, rather than pushing the AML data warehouse to the modelling team, to develop a model.




Does Effiya Technologies have a solution for data quality and data management for organisations without a proper system?

There is a data governance product that Effiya Technologies has developed, which is going to help  both customers setting up a data repository for the first time, and also customers with an ongoing data quality assessment.

It has all the various features that data governance should have, and it can very well control and steer the data to different groups or departments. There are business rules you can put in place to asses and enhance the quality of data. So whatever systematically can be done, is possible with this setup. What is not available even in the source system, or what is not pushed into the system, cannot be helped. That is something which you need to look at in your process.


Leave a Reply

Your email address will not be published. Required fields are marked *