Risk Modeling with Hadoop - dummies

Risk Modeling with Hadoop

By Dirk deRoos

Risk modeling is another major use case that’s energized by Hadoop. You’ll find that it closely matches the use case of fraud detection in that it’s a model-based discipline. The more data you have and the more you can “connect the dots,” the more often your results will yield better risk-prediction models.

The all-encompassing word risk can take on a lot of meanings. For example, customer churn prediction is the risk of a client moving to a competitor; the risk of a loan book relates to the risk of default; risk in health care spans the gamut from outbreak containment to food safety to the probability of reinfection and more.

The financial services sector (FSS) is now investing heavily in Hadoop-based risk modeling. This sector seeks to increase the automation and accuracy of its risk assessment and exposure modeling.

Hadoop offers participants the opportunity to extend the data sets that are used in their risk models to include underutilized sources (or sources that are never utilized), such as e-mail, instant messaging, social media, and interactions with customer service representatives, among other data sources.

Risk models in FSS pop up everywhere. They’re used for customer churn prevention, trade manipulation modeling, corporate risk and exposure analytics, and more.

When a company issues an insurance policy against natural disasters at home, one challenge is clearly seeing how much money is potentially at risk. If the insurer fails to reserve money for possible payouts, regulators will intervene (the insurer doesn’t want that); if the insurer puts too much money into its reserves to pay out future policy claims, they can’t then invest your premium money and make a profit (the insurer doesn’t want that, either).

Some companies are “blind” to the risk they face because they have been unable to run an adequate amount of catastrophic simulations pertaining to variance in wind speed or precipitation rates (among other variables) as they relate to their exposure.

Quite simply, these companies have difficulty stress-testing their risk models. The ability to fold in more data — for example, weather patterns or the ever-changing socioeconomic distribution of their client base — gives them a lot more insight and capability when it comes to building better risk models.

Building and stress-testing risk models like the one just described is an ideal task for Hadoop. These operations are often computationally expensive and, when you’re building a risk model, likely impractical to run against a data warehouse, for these reasons:

  • The warehouse probably isn’t optimized for the kinds of queries issued by the risk model. (Hadoop isn’t bound by the data models used in data warehouses.)

  • A large, ad hoc batch job such as an evolving risk model would add load to the warehouse, influencing existing analytic applications. (Hadoop can assume this workload, freeing up the warehouse for regular business reporting.)

  • More advanced risk models may need to factor in unstructured data, such as raw text. (Hadoop can handle that task efficiently.)