Insurance data analytics: some case studies of advanced algorithms and applications

INTRODUCTION
Christian ROBERT

Data analysis has always been at the very heart of the business of insurance companies, which is strongly rooted in data processing and statistical analysis. Historically, data has been collected and used to help with underwriting decisions, policies pricing, claims settlements and fraud prevention. Insurance companies have long sought increasingly granular data sets for their predictive models, so the relevance of the use of Machine Learning (ML) and Artificial Intelligence (AI) to the industry was no surprise. Nevertheless, the insurance sector is also often considered as one of the old-fashioned and change-resistant sectors of the economy, which is why it has often been questioned whether AI would ultimately have such a significant impact. The pace of technological transformation and changes in consumer behavior have heralded a new wave of competition from technology companies that many insurance companies have found threatening. The collection of new types of data (e.g. unstructured data such as reports, emails, images, contracts) and the emergence of new algorithms have created new ways to significantly disrupt the industry. Insurance companies had to react!

Some large insurance companies have invested heavily in AI, but most insurers have moved forward in a measured way, not always knowing how best to deploy these technologies. Insurers first examined every aspect of their organization to determine the best way to deploy AI. Rather than thinking about AI as a technology issue, the effort began with an assessment of business needs and opportunities where AI could generate business value. For each opportunity, insurers had to identify the data needed to take advantage of AI, including data from external sources. As a result, most insurers have had to implement stronger data governance to ensure that they have access to accurate, timely, reliable, and regulatory compliant data. They then had to develop expertise in AI. New AI skills were required, which necessitated hiring additional talent, partnering with third-party experts and partnering with, or even acquiring, technology startups. Disciplined experimentation had to be encouraged. Insurers had to develop a tolerance for experimentation while combining it with rigorous measurement of return on investment so that failures could be quickly eliminated and successful pilots could be fully implemented. Finally, the dangers of misused AI had to be considered. Applications that make inappropriate or biased decisions can cause significant damage as a result of faulty decisions, but can also lead to reputational damage and a loss of confidence in the business that would result in a loss of value. Insurers also needed to establish rigorous AI procedures to ensure that their applications are designed to work properly, even if they need to evolve and adapt over time.

In its first period from 2010 to 2015, the « Management de la Modélisation » research chair focused on the use of mathematical models in the human decision-making processes by insurance companies to enable them to adapt to a changing world. In its second period from 2015 to 2020, this research chair investigated the reasoned use of decision models and the control of algorithmic ruptures induced by the emergence of AI. The chair was renamed « Data Analytics & Models for Insurance » (DAMI) to support a research program focused on understanding the challenges of AI for insurance companies, policyholders and regulators.

Through this book, the chair wanted to bring together a number of contributions from its researchers and from researchers who have contributed to its life and events, but also from outside personalities whose experience and knowledge are indispensable. This book is divided into two parts.

The first part focuses on some applications and case studies of algorithms from ML and AI by insurance companies. It proposes 8 chapters in the fields of actuarial science, knowledge extraction (from reinsurance contracts or reports) and monitoring of partner networks or customer requests.

In Chapter 1, Quentin Guibert, Pierrick Piette and Frédéric Planchet propose two applications of ML techniques for mortality modelling and the construction of prospective life tables. The first approach considers Random Vector Functional Link (RFVL) Neural Networks which are typically used for multivariate time series projections. However, the calibration of such models on mortality data is delicate. A second approach considers a Vector-Autoregressive-Elastic-Net (VAR-ENET) model of differentiated log-mortality to produce predictions for different populations. This approach gives very interesting results with little effort for consistent predictions of mortality, regardless of the characteristics of the populations.

In Chapter 2, Alexandre Boumezoued and Christian Robert study the potential of ML tools for predicting individual reserves (i.e. loss by loss). They consider the stochastic modelling of individual claims occurrence and development based on Marked Point Processes. They investigate individual claims models, present a unified framework, and explain how these models can complement aggregate triangle-based models. The authors present both parametric and nonparametric learning algorithm approaches for individual claims data and discuss the advantages and drawbacks of each approach.

Cyber-risk is an increasingly present threat to our modern societies which are highly dependent on the digital economy and the Internet. Chapter 3 looks at the financial losses caused by cyber-attacks from a public database. In this chapter, Sébastien Farkas, Olivier Lopez and Maud Thomas show how treebased machine learning techniques can be used to improve the knowledge of risk by identifying groups of events that behave in a similar way in terms of severity. The approach combines the CART algorithm with tools from extreme value theory.

Investors’ choices for financial investments have seen the importance of extra-financial criteria for companies increase significantly in recent years. These extra-financial criteria, frequently referred to by the initials of the three pillars (E, S, G for Environment, Social and Governance) aim to quantify the positive or negative effects of a company’s activity on the human ecosystem. In Chapter 4, Christophe Geissler and Vincent Margot explain that the multiplicity of these extra-financial criteria nevertheless makes their integration into an investment process rather complex. They present a self-learning and interpretable strategy based on an ML algorithm that they have developed in order to simultaneously improve the extra-financial and possibly financial performance of an investment strategy in the medium term.

Maximilien Baudry has completed a PhD thesis in AI as part of the DAMI research chair. He worked on several innovative topics and was particularly interested in the possibility of generating time series using deep learning networks. The interest of such an approach is to be able to generate time series from one realization without fixing any prior model structure. This was very challenging since deep learning works on time series analysis have shown strong results on forecasting, but very little has been done on generation. In Chapter 5 Maximilien designs an autoregressive Implicit Quantile Network (AIQN) for time series, and show that such a network is able to efficiently learn the fundamental time dependencies of the underlying stochastic process. His work may be used in the future to generate economic scenarios that are needed for best estimate calculations of insurers’ liabilities.

Natural Language Processing is a set of AI algorithms that allows computers to interpret human languages, structure texts and draw conclusions based on textual information. In Chapter 6, Aurélien Couloumy and Roman Castagné take us through a complete description of the set of tasks to follow in order to succeed in insurance applications of these algorithms: from preprocessing and content extraction to the construction of specific architectures, through the choice of a good embedding, while adapting to the specificities of the insurance and reinsurance sectors.

In Chapter 7, Auriol Wabo, Frédéric Planchet and Maxence de Lussac seek to measure the impact of a network of experts on the cost of an automobile claim. To do so, they propose a method and algorithms to quantify the effect of a qualitative explanatory variable on a binary response (expert A or expert B) which is more flexible than the simple coefficient of a multiplicative GLM model. Their approach provides a measure that does not require the proportionality assumption of a GLM and is completely decorrelated from the effect of the other explanatory variables included in the model.

In Chapter 8, Patrick Laub, Nicole El Karoui, Stéphane Loisel and Yahia Salhi question the ability of ML algorithms to detect breaks in time series with seasonality as quickly as possible when compared to the CUSUM algorithm which is the most relevant from a theoretical point of view. They study the problem of sufficiently rapid detection of cases where a call center will need additional staff in the near future with a high probability. They illustrate their conclusions on real data provided by a French insurer.

The second part of this book discusses the dangers and limits of using complex algorithms derived from AI and the need for regulation. It is composed of six chapters that focus on understanding the specificities of the insurance sector regarding the use of AI, the need for tools to inform the decisions made by these algorithms, the perception of decision-makers and regulatory authorities of the dangers and opportunities of AI. This part ends with a reflection on the use of personal health data and the risks of re-identifying databases that are poorly anonymized.

In Chapter 9, Arthur Charpentier and Michel Denuit discuss the limits of applying learning algorithms in insurance. For example, they ask whether the search for so-called discriminating variables for a model, although economically justified, does not lead to potentially discriminatory practices, which is, on the contrary, morally, and legally, reprehensible. They also address the problem of segmentation taken to its extreme using sophisticated algorithms, as well as the problems of bias and fairness that can arise with these algorithms.

Marc Juillard and Yolan Honore-Rouge discuss in Chapter 10 the problem of interpretability of machine learning models, with a focus on the insurance sector. Their ambition is not to be exhaustive as the subject is vast, but rather to explain what lies behind the concept of interpretability and to detail the challenges of interpreting machine learning models in the insurance context. They provide a state-ofthe-art literature review on methodologies to explain the results of a machine learning model.

In Chapter 11, Christian Robert discusses the dangers of algorithms and the implementation of appropriate governance. If AI is to become an asset for insurance companies, rather than a threat, it is essential to understand its limits, biases, and risks, and to develop appropriate governance. He identifies different options for the implementation of appropriate governance to reduce the risks of decision-making algorithms. An important question that he raises is where should algorithm governance be positioned in the current regulation for insurance companies? A first possibility is to classify algorithmic risk as a component of model risk. This risk is well described in the current principles of risk management. The main limitation is that algorithmic risk introduces new dimensions that are still imperfectly known and described in internal documentation. A second possibility is to distinguish very clearly the algorithmic risk as a new risk among the main large risks (financial, underwriting, etc.) and to create a key function accordingly which will be associated with an activity that will be permanently monitored.

Chapter 12 brings together two contributions on the attitude of regulators and insurance managers towards AI. The first contribution is by Stéphane Loisel, Frank Schiller, and Jennifer Wang. They present the first experiences of supervisors in Europe, Asia, and America on AI and Insurtechs. The second contribution, written by Anani Olympio, Denis Clot, David Ingram, and Stéphane Loisel, examines the attitudes of insurance company managers towards AI. They compare these attitudes towards AI with those towards the models that are usually used in insurance. They work from a questionnaire they sent to insurance managers.

In Chapter 13, Erwan Médy and Céline Guérin-Faucheur start from the premise that health and health insurance will be profoundly transformed in the coming years by changes in society, data technologies and therapeutic advances. This chapter proposes to focus on the challenges of building personalized preventive services based on the use of health data and on the place that insurance could take in a new health organization oriented towards the more advanced exploitation of data.

Chapter 14, written by Michel Bera, describes an approach to a method for measuring the reidentification risk for anonymized data and tries to answer to the question whether the actuarial mathematics of extreme risks make it possible to insure or reinsure the quality of anonymization of a dataset.

We hope that readers, either experts or newcomers to the field of AI and data analytics, will enjoy going through these chapters and that the reading will stimulate the subtle skills required for a proper use of AI in insurance business context.