Modern analytics methodologies

The so-called modern analytical methodologies are those tools used to handle, explore and learn from massive datasets, for example in Kaggle challenges. In conjunction with the Data Lab of BNP Paribas Cardif, one of the main purposes is to challenge some of the celebrated logistic regression models with modern analytical tools. In this context, a Kaggle competition might be organised in collaboration with the Chair. Additionally, the teams of BNP Paribas Cardif could also be involved alongside the ISFA Kaggle team (created during the ISFA Forum 2014). The team is constituted of thirty students of ISFA and is led by Denis Clot, lecturer and researcher at ISFA.

Learning comes at a cost in automated bidding systems for advertising on internet or in strategies monitored by self-learning models. Which are the optimum strategies to learn sufficiently quickly while keeping costs in check? How can strategies be adapted in a changing world? Also, learning is sometimes not enough to anticipate the behaviour of new users for which little available information exists. One of the essential tasks in item recommendation systems for new customers is to plan future user assessments for items that they have not yet acquired. Although numerous models and algorithms have been proposed, the way of making precise predictions for new users with very few recorded assessments is still a real statistical challenge. This is called the “cold start” problem.

This also raises the question of data expiry in the world of mega-data, but also the nature of variables to be considered. Sometimes, lack of change over a long period is, in itself, a more important piece of information than the actual value of the variable under consideration. Selection effects have tendency to peter out after several years in insurance and certain variables can present regime changes. In line with questions about detecting the loss of validity of actuarial hypotheses studied by the Chair, consideration must be given to detecting changes in regime of key variables in massive databases.