What are predictive analytics? And why does the idea create so much confusion?

The greatest barrier to the widespread impact of predictive analytics in higher education is adoption. No matter how great the technology is, if people don’t use it effectively, any potential value is lost.

In the early stages of predictive analytics implementations at colleges and universities, a common obstacle comes in the form of questions that arise from some essential misunderstandings about data science and predictive analytics.  Without a clear understanding of what predictive analytics are, how they work, and what they do, it is easy to establish false expectations.  When predictive analytics fail to live up to these expectations, the result is disappointment, frustration, poor adoption, and a failure to fully actualize their potential value for student success.

This post is the first in a series of posts addressing common misunderstandings about data science that can have serious consequences for the success of an educational data or learning analytics analytics initiative in higher education.  The most basic misunderstanding that people have is about the language of prediction. What do we mean by ‘predictive’ analytics, anyway?

Why is the concept of ‘Predictive Analytics’ so confusing?

The term ‘predictive analytics’ is used widely, not just in education, but across all knowledge domains. We use the term because everyone else uses it, but it is actually pretty misleading.

I have written about this at length elsewhere, but in nutshell the term ‘prediction’ has a long history of being associated with a kind of mystical access to true knowledge about future events in a deterministic universe.  The history of the term is important, because it explains why many people get hung up on issues of accuracy, as if the goal of predictive analytics was to become something akin to the gold standard of a crystal ball.  It also explains why others are immediately creeped out by conversations about predictive analytics in higher education, because the term ‘prediction’ carries with it a set of pretty heavy metaphysical and epistemological connotations.  It is not uncommon in discussions of ethics and AI in higher education to hear comparisons between predictive analytics and the world of the film Minority Report (which is awesome), in which government agents are able to intervene and arrest people for crimes before they were committed.  In these conversations, however, it is rarely remembered that Minority Report predictions were quasi-magical in origin, where predictive analytics involve computational power applied to incomplete information.

Predictive analytics are not magic, even if the language of prediction sets us up to think of it in this way.  In The Signal an the Noise, Nate Silver suggests that we can begin to overcome this confusion by using the language of forecasting instead.  Where the goal of prediction is to be correct, the goal of a forecast is to be prepared.  I watch the weather channel, not because I want to know what the weather is going to be like, but because I want to know whether I need to pack an umbrella.

In higher education, it is unlikely that we will stop talking about predictive analytics any time soon.  But it is important to shift our thinking and set our expectations along the lines of forecasting.  When it comes to the early identification of at-risk students, our aim is not to be 100% accurate, and we are not making deterministic claims about a particular student’s future behavior.  What we are doing is providing a forecast based on incomplete information about groups of students in the past so that instructors and professional advisors can take action. The goal of predictive analytics in higher education is to offer  students an umbrella when the sky turns grey and there is a strong chance of rain.

Also published on Medium.