Should edtech vendors stop selling ‘predictive analytics’? A response to Tim McKay

Pedantic rants about the use and misuse of language are a lot of fun. We all have our soap boxes, and I strongly encourage everyone to hop on theirs from time to time. But when we enter into conversations around the use and misuse of jargon, we must always keep two things in mind: (1) conceptual boundaries are fuzzy, particularly when common terms are used across different disciplines, and (2) our conceptual commitments have serious consequences for how we perceive the world.

Tim McKay recently wrote a blog post called Hey vendors! Stop calling what you’re selling colleges and universities “Predictive Analytics”. In this piece, Mckay does two things. First, he tries to strongly distinguish the kind of ‘predictive analytics’ work done by vendors from the kind of ‘real’ prediction that is done within his own native discipline, which is astronomy. Second, on the basis of this distinction, he asserts that what analytics companies are calling ‘predictive analytics’ are actually not predictive at all. All of this is to imply what he later says explicitly in a tweet to Mike Sharkey: the language of prediction in higher ed analytics is less about helpfully describing the function of a particular tool, and more about marketing.

What I’d like to do here is to unpack Tim’s claims, and in so doing, soften the kind of strong antagonism that he erects between vendors and the rest of the academy, which is not particularly productive as vendors, higher educational institutions, government, and others seek to work together to promote student success, both in the US and abroad.

What is predictive analytics?

A hermeneutic approach

Let’s begin with defining analytics. Analytics is simply the visual display of quantitative information in support of human decision-making. That’s it. In practice, we see the broad category of analytics sub-divided in a wide variety of ways: by domain (i.e. website analytics), by content area (i.e., learning analytics, supply chain analytics), by intent (i.e., in the case of the common distinction between descriptive, predictive, and prescriptive analytics).

Looking specifically at predictive analytics, it is important not to take the term out of context. In the world of analytics, the term ‘predictive’ always refers to intent. Since analytics is always in the service of human decision-making, it always involves factors that are subject to change on the basis on human activity. Hence, ‘predictive analytics’ involves the desire to anticipate and represent some likely future outcome that is subject to change on the basis on human intervention. When considering the term ‘predictive analytics,’ then, it is important not to consider ‘predictive’ in a vacuum, separate from related terms (descriptive and prescriptive) and the concept of analytics, of which predictive analytics is a type. Pulling a specialized term out of one domain and evaluating it on the terms of another is unfair and is only possible under the presumption that language is static and ontologically bound to specific things.

So, when Tim McKay talks about scientific prediction and complains that predictive analytics do not live up to the rigorous standards of the former, he is absolutely right. But he is right because the language of prediction is deployed in two very different ways. In McKay’s view, scientific prediction involves applying one’s knowledge of governing rules to determine some future state of affairs with a high degree of confidence. In contrast, predictive analytics involves creating a mathematical model that anticipates a likely state of affairs based on observable quantitative patterns in a way that makes no claim to understanding how the world works. Scientific prediction, in McKay’s view, involves an effort to anticipate events that cannot be changed. Predictive analytics involves events that can be changed, and in many cases should be changed.

The distinction that McKay notes is indeed incredibly important. But, unlike McKay, I’m not particularly bothered by the existence of this kind of ambiguity in language. I’m also not particularly prone to lay blame for this kind of ambiguity at the feet of marketers, but I’ll address this later.

An Epistemological Approach

One approach to dealing with the disconnect between scientific prediction and predictive analytics is to admit that there is a degree of ambiguity in the term ‘prediction,’ to adopt a hermeneutic approach, and be clear that the term is simply being deployed relative to a different set of assumption. In other words, science and analytics are both right.

Another approach, however, might involve looking more carefully at the term ‘prediction’ itself and reconciling science and analytics by acknowledging that the difference is a matter of degree, and that they are both equally legitimate (and illegitimate) in their respective claims to the term.

McKay is actually really careful in the way that he describes scientific prediction. To paraphrase, scientific prediction involves (1) accurate information about a state of affairs (ex., the solar system), and (2) an understanding of the rules that govern changes in that state of affairs (ex., laws of gravity, etc). As McKay acknowledges, both our measurements and understanding of the rules of the universe are imperfect and subject to error, but when it comes to something like predicting an eclipse, the information we have is good enough that he is willing to “bet you literally anything in my control that this will happen – my car, my house, my life savings, even my cat. Really. And I’m prepared to settle up on August 22nd.”

Scientific prediction is inductive. It involves the creation of models that adequately describe past states of affairs, an assumption that the future will behave in very much the same way as the past, and some claim about a future event. It’s a systematic way of learning from experience.  McKay implies that explanatory scientific models are the same as the ‘rules that govern,’ but I feel like his admission that ‘Newton’s law of gravity is imperfect but quite adequate’ admits that they are not in fact the same. Our models might adequate rules, but the rules themselves are eternally out of our reach (a philosophical point that has been born out time and time again in the history of science).

Scientific prediction involves the creation of a good enough model that, in spite of errors in measurement and assuming that the patterns of the past will persist into the future, we are able to predict something like a solar eclipse with an incredibly high degree of probability. What if I hated eclipses. What if they really ground my gears. If I had enough time, money, and expertise, might it not be possible for me to…

…wait for it…

…build a GIANT LASER and DESTROY THE MOON?!

Based on my experience as an arm-chair science fiction movie buff, I think the answer is yes.

How is this fundamentally different from how predictive analytics works? Predictive analytics involves the creation of mathematical models based on past states of affairs, an admission that models are inherently incomplete and subject to error in measurement, an assumption that the future will behave in ways very similar to the past, and an acknowledgement that predicted future states of affairs might change with human (or extraterrestrial) intervention. Are the models used to power predictive analytics in higher education as accurate as those we have to predict a lunar eclipse? Certainly not. Is the data collected to produce predictive models of student success free from error? Hardly. But these are differences in degree rather than differences in the thing itself. By this logic, both predictive analytics and scientific prediction function in the exact same way. The only difference is that the social world is way more complex than the astrological world.

So, if scientific predictions are predictive, then student risk predictions are predictive as well. The latter might not be as accurate as the former, but the process and assumptions are identical for both.

An admission

It is unfortunate that, even as he grumbles about how the term ‘predictive’ is used in higher education analytics, McKay doesn’t offer a better alternative.

I’ll admit at this point that, with McKay, I don’t love the term ‘predictive.’ I feel like it is either too strong (in that it assumes some kind of god-like vision into the future) or too weak (in that it is used so widely in common speech and across disciplines that it ceases to have a specific meaning. With Nate Silver, I much prefer the term ‘forecast,’ especially in higher education.

In the Signal and the Noise, Silver notes that the terms ‘prediction’ and ‘forecast’ are used differently in different fields of study, and often interchangeably. In seismology, however, the two terms have very specific meanings: “A prediction is a definitive and specific statement about when and where an earthquake will strike: a major earthquake will hit Kyoto, Japan on June 28…whereas a forecast is a probabilistic statement usually over a longer time scale: there is a 60 percent chance of an earthquake in Southern California over the next thirty years.

There are two things to highlight in Silver’s discussion. First, the term ‘prediction’ is used differently and with varying degrees of rigor depending on the discipline. Second, if we really want to make a distinction, then what we call prediction in higher ed analytics should really be called forecasting. In principle, I like this a lot. When we produce a predictive model of student success, we are forecasting, because we are anticipating an outcome with a known degree of probability. When we take these forecasts and visualize them for the purpose of informing decisions, are we doing ‘forecasting analytics’? ‘forecastive analytics’? ‘forecast analytics’? I can’t actually think of a related term that I’d like to use on a regular basis. Acknowledging that no discipline owns the definition of ‘prediction,’ I’d far rather preserve the term ‘predictive analytics’ in higher education since it both rolls off the tongue, and already has significant momentum within the domain.

Is ‘predictive analytics’ a marketing gimick?

Those who have read my book will know that I like conceptual history. When we look at the history of the concept of prediction, we find that it has Latin roots and significantly predates the scientific revolution. Quoting Silver again:

The words predict and forecast are largely used interchangeably today, but in Shakespeare’s time, they meant different things.  A prediction was what a soothsayer told you […]

The term forecast came from English’s Germanic roots, unlike predict which is from Latin. Forecasting reflected the new Protestant worldliness rather than the otherwordliness of the Holy Roman Empire. Making a forecast typically implied planning under conditions of uncertainty. It suggested having prudence,
wisdom, and industriousness, more like the way we currently use the word foresight.

The term ‘prediction’ has a long and varied history. It’s meaning is slippery. But what I like about Silver’s summary of the term’s origins is that it essentially takes it off the table for everyone except those who who presume a kind of privileged access to the divine. In other words, using the language of prediction might actually be pretty arrogant, regardless of your field of study, since it presumes to have both complete information and an accurate understanding of the rules that govern the universe. Prediction is an activity reserved for gods, not men.

Digressions aside, the greatest issue that I have with McKay’s piece is that it uses the term ‘prediction’ as a site of antagonism between vendors and the academy. If we bracket all that has been said, and for a second accept McKay’s strong definition of ‘prediction,’ it is easy to demonstrate that vendors are not the only ones misusing the term ‘predictive analytics’ in higher education. Siemens and Baker deploy the term in their preface to the Cambridge Handbook of the Learning Sciences. Manuela Ekowo and Iris Palmer from New America comfortably makes use of the term in their recent policy paper on The Promise and Peril of Predictive Analytics in Higher Education. EDUCAUSE actively encourages the adoption of the term ‘predictive analytics’ through large numbers of publications including the Sept/Oct 2016 edition of the EDUCAUSE Review, which was dedicated entirely to the topic. The term appears in the ‘Journal of Learning Analytics,’ and is used in the first edition of the Handbook of Learning Analytics published by the Society of Learning Analytics Research (SoLAR). University administrators use the term. Government officials use the term. The examples are too numerous to cite (a search for “predictive analytics in higher education” in google scholar yields about 58,700 results). If we want to establish the true definition of ‘prediction’ and judge every use by this gold standard, then it is not simply educational technology vendors who should be charged with misuse. If there is a problem with how people are using the term, it is not a vendor problem: it is a problem of language, and of culture.

I began this essay by stating that we need to keep two things in mind when we enter into conversations about conceptual distinctions:  (1) conceptual boundaries are fuzzy, particularly when common terms are used across different disciplines, and (2) our conceptual commitments have serious consequences for how we perceive the world.  By now, I hope that I have demonstrated that the term ‘prediction’ is used in a wide variety of ways depending on context and intention.  That’s not a bad thing.  That’s just language.  A serious consequence of McKay’s discussion of how ed tech vendors use the term ‘predictive analytics is that it tacitly pits vendors against the interests of higher education — and of students — more generally.  Not only is such a sweeping implication unfair, but it is also unproductive.  It is the shared task of colleges, universities, vendors, government, not-for-profits, and others to work together in support of the success of students in the 21st century.  The language of student success is coalescing in such a way as to make possible a common vision and concerted action around a set of shared goals.  The term ‘predictive analytics’ is just one of many evolving terms that make up our contemporary student success vocabulary, and is evidence of an important paradigm shift in how we view higher education in the US.  Instead of quibbling about the ‘right’ use of language, we should instead recognize that language is shaped by values, and so work together to ensure that the words we use reflect the kinds of outcome we collectively wish to bring about.

Climbing out of the Trough of Disillusionment: Making Sense of the Educational Data Hype Cycle

In 2014, I wrote a blog post in which I claimed (along with others) that analytics had reached a ‘peak of inflated expectations.’ Is the use of analytics in higher education now entering what Gartner would call the ‘trough of disillusionment’?

In 2011, Long and Siemens famously argued that big data and analytics represented “the most dramatic factor shaping the future of higher education.”  Since that time, the annual NMC Horizon Report has looked forward to the year 2016 as the year when we would see widespread adoption of learning analytics in higher education.  But as 2016 comes to a close, the widespread adoption of learning analytics still lies on the distant horizon.  Colleges and universities are still very much in their infancy when it comes to the effective use of educational data.  In fact, poor implementations and uncertain ROI have led to what Kenneth C. Green has termed ‘angst about analytics.’

As a methodology, the Gartner Hype Cycle is not without criticism.  Audrey Watters, for example, takes issue with the fact that it is proprietary and so ‘hidden from scrutiny.’  Any proprietary methodology is in fact difficult to take seriously as a methodology.  It should also be noted that the methodology is also improperly named, as any methodology that assumes a particular outcome (i.e. that assumes that all technology adoption trends follow the same patters) is unworthy of the term.  But as a heuristic or helpful model, it is helpful way of visualizing analytics adoption in higher education to date, and it offers some helpful language for describing the state of the field. Read more