This is the second in my series on common misunderstandings about predictive analytics that hinder their adoption in higher education. Last week I talked about the language of predictive analytics. This week, I want to comment on another common misconception: that predictive analytics (and educational data mining more generally) is a social science. Read more
The greatest barrier to the widespread impact of predictive analytics in higher education is adoption. No matter how great the technology is, if people don’t use it effectively, any potential value is lost.
In the early stages of predictive analytics implementations at colleges and universities, a common obstacle comes in the form of questions that arise from some essential misunderstandings about data science and predictive analytics. Without a clear understanding of what predictive analytics are, how they work, and what they do, it is easy to establish false expectations. When predictive analytics fail to live up to these expectations, the result is disappointment, frustration, poor adoption, and a failure to fully actualize their potential value for student success.
This post is the first in a series of posts addressing common misunderstandings about data science that can have serious consequences for the success of an educational data or learning analytics analytics initiative in higher education. The most basic misunderstanding that people have is about the language of prediction. What do we mean by ‘predictive’ analytics, anyway?
Why is the concept of ‘Predictive Analytics’ so confusing?
The term ‘predictive analytics’ is used widely, not just in education, but across all knowledge domains. We use the term because everyone else uses it, but it is actually pretty misleading.
I have written about this at length elsewhere, but in nutshell the term ‘prediction’ has a long history of being associated with a kind of mystical access to true knowledge about future events in a deterministic universe. The history of the term is important, because it explains why many people get hung up on issues of accuracy, as if the goal of predictive analytics was to become something akin to the gold standard of a crystal ball. It also explains why others are immediately creeped out by conversations about predictive analytics in higher education, because the term ‘prediction’ carries with it a set of pretty heavy metaphysical and epistemological connotations. It is not uncommon in discussions of ethics and AI in higher education to hear comparisons between predictive analytics and the world of the film Minority Report (which is awesome), in which government agents are able to intervene and arrest people for crimes before they were committed. In these conversations, however, it is rarely remembered that Minority Report predictions were quasi-magical in origin, where predictive analytics involve computational power applied to incomplete information.
Predictive analytics are not magic, even if the language of prediction sets us up to think of it in this way. In The Signal an the Noise, Nate Silver suggests that we can begin to overcome this confusion by using the language of forecasting instead. Where the goal of prediction is to be correct, the goal of a forecast is to be prepared. I watch the weather channel, not because I want to know what the weather is going to be like, but because I want to know whether I need to pack an umbrella.
In higher education, it is unlikely that we will stop talking about predictive analytics any time soon. But it is important to shift our thinking and set our expectations along the lines of forecasting. When it comes to the early identification of at-risk students, our aim is not to be 100% accurate, and we are not making deterministic claims about a particular student’s future behavior. What we are doing is providing a forecast based on incomplete information about groups of students in the past so that instructors and professional advisors can take action. The goal of predictive analytics in higher education is to offer students an umbrella when the sky turns grey and there is a strong chance of rain.
In higher education, and in general, an increasing amount of attention is being paid to questions about the ethical use of data. People are working to produce principles, guidelines and ethical frameworks. This is a good thing.
Despite being well-intentioned, however, most of these projects are doomed to failure. The reason is that, amidst talk about arriving at an ethics, or developing an ethical framework, the terms ‘ethics’ and ‘framework’ are rarely well-defined from the outset. If you don’t have a clear understanding of your goal, you can’t define a strategy to achieve it, and you won’t know if you have reached it if you ever do.
As a foundation to future blog posts that I will write on the matter of ethics in AI, what I’d like to do is propose a couple of key definitions, and invite comment where my assumptions might not make sense.
What do we mean by ‘ethics’?
Ethics is hard to do. It is one of those five inter-related sub-disciplines of philosophy defined by Aristotle that also includes metaphysics, epistemology, aesthetics, and logic. To do ethics involves establishing a set of first principles, and developing a system for determining right action as a consequence of those principles. For example, if we presume the existence of a creator god that has given us some kind of access to true knowledge, then we can apply that knowledge to our day-to-day life as a guide to evaluating right or wrong courses of action. Or, instead of appealing to the transcendent, we might begin with certain assumptions about human nature and develop ethical guidelines meant to cultivate those essential and unique attributes. Or, if we decide that the limits of our knowledge preclude us from knowing anything about the divine, or even ourselves, except for the limits of our knowledge, there are ethical consequences of that as well. There are many approaches and variations here, but the key thing to understand is that ethics is hard. It requires us to be thoughtful about arriving at a set of first principles, being transparent, and systematically deriving ethical judgements as consequences of our metaphysical, epistemological, and logical commitments.
What ethics is NOT, is a set of unsystematicly articulated opinions about situations that make us feel uneasy. Unfortunately, when we read about ethics in data science, in education, and in general, this is typically what we end up with. Indeed, the field of education is particularly bad about talking about ethics (and of philosophy in general) in this way.
What do we mean by a ‘framework’?
The interesting thing about the language of frameworks is that it has the potential to liberate us from much of the heavy burden placed on us by ethical thinking. The reason for this is that the way this language is used in relation to ethics — as in an ‘ethical framework’ — already presupposes a specific philosophical perspective: Pragmatism.
What is Pragmatism? I’m going to do it a major disservice here, but it is a perspective that rejects our ability to know ‘truth’ in any transcendent or universal way, and so affirms that the truth in any given situation is a belief that ‘works.’ In other words, the right course of action is the one with the best practical set of consequences. (There’s a strong and compelling similarity here between Pragmatism and Pyrrhonian Skepticism, but won’t go into that here…except to note that, in philosophy, everything new is actually really old).
The reason that ethical frameworks are pragmatic is that they do not seek to define sets of universal first principles, but instead set out to establish methods or approaches for arriving at the best possible result at a given time, and in a given place.
The idea of an ethical framework is really powerful when discussing the human consequences of technological innovation. Laws and culture are constantly changing, and they differ radically around the globe. Were we to set out to define an ethics of educational data use, it could be a wonderful and fruitful academic exercise. A strong undergraduate thesis, or perhaps even a doctoral dissertation. But it would never be globally adopted, if for no other reason than because it would rest on first principles, the very definition of which is that they cannot themselves be justified. There will always be differences in opinion.
But an ethical framework CAN claim universality in a way that an ethics cannot, because it defines an approach to weighing a variety of factors that may be different from place to place, and that may change over time, but in a way that nevertheless allows people to make ethical judgments that work here and now. Where differences of opinion create issues for ethics, they are a valuable source of information for frameworks, which aim to balance and negotiate differences in order to arrive at the best possible outcome.
Laying my cards in the table (as if they weren’t on the table already), I am incredibly fond of the framework approach. Ethical frameworks are good things, and we should definitely strive to create an ethical frameworks for AI in education. We have already seen several attempts, and these have played an important role in getting the conversation started, but I see the language of ‘ethical framework’ being used with a lack of precision. The result has been some helpful, but rather ungrounded and unsystematic sets of claims pertaining to how data should be used in certain situations. These are not frameworks. Nor are they ethics. They are merely opinions. These efforts have been great for promoting public dialogue, but we need something more if we are going to make a difference.
Only by being absolutely clear from the outset about what an ethical framework is, and what it is meant to do, can we begin to make a significant and coordinated impact on law, public policy, data standards, and industry practices.
Coming up with a list of the top eventers based on their performance in 2016 is hard. The sport of three-day eventing is complex and multi-faceted, and the decisions we make about which factors to consider make a significant difference to the final result of any evaluation process. It is a result of this complexity, and the fact that there is bound to be strong disagreement about who ends up being included in a list of this kind, that it is rare to see anything like this published. And yet, I still believe that this exercise has value, particularly for fans like myself who find rankings a useful way of understanding the sport.
Note that the ranking that I have produced is the result of a lot of thinking and expert consultation. It is also a work in progress. I have tried to document some of the theory and methods underlying the list(s), but if you want to bypass this discussion, feel free to skip over these sections and see the lists themselves.
All ranking schemes involve subjective judgement. They involve establishing criteria on the basis of values. Since values differ from individual to individual, disagreement is bound to happen and conflicting lists are bound to appear. But there are two guiding principles that I believe should apply to all rankings:
(1) Look to the data – Human beings are great at making decisions and at coming up with justifications after the fact. We all have biases, and we are all terrible at overcoming them. By limiting ourselves to measurable qualities and available data, we can lessen the impact of irrelevant and inconsistently applied preferences.
(2) Be transparent – Being data-driven in our decision-making processes doesn’t mean being objective. Decisions have to be made about the kinds of data to include, the ways in which that data is transformed, and the analytical tools that are applied. This is not a bad thing. Not only are these decisions necessary, they are also important because it is here that data becomes meaningful. Here, I argue that making the ‘right’ decisions is less important than making your decisions explicit.
Who should be considered for inclusion in a list of top eventers world-wide? Here is a list of criteria that I believe any eventer needs to satisfy in order to be considered among the top in the sport. This is where values and judgement come in, and there is bound to be some disagreement. So it goes.
There are several significant differences between CCI and CIC events. The demands that each of these event types place on horse and rider are so different that, for all intents and purposes, they should be considered different sports entirely. Compared to CIC events, CCIs are characterized by longer cross country courses, have stricter vetting requirements, and include show jumping as the final of the three phases. CIC competitions are developmental. The most elite riders in the world must be able to compete, complete, and excel in CCI events. For this reason, I have chosen only to include CCI riders in the list.
3* and 4* only
This list is meant to include the best of the best. What this means is only including riders who have successfully competed at either 3 star or 4 star levels. Why not just include riders who have competed at the 4 star level and exclude 3 star results? The fact that there are only six 4 star events means that we don’t have a whole lot of data from year to year. The decision to include 3 star data also makes sense in light of recent decisions to downgrade Olympic and World Equestrian Games events to the 3 star level.
At least two competitions
There is a difference between CCI 3*/4* pairs and pairs that have merely competed at that level. In order to be considered in the list, a horse and rider combination must have completed a minimum of two CCI events at either the 3 star or four star level.
100% event completion rate
As recent Olympic history has underscored, the most important quality of an elite rider is the ability to consistently complete events at the highest level. Consistency is key. So I have only included riders in the list that successfully completed every CCI event they entered in 2016.
Once we have established a pool of eligible pairs, what is the best way to rank them? Do we simply take an average of their final scores? How do we account for the fact that some pairs excel in dressage while others shine on cross country or in show jumping? How to we account for the fact that judging differs from event to event, and for differences in terrain, weather, and course design? From a statistical perspective, we know that some events are ‘easier’ than others. How do we fairly compare the relative performance of horses and riders competing under different sets of conditions, even at the same level?
One way of overcoming differences is through a statistical process called standardization. A z-score is the difference between the number of points that a pair earned and the average number of points earned by all competitors at the same event in standard deviation units. A score of 0 means that a pair is average. A negative z-score means the pair is above average, and a positive score means that it is below. By converting points into z-scores, we are able to account for various differences from event to event. By comparing average final z-scores, we can more easily and reliably compare horse and rider combinations on an even playing field.
Once we have standardized final scores, we can sort pairs according to their average z-score and take the top 10. VOILA! We have a list of top riders. Here are the results, along with a little bit of more useful information about their performance at 3* and 4* levels.
The Results (worldwide)
- Michael Jung & Fischerrocana FST (GER)
- Maxime Livio & Qalao des Mers (FRA)
- Hazel Shannon & Clifford (Aus)
- Oliver Townend & ODT Ghareeb (GBR)
- Jonelle Price & Classic Moet (NZL)
- Andrew Nicholson & Teseo (NZL)
- Hannah Sue Burnett & Under Suspection (USA)
- Nicola Wilson & Annie Clover (GBR)
- Andreas Dibowski & FRH Butts Avedon (GER)
- Oliver Townend & Lanfranco (GBR)
The Results (USA)
If we apply the same criteria above, but only consider American CCI 3*/4* riders in 2016, we get the following list:
- Hannah Sue Burnett & Under Suspection
- Hannah Sue Burnett & Harbour Pilot
- Boyd Martin & Welcome Shadow
- Buck Davidson & Copper Beach
- Elisa Wallace & Simply Priceless
- Lauren Kieffer & Landmark’s Monte Carlo
- Lillian Heard & LCC Barnaby
- Kurt Martin & Delux Z
- Phillip Dutton & Fernhill Fugitive
- Sharon White & Cooley on Show
Some may find it odd that Phillip Dutton & Mighty Nice didn’t make either top 10 list, in spite of being a bronze medalist at the 2016 Olympic Games in Rio, Brazil. The reason for this is that the FEI dataset that I have used intentionally excludes Olympic results because they are kind of strange…a horse of a different color, so to speak. Not including the Olympics, this pair only competed at one CCI event in 2016: the Rolex Kentucky Three Day Event, where they finished in 4th with a final score of 57.8, which converts to a z-score of -1.11. Based on this score, the pair would rank first in terms of national rankings, and fifth in the world. But this is only one CCI event, and so I could not include them in the lists based on the criteria I established above.
Originally posted to horseHubby.com