How should we approach the evaluation of predictive models in higher education?
It is easy to fall into the trap of thinking that the goal of a predictive algorithm is to be as accurate as possible. But, as I have explained previously, the desire to increase the accuracy of a model for its own sake is one that fundamentally misunderstands the purpose of predictive analytics. The goal of predictive analytics in identifying at-risk students is not to ‘get it right,’ but rather to inform action. Accuracy is definitely important here, but it is not the most important, and getting hung up on academic conversations about a model can actually obscure its purpose and impede the progress we are able to make in support of student success.
Let’s take a hypothetical example. Consider a model with an accuracy of 70% in predicting a student’s chances of completing a course with a grade of C or higher. A confusion matrix representing this might look something like this:
Too much emphasis on model accuracy can lead to a kind of paralysis, or hesitation to reach out to students for fear that that model has misclassified them. Institutions might worry about students falling through the cracks because the model predicted they would pass when they actually failed. But what is worse? Acting wrongly? Or not acting at all?
Let’s consider this from the perspective of the academic advisor. In the absence of a predictive model, advisors may approach their task from one of the following two perspectives.
- No proactive outreach – this is the traditional walk-in model of academic advising. We know that the students who are most likely to seek out an academic advisor are also among the most likely to succeed anyway. What this means is that an academic advisor will probably only see some portion of 40 students in the above scenario, and make very little impact since those students would probably do just fine without them.
- Proactively reach out to everyone – we know that proactive advising works, so why not try and reach everyone? This would obviously be amazing! But institutions simply do not have the capacity to do this very well. With average advising loads of 450 students or more, it is impossible for advisors to reach out to all their students in time to ensure that they are on track and remain on track each semester. If an advisor only had the ability to see 50 students before week six of the semester, selected at random, only 25 of students (50%) seen would actually have been in need of academic support.
Compare the results of each of these scenarios with the results of an advisor who only reaches out to students that the algorithm has identified as being at-risk of failure. I this case, an advisor would only need to see 45 students, which means that they have greater time available to meet with each of them. True, only 30 of these students would truly be at risk of failing, but this is significantly greater than the number of at-risk students they would otherwise be able to meet with. There is, of course, no harm in meeting with students who are not actually at risk. Complemented by additional information about student performance and engagement, a trained academic advisor could also further triage students flagged as being at risk, and communicate with instructors to increase the accuracy and impact of outreach attempts.
What about the students who fail through the cracks? The students that the model predicts would be successful but who actually fail the course? This is obviously something we’d like to avoid, but 15% is far lower than the 60% that fall through in a traditional advising context, and the 25% that fall through using the scatter shot approach. Of course, this is an artificial example, describing an advisor who only makes outreach decisions on the basis of the recommendation produced by the predictive model. In actual fact, however, through a system like Blackboard Predict, advisors and faculty have access to additional information and communications tools to help them to fine tune their judgments and increase the accuracy and impact of outreach efforts even further.
What I hope this example underscores is that predictive analytics should be viewed as simply a tool. Prediction is not prophesy. It is an opportunity to have a conversation. Accuracy is important, but not to the point that modeling efforts get in the way of the actual interventions that drive student success. It is understandable that institutions might worry that a perceived lack of sufficient model accuracy by faculty and advisors might error confidence in the model that prevents them from taking action. It is therefore incredibly important that misunderstandings about the nature of prediction, predictive modeling, and action be addressed from the outset so that time and resources can be committed where they will make the greatest impact: in the development and implementation of high impact advising practices that use predictive analytics as a valuable source of information alongside others, including the kind of wisdom that comes through experience.
This is the third in a series of blog posts on misunderstandings about predictive analytics that get in the way of high impact campus adoption. For the first two posts in this series, check out What are predictive analytics? And why does the idea create so much confusion? and Predictive analytics are not social science: A common misunderstanding with major consequences for higher education