Predictive analytics are not social science: A common misunderstanding with major consequences for higher education

This is the second in my series on common misunderstandings about predictive analytics that hinder their adoption in higher education. Last week I talked about the language of predictive analytics. This week, I want to comment on another common misconception: that predictive analytics (and educational data mining more generally) is a social science. Read more

What are predictive analytics? And why does the idea create so much confusion?

The greatest barrier to the widespread impact of predictive analytics in higher education is adoption. No matter how great the technology is, if people don’t use it effectively, any potential value is lost.

In the early stages of predictive analytics implementations at colleges and universities, a common obstacle comes in the form of questions that arise from some essential misunderstandings about data science and predictive analytics.  Without a clear understanding of what predictive analytics are, how they work, and what they do, it is easy to establish false expectations.  When predictive analytics fail to live up to these expectations, the result is disappointment, frustration, poor adoption, and a failure to fully actualize their potential value for student success.

This post is the first in a series of posts addressing common misunderstandings about data science that can have serious consequences for the success of an educational data or learning analytics analytics initiative in higher education.  The most basic misunderstanding that people have is about the language of prediction. What do we mean by ‘predictive’ analytics, anyway?

Why is the concept of ‘Predictive Analytics’ so confusing?

The term ‘predictive analytics’ is used widely, not just in education, but across all knowledge domains. We use the term because everyone else uses it, but it is actually pretty misleading.

I have written about this at length elsewhere, but in nutshell the term ‘prediction’ has a long history of being associated with a kind of mystical access to true knowledge about future events in a deterministic universe.  The history of the term is important, because it explains why many people get hung up on issues of accuracy, as if the goal of predictive analytics was to become something akin to the gold standard of a crystal ball.  It also explains why others are immediately creeped out by conversations about predictive analytics in higher education, because the term ‘prediction’ carries with it a set of pretty heavy metaphysical and epistemological connotations.  It is not uncommon in discussions of ethics and AI in higher education to hear comparisons between predictive analytics and the world of the film Minority Report (which is awesome), in which government agents are able to intervene and arrest people for crimes before they were committed.  In these conversations, however, it is rarely remembered that Minority Report predictions were quasi-magical in origin, where predictive analytics involve computational power applied to incomplete information.

Predictive analytics are not magic, even if the language of prediction sets us up to think of it in this way.  In The Signal an the Noise, Nate Silver suggests that we can begin to overcome this confusion by using the language of forecasting instead.  Where the goal of prediction is to be correct, the goal of a forecast is to be prepared.  I watch the weather channel, not because I want to know what the weather is going to be like, but because I want to know whether I need to pack an umbrella.

In higher education, it is unlikely that we will stop talking about predictive analytics any time soon.  But it is important to shift our thinking and set our expectations along the lines of forecasting.  When it comes to the early identification of at-risk students, our aim is not to be 100% accurate, and we are not making deterministic claims about a particular student’s future behavior.  What we are doing is providing a forecast based on incomplete information about groups of students in the past so that instructors and professional advisors can take action. The goal of predictive analytics in higher education is to offer  students an umbrella when the sky turns grey and there is a strong chance of rain.

Ethical AI in Higher Education: Are we doing it wrong?

In higher education, and in general, an increasing amount of attention is being paid to questions about the ethical use of data. People are working to produce principles, guidelines and ethical frameworks. This is a good thing.

Despite being well-intentioned, however, most of these projects are doomed to failure. The reason is that, amidst talk about arriving at an ethics, or developing an ethical framework, the terms ‘ethics’ and ‘framework’ are rarely well-defined from the outset. If you don’t have a clear understanding of your goal, you can’t define a strategy to achieve it, and you won’t know if you have reached it if you ever do.

As a foundation to future blog posts that I will write on the matter of ethics in AI, what I’d like to do is propose a couple of key definitions, and invite comment where my assumptions might not make sense.

What do we mean by ‘ethics’?

Ethics is hard to do. It is one of those five inter-related sub-disciplines of philosophy defined by Aristotle that also includes metaphysics, epistemology, aesthetics, and logic. To do ethics involves establishing a set of first principles, and developing a system for determining right action as a consequence of those principles. For example, if we presume the existence of a creator god that has given us some kind of access to true knowledge, then we can apply that knowledge to our day-to-day life as a guide to evaluating right or wrong courses of action. Or, instead of appealing to the transcendent, we might begin with certain assumptions about human nature and develop ethical guidelines meant to cultivate those essential and unique attributes. Or, if we decide that the limits of our knowledge preclude us from knowing anything about the divine, or even ourselves, except for the limits of our knowledge, there are ethical consequences of that as well. There are many approaches and variations here, but the key thing to understand is that ethics is hard. It requires us to be thoughtful about arriving at a set of first principles, being transparent, and systematically deriving ethical judgements as consequences of our metaphysical, epistemological, and logical commitments.

What ethics is NOT, is a set of unsystematicly articulated opinions about situations that make us feel uneasy. Unfortunately, when we read about ethics in data science, in education, and in general, this is typically what we end up with. Indeed, the field of education is particularly bad about talking about ethics (and of philosophy in general) in this way.

What do we mean by a ‘framework’?

The interesting thing about the language of frameworks is that it has the potential to liberate us from much of the heavy burden placed on us by ethical thinking. The reason for this is that the way this language is used in relation to ethics — as in an ‘ethical framework’ — already presupposes a specific philosophical perspective: Pragmatism.

What is Pragmatism? I’m going to do it a major disservice here, but it is a perspective that rejects our ability to know ‘truth’ in any transcendent or universal way, and so affirms that the truth in any given situation is a belief that ‘works.’ In other words, the right course of action is the one with the best practical set of consequences. (There’s a strong and compelling similarity here between Pragmatism and Pyrrhonian Skepticism, but won’t go into that here…except to note that, in philosophy, everything new is actually really old).

The reason that ethical frameworks are pragmatic is that they do not seek to define sets of universal first principles, but instead set out to establish methods or approaches for arriving at the best possible result at a given time, and in a given place.

The idea of an ethical framework is really powerful when discussing the human consequences of technological innovation. Laws and culture are constantly changing, and they differ radically around the globe. Were we to set out to define an ethics of educational data use, it could be a wonderful and fruitful academic exercise. A strong undergraduate thesis, or perhaps even a doctoral dissertation. But it would never be globally adopted, if for no other reason than because it would rest on first principles, the very definition of which is that they cannot themselves be justified. There will always be differences in opinion.

But an ethical framework CAN claim universality in a way that an ethics cannot, because it defines an approach to weighing a variety of factors that may be different from place to place, and that may change over time, but in a way that nevertheless allows people to make ethical judgments that work here and now. Where differences of opinion create issues for ethics, they are a valuable source of information for frameworks, which aim to balance and negotiate differences in order to arrive at the best possible outcome.

 

Laying my cards in the table (as if they weren’t on the table already), I am incredibly fond of the framework approach. Ethical frameworks are good things, and we should definitely strive to create an ethical frameworks for AI in education. We have already seen several attempts, and these have played an important role in getting the conversation started, but I see the language of ‘ethical framework’ being used with a lack of precision. The result has been some helpful, but rather ungrounded and unsystematic sets of claims pertaining to how data should be used in certain situations. These are not frameworks. Nor are they ethics. They are merely opinions. These efforts have been great for promoting public dialogue, but we need something more if we are going to make a difference.

Only by being absolutely clear from the outset about what an ethical framework is, and what it is meant to do, can we begin to make a significant and coordinated impact on law, public policy, data standards, and industry practices.

Liquid modernity & learning analytics: On educational data in the 21st century

I was recently interviewed for a (forthcoming) piece in eLearn Magazine.  Below are my responses to a couple of key questions, reproduced here in their entirety.


eLearn: You have a Ph.D. in Philosophy. Could you share with us a little about your history and your work with learning analytics?

TH: What drives me in my capacity of a philosopher and social theorist is an interest in how changes in information technology affect how we think about society, and in the implications our changing conceptions of society have on the role of education.

I think about how the rapid increase in our access to information as a result of the internet has led to the advent of what Zygmunt Bauman has called ‘liquid modernity.’ In contrast to the world as recently as a half century ago — a world defined by hard and fast divisions of labor, career tracks, class distinctions, power hierarchies, and relationships — the world we live in now is far more fluid: relationships are unstable, changes in job and career are rapid, and the rate of technology change is increasing exponentially. The kind of training that made sense in the 1950’s not only doesn’t work, but it renders students ill-prepared to survive, let alone thrive, in the 21st century.

When I think about our liquid modern world, I am comforted to know that this is not the first time we have lived in a world of constant change.  We experienced it in Ancient Greece, and we experienced it during the Renaissance.  In both of these periods, the role of the teacher was incredibly important.  The Sophists were teachers.  So were the Humanists.  For both of these groups, the task of education was to train citizens to survive and thrive under conditions of constant change by cultivating ingenuity, or the ability to mobilize a variety of disparate elements to solve specific problems in the here and now.  For them, education was less about training than it was about cultivating the imagination, and encouraging the development of a kind of practical wisdom that could only be gained through experience.

It is common among people on analytics circles to use a quote apocryphally attributed to Peter Drucker: “What gets measured gets managed.” Indeed, when we look at the history of analytics, we can find its origins in the modern period immediately following industrialization, concerned with optimizing efficiency through standardization and specialization.  Something that has worried me is whether or not there is a mismatch between analytics – an approach to measurement with roots in early modernity – and the demands of education in the 21st century, when students don’t need to be managed, so much as prepared to adapt.

Is learning analytics compatible with 21st century education?

I believe the answer is yes, but it requires us to think carefully about what data mean, and the ways in which data are exposed.  In essence, it means appreciating the analytics do not represent an objective source of truth.  They are not a replacement for human judgment.  Rather, they represent important artifacts that need to be considered along with a variety of other sources of knowledge (including the wisdom that comes from experience) in order to solve particular problems here and now.  In this, I am really excited about the kind of reflective approaches to learning analytics being explored and championed by people like John Fritz, Alyssa Wise, Bodong Chen, Simon Buckingham Shum, Andrew Gibson, and others

eLearn: You wrote in an article for Blackboard Blog that “analytics take place at the intersection of information and human wisdom”. What does it mean to consider humanistic values when dealing with data? Why is it important?

TH: I mean this in two ways.  On the one hand, analytics is nothing more and nothing less than the visual display of quantitative information.  The movement from activity, to capturing that activity in the form of data, to transforming that data into information, to its visual display in the form of tables, charts, and graphs involves human judgment at every stage.  As an interpretive activity, the visual display of quantitative information involves decisions about what is important.  But it is also a rhetorical activity, designed to support particular kinds of decision in particular kinds of ways.  Analytics is a form of communication.  It is not neutral, and always embeds sets of particular values.  Hence, it is incumbent upon researchers, practitioners, and educational technology vendors to be thoughtful about the values that they bring to bear on their analytics, and also to be transparent about those values so that they can inform the interpretation of analytics by others.

On the other hand, to the extent that analytics are designed to support human decision-making, they are not a replacement for human judgment.  They are an important form of information, but they still need to be interpreted.  The most effective institutions are those with experiences and prudent practitioners who can carefully consider the data within the context of  deep knowledge and experience about students, institutional practices, cultural factors, and other things.

As artifact, analytics is the result of meaning-making, and it informs meaning-making.

eLearn: Do you think that institutions are already taking advantage of all the benefits that learning analytics can offer? What are their main challenges?

TH: No.  The field of learning analytics is really only six years old. We began with access to data and a sense of inflated expectation.

The initial excitement and sense of inflated expectation actually represents a significant challenge.  In those early days, institutions, organizations, and vendors alike promise and expected a lot.  But no one really knew what they had, or what was reasonable to expect.

Mike Sharkey and I recently wrote a series of pieces for EDUCAUSE and Next Generation Learning on the analytics hype cycle, in which we argued that we have entered the trough of disillusionment and have begun to ascend the slope of enlightenment (see HERE & HERE).  Many early adopter institutions were excited, invested, and were hurt. We are at an exciting moment right now because institutions, media, and vendors are beginning to develop far more realistic expectations. We know more, and can now start getting stuff done.

Another major challenge is adoption.  It’s easy to buy a technology.  It’s harder to get people to use it, and even harder to get people to use it effectively.  Overcoming the  adoption challenge is one that involves strong leadership, good marketing, and excellent faculty development.  It also requires courage.  Change is hard, and initially even the most successful institutions encountered significant flak.  But what we see time and time again that a well-executed adoption plan that emphasizes value while assuring safety (should never be punitive) very quickly overcomes negativity and sees broad-based success.

Lastly, a major challenge that institutions have is being overwhelmed by the data, and losing sight of the questions and challenges they what to address.  It is important to invest in data access so that you have the material you need to understand and address barriers when they arise, but questions should come first.

The difference between IT and Ed Tech

In a recent interview with John Jantsch for the Duct Tape Marketing podcast, Danny Iny argued that the difference between information and education essentially comes down to responsibility. Information is simply about presentation. Here are some things you might want to know. Whether and the extent to which you come to know them is entirely up to you.

In contrast, education implies that the one presenting information also takes on a degree of responsibility for ensuring that it is learned. Education is a relationship in which teachers and learners agree to share in the responsibility for the success of the learning experience.

This distinction, argues Iny, accounts for why books are so cheep and university is so expensive. Books merely present information, while universities take on an non-trivial amount of responsibility for what is learned, and how well.

(It is a shame that many teachers don’t appreciate this distinction, and their role as educators. I will admit that, when I was teaching, I didn’t fully grasp the extent of my responsibility for the success of my students. I wish I could go back and reteach those courses as an educator instead of as a mere informer.)

If we accept Iny’s distinction between information and education, what are the implications for what we today call educational technologies, or ‘Ed Tech’? As we look to the future of technology designed to meet specific needs of teachers and learners, is educational technology something that we wish to aspire to, or avoid?

Accepting Iny’s definition, I would contend that what we call educational technologies today are not really educational technologies at all. The reason is that neither they nor the vendors that maintain them take specific responsibility for the success or failure of the individual students they touch. Although vendors are quick to take credit for increased rates of student success, taking credit is not the same as taking responsibility. In higher education, the contract is between the student and the institution. If the student does not succeed, responsibility is shared between the two. No technology or ed tech vendor wants to be held accountable for the success of an individual student. In the absence of such a willingness or desire to accept a significant degree of responsibility for the success of particular individuals, what we have are not educational technologies, but rather information technologies designed for use in educational contexts. Like books…but more expensive.

With the advent of AI, however, we are beginning to see an increasing shift as technologies appear to take more and more responsibility for the learning process itself. Adaptive tutoring. Automated nudging. These approaches are designed to do more than present information. Instead, they are designed to promote learning itself. Should we consider these educational technologies? I think so. And yet they are not treated as such, because vendors in these areas are still unwilling (accountability is tricky) or unable (because of resistance from government and institutions) to accept responsibility for individual student outcomes. There is no culpability. That’s what teachers are for. In the absence of a willingness to carry the burden of responsibility for a student’s success, even these sophisticated approaches are still treated as information technologies, when they should actually be considered far more seriously.

As we look to the future, it does seem possible that the information technology platforms deployed in the context of education will, indeed, increasingly become and be considered full educational technologies. But this can only happen if vendors are willing to accept the kind of responsibility that comes with such a designation, and teachers are willing to share responsibility with technologies capable of automating them out of a job. This possible future state of educational technology may or may not be inevitable. It also may or may not be desirable.


RESOURCES

Why the National Student Clearinghouse matters, and why it should matter more

In analytics circles, it is common to quote Peter Drucker: “What gets measured get managed.” By quantifying our activities, it becomes possible to measure the impact of decisions on important outcomes, and optimize processes with a view to continual improvement.  With analytics, there comes a tremendous opportunity to make evidence-based decisions where before there was only anecdote.

But there is a flip side to all this.  Where measurement and management go hand in hand, the measurable can easily limit the kinds of things we think of as important.  Indeed, this is what we have seen in recent years around the term ‘student success.’  As institutions have gained more access to their own institutional data, they have gained tremendous insight into the factors contributing to things like graduation and retention rates.  Graduation and retention rates are easy to measure, because they don’t require access to data outside of institutions, and so retention and graduation have become the de facto metrics for student success.  Because colleges and universities can easily report on these things, they are also easy to incorporate into rankings of educational quality, accreditation standards, and government statistics.

But are institutional retention and graduation rates actually the best measures of student success? Or are they simply the most expedient given limitations on data collection standards?  What if we had greater visibility into how students flowed into and out of institutions?    What if we could reward institutions for effectively preparing their students for success at other institutions despite a failure to retain high numbers through to graduation?  In many ways, limited data access between institutions has led to conceptions of student success and a system of incentives that foster competition rather than cooperation, and may in fact create obstacles to the success of non-traditional students.  These are the kind of questions that have recently motivated a bipartisan group of senators to introduce a bill that would lift a ban on the federal collection of employment and graduation outcomes data.

More than 98% of US institutions provide data and have access to the National Student Clearinghouse.  For years, the National Student Clearinghouse (NSC) has provided a rich source of information about the flow of students between institutions in the U.S., but colleges and universities often struggle with making this information available for easy analysis.  Institutions see the greatest benefit from access to NSC data when they combine it with other institutional data sources, and especially demographic and performance information stored in their student information systems.  This kind of integration is helpful, not only for understanding and mitigating barriers to enrollment and progression, but also as institutions work together to understand the kinds of data that are important to them.  As argued in a recent article in Politico, external rating systems have a significant impact on setting institutional priorities and, in so doing, may have the effect of promoting systematic inequity on the basis of class and other factors. As we see at places like Georgia State University, the more data that an institution has at their disposal, and the more power it has to combine multiple data sources the more it can align its measurement practices with its own values, and do what’s best for its students.

 

Is Facebook making us more adventurous?

When was the last time you heard someone say “get off of Facebook (or Instagram? or twitter, or …) and DO something!”?

I have a favorite passage from Jean-Paul Sartre’s Nausea:

This is what I thought: for the most banal even to become an adventure, you must (and this is enough) begin to recount it. This is what fools people: a man is always a teller of tales, he sees everything that happens to him through them; and he tries to live his own life as if he were telling a story. But you have to choose: live or tell.

We experience life through the stories we tell, and through the stories of others. It has always been the case. Even before the internet.

So does that mean that social media, which demands the persistent sharing of ‘adventures,’ actually make our lives richer? Does the compulsion to share more moments as if they were significant events render our lives more event-ful?

I eat the same breakfast every day and never remember it. I take a single picture of my meal, and oatmeal becomes an event.

Research suggests that kids today are doing less. And that is probably right. But as they have more opportunities to narrate their lives, perhaps they are more adventurous.