2014 Learning Analytics Summer Institutes Begin Tomorrow

Geolocation from LASI 2014 (June 30, 2014)
FIGURE 1: Geolocation map of user tweets using the hash tag #lasi14 (As of 2014-06-29 19:00:00 EST)

The second annual Learning Analytics Summer Institutes (LASI) begin tomorrow. I am delighted to have been selected as one of the participants for this year’s event, and look forward to coming away at the end of the three days with new skills and insights that I can put immediately into practice, and share with others in my local community in Atlanta, GA. The Society for Learning Analytic Research and the International Educational Data Mining Society together form a vibrant, diverse, and welcoming community of scholars and practitioners. Professional conferences and related events are, generally speaking, terrible. The Learning Analytics and Knowledge Conference, which I has the great pleasure of attending this year, was a rare exception, and I expect LASI to be just as exceptional.

LASI14 Social Netork AnalysisFIGURE 2: Social Network Diagram of user tweets using the Twitter hashtag #lasi14 (As of 2014-06-29 19:00:00 EST)

Unlike an academic conference, the summer institutes are meant to function as an intensive ‘summer camp’ for educational data scientists. This year, in addition to keynote lectures by Pierre Dillenbourg (“What Does Eye Tracking Tell Us On MOOCs”), Phil Winne (“Learning Analytics for Learning Science When N = me”), and Tiffany Barnes (“Making a meaningful difference: Leveraging data to improve learning for most of people most of the time”), the event also gives participants the opportunity to participate in several hands-on workshops. Of course, the most valuable aspect of LASI is the chance to connect with experts in the field of learning analytics, to share ideas, mutually inspire, and generate opportunities for future collaboration.

In addition to the live event at the Harvard Graduate School of Education in Boston, MA, there are also several international satellite events taking place at the same time. Activity from all these events will be tagged #lasi2014, and I will do my best to summarize this activity at the end of each day.

Upping my ‘creepiness factor,’ I have also borrowed my wife’s narrative clip, which I will wear periodically over the next few days. My wife, Elisa Wallace, is an elite equestrian who has recently started working with Narrative to look at device applications in the sport of three day eventing. From the perspective of analytics, wearing the clip will give me an ongoing photo record of this year’s LASI, but also GPS and accelerometer data, which I look forward to reviewing as well.

Using SNAPP in OSX

In recent months, many folks (myself included) have been frustrated by a sudden incompatibility between SNAPP and the latest version of popular browsers and Java. SNAPP (Social Networks Adapting Pedagogical Practice) is a highly useful tool for visualizing discussion board activity in Blackboard, Moodle, and Sakai. I have wanted to use it, and to recommend it to others. With this work around, I can do both.

I should note up front, that I have only successfully implemented this work around on my own desktop environment, and have not tested it on other platforms or using other versions. If this works for you under other conditions, or if you find other more versatile work arounds, please post to the comments.

The environment in which I have been able to successfully launch SNAPP is as follows:

Under these conditions, getting SNAPP to work is actually…well, a snap!

1. In the Java Control Panel (System Preferences –> Java), go to the Security tab and edit the Site List to include both the code source and the base url from which you expect to launch the SNAPP applet. (If you point to http://www.snappvis.org, then Java will yell at you for not using https. You’ll have to accept, or else host the code in some other more secure location. So it goes.)

SNAPP Java Fix

2. Click OK to commit the changes.

3. Hurray!!!

In addition to the primary website for the SNAPP tool (http://www.snappvis.org), Sandeep Jayaprakash from Marist College has provided some excellent documentation on installing SNAPP to work on a local machine. well worth checking out: https://confluence.sakaiproject.org/pages/viewpage.action?pageId=84902193

Teaching the Unteachable: On the Compatibility of Learning Analytics and Humane Education

This paper is an exploratory effort to find a place for learning analytics in humane education. After distinguishing humane education from training on the basis of the Aristotelian model of intellectual capabilities, and arguing that humane education is distinct by virtue of its interest in cultivating prudence, which is unteachable, an account of three key characteristics of humane education is provided. Appealing to thinkers of the Italian Renaissance, it is argued that ingenium, eloquence, and self-knowledge constitute the what, how, and why of humane education. Lastly, looking to several examples from recent learning analytics literature, it is demonstrated that learning analytics is not only helpful as set of aids for ensuring success in scientific and technical disciplines, but in the humanities as well. In order to function effectively as an aid to humane education, however, learning analytics must be embedded within a context that encourages continuous reflection, responsiveness, and personal responsibility for learning.

Access Full Text Here

Learning Analytics for a World of Constant Change

Slides from a presentation delivered during the a Digital Pedagogy Meetup in Atlanta (20 February 2014), discussing ways in which traditional analytics may stifle innovation, and identifying several ways in which embedded approaches to learning analytics may actually contribute to the development of personal responsibility, critical thinking, digital citizenship, and imagination — characteristics so vital to surviving and thriving in the 21st century.

“The Society for Learning Analytics Research defines learning analytics as the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. Universities are increasingly using analytics to increase student retention and performance. Yet, the assumptions that frequently underly such initiatives are also inconsistent with pedagogies that would seek to cultivate creativity and innovation, capacities that are necessary in order to survive and thrive in a world that is constantly and increasingly changing. It will be argued that innovation and analytics are not incompatible, however, but rather that they are reconcilable through a shift in emphasis and priority. The presentation will sketch a provisional model of learning analytics that puts analytics in the service of (a humanist conception of) learning, rather than the reverse, and provide concrete examples of how this might be applied in practice.”

“Educational Data Mining and Learning Analytics”

This week, Ryan Baker posted a link to a piece, co-written with George Siemens, that is meant to function as an introduction to the fields of Educational Data Mining (EDM) and Learning Analytics (LA). “Educational Data Mining and Learning Analytics” is book chapter primarily concerned with methods and tools, and does an excellent job of summarizing some of the key similarities and differences between the two fields in this regard. However, in spite of the fact that the authors make a point of explicitly stating that EDM and LA are distinctly marked by an emphasis on making connections to educational theory and philosophy, the theoretical content of the piece is unfortunately quite sparse.

The tone of this work actually brings up some concerns that I have about EDM/LA as a whole. The authors observe that EDM and LA have been made possible, and have in fact been fueled, by (1) increases in technological capacity and (2) advances in business analytics that are readily adaptable to educational environments.

“The use of analytics in education has grown in recent years for four primary reasons: a substantial increase in data quantity, improved data formats, advances in computing, and increased sophistication of tools available for analytics”

The authors also make a point of highlighting the centrality of theory and philosophy in informing methods and interpretation.

“Both EDM and LA have a strong emphasis on connection to theory in the learning sciences and education philosophy…The theory-oriented perspective marks a departure of EDM and LA from technical approaches that use data as their sole guiding point”

My fear, however, which seems justified in light of the imbalance between theory and method in this chapter (a work meant to introduce, summarize, and so represent the two fields), is that the tools and methods that the fields have adopted, along with the technological- and business-oriented assumptions (and language) that those methods imply, have actually had a tendency to drive their educational philosophy.  From their past work, I get the sense that Baker and Siemens would both agree that the educational / learning space differs markedly from the kind of spaces we encounter in IT and business more generally. If this is the case, I would like to see more reflection on the nature of those differences, and then to see various statistical and machine learning methods evaluated in terms of their relevance to educational environments as educational environments.

Donkey-Carried-by-the-CartAs a set of tools for “understanding and optimizing learning and the environments in which it occurs” (solaresearch.org), learning analytics should be driven, first and foremost, by an interest in learning. This means that each EDM/LA project should begin with a strong conception of what learning is, and of the types of learning that it wants to ‘optimize’ (a term that is, itself, imported from technical and business environments into the education/learning space, and which is not at all neutral). To my mind, however, basic ideas like ‘learning’ and ‘education’ have not been sufficiently theorized or conceptualized by the field. In the absence of such critical reflection on the nature of education, and on the extent to which learning can in fact be measured, it is impossible to say exactly what it is that EDM/LA are taking as their object. How can we measure something if we do not know what it is? How can we optimize something unless we know what it is for? In the absence of critical reflection, and of maintaining a constant eye on our object, it becomes all too easy to consider our object as if its contours are the same as the limits of our methods, when in actual fact we need to be vigilant in our appreciation of just how much of the learning space our methods leave untouched.

If it is true that the field of learning analytics has emerged as a result of, and is driven by, advancements in machine learning methods, computing power, and business intelligence, then I worry about the risk of mistaking the cart for the horse and, in so doing, becoming blind to the possibility that our horse might actually be a mule—an infertile combination of business and education, which is also neither.

Four (Bad) Questions about Big Data

A colleague recently sent me an email that included four questions that he suggested were the most concerning to both data management companies and customers: *

  • Big Data Tools – What’s working today? What’s next?
  • Big Data Storage – Do organizations have a manageable and scalable storage strategy?
  • Big Data Analytics – How are organizations using analytics to manage their large volume of data and put it to use?
  • Big Data Accessibility – How are organizations leveraging this data and making it more accessible?

These are bad questions.

I should be clear that the questions are not bad on account of the general concerns they are meant to address. Questions about tools, scalable storage, the ways in which data are analyzed (and visualized), and the availability of information are central to an organization’s long-term information strategy. Each of these four questions addresses a central concern that has very significant consequences for the extent to which available data can be leveraged to meet current informational requirements, but also future capacity. These concerns are good and important. The questions, however, are still bad.

The reason these questions are bad (okay, maybe they’re not bad…maybe I just don’t like them) is that they are unclear about their terms and definitions. In the first place, they imply that there is a separation between something called ‘Big Data’ and the tools, storage, analytics (here used very loosely), and accessibility necessary to manage it. In actual fact, however, there is no such ‘thing’ as Big Data in the absence of each of those four things. Transactional systems (in the most general sense, which also includes sensors) produce a wide variety of data, and it is an interest in identifying patterns in this data that has always motivated empirical scientific research. In other words, it is data, and not ‘Big Data’ that is our primary concern.

The problem with data as objects is that, until recently, we have been radically limited in our ability to capture and store them. A transactional system may produce data, but how much can we capture? How much can we store? For how long? Until recently, technological limitations have radically limited our ability to capture, store, and analyze the immense quantities of data that are generated, and have meant working with samples, and using inferential statistics to make probable judgements about a population. In the era of Big Data, these technological limitations are rapidly disappearing. As we increase our capacity to capture and store data, we increasingly have access to entire populations. A radical increase in available data, however, is not yet ‘Big Data.’ It doesn’t matter how much data you can store if you don’t also have the capacity to access it. Without massive processing power, sophisticated statistical techniques, and visualization aids, all of the data we collect is for naught, pure potentiality in need of actualization. It is only once we make population data meaningful in its entirety (not sampling from our population data) through the application of statistical techniques and sound judgement that we have something that can legitimately be called ‘Big Data.’ A datum is a thing given to experience. The collection and visualization of a population of data produces another thing given to experience, a meta-datum, perhaps.

In light of these brief reflections, I would like to propose the following (VERY) provisional definition of Big Data (which resonates strongly, I think, with much of the other literature I have read):

Big Data is the set of capabilities (capture, storage, analysis) necessary to make meaningful judgements about populations of data.

By way of closing, I think it is also important to distinguish between ‘Big Data’ on the one hand, and ‘Analytics’ on the other. Although the two are often used in conjunction with each other, it is important to note that using Big Data is not the same as doing analytics. Just as the defining characteristic of Big Data above in increased access (access to data populations instead of samples), so to does analytics. In the past, the ability to make data-driven judgements meant either having some level of sophisticated statistical knowledge oneself, or else (more commonly) relying upon a small number of ‘data gurus,’ hired expressly because of their statistical expertise. In contrast to more traditional approaches to institutional intelligence, which involve data collection, cleaning, analysis, and reporting (all of which took time), analytics toolkits quickly perform these operations in real-time, and make use of visual dashboards that allow stakeholders to make timely and informed decisions without also having the skills and expertise necessary to generate these insights ‘from scratch.’

Where Big Data gives individuals access to all the data, Analytics makes Big Data available to all

Big Data is REALLY REALLY exciting. Of course, there are some significant ethical issues that need to be addressed in this area, particularly as the data collected are coming from human actors, but from a methodological point of view, having direct access to populations of data is something akin to a holy grail. From a social scientific perspective, the ability to track and analyze actual behavior instead of relying on self-reporting about behavior on surveys can give us insight into human interactions that, until now, was completely impossible. Analytics, on the other hand, is something about which I am a little more ambivalent. There is definitely something to be said to encouraging data-driven decision-making, even by those with limited statistical expertise. Confronted by pretty dashboards that are primarily (if not exclusively) descriptive, without the statistical knowledge to ask even basic questions about significance (just because there appears to be a big difference between populations on a graph, it doesn’t necessarily mean that there is one), and with no knowledge about the ways in which data are being extracted, transformed, and loaded into proprietary data warehousing solutions, I wonder about the extent to which analytics do not, at least sometimes, just offer the possibility of a new kind of anecdotal evidence justified by appeal to the authority of data. Insights generated in this way are akin to undergraduate research papers that lean heavily upon Wikipedia because, if it’s on the internet, it’s got to be true.

If it’s data-driven, it’s got to be true.

Analytics Four Square Diagram

I’m not really happy with this diagram. Definitely a work in progress, but hopefully it capture’s the gist of what I’m trying to sort out here.

* The source of these questions is an event that was recently put on by the POTOMAC Officer’s Club entitled “Big Data Analytics – Critical Support for the Agency Mission”, featuring Ely Kahn, Todd Myers, and Raymond Hensberger.