Keep Big Data Out of Small Classrooms


The Village School, Albert Anker, 1896

by Mario Carpo

It all started with cellphones, a long time ago. No student, and few teachers, would make voice calls from class, but in the early 2000s GSM phones started to offer nearly free text messaging, and students (and faculty) started to text during lectures and seminars. Before long students were composing text messages without even looking at their phones, courtesy of the good old duodecimal keyboard; some could actually text from a phone in their pocket. Then of course 3G, web-enabled smartphones came, followed by tablets, and as most of our classrooms generously provide Wi-Fi connections, everyone sitting in a class these days has endless ways to reach out to whatever can be found online, which is to say almost anything. Permanent connectivity is a wondrous development and we all profit immensely from it, for all kinds of purposes. But permanent connectivity in a higher education environment, and particularly in research seminars, is a mixed blessing, and can easily get out of hand.

I do not worry here about the many unsuitable, illegal or just plain silly classroom uses of web-enabled information technologies. I recently saw a student choose and buy a sweater from her tablet while sitting in a lecture theatre, only a few rows from the speaker. This would not have been possible only a few years ago, but people who want to waste their time will always find a way, irrespective of the technologies at hand. While in high school I was myself very advanced in the art of not being seen reading a newspaper (in print) during some classes. Today’s students striving to use touch screens to type notes may be inflicting unnecessary pain upon themselves, but they are not very different from many students of my generation that took handwritten notes from the first to the last minute of class, non-stop, as if writing under dictation.

Generally speaking, the degree of attention that teachers and students can muster in class is probably constant in time, and it is only marginally and temporarily affected by technological change. One reason for this is that classroom lectures or seminars have always been based upon just one medium, and one information technology: the human voice, and the spoken word, whether in the form of discourse (one-to-many, in the case of lectures), or dialogue (one-to-one or many-to-many, in the case of seminars). This format has survived all technological change to this day, which makes it today almost absurdly anachronistic; and it may now be under its most insidious attack ever, as today’s enemy is coming from within, and in disguise.

Socrates famously taught by oral questions and answers  – viva voce. His dialogues are known to us only because they were put into writing (purportedly by Plato) one generation later. A bit later still Aristotle abandoned the dialogic format – it is not clear if by chance or by design – and his writings expound arguments without anyone to speak for them: Aristotle’s speaker is in fact the book itself, not a living person whose voice has been recorded and transcribed. Not surprisingly, teaching in medieval (and mostly Aristotelian) universities was, in theory, based on written texts. But scribal transmission was aleatoric and expensive, and good copies were few and far between. As a result, manuscripts were often read aloud, and the few extant authors were memorized, annotated and commented upon ad infinitum. This of course changed with print – to this day, most classroom assignments imply reading many texts, not parsing and memorizing just one.

Digital technologies may not, or not yet, have significantly extended the corpus of relevant texts for each discipline, but they have already made many relevant texts permanently and immediately available, and searchable, through any web-connected device – at the time of writing, as small as a cellphone, a pair of glasses, a wristwatch. Text and image-based information retrieval technologies are an extraordinary scholarly asset, but when a universal catalogue of all kinds of sources and an ever-growing repository of data of all sorts is brought to a seminar or a lecture class, where anyone can search through it on the spot and on the fly, by a simple tap on a tablet screen, strange and unwieldy things start to happen.

There are only so many things one can say in two hours, so every teacher preparing a lecture, or a discussion session, makes a careful selection of things to say – which also means, implicitly, a much longer list of things that should remain unsaid, at least as long as the class will last. Facts are sifted, compared, selected, and those that do not fit the topic of the day are dropped. That’s the way the human mind works – and most likely always has: by building simple theories out of many apparently unrelated events. We cope with Big Data by dropping most of them, and carefully arranging the few data we need to make some limited sense of the world – by inferring patterns, laws, rules, principles, mathematical functions, etc. And one needs not invoke a general theory of science to understand that a two-hour session can only deal with very few data indeed – those we can fit into an argument we can memorize and that takes no more than two hours to present and discuss. The classroom, based as it is on voice, words, and memory, is the realm of Small Data, not of Big Data.

But today each web-enabled tablet, phone and computer is a window open onto Big Data – and Big Data are instantaneously searchable. This means that every fact mentioned in class can be immediately checked or further researched by all – a good thing, evidently. But today, twenty students checking the same banal item of information online will in most cases quickly come up with twenty slightly different results. This is partly due to algorithmic search customization, and partly to the unauthorized nature of most freely searchable digital data: there is so much of it online precisely because most of it is raw, crowdsourced, and often inaccurate. The nature of hypertextual information favors post-modern aggregation, to the detriment of modern authorial precision. This is why Wikipedia works, sometimes surprisingly well, while the Encyclopedia Britannica in print recently went out of business. By their own technical logic, most digital data are very reliable on average or in aggregate, but they are never entirely trustworthy if taken one by one. Even if they were, their sheer quantity suggests that they would not sit well in a two-hour seminar. Twenty years ago no one would have come to class with a 35 volume encyclopedia. Today, many students and instructors think they can bring to class almost all the data in the world, to search at will.

For the time being, the irruption of Big Data in the word-based environment of the classroom may appear as little more than an occasional nuisance, but it is one that flags and points to a major cultural and technological issue of our time. Digital information retrieval systems are increasingly at odds with the processes and logic of orality, human memory, and even of alphabetical writing and print. No one can prove that we still need oral teaching, and that we can still profit from this fossilized survival of the most ancestral of all information technologies. But in case we want to preserve this format, perhaps just for two hours a week, then for those two hours Big Data should be shut down.

Evidently, we would still need digital technologies in class to show and process documents, and for a number of other very good reasons. But additional data should not be brought in after class has started. The best means to this end is for all (including instructors) to abstain from using any web-connected browser while the class is in session. In fact, in many cases, instructors and students could easily come to class with no information technology at all, except their memory, and words to give it voice. For those that do not trust their memory, a sheet of paper and a pen may help– but few arguments that cannot be memorized and oralized from memory may be worth remembering anyway. Either way, fact checking should be left outside of class – and with that, the hypertextual, serendipitous pleasures of fact surfing. Thanks to digital searches, new data will be found, sifted, collated, streamlined and new arguments will arise to present and discuss the next time the class meets.

A long time ago, when marketplaces were physical places, traders did not bring stock or cash to the exchange: they traded by voice, and their word was trusted. At the end of the day, accounts were reconciled and woe to the trader who had sold stock he did not own, or paid with cash he did not have. The same principle should apply to the voice-based environment of classrooms and seminars. The only tablet Aristotle could bring to school, or Cicero to the senate, was a wax tablet. In alphabetical mode, a wax tablet can hold approximately 2 kilobytes of data. We are lucky to have so many more technological options to choose from today. But when we are in class, if we want to profit from it, similar limits would still make plenty of sense.

About the Author:

Mario Carpo, architectural historian and critic, is the author of Architecture in the Age of Printing (MIT Press, 2001), The Alphabet and the Algorithm (MIT Press, 2011), The Digital Turn in Architecture (Wiley, 2012), and other books. He was recently appointed professor of architectural history at the Bartlett School of Architecture in London.