Corpus Poetry: Berfrois Interviews Hannes Bajohr
Resting Couple, Ernst Ludwig Kirchner, 1907
by Amanda DeMarco
I first came across Hannes Bajohr’s digital literature when I read “Glaube, Liebe, Hoffnung” a text assembled from comments containing the words “faith,” “love,” and “hope” from the Facebook site of Pegida, a German anti-Islamic political group. I was impressed by how the text employed a highly literary, spiritual framework (1 Corinthians 13: “And now these three remain: faith, hope and love. But the greatest of these is love.”) to reveal something about a modern hate group in its digital incarnation.
I asked him if he’d like to try a similar approach to a notoriously difficult subject: sex. His book Timidities was the result. I talked to Bajohr about the process of making digital literature and its ethical implications.
You created the poem “Monologue” by using software to scrape the text from letters sent to Savage Love, sorting out sentences that start with “I am,” and linking the results with conjunctions. The other poems in your book Timidities similarly use code to gather and shape online text. For us non-technically minded folk, could you explain the nuts and bolts of that process?
It is a very simple process, and you’ve already described the basic principle. I call what I do “corpus poetry” because I start with gathering a large body of text—a corpus—which I then process according to specific rules. It’s similar to older poetics that work with found text. Think of Brion Gysin’s cut-up technique, in which he sliced apart newspapers or his own writing, shuffled it, and put it back together, relying on the ‘random’ creation of new meanings.
But there are a couple of differences. The most obvious one is that I use digital means both for gathering the source text and for producing the result. Usually, I will use a free web scraper, like Kimono or Martins Balodis’s Chrome plug-in. I tell it to gather a certain part of thousands of similar pages—let’s say, the body text of all erotic stories from literotica.com (one of the sources used for Timidities). Thus, and this is the second difference, the most natural sources for me to use are those already born digital—and I would say amateur internet sex writing like this is already a pretty digital genre.
The third difference is that the corpora produced this way are vastly larger than a paper source could feasibly yield. The biggest corpus I used in Timidities is about 300 MB of plain text, which would equal roughly 150,000 standard pages. Big data lit, if you will. This directly impacts what I can do with the corpus. If you have such a large body of text, you inevitably find patters. You can find them with software linguists (and other people in the digital humanities) use, or by writing your own code. “Monologue” was produced by a code I wrote in Python – both for extracting the sentences I wanted to use, but also for putting them back together.
When you create digital literature, you’re using other people’s words. What are the ethical concerns involved for you in your practice?
There is an easy and a more complicated answer to this. The easy one would be: there aren’t any. Text is being repurposed every day, by all of us, be it by retweeting someone or by telling a joke you heard or adopting an idea someone else had. It’s a reality. In this sense, authorship is a useless concept from the start, and any text is fair game.
But that is often a merely theoretical position, and I don’t think anyone really buys it. Any appropriation has the potential to be an act of violence. You rip someone’s words out of context, you mix them up with others’. It is very easy to mock. And sometimes you might want to; especially in a political context, the juxtaposition of sentences, or even their mere repetition, can be downright devastating.
This was not at all my intention in “Monologue.” Rather, I hope to have retained a sense of fragility that comes from identifying yourself, especially when you want advice, which is a gesture of vulnerability. All these people introduce themselves in a way that is relevant to the problems they have and the help they seek; often, these problems are a directly related to these identities. There is a literal identity conflict here. So to answer your question, every piece requires an individual ethical decision—just as the fact that appropriation can be misused does not render it unethical in general.
A digital approach seems it would produce cold, technical poetry, but when reading your texts, terms like empathy, compassion, and insecurity seem relevant. How do you employ code to these ends? And why?
The assumption that code is cold might simply be wrong. You may get that idea because processing text according to pre-given rules still makes people uncomfortable. But even traditionally written literature is very often rule-based. Think of repetition and ellipsis, which are easily modeled through code, but are also basic literary techniques.
The title piece, “Timidities,” is an example: It consists of what are called n-grams, the n most likely consecutive words in a text. As a corpus, I used a website of gay erotic stories, but since I was only looking for 4-grams, the most repeated phrases break off before they get to the point; “he was going to…,“ “i could feel his…,” etc. Because this technique simulates ellipsis, these sentences have a shy, timid, but also excited quality.
All of the websites you scraped text from for this book were sex-related, but in the past you’ve created similar work from political sources. How did the difference in the base material affect your process?
The corpus dictates the concept, not the other way round. Almost always, I stumble upon some great source of text that gets me excited enough to work to get it—and gathering corpora can be work—without knowing what I will do with it. Together with Gregor Weichbrodt I’m part of the writer’s collective 0x0a, and a lot of our communication consists this kind pointing out interesting material to each other.
The start for a piece that was published by The Offing in March was a 1989/90 collection of press articles, parliamentary minutes, and flyers from East and West Germany. It was assembled by a German linguistic institute in order to allow the analysis of common speech patterns and syntactic structures used during the Wende (the reunification of the two countries). What a great source! A country discussing its identity—an immensely complex process. But when it’s discussed today, you only hear of some amorphous “we” that unanimously came together, excluding all contention. By extracting from that corpus all sentences of a certain length that begin with “we,” I brought these voices into dialogue again. There is contention, contradiction, while at the same time everyone tries to speak for a “we” that might simply be a lazy (or dangerous) construct. And I would say it was the corpus that suggested this concept.
About the Author:
Amanda DeMarco is the founder of Readux Books. She is currently translating Franz Hessel’s Walking in Berlin (Scribe Publications, 2016), as well as Gaston de Pawlowski’s New Inventions and the Latest Innovations (Wakefield Press).