Skip to main content

Topic Modeling All 75 Years of Slavic Review

Slavic Languages and Literatures
February 22, 2016

Slavic Languages and Literatures students Jacob Lassin and Carlotta Chenoweth are working with Assistant Professor Marijeta Bozovic and Trip Kirkpatrick, senior instructional technologist at the Yale Center for Teaching and Learning, on a project to “topic model” Slavic Review, the premiere academic journal in the field of Slavic Studies.  Published since 1941, Slavic Review covers a period that has seen enormous political, cultural, and social changes.
Topic modeling is a methodology used in digital humanities — an emerging field at the intersection of computing and the traditional humanities that interprets the cultural and social impact of new media and information technologies, and also creates and applies these technologies to cultural, social, historical, and philological questions.

Topic modeling uses an algorithm called Latent Dirichlet Allocation (LDA), which measures the relative proximity of all words in a text. If certain words frequently appear near each other (for example, “poet” and “Pushkin”), they are grouped together, creating a “topic.” Unlike traditional methods of categorizing text, the LDA algorithm uses quantitative, rather than qualitative, factors. The computer ignores the meaning of the words and pays attention only to their proximity.

One of the most useful outcomes of topic modeling is the ability to see certain correlations and connections between elements of a corpus that might not otherwise be recognized,” says Chenoweth. “Take, for example, Topic 117 in our data set, which contains the words, ‘political,' 'power,' 'government,' 'country,' 'leaders,' 'regime,' 'opposition,’ and so on.” Although these words all concern politics, “the computer has no sense of the semantic meaning of the words, because they are joined together only by their location within the text. We also find here ‘future,’ ‘period,’ and ‘year,’ which may imply that our understanding of historical time is rooted in political language,” she says.

Topic modeling “will allow us to look at the entire run of the journal and see the development of themes and trends that are present over time,” says Lassin. “We will be able to confirm certain beliefs about the field of Slavic Studies, dispel some myths, and learn things that we were not able to see before.”

Lassin’s own research focuses on the intersection of Russian literature, the Russian Orthodox Church, and digital culture. He is interested in “how the Church uses the Internet to take part in conversations on literature and Russian culture and identity, writ large.” He is currently participating in the Mellon Graduate Concentration on digital humanities.

His article, “The Digital City in Post-Soviet Identity Formation: The Case of,” was recently published in the journal Digital Icons. In it, he uses the online project OurBaku to explore how non-national identities are being revived and promoted through new media for post-socialist audiences.

Chenoweth’s dissertation research addresses the Soviet avant-garde of the 1920s and its production of literature for illiterate or semiliterate audiences, with particular focus paid to the work of Vladimir Mayakovsky and Aleksandr Rodchenko.