Maciej Eder
Text Analysis Is Easy, Unless It Is Not: Reliability Issues in Measuring Textual Similarities
Text analysis investigations aimed at determining the degree of textual similarities in a collection of documents, is often associated with authorship attribution, but it can be easily generalized to address more general research questions, e.g. stylistic differentiation between genres, traces of gender, chronology, intertextuality, as well as identifying other stylometric ‘signals’. Simple as it is – at least at the first glance – the methodology used to group documents according to their similarities is at the same time based on several tacit assumptions and approximations that the users are not always aware of. The talk will revolve around a few text analysis problems, including classification, clustering, and visualization, and will focus on their limitations. A few ideas on how to improve the analysis will also be discussed.
Maciej Eder is the director of the Institute of Polish Language (Polish Academy of Sciences), chair of the Committee of Linguistics at the Polish Academy of Sciences, principal investigator of the project Computational Literary Studies Infrastructure, co-founder of the Computational Stylistics Group, and the main developer of the R package ‘Stylo’ for performing stylometric analyses. He is interested in European literature of the Renaissance and the Baroque, classical heritage in early modern literature, and quantitative approaches to style variation. These include measuring style using statistical methods, authorship attribution based on quantitative measures, as well as “distant reading” methods to analyze dozens (or hundreds) of literary works at a time.
Andrea Kocsis
https://www.eca.ed.ac.uk/profile/dr-andrea-kocsis
Can digital humanities rewrite concepts from non-digital heritage studies?
With the help of a combination of distant and close reading, my paper aimed to re-evaluate why some heritage sites do not evoke hot cognition in visitors. Hot cognition is a form of affect, a direct emotional way in which we can interpret heritage experiences before or without thinking them over. Applying the term to heritage studies, David Uzzell claimed that the likelihood of the hot interpretation of a dissonant heritage site depends on the time passed between the original traumatic event and the visit. However, I argue that the exhibition’s curation, the story-telling, and levels of immersion play a more critical role in the hot interpretation than the time that has passed since the atrocity. To prove so, I wanted to revise Uzzel’s classic theory with new methodologies offered by digital humanities. To test my hypothesis, I have analysed 6000 TripAdvisor reviews about sites commemorating temporally distant tragedies, such as the Clifford’s Tower in York, the Mary Rose Museum in Portsmouth, and the Medieval Massacre exhibition at the Swedish History Museum. While the close reading of the extant data proved fruitful and supported my hypothesis, methodologically, the research ran into a contradiction that the talk wishes to explore. I aimed to test popular computational methods in heritage affect research (sentiment analysis, lexicon-based emotion detection, topic modelling) and compare them to close reading. However, the quantitative and qualitative methods came to different conclusions. The paper investigates the reasons behind this result and points out the limitations of crowdsourced lexicon labels, lexicon-based methods deploying a categorical model of emotions, off-the-shelf codes, and the lack of control studies and methodological triangulation.
Meelis Kull
Humans and AI: similarities, differences, and why it matters
Meelis Kull is a Professor of Artificial Intelligence at the University of Tartu. He is the head of the Estonian Centre of Excellence in AI. His main research topics are machine learning, artificial intelligence, and data science, with a focus on uncertainty quantification and trustworthiness.