Exploring, Transforming and Visualizing Digital Editions and Corpora with Visual and AI Augmented Workflows

Sasha Rudan, Sinisha Rudan, Lazar Kovacevic, Eugenia Kelbert, Lucija Mandic

At the workshop we will use ColaboFlow framework integrated in the LitTerra platform for presenting multilingual corpora to explore and transform digital editions and corpora with visual and AI augmented workflows.
We will provide 3 corpora for participants to work with: 1. Henrik Ibsen’s multilingual corpus from the Centre for Ibsen Studies at the University of Oslo 2. Jane Eyre’s multilingual corpus from the Oxford University project, Prismatic Jane Eyre 3. Vladimir Nabokov’s multilingual work from the IMPULZ project IM-2022-68
The ColaboFlow framework is an expressive tool for creating, executing, and sharing visual and AI augmented workflows. On one hand, it is designed to be user-friendly and accessible to researchers with no programming skills. Therefore it supports BPMN diagrams as a visual representation of workflows and interaction with them. On the other hand, it is designed to support fully executable workflows that can be executed in a scalable and distributed environment. It provides semantic description of datasets propagating through workflows and semantic description of the workflows themselves. This enables AI to correctly comprehend workflows that can be composed, extended and executed.
During the workshop, we will have hands-on session where participants will be able to practice with such a process. We will practice the process we call workflow solidification where participants will start with describing their own workflows, then solidify them from descriptive to “taskative” (task-structured), and finally to fully executable workflows. The purpose of these steps is to end up with workflows that can be executed on real corpora and datasets in order to generate research-required results. During that process participants will be able to visualize their workflows, refine them, expand them, and interpret the results.
After the solidification process, and executing the workflows on real data, participants will be able to visualize results using AI augmented charts and graphs generation feature of LitTerra platform. The visualization process will follow the “The grammar of graphics” (Wilkinson 2012) model, visualized as a workflow, where participants will be able to interact with the visualizations and modify them. This will enable them to interpret the results and to make further decisions on the research process.
Finally, the will be able to use the LitTerra platform to augment the corpora with both visualizations and distant reading augmentation (such as showing the charts along the text and annotating the text with the annotations generated from the workflow results). This will enable them to explore the corpora in a new way and to make new discoveries which will eventually start a new cycle of the research process.
ColaboFlow workflows are designed to be collaborative and reusable/reproducible. This means that we will practice collaboration across research teams and annotation of the workflows, as well as reusing the workflows and corpora in the next research cycles. To be reproducible, the workflows support versioning, specialization (for specific parts of corpora, like specific tools for specific text languages), and provenance tracking, while datasets support versioning and provenance tracking. Having these features, we would be able to run the identical workflows against different text witnesses or their translations aiming for solid and argumented comparative results.

The setup of the workshop will be as follows:
– Introduction: We will start with a brief introduction to the ColaboFlow framework and the LitTerra platform for participants to get familiar with the concepts and the benefits of the tools and overall methodology.
– Potential and challenges: After that we will have a group discussion on the potential use cases and the challenges that participants are facing in their research.
– Human-in-the-loop: We will discuss the concept of human-in-the-loop particularly in the context of AI capable of composing and executing research workflows.
– Hands-on session: The main part of the workshop will be a hands-on session where participants will be able to explore and transform digital editions and corpora with visual and AI augmented workflows. We will provide a set of predefined workflow descriptions and corpora for participants to work with. Participants will be able to modify the workflows and corpora and see the outcome results.
– Evaluation: We will evaluate participants’ experience and the resulting workflows and their results. We will present some of them separately and provide overall statistics in behavioral patterns and outcomes.
– Discussion: We will conclude the workshop with a discussion on the potential future developments and the benefits of using visual and AI augmented workflows in digital humanities research. We will also discuss the potential future collaborations and the ways to continue the work started at the workshop.

About the organizers

Sasha Mile Rudan is completing his Ph.D. in Computer Science at the University of Oslo, focusing on collaborative virtual systems for augmenting DH workflows, social processes, knowledge management, and dialogue. He serves as infrastructure architect at the Centre for Ibsen Studies and as a researcher at Uppsala University, exploring infrastructure for researching hagiographic texts. He co-founded the LitTerra Foundation, which supports digital literature and cultural heritage through platforms for computational text analysis and visualization. Additionally, he co-founded ChaOS, an NGO that facilitates trans-disciplinary projects combining socially-engaged art, sustainability, culture, and ecology, involving researchers, writers, and artists from diverse fields.

Sinisha Rudan is an international lecturer, independent researcher, leader of several multidisciplinary projects, entrepreneur, IT-developer and artist. His work is transdisciplinary – in the fields of Socio-IT collaborative ecosystems, Collective Creativity and Intelligence, uniting it with Social Psychology, Art and Literature, Sustainable Development, Interactive formats, and Startup world (with a focus on Social Entrepreneurship).

Lazar Kovacevic is an independent researcher with a focus on the application of IT technology to education, creativity, collaboration, social action, etc. He has done many projects in areas of (web) information retrieval systems, text analysis and natural language processing, machine learning, data mining, collaboration, etc. He enjoys participating in multidisciplinary environments and working on interdisciplinary solutions to real world problems. He co-authored several papers discussing creative features in time series ranging from physical and biological to physiological and psychological processes (i.e. healthy heart shows more creative features than unhealthy). He developed algorithms for increasing diversity of perspectives in search results.

Eugenia Kelbert is Assistant Professor of Comparative Literature and Philology at the Higher School of Economics in Moscow and Leverhulme Early Career Fellow at the British Centre for Literary Translation at the University of East Anglia. Her work to date has focused on literary bilingualism, and especially on literature written in the second language of the author (translingualism). Her dissertation on this topic won the 2016 Charles Bernheimer Prize for best dissertation in Comparative Literature; she is currently reworking it as a book and researching her second book on translation and cross-lingual stylistic transfer. She has published on Joseph Brodsky, Rainer Maria Rilke and Eugene Jolas, among others, and is involved in collaborations in literary multilingualism, transnational creative writing and digital humanities.

Lucija Mandic graduated from Slovenian and Comparative literature at the Faculty of Arts, University of Ljubljana. In 2020, she became a doctoral student at the Postgraduate School ZRC SAZU (module ‘Literature in Context’) to work on a dissertation on distant reading of the Slovenian novel. She works as a junior researcher at the Institute of Slovenian Literature and Literary Studies ZRC SAZU. Her research interests include Slovenian literature of the nineteenth and twentieth centuries, minor literatures, digital literary studies, neo-avant-garde literature, cultural nationalism and cultural transfers.

Bibliography

Reynolds, Matthew, et al. Prismatic Jane Eyre: Close-Reading a World Novel Across Languages. Open Book Publishers, 2023.
Reynolds, Matthew, and Giovanni Pietro Vitali. “Mapping and Reading a World of Translations: Prismatic Jane Eyre.” Modern Languages Open 1 (2021).
Rothwell, Andrew, Andy Way, and Roy Youdale, eds. Computer-Assisted Literary Translation. Taylor & Francis Group, 2023.
Rudan, Sasha Mile, Sinisa Rudan, and Birger Møller-Pedersen. “Extending BPM (N) to Support Face-to-Virtual (F2V) Process Modeling.” MODELSWARD. 2021.
Rudan, Sasha Mile, et al. “Twin Talk: Bukvik+ LitTerra+ Colabo. Space-An example of DH collaboration across disciplines, languages, and style.” (2020): 15-29.
Rudan, Sasha Mile, et al. “Colabo. Space-Participatory Platform for Evolving Research and Publishing Workflows.” Linking Theory and Practice of Digital Libraries: 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings 25. Springer International Publishing, 2021.
Wilkinson, Leland. The grammar of graphics. Springer Berlin Heidelberg, 2012.