An exploration of speech-oriented research in non-speech-centric disciplines

Jens Edlund, Ambika Kirkland, Axel Ekström, Christina Tånnander, Ghazaleh Esfandiari, Jim O’Regan

This workshop aims to explore speech-oriented research in non-speech-centric disciplines.

Research that involves speech and spoken interaction routinely encounters significant challenges and pitfalls in analysis and experiment design. Despite these hurdles, many fields rely heavily on understanding spoken communication, whether in oral history, social sciences, medical diagnostics, or the arts. In the areas of speech technology and speech science, at least some of these hurdles are known, as are some of their solutions. However, there remains a lack of collaboration and knowledge exchange between speech-centric disciplines and disciplines in which speech and spoken interaction is a central part, but not the topic of study per se.

The workshop will focus on two main strands: studies involving live participants, such as recordings or analysis of live interactions, and studies involving the analysis of existing data. The former involves considerable methodological as well as technical and ethical issues, while the main pitfalls in the latter involves the appropriateness and reliability of analysis methods and the reliability and appropriateness of the data given the research question.

The half-day workshop on “Speech-oriented research” recognises the need for greater coordination of interdisciplinary collaboration involving the analysis of or experimentation with speech and spoken interaction. Situated within the context of digital humanities, where traditionally non-technical fields increasingly engage with data and technology, this workshop brings together researchers from various domains to discuss the complexities of studies involving spoken human communication.

A preliminary half-day agenda includes the following, with roughly 1h per item, including breaks:

  • An introduction to the workshop and its participants. Includes some examples of innovative collaborations (organisers)
  • A keynote (invited external speaker)
  • Poster presentations focussing on practical matters: experienced problems, successes, research questions, data sets. Submission and solicitation. Editorial review; choice based on a balanced set of presentations. (Accepted contributions.)
  • An open discussion of themes gathered from the presentations. Structured questions for which we propose answers. (All, potentially in groups.)
  • Interest in forum; interest in participation in white paper authoring, interest in participation in general organisation, interest in Dagstuhl proposal writing (all; the latter may be excluded from open discussion depending on number of participants).
  • Concluding words. (short)
  • Social event
  • Expected outcomes

The long-term goal is an increased awareness and willingness to collaborate on speech-oriented questions.

The practical goals are:

  • A white paper. Workshop participants as well as external authors will be invited to participate.
  • An international interest group (potentially, at some point, a formal SIG) with a structured forum (e.g. a mailing list).
  • A Dagstuhl proposal to continue work on best practices and methods of information sharing between disciplines that avoids constraining individual disciplines.
  • Finally, as a result of the Dagstuhl, a continued regular workshop.

About the organizers

Jens Edlund holds a PhD in Speech Communication and a Docent in Speech Technology. He is a full professor at KTH Royal Institute of Technology in Stockholm, and the Director of Språkbanken Tal, the speech branch of the Swedish National Research Infrastructure Nationella språkbanken, as well as the head of the KTH representation in the National Research Infrastructure HumInfra. He is responsible for CLARIN SPEECH, a CLARIN ERIC K-centre focusing on speech technology in the humanities and social sciences, and for the KTH participant in the Swedish Dariah membership.
Edlund’s research is highly interdisciplinary, and he is currently the PI of a handful of interdisciplinary research projects ranging from accessibility research to conceptual development in parliamentary discourse. He has published well over 100 peer reviewed journal and conference articles with speech and spoken interaction is a common denominator in a range of disciplines.

Ambika Kirkland is a PhD student at KTH. One main interest is analysis of human perception of different types of speech, where she experiments with different technologies ranging from web-based perception experiments to EEG.

Axel Ekström is a post-doc In the University of Zurich. His work investigates the origin of speech from an acoustic and articulatory perspective, and spans humans, primates, and other mammals, and involves among other things attempts at clarifying terminology and methodology issues that arise when borrowing methods from one field to another.

Christina Tånnander is a speech technologist at the Swedish Agency for Accessible Media and an industrial PhD student at KTH. Her work includes evaluation methods that puts the human needs in the centre, and sits in the cross-section of several relatively disparate fields.

Ghazaleh Esfandiari is a PhD student at KTH. She studies multimodal, multiparty human interaction in for example meetings, borrowing questions and methods from several disciplines.

Jim O’Regan is a PhD student at KTH. He works on new methods of processing large amounts of existing speech, mainly from historical archives and/or under-resourced languages. He is involved in several projects where non-speech-centric researchers investigate speech-oriented phenomena.