Workshop on Cultural Heritage data mining

Lars Kjær, Anders Klindt Myrvoll

Are you fascinated by AI and text and data mining? Want to get insight into fundamental concepts in the field of machine learning? And are you interested in finding ways to use cultural heritage data as material in your Digital Humanities courses?

Then take part in this workshop and learn about this using a user-friendly software called Orange. Orange is a comprehensive, component-based software suite for machine learning and data mining, developed at Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, together with open source community (Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Mozina M, Polajnar M, Toplak M, Staric A, Stajdohar M, Umek L, Zagar L, Zbontar J, Zitnik M, Zupan B , 2013).

The advantage of Orange is that it is a no-code software that handles many of the same tasks and processes that are integrated parts of the data science that typically is applied to different fields of studies within both natural, social and humanistic science.

Orange can be used by researchers and employees in the GLAM sector. If you need to understand and evaluate results made by digital methods, then Orange is great for learning about AI and text and data mining. If you wish to introduce Digital Humanities in a classroom, then Orange can be used instead of Python or R for multiple data science tasks. If you need to build datasets and/or experiment with new approaches to a digital collection, then Orange can be used to sort, classify, analyse, categories, and retrieve data.

Target audience: University researchers and employees in the GLAM sector.

Expected learning outcome: On this workshop you can learn about:

  • Orange’s documentation

  • Orange’s interface

  • Importing data into Orange

  • Preprocessing text; preparing data for natural language processing tasks

  • Embedding; transformation of images and text into vectors

  • Classifying in order to analyse and interpret data

  • Visualisation of data

Format: The half-day workshop is based on active participation and switches between short talks and practical solving of small tasks.

Technical requirements and data: To participate you got to bring your own computer, and I would also ask you to have downloaded and installed Orange on your computer before the workshop begin. Click here to go to the download page. If you wish to work on your own data in form of image files, text files, csv files you are very welcome to do so, if you do not have your own data, then I will have data available for you to use.

Anticipated number of participants: 25 participants.

About the organizers

Lars Kjær is a Special Advisor in Digital Humanities at the Copenhagen University Library, part of the Royal Danish Library. He plays a role in advancing DH initiatives at the University of Copenhagen. His responsibilities include designing and facilitating both onsite and online DH workshops, as well as providing guidance on how the Royal Danish Library’s digitized cultural heritage material can be used for research and teaching at the university. He possesses a wide range of DH skills, including an understanding of trends in DH across various humanistic disciplines. He teaches Python scripting, photogrammetry software, GIS, and no-code software such as Voyant, Spyral, and Orange. Lars holds an academic background in history from the University of Copenhagen. He also contributes to the digital humanities community as a board member and treasurer for DHNB.

Anders Klindt Myrvoll is the Programme Manager at the national Danish web archive, Netarkivet, at the Royal Danish Library since 2018. Together with colleagues, he is collecting, preserving and providing access to the Danish web. Prior to web archiving Anders worked more than 13 years in the broadcast, film and media industry, collaborating globally on high-end localization, making original content for children, saving digital cultural heritage, strategy, optimization, leadership and much more. You can find him at Linkedin or @andersklindt on X/Twitter.

Bibliography

Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Mozina M, Polajnar M, Toplak M, Staric A, Stajdohar M, Umek L, Zagar L, Zbontar J, Zitnik M, Zupan B (2013) Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research 14(Aug): 2349−2353.