Reusing digital collections from GLAM Labs: a Jupyter Notebook approach

Time: Wednesday, 29/May/2024: 8:30 – 12:00

Organisers: Gustavo Candela1, Mirjam Cuper2, Olga Holownia3, Max Pedersen4

1University of Alicante, Spain; 2National Library of the Netherlands; 3International Internet Preservation Consortium; 4Royal Danish Library, Denmark

For decades, GLAM organizations have been exploring new ways to make available their collections. Recent methods for publishing digital collections through initiatives such as “GLAM Labs” (glamlabs.io) and “Collections as Data” (collectionsasdata.github.io) have focused on the adoption and reuse of computational access methods. Over the past years, cultural heritage institutions have been gradually offering access to digital collections containing different materials such as metadata, text, and images. In this context, Jupyter Notebooks have emerged as a powerful tool to facilitate additional documentation about the collections for digital humanities researchers. GLAM institutions have started to employ Jupyter Notebooks as a new approach to demonstrate how reusers can access and experiment with datasets derived from their collections. The International GLAM Labs Community compiled a selection of projects provided by relevant institutions such as the Data Foundry at the National Library of Scotland (data.nls.uk), the National Library of Luxembourg (data.bnl.lu) and the Library of Congress (data.labs.loc.gov), and the Austrian National Library (labs.onb.ac.at/en/datasets) that are available on the website in the section dedicated to computational access (glamlabs.io/computational-access-to-digital-collections). Some members of the community have also published a research article to provide a methodology to assess the quality of these Jupyter Notebooks projects [1] and a checklist to publish collections as data [2].

The main goal of the workshop is to demonstrate how Jupyter Notebooks can be used as a tool for working with a dataset derived from a library collection. Following other approaches such as the GLAM Workbench (glam-workbench.net)[3], and provided as the content for the practical exercises, a new Jupyter Notebook collection will be made available before the conference that will be used during the workshop in order to provide examples of how to reuse digital collections. The collection will be made available through GitHub and will be prepared to be open in an executable environment such as Binder, making the code reproducible. Additional tools will be introduced such as Python environments (e.g., conda) and libraries to work with tabular data and natural language processing (e.g., pandas and NLTK: Natural Language Toolkit). This work intends to foster the use of Jupyter Notebooks in the GLAM context as well as provide an introduction for DH researchers and anyone interested in working with digital collections.

The workshop will be structured as follows:

  1. [15 mins] Introduction to the workshop
  2. [15 mins] Introduction to the computational access section on the GLAM labs website (https://glamlabs.io/computational-access-to-digital-collections/). Several aspects will be covered including the projects, how the information about the projects is introduced in Wikidata to create charts as well as how the new section was implemented.

  3. [2 hours] Practical exercises will cover the following steps:

  • opening a selection of Jupyter Notebooks in an executable environment using a web browser as the main tool
  • notebooks’ structure (code and markdown cell), and how users can add, run and remove cells
  • learning how to create a project of Jupyter Notebooks from scratch, and understanding the tools, platforms and services that can be used.
  • a brief installation guide for Python and environments such as Anaconda.
  1. [25 mins] Presentation of the article “An approach to assess the quality of Jupyter projects published by GLAM institutions” recently published in the Journal of the Association for Information Science and Technology. This article describes a methodology to assess the quality of Jupyter Notebook projects made available by GLAM institutions. Based on the best practices and guidelines, it provides a list of steps to follow in order to assess a Jupyter Notebook project.

  2. [5 mins] Wrap-up

Workshop coordinators

  • Gustavo Candela, University of Alicante
  • Olga Holownia, International Internet Preservation Consortium
  • Max Odsbjerg Pedersen, Royal Danish Library
  • Mirjam Cuper, National Library of the Netherlands

Format: on-site tutorial

Target audience: DH and CS researchers, librarians, archive staff, university staff and students

Number of participants: 20-25 max.

Technical requirements: Internet connection and laptops. Prior knowledge about the use of Jupyter Notebooks and Python programming language is not required but recommended. If participants want to develop and run the Jupyter Notebooks on their computers, they should install Python or an environment such as Anaconda before the workshop. The web browsers recommended are the latest versions of Firefox or Google Chrome.

Learning outcomes

The following list describes the learning outcomes of this workshop:

  • create awareness of the relevance of Collections as Data in the context of GLAM
  • appreciate the usefulness of Jupyter Notebooks in the GLAM context and its history
  • analyze the structure of a digital collection suitable for computational use
  • understand the steps involved when creating a collection of Jupyter Notebooks
  • understand the structure of a Jupyter Notebook and how to use it
  • create awareness of the relevance of documentation

References

[1] Candela, G., Chambers,S., and Sherratt, T. (2023), “An approach to assess the quality of Jupyter projects published by GLAM institutions”, J. Assoc. Inf. Sci. Technol. 74(13): 1550-1564. https://doi.org/10.1002/asi.24835

[2] Candela, G., Gabriëls, N., Chambers, S., Dobreva, M., Ames, S., Ferriter, M., Fitzgerald, N., Harbo, V., Hofmann, K., Holownia, O., Irollo, A., Mahey, M., Manchester, E., Pham, T.-A., Potter, A. and Van Keer, E. (2023), “A checklist to publish collections as data in GLAM institutions”, Global Knowledge, Memory and Communication, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/GKMC-06-2023-0195

[3] Sherratt, T. (2021), “GLAM Workbench (version v1.0.0)”, Zenodo. https://doi.org/10.5281/zenodo.5603060