Research Internship – Creating automated metadata of Dutch Literary Masterpieces – Koninklijke Bibliotheek – Den Haag
The Digital Library of Dutch Literature (DBNL, see https://dbnl.nl/ is a digital collection of texts, belonging to Dutch literature, linguistics and cultural history from the earliest times to the present. The collection represents the entire Dutch language area and is the result of a collaboration between the Taalunie, the Flemish Heritage Library and the National Library of the Netherlands in The Hague. The website is consulted around four million times a year
The DBNL digitizes texts at a very high quality level. The texts are fully recorded in a sustainable XML-TEI format and have a margin of error with a margin of error of up to 0.005%, meaning 1 error in 20,000. In addition to ensuring the high quality, much attention is also paid to the enrichment of texts. This ensures good access to the literary heritage. References are made to relevant information about authors, titles, place names and data. The DBNL website currently contains more than 4.5 million pages of digitized text.
During this research internship or graduation assignment we would like to explore the possibilities of automatic assigning metadata within the DBNL production process.
This involves two parts:
In every text we are looking for all sorts of information (including names, places, dates). We now manually link this to our various thesauri. These are databases with, for example, titles, authors and countries. This link allows the DBNL to make interesting cross connections. We now want to explore the use of Named Entity Recognition (NER) to automatically extract information from the texts. This information must then be linked to thesauri. Disambiguation and spelling variations must be taken into account.
Besides adding substantive metadata, we also manually produce metadata about the structure of the texts. This concerns the indication of headings, page numbers, poems, tables, etc. For some titles (often prose) the layout is clear, but for e.g. textbooks and magazines this is more complicated. Are there ways to add these types of information automatically?
- is at the final stage of his or her study Artificial Intelligence, Data Science, Software Engineering, (Digital) Humanities or related
- can work technically independently, but with substantive support of our Data Science and DBNL Team
- can handle existing tooling, such as NER and layout analysis or knows how to gain knowledge about this
- has expertise in the field of NLP, machine learning and statistical models, for example for the disambiguation of found named entities
- has basic knowledge of Dutch language. Although this is not a necessity, some understanding will be helpful
- A research internship at the Data Science Team of the Research Department of the National Library of the Netherlands (18 fte) for max 6 months, but all catered to your needs and requirements of your university and supervisor
- A working place at the offices of the KB, downtown The Hague, only a 3 minute walk from Central Station
- Substantive support by both our Data Science Team as well as the DBNL Team
- Access to all data of the DBNL, tooling we developed before see https://lab.kb.nl and hardware we have available to run our own experiments
- Reimbursement for travel costs and a compensation, in line with our regular internship compensation.
The KB is a nationally and internationally renowned institution: with more than 500 employees, we are one of the major Dutch heritage and science institutions and have an important coordinating role in the network of public libraries. Tasks include preserving, collecting and making available all publications published in or about the Netherlands and building the national digital library. We also think it is important to train young colleagues.
We regularly have internships for students of various courses and disciplines, both academic and higher professional (eg book science, literature study, Artificial Intelligence, Data Science, Software Engineering, (Digital) Humanities, IT, HRM, financial, facility management, communication etc.). For example, we assist HBO students in their work placement, but also academics who want to carry out their (graduation) research or graduation project at the KB.
For more information about this internship or to apply, please contact Martijn Kleppe (Head of Research) at 06-24221048 or send an e-mail to
Comments are closed