The objectives were to (1) update our partners on conducted activities since first workshop, and (2) collaboratively work on historical research scenarios to further understand historians’ needs. Overall, this meeting helped to set a midway milestone in the co-design work of the interface and foster dialogue within the project and with the partners.
Update on recent activities
Although not a lot is tangible yet, Impresso team made progresses on several fronts during the last months. As for data acquisition, things are progressing slowly but surely: beside NZZ and LeTemps which are already on our servers, first Swiss newspapers have been transferred (here our thanks go to RERO and Swiss libraries), Luxembourg ones are about to arrive, and contacts with further libraries are on-going.
Data transfer comes with its share of technical manipulations and surprises: corrupted archives, missing files, unexpected format changes, etc. Copying files from left to right takes more than pressing a button and asks for careful, systematic and quite long verifications.
Once on our drives, xml-encoded METS/ALTO OCR information is converted into a “canonical” light data representation, encoded in JSON. Our goal here is to be able to easily process - potentially in a distributed manner - millions of files containing only the information of interest to us.
Regarding images, the impact of image format and compression rate on OCR quality is under evaluation, in order to find the optimum balance between file size and OCR quality in case we will need to re-OCRise. A blog post on this topic will soon be published.
Co-design & historical use cases
Discussions on the co-design of the interface had previously been summarised in a white paper setting out the requirements that need to be met if the interface is to be used for academic purposes, and in a green paper reviewing existing interfaces. Team members and associates had the opportunity to read and comment these summaries in advance, which set the scene for the workshop discussion. Both documents are work-in-progress and will be refined as discussions advance, until next workshop in July where they will be signed off.
The meeting was also an opportunity to discuss various historical use cases associated with the project. Research scenarios on the figure of Jean Monnet (Raphaëlle Ruppen-Coutaz, UNIL), on the Battle of Arnhem (Marten Düring, C2DH), and on the Resistance to “Europe” (Estelle Bunout, C2DH) were discussed from a technical and research point of view in terms, considering the perspectives of history NLP, and also data mining.
The discussions on the historical use cases were intended to help all the partners express their needs and requirements, and to serve as stepping stones for testing tools and evaluating their relevance for the final project outcomes. Use cases enable historians to express their expectations in more concrete terms (and questions), based on their actual research practices and needs.
The design of the interface was discussed on the basis of a short review of inspiring existing interfaces, and of first interface sketches (wireframes) presented by the designers. These presentations sparked off a constructive and useful discussion on various aspects of practical interface feature implementation.
Through this second workshop, it appears that the regular interdisciplinary dialogue fosters the synergy of the project, and enables the continued and active participation of end users in the design of the interface as well as their progressive appropriation of the tools.
The next impresso workshop scheduled on 5-6 July 2018 in Lausanne will focus on media history and epistemological questions, and present the very first interface prototype.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
A Creative Commons Attribution-NoDerivatives 4.0 (CC BY-ND 4.0) license applies
to all contents published in impresso. While articles published on impresso can
be copied by anyone for noncommercial purposes if proper credit is given,
all materials are published under an open-access license with authors retaining
full and permanent ownership of their work.