Irene Amstutz and Martin Reisacher. Digital compass for 2.5 million historical newspaper clippings.
The Swiss Economic Archives SWA owns a collection of newspaper clippings on companies, persons and factual topics related to the Swiss economy, dating back to the 1850s. The 2.5 million clippings represent a well-used, central source collection for economic and social history. As part of a still ongoing project, about one third of the newspaper clippings have been digitized and made available in full text on a IIIF-based interface, offering new entrances to the collection. The presentation is intended to give researchers an understanding of the dossier principle used historically and to introduce the innovative IIIF-based search options. This is order to initiate a dialogue regarding the expectations of the digital presentation of a newspaper clipping collection and its relevance after the digital full text revolution.
Steven Claeyssens. Digitised Collections of Historical Publications as Bibliographic Objects: the Delpher Newspaper Case.
The KB, the national library of the Netherlands, publishes around 100 million pages of historical paper publications, available for digital research, on the national platform Delpher. It is argued that a better understanding of this type of massive collections of digital surrogates necessarily starts with the identification of the bibliographic objects that constitute the collection and establish their complex relationship. Expanding on the argument by Cordell (2017) that ‘we might think of OCR as a compositor setting text in a language it does not comprehend’ the paper takes a bibliographic approach to investigate the collection of digitised newspapers published on Delpher.
Claire-Lise Gaillard. Feuilleter la presse ancienne par gigaoctets.
Not yet available.
Martin Gasteiner and Andreas Enderlin. Crossing or Intersecting the Emperor´s Desk with digitized Newspaper Data.
Entity-source-networks in the late Habsburg Empire: Connecting entities and sources from newspaper databases and government files. The project tries to highlight, formalize and offer a typological model for newspapers and government files as interconnected sources. This results in the significant increase of the epistemic value of the sources for historical research and historiography. The almost innumerable highly formalized, written acts of decisions Emperor Francis Joseph I. received from his cabinet office during his reign show a clear focus on domestic affairs, which rarely went beyond the borders of the monarchy. If, however, the perspective is steered away from the decisions themselves towards the broader process of decision-making, a well-informed course of events integrated into far-reaching networks of governance consisting of numerous new actors becomes visible. Reports about these processes and actors are evident in newspapers of the time. This approach of entangled history and interconnected sources on a digital base enables surprising and new insights for researchers in the humanities. The project is a collaboration of the FWF-project ‚The Emperor’s Desk: a site of policy making‘ (funded by the Austrian science fund), Newseye 70299 (Horizon 2020), the Vienna University Library and the Austrian National Archives.
Martin Grandjean. Une analyse systématique de la médiatisation des débats de la Société des Nations dans le Journal de Genève.
Not yet available.
Malorie Guilbaud Perez. How a national common memory of a traumatic event has been built: the case of the Triangle Fire in newspaper collections, digitized or on-site used.
The fire that devastated the three top floors of the Asch building, workplace for the five hundred employees of the Triangle Company, is registered as one of the worst catastrophes occurred in New York through the XXth century. Not only this event made the headlines of the main newspapers but also, despite years passing by, it grounded into always larger social and territorial circles. Confronting digitized newspapers collections to on-site explorations and thanks to specific program analysis, this work tries to explore to what extend newspaper corpus could be built and used to understand it.
Christoph Hanzig, Martin Munke and Michael Thoß. Digitizing and presenting a Nazi newspaper – the example “Der Freiheitskampf”.
The Saxon daily newspaper “Der Freiheitskampf” has been digitized and presented online in order to fill the significant lack of sources about the Saxon NSDAP. The editors are systematically indexing the content of its articles as well as compiling and categorizing the Saxon-related articles in a relational database. The talk will address several aspects of using digitized newspapers as a historical source, namely how to a) offer a more critical approach to digital print sources by means of content-based indexing, than OCR; b) contextualize a newspaper collection by using standardized data; and c) engage with legal and moral questions relating to the digital presentation of an ideologically highly tainted source collection.Tackling these issues from the perspective of a research institution and of a provider of a research infrastructure allows discussing different points of views on the mass-digitalization of historical newspapers.
Zoé Kergomard.Out of the comfort zone. Digitized newspapers as a probe and an incentive for more reflexivity.
How to integrate digitized newspapers into a broader and diverse source corpus ? In my research project, I look electoral abstention as an object of contention between political, media and intellectual actors in post-war France, Germany and Switzerland. Digitized newspapers can first be used as a wonderful « probe » to explore the media coverage of newspapers. But integrating them into a wider corpus also leads to question the selection criteria for other (digitized or non-digitized) sources, from parliamentary debates to the traditional press reviews gathered by archivists. Rather than aiming at representativeness, the use of digitized newspapers should hence challenge all historians to more reflexivity in their research process.
Erik Koenen, Simon Sax and Falko Krause. Die “Berliner Volkszeitung” (1853–1944) aus der Distanz lesen. Praxis und Perspektiven des “Distant Reading” von digitalisierten Zeitungen.
Not yet available.
Monika Kovarova-Simecek. Cultural History of Financial News in Vienna (1771-1914) – A historical analysis using ANNO/ÖNB digital repository and ÖNBLabs.
The talk illustrates how ANNO together with digital tools of ONBLabs can be used to explore the cultural history of financial news in Vienna from the foundation of the Vienna Stock Exchange in 1771 until its temporary closure in 1914. A keyword-based approach was used to extract and statistically analyse a data set of 4 million hits from 174 historical newspapers. The discovered patterns provide information about i.a. which newspapers and audiences were relevant in the evolvement of financial journalism in Austria, to what extent financial press emerged, and which social discourses were held on the occasion of this development.
Suzanna Krivulskaya. The Crimes of Preachers: Religion, Scandal, and the Trouble with Digitized Archives.
Late-nineteenth century U.S. newspapers provide a rich archive of the scandalous. Their rapid digitization has allowed historians access to an unprecedented volume of data. Yet nineteenth-century North American newspaper editors were not always reliable narrators of the past: they reprinted others’ stories, embellished key details, and sensationalized their narratives to sell copy. Focusing specifically on sex scandals involving Protestant pastors, this paper interrogates the possibilities and limitations of working with digitized newspapers in the age of big data.
Pierre-Carl Langlais. Classified news.
Numapresse, a digital humanities project devoted to the historical study of French-speaking newspapers from 1800, has trained models to recognize major newspaper genres from political news to sport section or serial novels. The key output of this program has been the automated classification of most the major French dailies of the interwar period, from 1920 to 1939. Automated classification makes it possible to visualize structural patterns that are not easily accounted for by more qualitative historical approaches, such as the gradual emergence of moderne newspaper genres in the 19th century or the development of thematic supplements in the interwar period. Theses large scale classification programs are currently being extended to non-textual objects such as news images or editorial structures.
Sarah Oberbichler. Challenges and Pitfalls when working with Digital Newspaper Collections - an Example on the Topic “Return Migration”.
Based on a case study on return migration - defined as the movement of emigrants, refugees, prisoners of war, etc. back to their country of origin – this paper shows the impact of noisy automatic text recognition (OCR), limited search and download options, and missing metadata on research with historical newspapers. The attempt to find suitable subcorpora for further empirical analysis has clearly shown that what can be found via interfaces does not always reflect what is actually reported in the newspapers. This arises the question as to how we can reduce serious distortions and a one-sided “tunnel vision” while building subcorpora to such an extent that valid research results can be obtained?
Petri Paju and Heli Rantala and Hannu Salmi. Cycles of Text Reuse in Finnish Newspapers and Periodicals, 1771–1920: Ontological and epistemological perspectives.
The paper explores the ontological and epistemological ramifications of text reuse. It draws on a research project on Finnish digitized newspapers and magazines, five million pages in total, from 1771 to 1920. The paper analyzes the different cycles of text reuse. Some repetition chains were very slow, lasting over 140 years, some were very rapid, viral chains of reuse. Our argument is that text reuse has ontological ramifications on how we conceive the past processes in a media network. At the same time, it involves epistemological considerations, especially on the material conditions of both newspaper publishing and its digitization process.
Claudia Resch and Dario Kampkaspar. iffi«-‘ ■= und £ann$b*i - why the quality of full text matters (for historians and historical linguists).
For some time now, researchers, librarians and IT experts have been cooperating intensively to advance the digitisation of historical newspapers. Funding initiatives are pushing for the digital provision of both, images and full texts. However, little is said about the quality of these texts on the word and sign level. In fact, there are significant differences: they range from automatically generated OCR to manually improved, almost error-free, full text versions comparable to gold standard corpora. Based on a digitisation project for the Wiener Zeitung, this paper will discuss which measures could help to increase text quality and how historical newspapers could thus become a reusable Eldorado for historians and historical linguists.
Francois Robinet and Rémi Korman. Des usages des collections numériques de la presse rwandaise pour écrire l’histoire du génocide des Tutsi.
Not yet available.
Yannick Rochat and Selim Krichane. The colourful task of exploring video game magazines.
One of the main sources documenting the history of video games – their development, practice, or reception – are the numerous magazines published from late 1970s to the mid-2000s. They display digests of industry announcements, summary calendars, reports or game reviews and are invaluable in documenting the history of the media. During this talk, we will show a web application, based on community-driven archival material, which allows the exploration of some of these magazines. In particular, we will discuss the difficulties of dealing with sources having unusual formatting, like non aligned text or bad contrast between text and page background.
Fredrik Norén with Pelle Snickars, Alexandra Borg, Johan Jarlbrink, Erik Edoff and Måns Magnusson. Measuring “the political” in Swedish newspaper data during the 1960s and 1970s.
The aim of our paper is to present, examine and explore the development of the concept of “the political” in Swedish newspaper data – gleaned from four major newspapers – during the post-war era. The traditional assumption is that “the political” increased and diversified dramatically during the 1960s and 1970s with new notions as “the personal is political”, “political consciousness” and even “political fantasy”. We will empirically examine this assumption by first extracting and studying bigrams for the “political” from the newspaper data, and then deepen the analysis with explorative tools for topic modeling and network analysis.
Phillip Stroebel. Towards Large-Scale OCR for Historical Newspapers with Self-Trained Models.
Not yet available.
Giorgia Tolfo & Living with Machine team. Hunting for Treasure: Living with Machines and the British Library Newspaper Collection.
We discuss the open access digitisation programme undertaken by Living with Machines, exploring the range of constraints that inform digitisation strategies and selection priorities. Because the landscape of digitised newspaper collections is so complex, and research and digitisation processes operate on different timelines, we have focused on opportunities to make digitisation choices both transparent and pragmatic. Working towards solutions that reflect collaborations between library curators and scholars, we will introduce: a) our custom visualisation tool—the Press Picker—designed to support decision making about digitisation; b) the process of automatic metadata generation from the Newspaper Press Directories, a contemporaneous record of British newspapers.
Andrew Torget. Mapping Texts: Examining the Effects of OCR Noise on Historical Newspaper Collections.
This paper documents the “Mapping Texts Project,” an experiment focused on the problem of OCR noise in historical newspapers. The purpose of the project was to combine text-mining with data-visualization to measure both OCR noise rates and their effects on how scholars detect meaningful language patterns embedded in large-scale digital newspaper collections. The project developed two interactive visualizations measuring OCR quality and its effects on detecting language patterns, revealing the depth of the challenges facing humanities scholars seeking greater transparency of OCR data in historical newspaper databases.
Tobias von Waldkirch. Le locuteur je comme principe d’organisation du discours dans les correspondances particulières du Journal de Genève au XIXe siècle : l’exemple de la couverture de guerre.
Not yet available.
Melvin Wevers. Historical Advertisements in Digitized Newspapers as a Lens on the Past.
Historians have turned their focus to newspaper articles as a proxy of public discourse, while advertisements remain an understudied source of digitized information. This paper focuses on two aspects of working with extensive collections of advertisements. First, advertisements come in all shapes and sizes, ranging from classifieds, shipping reports to full-page ads. The size and position of these advertisements and particular word usage can help to construct particular subsets of advertisements. Second, this paper describes how we can use extract textual information from historical advertisements that can be used for historical analysis of trends and particularities.