Named Entity Processing on Historical Newspapers


HIPE (Identifying Historical People, Places and other Entities) is a named entity processing evaluation campaign on historical newspapers in French, German and English, organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab.

Key information

(also visit various pages via the ‘hamburger’ menu on the top right)

Tasks: Named Entity Recognition and Classification and Named Entity Linking.

Results (update 12.06.2020): Now published! Learn more about the results of the 13 teams who participated and submitted more than 70 runs here. Approaches will be described in working note papers presented in Sept at the CLEF conference (online) and published via the CEUR-WS platform.

Covid19: Check the new HIPE calendar and the CLEF home page. The final conference will happen online at the same date.

Registration: until 26 April 2020.

Data: in this github repository.

Participation: the participation guidelines (v1.1) offer a detailed description of the tasks and provide instructions relative to participation.


  • For information about the metrics, visit the dedicated page or section 4 of the participation guidelines.

  • HIPE scorer available HERE.



Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures, and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performances are below the ones on contemporary data and are hardly comparable. In this context, the objective of HIPE is threefold:

  1. to strengthen the robustness of existing approaches on non-standard input;
  2. to enable performance comparison of NE processing on historical texts; and, in the long run,
  3. to foster efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.