Named Entity Processing on Historical Newspapers
HIPE (Identifying Historical People, Places and other Entities) is a evaluation campaign on named entity processing on historical newspapers in French, German and English, which was organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab.
(also visit the ‘hamburger’ menu on the top right)
Tasks: Named Entity Recognition and Classification and Named Entity Linking.
HIPE Final Workshop (Sept 2020):
Learn more about the results of the 13 teams who participated and submitted more than 70 runs here.
Data: in this github repository (zenodo record under finalization)
Participation: the participation guidelines (v1.1) offer a detailed description of the tasks and provide instructions relative to participation.
For information about the metrics, visit the dedicated page or section 4 of the participation guidelines.
HIPE scorer available HERE.
- Twitter: news will be posted via the @ImpressoProject. Participants can also follow the @clef_initiative account, and use the #clef2020 hash tag.
- Discussion group for participants: https://groups.google.com/forum/#!forum/clef-hipe-2020
Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures, and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performances are below the ones on contemporary data and are hardly comparable. In this context, the objective of HIPE is threefold:
- to strengthen the robustness of existing approaches on non-standard input;
- to enable performance comparison of NE processing on historical texts; and, in the long run,
- to foster efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.