Impresso API resources

Search content items in the Impresso corpus.

impresso.search.find(term='Titanic', limit=10)

impresso.resources.search.SearchResource

Bases: Resource

Search content items in the impresso database.

find(term=None, order_by=None, limit=None, offset=None, with_text_contents=False, title=None, front_page=None, entity_id=None, newspaper_id=None, date_range=None, language=None, mention=None, topic_id=None, collection_id=None, country=None, partner_id=None, text_reuse_cluster_id=None)

Search for content items in Impresso.

Parameters:
  • term (str | AND[str] | OR[str] | None, default: None ) –

    Search term.

  • order_by (SearchOrderByLiteral | None, default: None ) –

    Order by aspect.

  • limit (int | None, default: None ) –

    Number of results to return.

  • offset (int | None, default: None ) –

    Number of results to skip.

  • with_text_contents (bool | None, default: False ) –

    Return only content items with text contents.

  • title (str | AND[str] | OR[str] | None, default: None ) –

    Return only content items that have this term or all/any of the terms in the title.

  • front_page (bool | None, default: None ) –

    Return only content items that were on the front page.

  • entity_id (str | AND[str] | OR[str] | None, default: None ) –

    Return only content items that mention this entity or all/any of the entities.

  • date_range (DateRange | None, default: None ) –

    Return only content items that were published in this date range.

  • language (str | OR[str] | None, default: None ) –

    Return only content items that are in this language or all/any of the languages.

  • mention (str | AND[str] | OR[str] | None, default: None ) –

    Return only content items that mention an entity with this term or all/any of entities with the terms.

  • topic_id (str | AND[str] | OR[str] | None, default: None ) –

    Return only content items that are about this topic or all/any of the topics.

  • collection_id (str | OR[str] | None, default: None ) –

    Return only content items that are in this collection or all/any of the collections.

  • country (str | OR[str] | None, default: None ) –

    Return only content items that are from this country or all/any of the countries.

  • partner_id (str | OR[str] | None, default: None ) –

    Return only content items that are from this partner or all/any of the partners.

  • text_reuse_cluster_id (str | OR[str] | None, default: None ) –

    Return only content items that are in this text reuse cluster or all/any of the clusters.

Returns:
  • SearchDataContainer( SearchDataContainer ) –

    Data container with a page of results of the search.

impresso.api_client.models.search_order_by.SearchOrderByLiteral = Literal['date', 'id', 'relevance', '-date', '-relevance', '-id'] module-attribute

impresso.resources.search.SearchDataContainer

Bases: DataContainer

Response of a search call.

df: DataFrame property

Return the data as a pandas dataframe.

Entities

Search entities in the Impresso corpus.

impresso.entities.find(term="Douglas Adams")

impresso.resources.entities.EntitiesResource

Bases: Resource

Search entities in the Impresso database.

find(term=None, wikidata_id=None, entity_id=None, entity_type=None, order_by=None, resolve=False, limit=None, offset=None)

Search entities in Impresso.

Parameters:
  • term (str | None, default: None ) –

    Search term.

  • wikidata_id (str | AND[str] | OR[str] | None, default: None ) –

    Return only entities resolved to this Wikidata ID.

  • entity_id (str | AND[str] | OR[str] | None, default: None ) –

    Return only entity with this ID.

  • entity_type (EntityType | AND[EntityType] | OR[EntityType] | None, default: None ) –

    Return only entities of this type.

  • order_by (FindEntitiesOrderByLiteral | None, default: None ) –

    Field to order results by.

  • resolve (bool, default: False ) –

    Return Wikidata details of the entities, if the entity is linked to a Wikidata entry.

  • limit (int | None, default: None ) –

    Number of results to return.

  • offset (int | None, default: None ) –

    Number of results to skip.

Returns:
  • FindEntitiesContainer( FindEntitiesContainer ) –

    Data container with a page of results of the search.

get(id)

Get entity by ID.

impresso.resources.entities.EntityType = Literal['person', 'location'] module-attribute

impresso.api_client.models.find_entities_order_by.FindEntitiesOrderByLiteral = Literal['count', 'count-mentions', 'name', 'relevance', '-relevance', '-name', '-count', '-count-mentions'] module-attribute

Media sources

Search media sources available in the Impresso corpus.

impresso.media_sources.find(
    term="wort",
    order_by="lastIssue",
)

impresso.resources.media_sources.MediaSourcesResource

Bases: Resource

Search media sources in the Impresso database.

find(term=None, type=None, order_by=None, with_properties=False, limit=None, offset=None)

Search media sources in Impresso.

Parameters:
  • term (str | None, default: None ) –

    Search term.

  • type (FindMediaSourcesTypeLiteral | None, default: None ) –

    Type of media sources to search for.

  • order_by (FindMediaSourcesOrderByLiteral | None, default: None ) –

    Field to order results by.

  • with_properties (bool, default: False ) –

    Include properties in the results.

  • limit (int | None, default: None ) –

    Number of results to return.

  • offset (int | None, default: None ) –

    Number of results to skip.

Returns:

impresso.api_client.models.find_media_sources_order_by.FindMediaSourcesOrderByLiteral = Literal['countIssues', 'firstIssue', 'lastIssue', 'name', '-name', '-firstIssue', '-lastIssue', '-countIssues'] module-attribute

impresso.resources.media_sources.FindMediaSourcesContainer

Bases: DataContainer

Response of a search call.

df: DataFrame property

Return the data as a pandas dataframe.

Content Items

Get a single content item by ID.

impresso.content_items.get("NZZ-1794-08-09-a-i0002")

Collections

Work with collections

impresso.resources.collections.CollectionsResource

Bases: Resource

Work with collections.

add_items(collection_id, item_ids)

Add items to a collection by their IDs.

NOTE: Items are not added immediately. This operation may take up to a few minutes to complete and reflect in the collection.

Parameters:
  • collection_id (str) –

    ID of the collection.

  • item_ids (list[str]) –

    IDs of the content items to add.

find(term=None, order_by=None, limit=None, offset=None)

Search collections in Impresso.

Parameters:
  • term (str | None, default: None ) –

    Search term.

  • order_by (FindCollectionsOrderByLiteral | None, default: None ) –

    Order by aspect.

  • limit (int | None, default: None ) –

    Number of results to return.

  • offset (int | None, default: None ) –

    Number of results to skip.

Returns:

get(id)

Get collection by ID.

items(collection_id, limit=None, offset=None)

Return all content items from a collection.

Parameters:
  • collection_id (str) –

    ID of the collection.

  • limit (int | None, default: None ) –

    Number of results to return.

  • offset (int | None, default: None ) –

    Number of results to skip.

Returns:
  • SearchDataContainer( SearchDataContainer ) –

    Data container with a page of results of the search.

remove_items(collection_id, item_ids)

Add items to a collection by their IDs.

NOTE: Items are not removed immediately. This operation may take up to a few minutes to complete and reflect in the collection.

Parameters:
  • collection_id (str) –

    ID of the collection.

  • item_ids (list[str]) –

    IDs of the content items to add.

impresso.api_client.models.find_collections_order_by.FindCollectionsOrderByLiteral = Literal['date', 'size', '-date', '-size'] module-attribute

impresso.resources.collections.FindCollectionsContainer

Bases: DataContainer

Response of a find call.

df: DataFrame property

Return the data as a pandas dataframe.

Named entity recognition

The python library contains a set of named entity recognition methods that use the same NER model used to add entities to the Impresso database.

impresso.resources.tools.ToolsResource

Bases: Resource

Various helper tools

nel(text)

Named Entity Linking

This method requires named entities to be enclosed in tags: [START]entity[END].

Parameters:
  • text (str) –

    Text to process

Returns:

ner(text)

Named Entity Recognition

This method is faster than ner_nel but does not provide any linking to external resources.

Parameters:
  • text (str) –

    Text to process

Returns:

ner_nel(text)

Named Entity Recognition and Named Entity Linking

This method is slower than ner but provides linking to external resources.

Parameters:
  • text (str) –

    Text to process

Returns:

impresso.resources.tools.NerContainer

Bases: DataContainer

Name entity recognition result container.

df: DataFrame property

Return the data as a pandas dataframe.

limit: int property

Page size.

offset: int property

Page offset.

size: int property

Current page size.

total: int property

Total number of results.

Text reuse

Two resources can be used to search text reuse clusters and passages.

impresso.resources.text_reuse.clusters.TextReuseClustersResource

Bases: Resource

Text reuse clusters resource.

impresso.resources.text_reuse.passages.TextReusePassagesResource

Bases: Resource

Text reuse passages resource.