Skip to content

Impresso API resources

Search content items in the Impresso corpus.

impresso.search.find(term='Titanic', limit=10)

impresso.resources.search.SearchResource

Bases: Resource

Search content items in the impresso database.

find(term=None, order_by=None, limit=None, offset=None, with_text_contents=False, title=None, front_page=None, entity_id=None, newspaper_id=None, date_range=None, language=None, mention=None, topic_id=None, collection_id=None, country=None, partner_id=None, text_reuse_cluster_id=None)

Search for content items in Impresso.

Parameters:

Name Type Description Default
term str | AND[str] | OR[str] | None

Search term.

None
order_by SearchOrderByLiteral | None

Order by aspect.

None
limit int | None

Number of results to return.

None
offset int | None

Number of results to skip.

None
with_text_contents bool | None

Return only content items with text contents.

False
title str | AND[str] | OR[str] | None

Return only content items that have this term or all/any of the terms in the title.

None
front_page bool | None

Return only content items that were on the front page.

None
entity_id str | AND[str] | OR[str] | None

Return only content items that mention this entity or all/any of the entities.

None
date_range DateRange | None

Return only content items that were published in this date range.

None
language str | OR[str] | None

Return only content items that are in this language or all/any of the languages.

None
mention str | AND[str] | OR[str] | None

Return only content items that mention an entity with this term or all/any of entities with the terms.

None
topic_id str | AND[str] | OR[str] | None

Return only content items that are about this topic or all/any of the topics.

None
collection_id str | OR[str] | None

Return only content items that are in this collection or all/any of the collections.

None
country str | OR[str] | None

Return only content items that are from this country or all/any of the countries.

None
partner_id str | OR[str] | None

Return only content items that are from this partner or all/any of the partners.

None
text_reuse_cluster_id str | OR[str] | None

Return only content items that are in this text reuse cluster or all/any of the clusters.

None

Returns:

Name Type Description
SearchDataContainer SearchDataContainer

Data container with a page of results of the search.

impresso.api_client.models.search_order_by.SearchOrderByLiteral = Literal['date', 'id', 'relevance', '-date', '-relevance', '-id'] module-attribute

impresso.resources.search.SearchDataContainer

Bases: DataContainer

Response of a search call.

df: DataFrame property

Return the data as a pandas dataframe.

Entities

Search entities in the Impresso corpus.

impresso.entities.find(term="Douglas Adams")

impresso.resources.entities.EntitiesResource

Bases: Resource

Search entities in the Impresso database.

find(term=None, wikidata_id=None, entity_id=None, entity_type=None, order_by=None, resolve=False, limit=None, offset=None)

Search entities in Impresso.

Parameters:

Name Type Description Default
term str | None

Search term.

None
wikidata_id str | AND[str] | OR[str] | None

Return only entities resolved to this Wikidata ID.

None
entity_id str | AND[str] | OR[str] | None

Return only entity with this ID.

None
entity_type EntityType | AND[EntityType] | OR[EntityType] | None

Return only entities of this type.

None
order_by FindEntitiesOrderByLiteral | None

Field to order results by.

None
resolve bool

Return Wikidata details of the entities, if the entity is linked to a Wikidata entry.

False
limit int | None

Number of results to return.

None
offset int | None

Number of results to skip.

None

Returns:

Name Type Description
FindEntitiesContainer FindEntitiesContainer

Data container with a page of results of the search.

get(id)

Get entity by ID.

impresso.resources.entities.EntityType = Literal['person', 'location'] module-attribute

impresso.api_client.models.find_entities_order_by.FindEntitiesOrderByLiteral = Literal['count', 'count-mentions', 'name', 'relevance', '-relevance', '-name', '-count', '-count-mentions'] module-attribute

Newspapers

Search newspapers available in the Impresso corpus.

impresso.newspapers.find(
    term="wort",
    order_by="lastIssue",
)

impresso.resources.newspapers.NewspapersResource

Bases: Resource

Search newspapers in the Impresso database.

find(term=None, order_by=None, limit=None, offset=None)

Search newspapers in Impresso.

Parameters:

Name Type Description Default
term str | None

Search term.

None
order_by FindNewspapersOrderByLiteral | None

Field to order results by.

None
limit int | None

Number of results to return.

None
offset int | None

Number of results to skip.

None

Returns:

Name Type Description
FindNewspapersContainer FindNewspapersContainer

Data container with a page of results of the search.

impresso.api_client.models.find_newspapers_order_by.FindNewspapersOrderByLiteral = Literal['countIssues', 'endYear', 'firstIssue', 'lastIssue', 'name', 'startYear', '-name', '-countIssues', '-startYear', '-endYear', '-firstIssue', '-lastIssue'] module-attribute

impresso.resources.newspapers.FindNewspapersContainer

Bases: DataContainer

Response of a search call.

df: DataFrame property

Return the data as a pandas dataframe.

Content Items

Get a single content item by ID.

impresso.content_items.get("NZZ-1794-08-09-a-i0002")

Collections

Work with collections

impresso.resources.collections.CollectionsResource

Bases: Resource

Work with collections.

add_items(collection_id, item_ids)

Add items to a collection by their IDs.

NOTE: Items are not added immediately. This operation may take up to a few minutes to complete and reflect in the collection.

Parameters:

Name Type Description Default
collection_id str

ID of the collection.

required
item_ids list[str]

IDs of the content items to add.

required

find(term=None, order_by=None, limit=None, offset=None)

Search collections in Impresso.

Parameters:

Name Type Description Default
term str | None

Search term.

None
order_by FindCollectionsOrderByLiteral | None

Order by aspect.

None
limit int | None

Number of results to return.

None
offset int | None

Number of results to skip.

None

Returns:

Name Type Description
FindCollectionsContainer FindCollectionsContainer

Data container with a page of results of the search.

get(id)

Get collection by ID.

items(collection_id, limit=None, offset=None)

Return all content items from a collection.

Parameters:

Name Type Description Default
collection_id str

ID of the collection.

required
limit int | None

Number of results to return.

None
offset int | None

Number of results to skip.

None

Returns:

Name Type Description
SearchDataContainer SearchDataContainer

Data container with a page of results of the search.

remove_items(collection_id, item_ids)

Add items to a collection by their IDs.

NOTE: Items are not removed immediately. This operation may take up to a few minutes to complete and reflect in the collection.

Parameters:

Name Type Description Default
collection_id str

ID of the collection.

required
item_ids list[str]

IDs of the content items to add.

required

impresso.api_client.models.find_collections_order_by.FindCollectionsOrderByLiteral = Literal['date', 'size', '-date', '-size'] module-attribute

impresso.resources.collections.FindCollectionsContainer

Bases: DataContainer

Response of a find call.

df: DataFrame property

Return the data as a pandas dataframe.

Named entity recognition

The python library contains a set of named entity recognition methods that use the same NER model used to add entities to the Impresso database.

impresso.resources.tools.ToolsResource

Bases: Resource

Various helper tools

nel(text)

Named Entity Linking

This method requires named entities to be enclosed in tags: [START]entity[END].

Parameters:

Name Type Description Default
text str

Text to process

required

Returns:

Name Type Description
NerContainer NerContainer

List of named entities

ner(text)

Named Entity Recognition

This method is faster than ner_nel but does not provide any linking to external resources.

Parameters:

Name Type Description Default
text str

Text to process

required

Returns:

Name Type Description
NerContainer NerContainer

List of named entities

ner_nel(text)

Named Entity Recognition and Named Entity Linking

This method is slower than ner but provides linking to external resources.

Parameters:

Name Type Description Default
text str

Text to process

required

Returns:

Name Type Description
NerContainer NerContainer

List of named entities

impresso.resources.tools.NerContainer

Bases: DataContainer

Name entity recognition result container.

df: DataFrame property

Return the data as a pandas dataframe.

limit: int property

Page size.

offset: int property

Page offset.

size: int property

Current page size.

total: int property

Total number of results.

Text reuse

Two resources can be used to search text reuse clusters and passages.

impresso.resources.text_reuse.clusters.TextReuseClustersResource

Bases: Resource

Text reuse clusters resource.

impresso.resources.text_reuse.passages.TextReusePassagesResource

Bases: Resource

Text reuse passages resource.