Impresso API resources
Search
Search content items in the Impresso corpus.
impresso.search.find(term='Titanic', limit=10)
impresso.resources.search.SearchResource
Bases: Resource
Search content items in the impresso database.
find(term=None, order_by=None, limit=None, offset=None, with_text_contents=False, title=None, front_page=None, entity_id=None, newspaper_id=None, date_range=None, language=None, mention=None, topic_id=None, collection_id=None, country=None, partner_id=None, text_reuse_cluster_id=None)
Search for content items in Impresso.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term
|
str | AND[str] | OR[str] | None
|
Search term. |
None
|
order_by
|
SearchOrderByLiteral | None
|
Order by aspect. |
None
|
limit
|
int | None
|
Number of results to return. |
None
|
offset
|
int | None
|
Number of results to skip. |
None
|
with_text_contents
|
bool | None
|
Return only content items with text contents. |
False
|
title
|
str | AND[str] | OR[str] | None
|
Return only content items that have this term or all/any of the terms in the title. |
None
|
front_page
|
bool | None
|
Return only content items that were on the front page. |
None
|
entity_id
|
str | AND[str] | OR[str] | None
|
Return only content items that mention this entity or all/any of the entities. |
None
|
date_range
|
DateRange | None
|
Return only content items that were published in this date range. |
None
|
language
|
str | OR[str] | None
|
Return only content items that are in this language or all/any of the languages. |
None
|
mention
|
str | AND[str] | OR[str] | None
|
Return only content items that mention an entity with this term or all/any of entities with the terms. |
None
|
topic_id
|
str | AND[str] | OR[str] | None
|
Return only content items that are about this topic or all/any of the topics. |
None
|
collection_id
|
str | OR[str] | None
|
Return only content items that are in this collection or all/any of the collections. |
None
|
country
|
str | OR[str] | None
|
Return only content items that are from this country or all/any of the countries. |
None
|
partner_id
|
str | OR[str] | None
|
Return only content items that are from this partner or all/any of the partners. |
None
|
text_reuse_cluster_id
|
str | OR[str] | None
|
Return only content items that are in this text reuse cluster or all/any of the clusters. |
None
|
Returns:
Name | Type | Description |
---|---|---|
SearchDataContainer |
SearchDataContainer
|
Data container with a page of results of the search. |
impresso.api_client.models.search_order_by.SearchOrderByLiteral = Literal['date', 'id', 'relevance', '-date', '-relevance', '-id']
module-attribute
impresso.resources.search.SearchDataContainer
Bases: DataContainer
Response of a search call.
df: DataFrame
property
Return the data as a pandas dataframe.
Entities
Search entities in the Impresso corpus.
impresso.entities.find(term="Douglas Adams")
impresso.resources.entities.EntitiesResource
Bases: Resource
Search entities in the Impresso database.
find(term=None, wikidata_id=None, entity_id=None, entity_type=None, order_by=None, resolve=False, limit=None, offset=None)
Search entities in Impresso.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term
|
str | None
|
Search term. |
None
|
wikidata_id
|
str | AND[str] | OR[str] | None
|
Return only entities resolved to this Wikidata ID. |
None
|
entity_id
|
str | AND[str] | OR[str] | None
|
Return only entity with this ID. |
None
|
entity_type
|
EntityType | AND[EntityType] | OR[EntityType] | None
|
Return only entities of this type. |
None
|
order_by
|
FindEntitiesOrderByLiteral | None
|
Field to order results by. |
None
|
resolve
|
bool
|
Return Wikidata details of the entities, if the entity is linked to a Wikidata entry. |
False
|
limit
|
int | None
|
Number of results to return. |
None
|
offset
|
int | None
|
Number of results to skip. |
None
|
Returns:
Name | Type | Description |
---|---|---|
FindEntitiesContainer |
FindEntitiesContainer
|
Data container with a page of results of the search. |
get(id)
Get entity by ID.
impresso.resources.entities.EntityType = Literal['person', 'location']
module-attribute
impresso.api_client.models.find_entities_order_by.FindEntitiesOrderByLiteral = Literal['count', 'count-mentions', 'name', 'relevance', '-relevance', '-name', '-count', '-count-mentions']
module-attribute
Newspapers
Search newspapers available in the Impresso corpus.
impresso.newspapers.find(
term="wort",
order_by="lastIssue",
)
impresso.resources.newspapers.NewspapersResource
Bases: Resource
Search newspapers in the Impresso database.
find(term=None, order_by=None, limit=None, offset=None)
Search newspapers in Impresso.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term
|
str | None
|
Search term. |
None
|
order_by
|
FindNewspapersOrderByLiteral | None
|
Field to order results by. |
None
|
limit
|
int | None
|
Number of results to return. |
None
|
offset
|
int | None
|
Number of results to skip. |
None
|
Returns:
Name | Type | Description |
---|---|---|
FindNewspapersContainer |
FindNewspapersContainer
|
Data container with a page of results of the search. |
impresso.api_client.models.find_newspapers_order_by.FindNewspapersOrderByLiteral = Literal['countIssues', 'endYear', 'firstIssue', 'lastIssue', 'name', 'startYear', '-name', '-countIssues', '-startYear', '-endYear', '-firstIssue', '-lastIssue']
module-attribute
impresso.resources.newspapers.FindNewspapersContainer
Bases: DataContainer
Response of a search call.
df: DataFrame
property
Return the data as a pandas dataframe.
Content Items
Get a single content item by ID.
impresso.content_items.get("NZZ-1794-08-09-a-i0002")
Collections
Work with collections
impresso.resources.collections.CollectionsResource
Bases: Resource
Work with collections.
add_items(collection_id, item_ids)
Add items to a collection by their IDs.
NOTE: Items are not added immediately. This operation may take up to a few minutes to complete and reflect in the collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_id
|
str
|
ID of the collection. |
required |
item_ids
|
list[str]
|
IDs of the content items to add. |
required |
find(term=None, order_by=None, limit=None, offset=None)
Search collections in Impresso.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
term
|
str | None
|
Search term. |
None
|
order_by
|
FindCollectionsOrderByLiteral | None
|
Order by aspect. |
None
|
limit
|
int | None
|
Number of results to return. |
None
|
offset
|
int | None
|
Number of results to skip. |
None
|
Returns:
Name | Type | Description |
---|---|---|
FindCollectionsContainer |
FindCollectionsContainer
|
Data container with a page of results of the search. |
get(id)
Get collection by ID.
items(collection_id, limit=None, offset=None)
Return all content items from a collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_id
|
str
|
ID of the collection. |
required |
limit
|
int | None
|
Number of results to return. |
None
|
offset
|
int | None
|
Number of results to skip. |
None
|
Returns:
Name | Type | Description |
---|---|---|
SearchDataContainer |
SearchDataContainer
|
Data container with a page of results of the search. |
remove_items(collection_id, item_ids)
Add items to a collection by their IDs.
NOTE: Items are not removed immediately. This operation may take up to a few minutes to complete and reflect in the collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_id
|
str
|
ID of the collection. |
required |
item_ids
|
list[str]
|
IDs of the content items to add. |
required |
impresso.api_client.models.find_collections_order_by.FindCollectionsOrderByLiteral = Literal['date', 'size', '-date', '-size']
module-attribute
impresso.resources.collections.FindCollectionsContainer
Bases: DataContainer
Response of a find call.
df: DataFrame
property
Return the data as a pandas dataframe.
Named entity recognition
The python library contains a set of named entity recognition methods that use the same NER model used to add entities to the Impresso database.
impresso.resources.tools.ToolsResource
Bases: Resource
Various helper tools
nel(text)
Named Entity Linking
This method requires named entities to be enclosed in tags: [START]entity[END].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
Text to process |
required |
Returns:
Name | Type | Description |
---|---|---|
NerContainer |
NerContainer
|
List of named entities |
ner(text)
Named Entity Recognition
This method is faster than ner_nel
but does not provide any linking to external resources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
Text to process |
required |
Returns:
Name | Type | Description |
---|---|---|
NerContainer |
NerContainer
|
List of named entities |
ner_nel(text)
Named Entity Recognition and Named Entity Linking
This method is slower than ner
but provides linking to external resources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
Text to process |
required |
Returns:
Name | Type | Description |
---|---|---|
NerContainer |
NerContainer
|
List of named entities |
impresso.resources.tools.NerContainer
Bases: DataContainer
Name entity recognition result container.
df: DataFrame
property
Return the data as a pandas dataframe.
limit: int
property
Page size.
offset: int
property
Page offset.
size: int
property
Current page size.
total: int
property
Total number of results.
Text reuse
Two resources can be used to search text reuse clusters and passages.
impresso.resources.text_reuse.clusters.TextReuseClustersResource
Bases: Resource
Text reuse clusters resource.
impresso.resources.text_reuse.passages.TextReusePassagesResource
Bases: Resource
Text reuse passages resource.