Image handling

Image Utils

class impresso_commons.images.img_utils.BoxStrategy(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

jpg_highest = 'jpg_highest'
jpg_uniq = 'jpg_uniq'
png_highest = 'png_highest'
png_uniq = 'png_uniq'
tif = 'tif'
impresso_commons.images.img_utils.compose(path_img_one, path_img_two, path_img_three)
impresso_commons.images.img_utils.get_img_from_archive(archive, path_checker, ext_checker, name_checker=None)
impresso_commons.images.img_utils.get_imgdimensions(image_data)

Returns image height and width

impresso_commons.images.img_utils.get_jpg(jpgs, page_digit)
impresso_commons.images.img_utils.get_page_folders(archive)
impresso_commons.images.img_utils.get_png(pngs, page_digit)
impresso_commons.images.img_utils.get_tif(tifs, page_digit)
impresso_commons.images.img_utils.run_cmd(cmd)

Execute ‘cmd’ in the shell and return result (stdout and stderr).

Olive Boxes

Functions to support re-computation of Olive box coordinates.

impresso_commons.images.olive_boxes.compute_box(scale_factor, input_box)

Compute IIIF box coordinates of input_box relative to scale_factor.

Parameters:
  • scale_factor (float) – ratio between 2 images with different dimensions

  • input_box (str) – string with 4 values separated by spaces

Returns:

new box coordinates

Return type:

str

impresso_commons.images.olive_boxes.compute_scale_factor(img_source_path, img_dest_path)

Computes x scale factor bewteen 2 images.

Parameters:
  • img_source_path (full path to the image) – the source image

  • img_dest_path (full path to the image) – the destination image

Returns:

scale factor

Return type:

float

impresso_commons.images.olive_boxes.convert_box(input_box)

Convert a box with [x y x y] coordinates to [x y w h]

Parameters:

input_box (str) – box with 4 coordinates, x and y upper left and lower right

Returns:

box with 4 coordinates, x and y upper left, width and height

Return type:

str

impresso_commons.images.olive_boxes.get_iiif_url(page_id: str, box: str, base: str = 'http://dhlabsrv17.epfl.ch/iiif_impresso', iiif_manifest_uri: str = None, pct: bool = False) str

Returns impresso iiif url given a page id and a box

Parameters:
  • page_id (str) – impresso page id, e.g. EXP-1930-06-10-a-p0001

  • box (str (4 coordinate values blank separated)) – iiif box (x, y, w, h)

Returns:

iiif url of the box

Return type:

str

impresso_commons.images.olive_boxes.get_scale_factor(issue_dir_path, archive, page_xml, box_strategy, img_source_name)

Returns the scale factor in Olive context, given a strategy to choose the source image.

Parameters:
  • issue_dir_path (str) – the path of the issue

  • archive (zipfile.ZipFile) – the zip archive

  • page_xml (bytes) – the xml handler of the page

  • box_strategy (str) – the box strategy such as found in the info.txt from jp2 folder

  • img_source_name – as found in the info.txt from jp2 folder

Returns:

the hopefully correct scale factor

Return type:

float

Background information

Impresso converts library images to JP2, taking the best image available: tif > highest png > jpg. Olive box coordinates were computed according to an image source which we have to identify among several. Image format coverage is different from issue to issue, and we have to devise strategies.

Case 1: tif

The tif is present and is the file from which the jp2 was converted. Dest: Tif dimensions can therefore be used as jp2 dimensions, no need to read the jp2 file. Source: Image source dimension is present in the page.xml (normally).

Case 2: several png

In this case the jp2 was acquired using the png with the highest dimension. Dest: It looks that in case of several png, Olive also took the highest for the OCR. It is therefore possible to rely on the resolution indicated in the page xml, which should be the same as our jp2. N.B.: the page width and heigth indicated in the xml do not correspond (usually) to the highest resolution png (there is therefore a discrepancy in Olive file between the tag ‘images_resolution’ on the one hand, and ‘page_width|height’on the other). It seems we can ignore this and rely on the resolution only in the current case. Source: the highest png Here source and dest dimension are equals, the function returns 1.

Case 3: one png only

To be checked if it happens. In this case, there is no choice and Olive OCR and JP2 acquisition should be from the same source => scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. The only danger is if Olive used another image file and did not provide it.

Case 4: one jpg only

Same as Case 3, scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. (there is only one image and things should fit, not like in case 2)

impresso_commons.images.olive_boxes.test()

a testing function - to be moved in proper test unit…