Image handling

Image Utils

class impresso_commons.images.img_utils.BoxStrategy(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

jpg_highest = 'jpg_highest'

jpg_uniq = 'jpg_uniq'

png_highest = 'png_highest'

png_uniq = 'png_uniq'

tif = 'tif'

impresso_commons.images.img_utils.compose(path_img_one, path_img_two, path_img_three)

impresso_commons.images.img_utils.get_img_from_archive(archive, path_checker, ext_checker, name_checker=None)

impresso_commons.images.img_utils.get_imgdimensions(image_data): Returns image height and width

impresso_commons.images.img_utils.get_jpg(jpgs, page_digit)

impresso_commons.images.img_utils.get_page_folders(archive)

impresso_commons.images.img_utils.get_png(pngs, page_digit)

impresso_commons.images.img_utils.get_tif(tifs, page_digit)

impresso_commons.images.img_utils.run_cmd(cmd): Execute ‘cmd’ in the shell and return result (stdout and stderr).

Olive Boxes

Functions to support re-computation of Olive box coordinates.

impresso_commons.images.olive_boxes.compute_box(scale_factor, input_box)

Compute IIIF box coordinates of input_box relative to scale_factor.

Parameters:

scale_factor (float) – ratio between 2 images with different dimensions
input_box (str) – string with 4 values separated by spaces

Returns:

new box coordinates

Return type:

str

impresso_commons.images.olive_boxes.compute_scale_factor(img_source_path, img_dest_path)

Computes x scale factor bewteen 2 images.

Parameters:

img_source_path (full path to the image) – the source image
img_dest_path (full path to the image) – the destination image

Returns:

scale factor

Return type:

float

impresso_commons.images.olive_boxes.convert_box(input_box)

Convert a box with [x y x y] coordinates to [x y w h]

Parameters:: input_box (str) – box with 4 coordinates, x and y upper left and lower right
Returns:: box with 4 coordinates, x and y upper left, width and height
Return type:: str

impresso_commons.images.olive_boxes.get_iiif_url(page_id: str, box: str, base: str = 'http://dhlabsrv17.epfl.ch/iiif_impresso', iiif_manifest_uri: str = None, pct: bool = False) → str

Returns impresso iiif url given a page id and a box

Parameters:

page_id (str) – impresso page id, e.g. EXP-1930-06-10-a-p0001
box (str (4 coordinate values blank separated)) – iiif box (x, y, w, h)

Returns:

iiif url of the box

Return type:

str

impresso_commons.images.olive_boxes.get_scale_factor(issue_dir_path, archive, page_xml, box_strategy, img_source_name)

Returns the scale factor in Olive context, given a strategy to choose the source image.

Parameters:

issue_dir_path (str) – the path of the issue
archive (zipfile.ZipFile) – the zip archive
page_xml (bytes) – the xml handler of the page
box_strategy (str) – the box strategy such as found in the info.txt from jp2 folder
img_source_name – as found in the info.txt from jp2 folder

Returns:

the hopefully correct scale factor

Return type:

float

Background information

Impresso converts library images to JP2, taking the best image available: tif > highest png > jpg. Olive box coordinates were computed according to an image source which we have to identify among several. Image format coverage is different from issue to issue, and we have to devise strategies.

Case 1: tif

The tif is present and is the file from which the jp2 was converted. Dest: Tif dimensions can therefore be used as jp2 dimensions, no need to read the jp2 file. Source: Image source dimension is present in the page.xml (normally).

Case 2: several png

In this case the jp2 was acquired using the png with the highest dimension. Dest: It looks that in case of several png, Olive also took the highest for the OCR. It is therefore possible to rely on the resolution indicated in the page xml, which should be the same as our jp2. N.B.: the page width and heigth indicated in the xml do not correspond (usually) to the highest resolution png (there is therefore a discrepancy in Olive file between the tag ‘images_resolution’ on the one hand, and ‘page_width|height’on the other). It seems we can ignore this and rely on the resolution only in the current case. Source: the highest png Here source and dest dimension are equals, the function returns 1.

Case 3: one png only

To be checked if it happens. In this case, there is no choice and Olive OCR and JP2 acquisition should be from the same source => scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. The only danger is if Olive used another image file and did not provide it.

Case 4: one jpg only

Same as Case 3, scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. (there is only one image and things should fit, not like in case 2)

impresso_commons.images.olive_boxes.test(): a testing function - to be moved in proper test unit…