Image handling
Image Utils
- class impresso_commons.images.img_utils.BoxStrategy(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- jpg_highest = 'jpg_highest'
- jpg_uniq = 'jpg_uniq'
- png_highest = 'png_highest'
- png_uniq = 'png_uniq'
- tif = 'tif'
- impresso_commons.images.img_utils.compose(path_img_one, path_img_two, path_img_three)
- impresso_commons.images.img_utils.get_img_from_archive(archive, path_checker, ext_checker, name_checker=None)
- impresso_commons.images.img_utils.get_imgdimensions(image_data)
Returns image height and width
- impresso_commons.images.img_utils.get_jpg(jpgs, page_digit)
- impresso_commons.images.img_utils.get_page_folders(archive)
- impresso_commons.images.img_utils.get_png(pngs, page_digit)
- impresso_commons.images.img_utils.get_tif(tifs, page_digit)
- impresso_commons.images.img_utils.run_cmd(cmd)
Execute ‘cmd’ in the shell and return result (stdout and stderr).
Olive Boxes
Functions to support re-computation of Olive box coordinates.
- impresso_commons.images.olive_boxes.compute_box(scale_factor, input_box)
Compute IIIF box coordinates of input_box relative to scale_factor.
- Parameters:
scale_factor (float) – ratio between 2 images with different dimensions
input_box (str) – string with 4 values separated by spaces
- Returns:
new box coordinates
- Return type:
str
- impresso_commons.images.olive_boxes.compute_scale_factor(img_source_path, img_dest_path)
Computes x scale factor bewteen 2 images.
- Parameters:
img_source_path (full path to the image) – the source image
img_dest_path (full path to the image) – the destination image
- Returns:
scale factor
- Return type:
float
- impresso_commons.images.olive_boxes.convert_box(input_box)
Convert a box with [x y x y] coordinates to [x y w h]
- Parameters:
input_box (str) – box with 4 coordinates, x and y upper left and lower right
- Returns:
box with 4 coordinates, x and y upper left, width and height
- Return type:
str
- impresso_commons.images.olive_boxes.get_iiif_url(page_id: str, box: str, base: str = 'http://dhlabsrv17.epfl.ch/iiif_impresso', iiif_manifest_uri: str = None, pct: bool = False) str
Returns impresso iiif url given a page id and a box
- Parameters:
page_id (str) – impresso page id, e.g. EXP-1930-06-10-a-p0001
box (str (4 coordinate values blank separated)) – iiif box (x, y, w, h)
- Returns:
iiif url of the box
- Return type:
str
- impresso_commons.images.olive_boxes.get_scale_factor(issue_dir_path, archive, page_xml, box_strategy, img_source_name)
Returns the scale factor in Olive context, given a strategy to choose the source image.
- Parameters:
issue_dir_path (str) – the path of the issue
archive (zipfile.ZipFile) – the zip archive
page_xml (bytes) – the xml handler of the page
box_strategy (str) – the box strategy such as found in the info.txt from jp2 folder
img_source_name – as found in the info.txt from jp2 folder
- Returns:
the hopefully correct scale factor
- Return type:
float
Background information
Impresso converts library images to JP2, taking the best image available: tif > highest png > jpg. Olive box coordinates were computed according to an image source which we have to identify among several. Image format coverage is different from issue to issue, and we have to devise strategies.
Case 1: tif
The tif is present and is the file from which the jp2 was converted. Dest: Tif dimensions can therefore be used as jp2 dimensions, no need to read the jp2 file. Source: Image source dimension is present in the page.xml (normally).
Case 2: several png
In this case the jp2 was acquired using the png with the highest dimension. Dest: It looks that in case of several png, Olive also took the highest for the OCR. It is therefore possible to rely on the resolution indicated in the page xml, which should be the same as our jp2. N.B.: the page width and heigth indicated in the xml do not correspond (usually) to the highest resolution png (there is therefore a discrepancy in Olive file between the tag ‘images_resolution’ on the one hand, and ‘page_width|height’on the other). It seems we can ignore this and rely on the resolution only in the current case. Source: the highest png Here source and dest dimension are equals, the function returns 1.
Case 3: one png only
To be checked if it happens. In this case, there is no choice and Olive OCR and JP2 acquisition should be from the same source => scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. The only danger is if Olive used another image file and did not provide it.
Case 4: one jpg only
Same as Case 3, scale factor of 1. Here we do an additional check to see if the page_width|height are the same as the image ones. (there is only one image and things should fit, not like in case 2)
- impresso_commons.images.olive_boxes.test()
a testing function - to be moved in proper test unit…