h3.dataprocessing package#

Submodules#

h3.dataprocessing.DataAugmentation module#

h3.dataprocessing.crop_images module#

h3.dataprocessing.crop_images.crop_images(img, polygon_df, zoom_level: int, pixel_num: int, im_size: int, output_path: str)[source]#

Crops imagery based on the building polygon and the desired pixel size. It can zoom in to certain levels, whilst maintaining the pixel size by bilinear interpolation. The building will be centered in the image and all buildings that are cut off by the image boundary are not included.

Parameters:
  • img (dataset object) – The pre-event imagery that underlays the building polygon. This image will be cropped.

  • polygon_df (object) – The building polygon that needs to be cropped around.

  • zoom_level (int) – A number that dictates how large the crop_size should be, based on required pixel number for the model.

  • pixel_num (int) – Value of the sides of the squared cropped image as input for the model.

  • im_size (int) – Value of the sides of the squared input image.

  • output_path (str) – Path used to save the zoomed cropped images.

Returns:

Returns whether this building polygon is red flagged and should be ignored, based on whether the building polygon lies on the edge of the image.

Return type:

int

h3.dataprocessing.crop_images.extract_coords(row)[source]#
h3.dataprocessing.crop_images.image_processing(zoom_levels: list, pixel_num: int)[source]#

Loads images and crops them based on the required zoom levels and the required imagery input pixel size for the model.

Parameters:
  • zoom_levels (list) – list containing all required zoom levels as integers.

  • pixel_num (int) – Value of the sides of the squared cropped image as input for the model.

h3.dataprocessing.crop_images.main()[source]#
h3.dataprocessing.crop_images.mask_to_bb(Y)[source]#

Takes a filled building mask and converts it into a rectangular bounding box.

Parameters:

Y (numpy array) – The filled polgyon mask from the polygon_mask function.

Returns:

Array of the rectangular bounding box as a mask.

Return type:

numpy array

h3.dataprocessing.crop_images.polygon_mask(img, polygon, im_size: int)[source]#

Fills up a polygon to create a mask of the building.

Parameters:
  • img (dataset object) – pre-event image file aligning with the building polygon.

  • polygon (object) – polygon containing the outline of the building.

  • im_size (int) – Value of the sides of the squared input image.

Returns:

filled mask of the building.

Return type:

numpy array

h3.dataprocessing.crop_images_img module#

h3.dataprocessing.crop_images_img.crop_images(image_array, img_metadata, polygon_df, zoom_level: int, pixel_num: int, im_size: int, output_path: str)[source]#

Crops imagery based on the building polygon and the desired pixel size. It can zoom in to certain levels, whilst maintaining the pixel size by bilinear interpolation. The building will be centered in the image and all buildings that are cut off by the image boundary are not included.

Parameters:
  • image_array (ndarray) – The pre-event image array given from rasterio.DatasetReader.read() that underlays the building polygon. This image will be cropped.

  • img_metadata (dict) – The dictionary of the image metadata. See rasterio.DatasetReader.meta.

  • polygon_df (object) – The building polygon that needs to be cropped around.

  • zoom_level (int) – A number that dictates how large the crop_size should be, based on required pixel number for the model.

  • pixel_num (int) – Value of the sides of the squared cropped image as input for the model.

  • im_size (int) – Value of the sides of the squared input image.

  • output_path (str) – Path used to save the zoomed cropped images.

Returns:

Returns whether this building polygon is red flagged and should be ignored, based on whether the building polygon lies on the edge of the image.

Return type:

int

h3.dataprocessing.crop_images_img.extract_coords(row)[source]#
h3.dataprocessing.crop_images_img.image_processing(zoom_levels: list, pixel_num: int)[source]#

Loads images and crops them based on the required zoom levels and the required imagery input pixel size for the model. :param zoom_levels: list containing all required zoom levels as integers. :type zoom_levels: list :param pixel_num: Value of the sides of the squared cropped image as input for the model. :type pixel_num: int

h3.dataprocessing.crop_images_img.main()[source]#
h3.dataprocessing.crop_images_img.mask_to_bb(Y)[source]#

Takes a filled building mask and converts it into a rectangular bounding box.

Parameters:

Y (numpy array) – The filled polgyon mask from the polygon_mask function.

Returns:

Array of the rectangular bounding box as a mask.

Return type:

numpy array

h3.dataprocessing.crop_images_img.polygon_mask(img_array, polygon, im_size: int)[source]#

Fills up a polygon to create a mask of the building.

Parameters:
  • img (dataset object) – pre-event image file aligning with the building polygon.

  • polygon (object) – polygon containing the outline of the building.

  • im_size (int) – Value of the sides of the squared input image.

Returns:

filled mask of the building.

Return type:

numpy array

h3.dataprocessing.crop_images_img.save_image(output_path, resized_img, img_metadata)[source]#

h3.dataprocessing.extract_metadata module#

h3.dataprocessing.extract_metadata.extract_damage_allfiles_ensemble(filepaths_dict: dict, crs: str)[source]#

Filters all pre and post label files for hurricanes, extracts the metadata from the post and pre json files. Takes damage information from post and adds that to the pre-event metadata dataframe.

Parameters:

filepaths_dict (dict) – .json files in xBD data folder to filter organised by their folder. These files are a value for the holdout, tier1, tier3 and test folder as a key.

crsstr

coordinate reference system to put as geometry in geodataframe.

Returns:

geodataframes with a summary of metadata for all pre-event hurricane events with post-event labels.

Return type:

geodataframe

h3.dataprocessing.extract_metadata.extract_damage_allfiles_separate(filepaths_dict: dict, crs: str, event: Literal['pre', 'post'])[source]#

Filters all label files for hurricanes, extracts the metadata, concatenates all files for post and pre images separately.

Parameters:
  • directory_files (dict) – .json files in xBD data folder to filter organised by their folder. These files are a value for the holdout, tier1, tier3 and test folder as a key.

  • crs (str) – coordinate reference system to put as geometry in geodataframe.

  • event (str) – post or pre event json files to filter out.

Returns:

two geodataframes with a summary of metadata for all hurricane events with labels.

Return type:

geodataframe

h3.dataprocessing.extract_metadata.extract_metadata(json_link: str, CLASSES_DICT: dict, crs: str, event_type: str)[source]#

Extracts location in xy and long-lat format, gives damage name, class and date.

Parameters:
  • json_link (path) – path to json file containing location and metadata.

  • classes_dict (dict) – dictionary mapping between damage classes (str) and damage numbers (int).

  • crs (str) – coordinate reference system to put as geometry in geodataframe.

  • event_type (str) – post or pre event json files to filter out.

Returns:

contains polygons of json file, corresponding metadata.

Return type:

Geodataframe

h3.dataprocessing.extract_metadata.extract_point(building)[source]#

Extract coordinate information from polygon and convert to a centroid point.

Parameters:

building (object) – polygon information in shapely coordinates.

Returns:

centroid point of polygon.

Return type:

object

h3.dataprocessing.extract_metadata.extract_polygon(building)[source]#

Extract polygon coordinate reference system information.

Parameters:

building (object) – polygon shapely coordinates.

Returns:

polygon with spatial coordinate information.

Return type:

object

h3.dataprocessing.extract_metadata.filter_files(files: list, filepath: str, search_criteria: str) list[source]#
Filter all json label files and returns a list of post-event files for

hurricanes.

Parameters:
  • files (list) – list of json files in the label directory

  • filepath (str) – path to file, assisting in search criteria process

  • search_criteria (str) – filter out hurricanes, post-event imagery in json format. i.e. input hurricane*pre*json, supports glob wildcard *

Returns:

list of filtered files for corresponding criteria.

Return type:

list

h3.dataprocessing.extract_metadata.load_and_save_df(filepaths_dict: dict, output_dir: str, reload_pickle: bool = False)[source]#

Loads the json label files for all hurricanes in the “hold” section of the xBD data, extracts the points and polygons in both xy coordinates, referring to the corresponding imagery file, and the longitude and latitude.

Parameters:
  • filepaths_dict (dict) – pathnames in a dictionary for the holdout, tier1, tier3 and test folder.

  • output_dir (str) –

  • reload_pickle (bool, optional) – If True recreate the pickle files as if they did not exist. The default is False.

Returns:

all metadata and locations in a geodataframe that is saved in the data/datasets/EFs directory. Choose to return the gdf with long-lat coordinate system and pre polygons with post damage as this is most useful for choosing the EFs.

Return type:

Geodataframe

h3.dataprocessing.extract_metadata.main()[source]#
h3.dataprocessing.extract_metadata.overlapping_polygons(geoms, p)[source]#

Checks if polygons from pre- and post-event imagery overlap. If they do, the damage class from post-event dataframe can be allocated to the pre-event polygon.

Parameters:
  • geoms (series) – post-event geodataframe geometry column containing the polygon

  • p (object) – pre-event polygon extracted from geodataframe

Returns:

column in post-event dataframe containing which row number the post-event polygon matches with in the pre-event dataframe.

Return type:

series

h3.dataprocessing.pre_processing module#

h3.dataprocessing.pre_processing.image_loading(polygons_df, zoom_levels: list, pixel_num: int, zoomdir_dict: dict)[source]#

Loads images and crops them based on the required zoom levels and the required imagery input pixel size for the model.

Parameters:
  • polygons_df (geopandas dataframe) – Pandas dataframe containing metadata information about the pre-event polygoms, combined with the damage class from the post-event data. The reference system is “xy” referring to the corresponding image file.

  • zoom_levels (list) – list containing all required zoom levels as integers.

  • pixel_num (int) – Value of the sides of the squared cropped image as input for the model.

  • zoomdir_dict (dict) – Dictionary containing all filepaths for the zoom directories with the values of the zoom level.

h3.dataprocessing.pre_processing.main()[source]#