hazimp.context

The purpose of this module is to provide objects to process a series of jobs in a sequential order. The order is determined by the queue of jobs.

Module Contents

Classes

Context

Context is a singleton storing all of the run specific data, such as the

Functions

save_csv(write_dict, filename)

Save a dictionary of arrays as a csv file.

save_csv_agg(write_dict, filename)

Save a pandas.DataFrame as a csv file.

Attributes

LOGGER

DATEFMT

EX_LAT

EX_LONG

hazimp.context.LOGGER
hazimp.context.DATEFMT = %Y-%m-%d %H:%M:%S %Z
hazimp.context.EX_LAT = exposure_latitude
hazimp.context.EX_LONG = exposure_longitude
class hazimp.context.Context

Bases: object

Context is a singleton storing all of the run specific data, such as the exposure features and their attributes, vulnerability sets, aggregations, pivot tables, provenance, etc.

Variables:
  • exposure_lat – Latitude values of the exposure data

  • exposure_long – Longitude values of the exposure data

  • exposure_att – A pandas.DataFrame to hold the exposure attributes

  • exposure_agg – A pandas.DataFrame.groupby object that holds aggregated exposure data after executing save_exposure_aggregation()

  • exposure_vuln_curves – A dict of hazimp.jobs.RealisedVulnerabilityCurves

  • vulnerability_sets – A dict of the available vulnerability sets

  • vul_function_titles – A dictionary with keys being vulnerability_set_ids and value being the exposure attribute who’s values are vulnerability function ID’s.

  • pivotpandas.DataFrame for an Excel-style pivot table (e.g. for tabulation of results)

  • provprov.ProvDocument for provenance information.

  • provlabel (str) – Qualified label for the provenance information

  • provtitle (str) – Descriptive title for provenance information

  • provstarttime (str) – Formatted datetime representing the start of the analysis.

set_prov_label(self, label, title='HazImp analysis')

Set the qualified label for the provenance data

Parameters:
  • label – the qualified label name. This is used to reference the analysis activity in other functions and methods.

  • title – Optional value for the dcterms:title element

get_site_shape(self)

Get the numpy.shape of sites the context is storing. It is based on the shape of exposure_long.

Returns:

The numpy.shape of sites the context is storing.

clip_exposure(self, min_long, min_lat, max_long, max_lat)

Clip the exposure data so only the exposure values within the rectangle formed by max_lat, min_lat, max_long and min_long are included.

Note: This must be called before the exposure_vuln_curves are determined, since the curves have a site dimension.

Parameters:
  • min_long (float) – minimum longitude of exposure data to use

  • min_lat (float) – minimum latitude of exposure data to use

  • max_long (float) – maximum longitude of exposure data to use

  • max_lat (float) – maximum latitude of exposure data to use

save_exposure_atts(self, filename, use_parallel=True)

Save the exposure attributes, including latitude and longitude. The file type saved is based on the filename extension.

Parameters:
  • use_parallel – Set to True for parallel behaviour, which is only node 0 writing to file.

  • filename – The file to be written. If the extension is ‘.npz’, then the arrays are save to an uncompressed numpy format file.

Return write_dict:

The whole dictionary, returned for testing.

save_exposure_aggregation(self, filename, use_parallel=True)

Save the aggregated exposure attributes. The file type saved is based on the filename extension.

Parameters:
  • use_parallel – Set to True for parallel behaviour which is only node 0 writing to file.

  • filename – The file to be written. If the extension is ‘.npz’, then the arrays are save to an uncompressed numpy format file.

Return write_dict:

The whole pandas.DataFrame, returned for testing.

save_aggregation(self, filename, boundaries, impactcode, boundarycode, categories, fields, categorise, use_parallel=True)

Save data aggregated to geospatial regions.

Parameters:
  • filename (str) – Destination filename

  • boundaries (str) – File name of a geospatial dataset that contains geographical boundaries to serve as aggregation boundaries

  • impactcode (str) – Field name in the dframe to aggregate by

  • boundarycode (str) – Corresponding field name in the geospatial dataset.

  • categories (boolean) – Add columns for the number of buildings in each damage state defined in the ‘Damage state’ attribute. This requires that a ‘categorise` job has been included in the pipeline, which in turn requires the bins and labels to be defined in the job configuration.

  • fields (dict) – A dict with keys of valid column names (from the pandas.DataFrame) and values being lists of aggregation functions to apply to the columns.

  • categorise (dict) – categorise job attributes

  • use_parallel (bool) – True for parallel behaviour, which is only node 0 writing to file

aggregate_loss(self, groupby=None, kwargs=None)

Aggregate data by the groupby attribute, using the kwargs to perform any arithmetic aggregation on fields (e.g. summation, mean, etc.)

Parameters:
  • groupby – A column in the DataFrame that corresponds to regions by which to aggregate data

  • kwargs – A dict with keys of valid column names (from the DataFrame) and values being lists of aggregation functions to apply to the columns.

For example:

kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'],
          'structural': ['mean', 'std']}

See https://tinyurl.com/54rbacwm for more guidance on using aggregation with pd.DataFrames.

>>> groupby = 'MB_CODE11'
>>> kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'],
              'structural': ['mean', 'std']}
>>> context.aggregate_loss(groupby, kwargs)
categorise(self, bins, labels, field_name)

Bin values into discrete intervals.

Parameters:
  • bins (list) – Monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

  • labels – Specifies the labels for the returned bins. Must be the same length as the resulting bins.

  • field_name (str) – Name of the new column in the exposure_att pandas.DataFrame

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html for more details

>>> bins = [0.0, 0.02, 0.1, 0.2, 0.5, 1.0]
>>> labels = ['Negligible', 'Slight', 'Moderate', 'Extensive',
>>> 'Complete']
>>> field_name = 'Damage state'
>>> context.categorise(bins, labels, field_name)
>>> context.exposure_att.head()
tabulate(self, file_name, index=None, columns=None, aggfunc=None)

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame, then writes to an Excel file. This function does not support data aggregation - multiple values will result in a MultiIndex in the columns.

See https://tinyurl.com/6x535u5t for further details.

Parameters:
  • file_name – destination for the pivot table

  • index – column or list of columns. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

  • columns – column, or list of the columns. Keys to group by on the pivot table column. If an array is passed, it is being used in the same manner as column values.

  • aggfunc – function, list of functions, dict, default numpy.mean. If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves). If dict is passed, the key is column to aggregate and value is function or list of functions.

Example:

Include the following in the configuration file:

- tabulate:
   file_name: wind_impact_table.xlsx
   index: MESHBLOCK_CODE_2011
   columns: Damage state
   aggfunc: size

This will produce a file called “wind_impact_table.xlsx”, with the count of buildings in each “Damage state”, grouped by the index field MESHBLOCK_CODE_2011

hazimp.context.save_csv(write_dict, filename)

Save a dictionary of arrays as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values.

If the array is higher than 1d the other dimensions are averaged to get a 1d array.

Parameters:
  • write_dict (Dictionary.) – Write as a csv file.

  • filename – The csv file will be written here.

hazimp.context.save_csv_agg(write_dict, filename)

Save a pandas.DataFrame as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values.

If the array is higher than 1d the other dimensions are averaged to get a 1d array.

Parameters:
  • write_dict (Dictionary.) – Write as a csv file.

  • filename – The csv file will be written here.