hazimp.context
¶
The purpose of this module is to provide objects to process a series of jobs in a sequential order. The order is determined by the queue of jobs.
Module Contents¶
Classes¶
Context is a singleton storing all of the run specific data, such as the |
Functions¶
|
Save a dictionary of arrays as a csv file. |
|
Save a pandas.DataFrame as a csv file. |
Attributes¶
- hazimp.context.LOGGER¶
- hazimp.context.DATEFMT = %Y-%m-%d %H:%M:%S %Z¶
- hazimp.context.EX_LAT = exposure_latitude¶
- hazimp.context.EX_LONG = exposure_longitude¶
- class hazimp.context.Context¶
Bases:
object
Context is a singleton storing all of the run specific data, such as the exposure features and their attributes, vulnerability sets, aggregations, pivot tables, provenance, etc.
- Variables:
exposure_lat – Latitude values of the exposure data
exposure_long – Longitude values of the exposure data
exposure_att – A
pandas.DataFrame
to hold the exposure attributesexposure_agg – A
pandas.DataFrame.groupby
object that holds aggregated exposure data after executingsave_exposure_aggregation()
exposure_vuln_curves – A
dict
ofhazimp.jobs.RealisedVulnerabilityCurves
vulnerability_sets – A
dict
of the available vulnerability setsvul_function_titles – A dictionary with keys being vulnerability_set_ids and value being the exposure attribute who’s values are vulnerability function ID’s.
pivot –
pandas.DataFrame
for an Excel-style pivot table (e.g. for tabulation of results)prov –
prov.ProvDocument
for provenance information.provlabel (str) – Qualified label for the provenance information
provtitle (str) – Descriptive title for provenance information
provstarttime (str) – Formatted datetime representing the start of the analysis.
- set_prov_label(self, label, title='HazImp analysis')¶
Set the qualified label for the provenance data
- Parameters:
label – the qualified label name. This is used to reference the analysis activity in other functions and methods.
title – Optional value for the dcterms:title element
- get_site_shape(self)¶
Get the
numpy.shape
of sites the context is storing. It is based on the shape ofexposure_long
.- Returns:
The
numpy.shape
of sites the context is storing.
- clip_exposure(self, min_long, min_lat, max_long, max_lat)¶
Clip the exposure data so only the exposure values within the rectangle formed by max_lat, min_lat, max_long and min_long are included.
Note: This must be called before the exposure_vuln_curves are determined, since the curves have a site dimension.
- Parameters:
min_long (float) – minimum longitude of exposure data to use
min_lat (float) – minimum latitude of exposure data to use
max_long (float) – maximum longitude of exposure data to use
max_lat (float) – maximum latitude of exposure data to use
- save_exposure_atts(self, filename, use_parallel=True)¶
Save the exposure attributes, including latitude and longitude. The file type saved is based on the filename extension.
- Parameters:
use_parallel – Set to True for parallel behaviour, which is only node 0 writing to file.
filename – The file to be written. If the extension is ‘.npz’, then the arrays are save to an uncompressed numpy format file.
- Return write_dict:
The whole dictionary, returned for testing.
- save_exposure_aggregation(self, filename, use_parallel=True)¶
Save the aggregated exposure attributes. The file type saved is based on the filename extension.
- Parameters:
use_parallel – Set to True for parallel behaviour which is only node 0 writing to file.
filename – The file to be written. If the extension is ‘.npz’, then the arrays are save to an uncompressed numpy format file.
- Return write_dict:
The whole
pandas.DataFrame
, returned for testing.
- save_aggregation(self, filename, boundaries, impactcode, boundarycode, categories, fields, categorise, use_parallel=True)¶
Save data aggregated to geospatial regions.
- Parameters:
filename (str) – Destination filename
boundaries (str) – File name of a geospatial dataset that contains geographical boundaries to serve as aggregation boundaries
impactcode (str) – Field name in the dframe to aggregate by
boundarycode (str) – Corresponding field name in the geospatial dataset.
categories (boolean) – Add columns for the number of buildings in each damage state defined in the ‘Damage state’ attribute. This requires that a ‘categorise` job has been included in the pipeline, which in turn requires the bins and labels to be defined in the job configuration.
fields (dict) – A dict with keys of valid column names (from the
pandas.DataFrame
) and values being lists of aggregation functions to apply to the columns.categorise (dict) – categorise job attributes
use_parallel (bool) – True for parallel behaviour, which is only node 0 writing to file
- aggregate_loss(self, groupby=None, kwargs=None)¶
Aggregate data by the groupby attribute, using the kwargs to perform any arithmetic aggregation on fields (e.g. summation, mean, etc.)
- Parameters:
groupby – A column in the DataFrame that corresponds to regions by which to aggregate data
kwargs – A dict with keys of valid column names (from the DataFrame) and values being lists of aggregation functions to apply to the columns.
For example:
kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'], 'structural': ['mean', 'std']}
See https://tinyurl.com/54rbacwm for more guidance on using aggregation with
pd.DataFrames
.>>> groupby = 'MB_CODE11' >>> kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'], 'structural': ['mean', 'std']} >>> context.aggregate_loss(groupby, kwargs)
- categorise(self, bins, labels, field_name)¶
Bin values into discrete intervals.
- Parameters:
bins (list) – Monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
labels – Specifies the labels for the returned bins. Must be the same length as the resulting bins.
field_name (str) – Name of the new column in the exposure_att
pandas.DataFrame
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html for more details
>>> bins = [0.0, 0.02, 0.1, 0.2, 0.5, 1.0] >>> labels = ['Negligible', 'Slight', 'Moderate', 'Extensive', >>> 'Complete'] >>> field_name = 'Damage state' >>> context.categorise(bins, labels, field_name) >>> context.exposure_att.head()
- tabulate(self, file_name, index=None, columns=None, aggfunc=None)¶
Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame, then writes to an Excel file. This function does not support data aggregation - multiple values will result in a MultiIndex in the columns.
See https://tinyurl.com/6x535u5t for further details.
- Parameters:
file_name – destination for the pivot table
index – column or list of columns. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
columns – column, or list of the columns. Keys to group by on the pivot table column. If an array is passed, it is being used in the same manner as column values.
aggfunc – function, list of functions, dict, default numpy.mean. If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves). If dict is passed, the key is column to aggregate and value is function or list of functions.
Example:
Include the following in the configuration file:
- tabulate: file_name: wind_impact_table.xlsx index: MESHBLOCK_CODE_2011 columns: Damage state aggfunc: size
This will produce a file called “wind_impact_table.xlsx”, with the count of buildings in each “Damage state”, grouped by the index field MESHBLOCK_CODE_2011
- hazimp.context.save_csv(write_dict, filename)¶
Save a dictionary of arrays as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values.
If the array is higher than 1d the other dimensions are averaged to get a 1d array.
- Parameters:
write_dict (Dictionary.) – Write as a csv file.
filename – The csv file will be written here.
- hazimp.context.save_csv_agg(write_dict, filename)¶
Save a pandas.DataFrame as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values.
If the array is higher than 1d the other dimensions are averaged to get a 1d array.
- Parameters:
write_dict (Dictionary.) – Write as a csv file.
filename – The csv file will be written here.