:py:mod:`hazimp.context` ======================== .. py:module:: hazimp.context .. autoapi-nested-parse:: The purpose of this module is to provide objects to process a series of jobs in a sequential order. The order is determined by the queue of jobs. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: hazimp.context.Context Functions ~~~~~~~~~ .. autoapisummary:: hazimp.context.save_csv hazimp.context.save_csv_agg Attributes ~~~~~~~~~~ .. autoapisummary:: hazimp.context.LOGGER hazimp.context.DATEFMT hazimp.context.EX_LAT hazimp.context.EX_LONG .. py:data:: LOGGER .. py:data:: DATEFMT :annotation: = %Y-%m-%d %H:%M:%S %Z .. py:data:: EX_LAT :annotation: = exposure_latitude .. py:data:: EX_LONG :annotation: = exposure_longitude .. py:class:: Context Bases: :py:obj:`object` Context is a singleton storing all of the run specific data, such as the exposure features and their attributes, vulnerability sets, aggregations, pivot tables, provenance, etc. :ivar exposure_lat: Latitude values of the exposure data :ivar exposure_long: Longitude values of the exposure data :ivar exposure_att: A :class:`pandas.DataFrame` to hold the exposure attributes :ivar exposure_agg: A :class:`pandas.DataFrame.groupby` object that holds aggregated exposure data after executing :meth:`save_exposure_aggregation` :ivar exposure_vuln_curves: A :class:`dict` of :class:`hazimp.jobs.RealisedVulnerabilityCurves` :ivar vulnerability_sets: A :class:`dict` of the available vulnerability sets :ivar vul_function_titles: A dictionary with keys being vulnerability_set_ids and value being the exposure attribute who's values are vulnerability function ID's. :ivar pivot: :class:`pandas.DataFrame` for an Excel-style pivot table (e.g. for tabulation of results) :ivar prov: :class:`prov.ProvDocument` for provenance information. :ivar str provlabel: Qualified label for the provenance information :ivar str provtitle: Descriptive title for provenance information :ivar str provstarttime: Formatted datetime representing the start of the analysis. .. py:method:: set_prov_label(self, label, title='HazImp analysis') Set the qualified label for the provenance data :param label: the qualified label name. This is used to reference the analysis activity in other functions and methods. :param title: Optional value for the dcterms:title element .. py:method:: get_site_shape(self) Get the :class:`numpy.shape` of sites the context is storing. It is based on the shape of :data:`exposure_long`. :return: The :class:`numpy.shape` of sites the context is storing. .. py:method:: clip_exposure(self, min_long, min_lat, max_long, max_lat) Clip the exposure data so only the exposure values within the rectangle formed by max_lat, min_lat, max_long and min_long are included. Note: This must be called before the exposure_vuln_curves are determined, since the curves have a site dimension. :param float min_long: minimum longitude of exposure data to use :param float min_lat: minimum latitude of exposure data to use :param float max_long: maximum longitude of exposure data to use :param float max_lat: maximum latitude of exposure data to use .. py:method:: save_exposure_atts(self, filename, use_parallel=True) Save the exposure attributes, including latitude and longitude. The file type saved is based on the filename extension. :param use_parallel: Set to True for parallel behaviour, which is only node 0 writing to file. :param filename: The file to be written. If the extension is '.npz', then the arrays are save to an uncompressed numpy format file. :return write_dict: The whole dictionary, returned for testing. .. py:method:: save_exposure_aggregation(self, filename, use_parallel=True) Save the aggregated exposure attributes. The file type saved is based on the filename extension. :param use_parallel: Set to True for parallel behaviour which is only node 0 writing to file. :param filename: The file to be written. If the extension is '.npz', then the arrays are save to an uncompressed numpy format file. :return write_dict: The whole :class:`pandas.DataFrame`, returned for testing. .. py:method:: save_aggregation(self, filename, boundaries, impactcode, boundarycode, categories, fields, categorise, use_parallel=True) Save data aggregated to geospatial regions. :param str filename: Destination filename :param str boundaries: File name of a geospatial dataset that contains geographical boundaries to serve as aggregation boundaries :param str impactcode: Field name in the `dframe` to aggregate by :param str boundarycode: Corresponding field name in the geospatial dataset. :param boolean categories: Add columns for the number of buildings in each damage state defined in the 'Damage state' attribute. This requires that a 'categorise` job has been included in the pipeline, which in turn requires the bins and labels to be defined in the job configuration. :param dict fields: A `dict` with keys of valid column names (from the :class:`pandas.DataFrame`) and values being lists of aggregation functions to apply to the columns. :param dict categorise: categorise job attributes :param bool use_parallel: True for parallel behaviour, which is only node 0 writing to file .. py:method:: aggregate_loss(self, groupby=None, kwargs=None) Aggregate data by the `groupby` attribute, using the `kwargs` to perform any arithmetic aggregation on fields (e.g. summation, mean, etc.) :param groupby: A column in the `DataFrame` that corresponds to regions by which to aggregate data :param kwargs: A `dict` with keys of valid column names (from the `DataFrame`) and values being lists of aggregation functions to apply to the columns. For example:: kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'], 'structural': ['mean', 'std']} See https://tinyurl.com/54rbacwm for more guidance on using aggregation with :class:`pd.DataFrames`. >>> groupby = 'MB_CODE11' >>> kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'], 'structural': ['mean', 'std']} >>> context.aggregate_loss(groupby, kwargs) .. py:method:: categorise(self, bins, labels, field_name) Bin values into discrete intervals. :param list bins: Monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths. :param labels: Specifies the labels for the returned bins. Must be the same length as the resulting bins. :param str field_name: Name of the new column in the `exposure_att` :class:`pandas.DataFrame` See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html for more details >>> bins = [0.0, 0.02, 0.1, 0.2, 0.5, 1.0] >>> labels = ['Negligible', 'Slight', 'Moderate', 'Extensive', >>> 'Complete'] >>> field_name = 'Damage state' >>> context.categorise(bins, labels, field_name) >>> context.exposure_att.head() .. py:method:: tabulate(self, file_name, index=None, columns=None, aggfunc=None) Reshape data (produce a "pivot" table) based on column values. Uses unique values from specified `index` / `columns` to form axes of the resulting DataFrame, then writes to an Excel file. This function does not support data aggregation - multiple values will result in a MultiIndex in the columns. See https://tinyurl.com/6x535u5t for further details. :param file_name: destination for the pivot table :param index: column or list of columns. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values. :param columns: column, or list of the columns. Keys to group by on the pivot table column. If an array is passed, it is being used in the same manner as column values. :param aggfunc: function, list of functions, dict, default numpy.mean. If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves). If dict is passed, the key is column to aggregate and value is function or list of functions. Example: Include the following in the configuration file:: - tabulate: file_name: wind_impact_table.xlsx index: MESHBLOCK_CODE_2011 columns: Damage state aggfunc: size This will produce a file called "wind_impact_table.xlsx", with the count of buildings in each "Damage state", grouped by the `index` field `MESHBLOCK_CODE_2011` .. py:function:: save_csv(write_dict, filename) Save a dictionary of arrays as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values. If the array is higher than 1d the other dimensions are averaged to get a 1d array. :param write_dict: Write as a csv file. :type write_dict: Dictionary. :param filename: The csv file will be written here. .. py:function:: save_csv_agg(write_dict, filename) Save a `pandas.DataFrame` as a csv file. the first dimension in the arrays is assumed to have the save length for all arrays. In the csv file the keys become titles and the arrays become values. If the array is higher than 1d the other dimensions are averaged to get a 1d array. :param write_dict: Write as a csv file. :type write_dict: Dictionary. :param filename: The csv file will be written here.