:py:mod:`hazimp.context`
========================

.. py:module:: hazimp.context

.. autoapi-nested-parse::

   The purpose of this module is to provide objects
   to process a series of jobs in a sequential
   order. The order is determined by the queue of jobs.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   hazimp.context.Context


Functions
~~~~~~~~~

.. autoapisummary::

   hazimp.context.save_csv
   hazimp.context.save_csv_agg


Attributes
~~~~~~~~~~

.. autoapisummary::

   hazimp.context.LOGGER
   hazimp.context.DATEFMT
   hazimp.context.EX_LAT
   hazimp.context.EX_LONG


.. py:data:: LOGGER
   

.. py:data:: DATEFMT
   :annotation: = %Y-%m-%d %H:%M:%S %Z

   
.. py:data:: EX_LAT
   :annotation: = exposure_latitude

   
.. py:data:: EX_LONG
   :annotation: = exposure_longitude

   
.. py:class:: Context

   Bases: :py:obj:`object`

   Context is a singleton storing all of the run specific data, such as the
   exposure features and their attributes, vulnerability sets, aggregations,
   pivot tables, provenance, etc.

   :ivar exposure_lat: Latitude values of the exposure data
   :ivar exposure_long: Longitude values of the exposure data

   :ivar exposure_att: A :class:`pandas.DataFrame` to hold the exposure
       attributes
   :ivar exposure_agg: A :class:`pandas.DataFrame.groupby` object that holds
       aggregated exposure data after executing
       :meth:`save_exposure_aggregation`
   :ivar exposure_vuln_curves: A :class:`dict` of
       :class:`hazimp.jobs.RealisedVulnerabilityCurves`
   :ivar vulnerability_sets: A :class:`dict` of the available vulnerability
       sets
   :ivar vul_function_titles: A dictionary with keys being
       vulnerability_set_ids and value being the exposure attribute who's
       values are vulnerability function ID's.
   :ivar pivot: :class:`pandas.DataFrame` for an Excel-style pivot table (e.g.
       for tabulation of results)
   :ivar prov: :class:`prov.ProvDocument` for provenance information.
   :ivar str provlabel: Qualified label for the provenance information
   :ivar str provtitle: Descriptive title for provenance information
   :ivar str provstarttime: Formatted datetime representing the start of
       the analysis.


   .. py:method:: set_prov_label(self, label, title='HazImp analysis')

      Set the qualified label for the provenance data

      :param label: the qualified label name. This is used to reference the
          analysis activity in other functions and methods.
      :param title: Optional value for the dcterms:title element


   .. py:method:: get_site_shape(self)

      Get the :class:`numpy.shape` of sites the context is storing.
      It is based on the shape of :data:`exposure_long`.

      :return: The :class:`numpy.shape` of sites the context is storing.


   .. py:method:: clip_exposure(self, min_long, min_lat, max_long, max_lat)

      Clip the exposure data so only the exposure values within
      the rectangle formed by  max_lat, min_lat, max_long and
      min_long are included.

      Note: This must be called before the exposure_vuln_curves
      are determined, since the curves have a site dimension.

      :param float min_long: minimum longitude of exposure data to use
      :param float min_lat: minimum latitude of exposure data to use
      :param float max_long: maximum longitude of exposure data to use
      :param float max_lat: maximum latitude of exposure data to use


   .. py:method:: save_exposure_atts(self, filename, use_parallel=True)

      Save the exposure attributes, including latitude and longitude.
      The file type saved is based on the filename extension.

      :param use_parallel: Set to True for parallel behaviour, which is only
          node 0 writing to file.
      :param filename: The file to be written. If the extension is '.npz',
          then the arrays are save to an uncompressed numpy format file.

      :return write_dict: The whole dictionary, returned for testing.


   .. py:method:: save_exposure_aggregation(self, filename, use_parallel=True)

      Save the aggregated exposure attributes.
      The file type saved is based on the filename extension.

      :param use_parallel: Set to True for parallel behaviour which is only
          node 0 writing to file.
      :param filename: The file to be written. If the extension is '.npz',
          then the arrays are save to an uncompressed numpy format file.

      :return write_dict: The whole :class:`pandas.DataFrame`, returned for
          testing.


   .. py:method:: save_aggregation(self, filename, boundaries, impactcode, boundarycode, categories, fields, categorise, use_parallel=True)

      Save data aggregated to geospatial regions.

      :param str filename: Destination filename
      :param str boundaries: File name of a geospatial dataset that contains
          geographical boundaries to serve as aggregation boundaries
      :param str impactcode: Field name in the `dframe` to aggregate by
      :param str boundarycode: Corresponding field name in the geospatial
          dataset.
      :param boolean categories: Add columns for the number of buildings in
          each damage state defined in the 'Damage state' attribute. This
          requires that a 'categorise` job has been included in the pipeline,
          which in turn requires the bins and labels to be defined in the job
          configuration.
      :param dict fields: A `dict` with keys of valid column names (from the
          :class:`pandas.DataFrame`) and values being lists of aggregation
          functions to apply to the columns.
      :param dict categorise: categorise job attributes
      :param bool use_parallel: True for parallel behaviour, which is only
          node 0 writing to file


   .. py:method:: aggregate_loss(self, groupby=None, kwargs=None)

      Aggregate data by the `groupby` attribute, using the `kwargs` to
      perform any arithmetic aggregation on fields (e.g. summation,
      mean, etc.)

      :param groupby: A column in the `DataFrame` that corresponds to regions
          by which to aggregate data
      :param kwargs: A `dict` with keys of valid column names (from the
          `DataFrame`) and values being lists of aggregation functions to
          apply to the columns.

      For example::

          kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'],
                    'structural': ['mean', 'std']}

      See https://tinyurl.com/54rbacwm for more guidance on using aggregation
      with :class:`pd.DataFrames`.

      >>> groupby = 'MB_CODE11'
      >>> kwargs = {'REPLACEMENT_VALUE': ['mean', 'sum'],
                    'structural': ['mean', 'std']}
      >>> context.aggregate_loss(groupby, kwargs)


   .. py:method:: categorise(self, bins, labels, field_name)

      Bin values into discrete intervals.

      :param list bins: Monotonically increasing array of bin edges,
          including the rightmost edge, allowing for non-uniform bin widths.
      :param labels: Specifies the labels for the returned
          bins. Must be the same length as the resulting bins.
      :param str field_name: Name of the new column in the `exposure_att`
          :class:`pandas.DataFrame`

      See
      https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html
      for more details

      >>> bins = [0.0, 0.02, 0.1, 0.2, 0.5, 1.0]
      >>> labels = ['Negligible', 'Slight', 'Moderate', 'Extensive',
      >>> 'Complete']
      >>> field_name = 'Damage state'
      >>> context.categorise(bins, labels, field_name)
      >>> context.exposure_att.head()


   .. py:method:: tabulate(self, file_name, index=None, columns=None, aggfunc=None)

      Reshape data (produce a "pivot" table) based on column values. Uses
      unique values from specified `index` / `columns` to form axes of the
      resulting DataFrame, then writes to an Excel file. This function does
      not support data aggregation - multiple values will result in a
      MultiIndex in the columns.

      See https://tinyurl.com/6x535u5t for further details.

      :param file_name: destination for the pivot table
      :param index: column or list of columns. Keys to group by on the pivot
          table index. If an array is passed, it is being used as the same
          manner as column values.
      :param columns: column, or list of the columns. Keys to group by on the
          pivot  table column.  If an array is passed, it is being used in
          the same manner as column values.
      :param aggfunc: function, list of functions, dict, default numpy.mean.
          If list of functions passed, the resulting pivot table will have
          hierarchical columns whose top level are the function names
          (inferred from the function objects themselves). If dict is passed,
          the key is column to aggregate and value is function or list of
          functions.

      Example:

      Include the following in the configuration file::

       - tabulate:
          file_name: wind_impact_table.xlsx
          index: MESHBLOCK_CODE_2011
          columns: Damage state
          aggfunc: size

      This will produce a file called "wind_impact_table.xlsx", with the
      count of buildings in each "Damage state", grouped by the `index` field
      `MESHBLOCK_CODE_2011`


.. py:function:: save_csv(write_dict, filename)

   Save a dictionary of arrays as a csv file.
   the first dimension in the arrays is assumed to have the save length
   for all arrays.
   In the csv file the keys become titles and the arrays become values.

   If the array is higher than 1d the other dimensions are averaged to get a
   1d array.

   :param  write_dict: Write as a csv file.
   :type write_dict: Dictionary.
   :param filename: The csv file will be written here.


.. py:function:: save_csv_agg(write_dict, filename)

   Save a `pandas.DataFrame` as a csv file.
   the first dimension in the arrays is assumed to have the save length
   for all arrays.
   In the csv file the keys become titles and the arrays become values.

   If the array is higher than 1d the other dimensions are averaged to get a
   1d array.

   :param  write_dict: Write as a csv file.
   :type write_dict: Dictionary.
   :param filename: The csv file will be written here.