Fill and Extract Helpers

In order to fill data into the dismod databases in a meaningful way for the cascade, we have two classes that are subclasses of DismodIO and provide easy functionality for filling tables based on a model version’s settings.

Dismod Filler

class cascade_at.dismod.api.dismod_filler.DismodFiller(path, settings_configuration, measurement_inputs, grid_alchemy, parent_location_id, sex_id, child_prior=None, mulcov_prior=None)[source]

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and takes everything from the collector module and puts them into the Dismod database tables in the correct construction.

Dismod Filler wraps a dismod database and fills all of the tables using the measurement inputs object, settings, and the grid alchemy constructor.

It optionally includes rate priors and covariate multiplier priors.

Parameters
  • path (Union[str, Path]) – the path of the dismod database

  • settings_configuration (SettingsConfig) – the settings configuration object

  • measurement_inputs (MeasurementInputs) – the measurement inputs object

  • grid_alchemy (Alchemy) – the grid alchemy object

  • parent_location_id (int) – the parent location ID for this database

  • sex_id (int) – the reference sex for this database

  • child_prior (Optional[Dict[str, Dict[str, ndarray]]]) – a dictionary of child rate priors to use. The first level of the dictionary is the rate name, and the second is the type of prior, being value, age, or dtime.

self.parent_child_model

Model that was constructed from grid_alchemy parameters for one specific parent and its descendents

Examples

>>> from pathlib import Path
>>> from cascade_at.model.grid_alchemy import Alchemy
>>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings
>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE)
>>> inputs = MeasurementInputsFromSettings(settings)
>>> inputs.demographics.location_id = [102, 555] # subset the locations to make it go faster
>>> inputs.get_raw_inputs()
>>> inputs.configure_inputs_for_dismod(settings)
>>> alchemy = Alchemy(settings)
>>> da = DismodFiller(path=Path('temp.db'),
>>>                    settings_configuration=settings,
>>>                    measurement_inputs=inputs,
>>>                    grid_alchemy=alchemy,
>>>                    parent_location_id=1,
>>>                    sex_id=3)
>>> da.fill_for_parent_child()
get_omega_df()[source]

Get the correct omega data frame for this two-level model.

Return type

DataFrame

get_parent_child_model()[source]

Construct a two-level model that corresponds to this parent location ID and its children.

Return type

Model

calculate_reference_covariates()[source]

Calculates reference covariate values based on the input object and the parent/sex we have in the two-level model. Modifies the baseline covariate specs object.

Return type

CovariateSpecs

fill_for_parent_child(**options)[source]

Fills the Dismod database with inputs and a model construction for a parent location and its descendents.

Pass in some optional keyword arguments to fill the option table with additional info or to over-ride the defaults.

Return type

None

node_id_from_location_id(location_id)[source]

Get the node ID from a location ID in an already created node table.

Return type

int

fill_reference_tables()[source]

Fills all of the reference tables including density, node, covariate, age, and time.

fill_data_tables()[source]

Fills the data tables including data and avgint.

fill_grid_tables()[source]

Fills the grid-like tables including weight, rate, smooth, smooth_grid, prior, integrand, mulcov, nslist, nslist_pair.

construct_option_table(**kwargs)[source]

Construct the option table with the default arguments, and if needed can pass in some kwargs to update the dictionary with new options or over-ride old options.

Return type

DataFrame

Dismod Extractor

class cascade_at.dismod.api.dismod_extractor.DismodExtractor(path)[source]

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and extracts helpful data frames from the dismod database tables.

Parameters

path (str) – The database filepath

get_predictions(locations=None, sexes=None, samples=False, predictions=None)[source]

Get the predictions from the predict table for locations and sexes. Will either return a column of ‘mean’ if not samples, otherwise ‘draw’, which can then be reshaped wide if necessary.

Return type

DataFrame

gather_draws_for_prior_grid(location_id, sex_id, rates, value=True, dage=False, dtime=False, samples=True)[source]

Takes draws and formats them for a prior grid for values, dage, and dtime. Assumes that age_lower == age_upper and time_lower == time_upper for all data rows. We might not want to do all value, dage, and dtime, so pass False if you want to skip those.

Parameters
  • location_id (int) –

  • sex_id (int) –

  • rates (List[str]) – list of rates to get the draws for

  • value (bool) – whether to calculate value priors

  • dage (bool) – whether to calculate dage priors

  • dtime (bool) – whether to calculate dtime priors

  • samples (bool) – whether the prior came from samples

Returns

Return type

Dictionary of 3-d arrays of value, dage, and dtime draws over age and time for this loc and sex

format_predictions_for_ihme(gbd_round_id, locations=None, sexes=None, samples=False, predictions=None)[source]

Formats predictions from the prediction table and returns either the mean or draws, based on whether or not samples is False or True.

Parameters
  • locations (Optional[List[int]]) – A list of locations to extract from the predictions

  • sexes (Optional[List[int]]) – A list of sexes to extract from the predictions

  • gbd_round_id (int) – The GBD round ID to format the predictions for

  • samples (bool) – Whether or not the predictions have draws (samples) or whether it is just one fit.

  • predictions (Optional[DataFrame]) – An optional data frame with the predictions to use rather than reading them directly from the database.

Returns

Return type

Data frame with predictions formatted for the IHME databases.

Table Creation

The DismodFiller uses the following table creation functions internally.

Formatting Reference Tables

The dismod database needs some standard reference tables. These are made with the following functions.

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_integrand_table(data_cv_from_settings=None, default_data_cv=0.0)[source]

Constructs the integrand table and adds data CV in the minimum_meas_cv column.

Parameters
  • data_cv_from_settings ((optional dict) key, value pair that has) – integrands mapped to data cv

  • default_data_cv ((float) default value for data CV to use) –

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.default_rate_table()[source]

Constructs the default rate table with rate names and ids.

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_node_table(location_dag)[source]

Constructs the node table from a location DAG’s to_dataframe() method.

Parameters

location_dag (LocationDAG) – location hierarchy object

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_covariate_table(covariates)[source]

Constructs the covariate table from a list of Covariate objects.

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_density_table()[source]

Constructs the default density table.

Return type

DataFrame

Formatting Dismod Data Tables

There are helper functions to create data files. Broke them up into small functions to help with unit testing.

cascade_at.dismod.api.fill_extract_helpers.data_tables.prep_data_avgint(df, node_df, covariate_df)[source]

Preps both the data table and the avgint table by mapping locations to nodes and covariates to names.

Putting it in the same function because it does the same stuff, but data and avgint need to be called separately because dismod requires different columns.

Parameters
  • df (DataFrame) – The data frame to map

  • node_df (DataFrame) – The node table from dismod db

  • covariate_df (DataFrame) – The covariate table from dismod db

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_data_table(df, node_df, covariate_df, ages, times)[source]

Constructs the data table from input df.

Parameters
  • df (DataFrame) – data frame of inputs that have been prepped for dismod

  • node_df (DataFrame) – the dismod node table

  • covariate_df (DataFrame) – the dismod covariate table

  • ages (ndarray) –

  • times (ndarray) –

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_gbd_avgint_table(df, node_df, covariate_df, integrand_df, ages, times)[source]

Constructs the avgint table using the output df from the inputs.to_avgint() method.

Parameters
  • df (DataFrame) – The data frame to construct the avgint table from, that has things like ages, times, nodes (locations), sexes, etc.

  • node_df (DataFrame) – dismod node data frame

  • covariate_df (DataFrame) – dismod covariate data frame

  • integrand_df (DataFrame) – dismod integrand data frame

  • ages (ndarray) – array of ages for the model

  • times (ndarray) – array of times for the model

Return type

DataFrame

Formatting Grid Tables

There are helper functions to create grid tables in the dismod database. These are things like WeightGrid and SmoothGrid.

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_model_tables(model, location_df, age_df, time_df, covariate_df)[source]

Main function that loops through the items from a model object, which include rate, random_effect, alpha, beta, and gamma and constructs the modeling tables in dismod db.

Each of these are “grid” vars, so they need entries in prior, smooth, and smooth_grid. This function returns those tables.

It also constructs the rate, integrand, and mulcov tables (alpha, beta, gamma), plus nslist and nslist_pair tables.

Parameters
  • model (Model) – A model object that has rate information

  • location_df (DataFrame) – A location / node data frame

  • age_df (DataFrame) – An age data frame for dismod

  • time_df (DataFrame) – A time data frame for dismod

  • covariate_df (DataFrame) – A covariate data frame for dismod

Returns

rate, prior, smooth, smooth_grid, mulcov, nslist, nslist_pair, and subgroup tables

Return type

A dictionary of data frames for each table name, includes

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_weight_grid_tables(weights, age_df, time_df)[source]

Constructs the weight and weight_grid tables.”

Parameters
  • weights (Dict[str, Var]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.

  • age_df – Age data frame from dismod db

  • time_df – Time data frame from dismod db

Returns

Return type

Tuple of the weight table and the weight grid table

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_subgroup_table()[source]

Constructs the default subgroup table. If we want to actually use the subgroup table, need to build this in.

Return type

DataFrame

Helper Functions

Posterior to Prior

When we do “posterior to prior” that means to take the fit from a parent database and use the rate posteriors as the prior for the child fits. This happens in DismodFiller when it builds the two-level model with Alchemy because it replaces the default priors with the ones passed in.

The posterior is passed down by predicting the parent model on the rate grid for the children. To construct the rate grid, we use the following function:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.get_prior_avgint_grid(grids, sexes, locations, midpoint=False)[source]

Get a data frame to use for setting up posterior predictions on a grid. The grids are specified in the grids parameter.

Will still need to have covariates added to it, and prep data from dismod.api.data_tables.prep_data_avgint to convert nodes and covariate names before it can be input into the avgint table in a database.

Parameters
  • grids (Dict[str, Dict[str, ndarray]]) – A dictionary of grids with keys for each integrand, which are dictionaries for “age” and “time”.

  • sexes (List[int]) – A list of sexes

  • locations (List[int]) – A list of locations

  • midpoint (bool) – Whether to midpoint the grid lower and upper values (recommended for rates).

Returns

“avgint_id”, “integrand_id”, “location_id”, “weight_id”, “subgroup_id”, “age_lower”, “age_upper”, “time_lower”, “time_upper”, “sex_id”

Return type

Dataframe with columns

And then to upload those priors from the rate grid to the IHME databases since the IHME databases require standard GBD ages and times, we use this function. This is just for visualization purposes:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.format_rate_grid_for_ihme(rates, gbd_round_id, location_id, sex_id)[source]

Formats a grid of mean, upper, and lower for a prior rate for the IHME database. Only does this for Gaussian priors.

Parameters
  • rates (Dict[str, SmoothGrid]) – A dictionary of SmoothGrids, keyed by primary rates like “iota”

  • gbd_round_id (int) – the GBD round

  • location_id (int) – the location ID to append to this data frame

  • sex_id (int) – the sex ID to append to this data frame

Returns

Return type

A data frame formatted for the IHME databases

Multithreading

When we want to do multithreading on a dismod database, we can define some process that works, for example, on only a subset of a database’s data or samples, etc. In order to do this work, there is a base class here that is subclassed in sample and Predict since there are tasks that can be done in parallel on one database.

class cascade_at.dismod.api.multithreading._DismodThread(main_db, index_file_pattern)[source]

Splits a dismod database into multiple databases to run parallel processes on the database. The work happens when you call an instantiated _DismodThread.

cascade_at.dismod.api.multithreading.dmdismod_in_parallel(dm_thread, sims, n_pool)[source]

Run a dismod thread in parallel by constructing a multiprocessing pool. A dismod thread is anything that is based off of _DismodThread so it has a __call__ method with an overridden _process method.