Fill and Extract Helpers¶

In order to fill data into the dismod databases in a meaningful way for the cascade, we have two classes that are subclasses of DismodIO and provide easy functionality for filling tables based on a model version’s settings.

Dismod Filler¶

class cascade_at.dismod.api.dismod_filler.DismodFiller(path, settings_configuration, measurement_inputs, grid_alchemy, parent_location_id, sex_id, child_prior=None, mulcov_prior=None)[source]¶

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and takes everything from the collector module and puts them into the Dismod database tables in the correct construction.

Dismod Filler wraps a dismod database and fills all of the tables using the measurement inputs object, settings, and the grid alchemy constructor.

It optionally includes rate priors and covariate multiplier priors.

Parameters

path (Union[str, Path]) – the path of the dismod database
settings_configuration (SettingsConfig) – the settings configuration object
measurement_inputs (MeasurementInputs) – the measurement inputs object
grid_alchemy (Alchemy) – the grid alchemy object
parent_location_id (int) – the parent location ID for this database
sex_id (int) – the reference sex for this database
child_prior (Optional[Dict[str, Dict[str, ndarray]]]) – a dictionary of child rate priors to use. The first level of the dictionary is the rate name, and the second is the type of prior, being value, age, or dtime.

self.parent_child_model¶: Model that was constructed from grid_alchemy parameters for one specific parent and its descendents

Examples

>>> from pathlib import Path
>>> from cascade_at.model.grid_alchemy import Alchemy
>>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings
>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings

>>> settings = load_settings(BASE_CASE)
>>> inputs = MeasurementInputsFromSettings(settings)
>>> inputs.demographics.location_id = [102, 555] # subset the locations to make it go faster
>>> inputs.get_raw_inputs()
>>> inputs.configure_inputs_for_dismod(settings)
>>> alchemy = Alchemy(settings)

>>> da = DismodFiller(path=Path('temp.db'),
>>>                    settings_configuration=settings,
>>>                    measurement_inputs=inputs,
>>>                    grid_alchemy=alchemy,
>>>                    parent_location_id=1,
>>>                    sex_id=3)
>>> da.fill_for_parent_child()

get_omega_df()[source]¶

Get the correct omega data frame for this two-level model.

Return type: DataFrame

get_parent_child_model()[source]¶

Construct a two-level model that corresponds to this parent location ID and its children.

Return type: Model

calculate_reference_covariates()[source]¶

Calculates reference covariate values based on the input object and the parent/sex we have in the two-level model. Modifies the baseline covariate specs object.

Return type: CovariateSpecs

fill_for_parent_child(**options)[source]¶

Fills the Dismod database with inputs and a model construction for a parent location and its descendents.

Pass in some optional keyword arguments to fill the option table with additional info or to over-ride the defaults.

Return type: None

node_id_from_location_id(location_id)[source]¶

Get the node ID from a location ID in an already created node table.

Return type: int

fill_reference_tables()[source]¶: Fills all of the reference tables including density, node, covariate, age, and time.

fill_data_tables()[source]¶: Fills the data tables including data and avgint.

fill_grid_tables()[source]¶: Fills the grid-like tables including weight, rate, smooth, smooth_grid, prior, integrand, mulcov, nslist, nslist_pair.

construct_option_table(**kwargs)[source]¶

Construct the option table with the default arguments, and if needed can pass in some kwargs to update the dictionary with new options or over-ride old options.

Return type: DataFrame

Dismod Extractor¶

class cascade_at.dismod.api.dismod_extractor.DismodExtractor(path)[source]¶

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and extracts helpful data frames from the dismod database tables.

Parameters: path (str) – The database filepath

get_predictions(locations=None, sexes=None, samples=False, predictions=None)[source]¶

Get the predictions from the predict table for locations and sexes. Will either return a column of ‘mean’ if not samples, otherwise ‘draw’, which can then be reshaped wide if necessary.

Return type: DataFrame

gather_draws_for_prior_grid(location_id, sex_id, rates, value=True, dage=False, dtime=False, samples=True)[source]¶

Takes draws and formats them for a prior grid for values, dage, and dtime. Assumes that age_lower == age_upper and time_lower == time_upper for all data rows. We might not want to do all value, dage, and dtime, so pass False if you want to skip those.

Parameters

location_id (int) –
sex_id (int) –
rates (List[str]) – list of rates to get the draws for
value (bool) – whether to calculate value priors
dage (bool) – whether to calculate dage priors
dtime (bool) – whether to calculate dtime priors
samples (bool) – whether the prior came from samples

Returns

Return type

Dictionary of 3-d arrays of value, dage, and dtime draws over age and time for this loc and sex

format_predictions_for_ihme(gbd_round_id, locations=None, sexes=None, samples=False, predictions=None)[source]¶

Formats predictions from the prediction table and returns either the mean or draws, based on whether or not samples is False or True.

Parameters

locations (Optional[List[int]]) – A list of locations to extract from the predictions
sexes (Optional[List[int]]) – A list of sexes to extract from the predictions
gbd_round_id (int) – The GBD round ID to format the predictions for
samples (bool) – Whether or not the predictions have draws (samples) or whether it is just one fit.
predictions (Optional[DataFrame]) – An optional data frame with the predictions to use rather than reading them directly from the database.

Returns

Return type

Data frame with predictions formatted for the IHME databases.

Table Creation¶

The DismodFiller uses the following table creation functions internally.

Formatting Reference Tables¶

The dismod database needs some standard reference tables. These are made with the following functions.

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_integrand_table(data_cv_from_settings=None, default_data_cv=0.0)[source]¶

Constructs the integrand table and adds data CV in the minimum_meas_cv column.

Parameters

data_cv_from_settings ((optional dict) key, value pair that has) – integrands mapped to data cv
default_data_cv ((float) default value for data CV to use) –

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.default_rate_table()[source]¶

Constructs the default rate table with rate names and ids.

Return type: DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_node_table(location_dag)[source]¶

Constructs the node table from a location DAG’s to_dataframe() method.

Parameters: location_dag (LocationDAG) – location hierarchy object
Return type: DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_covariate_table(covariates)[source]¶

Constructs the covariate table from a list of Covariate objects.

Return type: DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_density_table()[source]¶

Constructs the default density table.

Return type: DataFrame

Formatting Dismod Data Tables¶

There are helper functions to create data files. Broke them up into small functions to help with unit testing.

cascade_at.dismod.api.fill_extract_helpers.data_tables.prep_data_avgint(df, node_df, covariate_df)[source]¶

Preps both the data table and the avgint table by mapping locations to nodes and covariates to names.

Putting it in the same function because it does the same stuff, but data and avgint need to be called separately because dismod requires different columns.

Parameters

df (DataFrame) – The data frame to map
node_df (DataFrame) – The node table from dismod db
covariate_df (DataFrame) – The covariate table from dismod db

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_data_table(df, node_df, covariate_df, ages, times)[source]¶

Constructs the data table from input df.

Parameters

df (DataFrame) – data frame of inputs that have been prepped for dismod
node_df (DataFrame) – the dismod node table
covariate_df (DataFrame) – the dismod covariate table
ages (ndarray) –
times (ndarray) –

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_gbd_avgint_table(df, node_df, covariate_df, integrand_df, ages, times)[source]¶

Constructs the avgint table using the output df from the inputs.to_avgint() method.

Parameters

df (DataFrame) – The data frame to construct the avgint table from, that has things like ages, times, nodes (locations), sexes, etc.
node_df (DataFrame) – dismod node data frame
covariate_df (DataFrame) – dismod covariate data frame
integrand_df (DataFrame) – dismod integrand data frame
ages (ndarray) – array of ages for the model
times (ndarray) – array of times for the model

Return type

DataFrame

Formatting Grid Tables¶

There are helper functions to create grid tables in the dismod database. These are things like WeightGrid and SmoothGrid.

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_model_tables(model, location_df, age_df, time_df, covariate_df)[source]¶

Main function that loops through the items from a model object, which include rate, random_effect, alpha, beta, and gamma and constructs the modeling tables in dismod db.

Each of these are “grid” vars, so they need entries in prior, smooth, and smooth_grid. This function returns those tables.

It also constructs the rate, integrand, and mulcov tables (alpha, beta, gamma), plus nslist and nslist_pair tables.

Parameters

model (Model) – A model object that has rate information
location_df (DataFrame) – A location / node data frame
age_df (DataFrame) – An age data frame for dismod
time_df (DataFrame) – A time data frame for dismod
covariate_df (DataFrame) – A covariate data frame for dismod

Returns

rate, prior, smooth, smooth_grid, mulcov, nslist, nslist_pair, and subgroup tables

Return type

A dictionary of data frames for each table name, includes

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_weight_grid_tables(weights, age_df, time_df)[source]¶

Constructs the weight and weight_grid tables.”

Parameters

weights (Dict[str, Var]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.
age_df – Age data frame from dismod db
time_df – Time data frame from dismod db

Returns

Return type

Tuple of the weight table and the weight grid table

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_subgroup_table()[source]¶

Constructs the default subgroup table. If we want to actually use the subgroup table, need to build this in.

Return type: DataFrame

Helper Functions¶

Posterior to Prior¶

When we do “posterior to prior” that means to take the fit from a parent database and use the rate posteriors as the prior for the child fits. This happens in DismodFiller when it builds the two-level model with Alchemy because it replaces the default priors with the ones passed in.

The posterior is passed down by predicting the parent model on the rate grid for the children. To construct the rate grid, we use the following function:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.get_prior_avgint_grid(grids, sexes, locations, midpoint=False)[source]¶

Get a data frame to use for setting up posterior predictions on a grid. The grids are specified in the grids parameter.

Will still need to have covariates added to it, and prep data from dismod.api.data_tables.prep_data_avgint to convert nodes and covariate names before it can be input into the avgint table in a database.

Parameters

grids (Dict[str, Dict[str, ndarray]]) – A dictionary of grids with keys for each integrand, which are dictionaries for “age” and “time”.
sexes (List[int]) – A list of sexes
locations (List[int]) – A list of locations
midpoint (bool) – Whether to midpoint the grid lower and upper values (recommended for rates).

Returns

“avgint_id”, “integrand_id”, “location_id”, “weight_id”, “subgroup_id”, “age_lower”, “age_upper”, “time_lower”, “time_upper”, “sex_id”

Return type

Dataframe with columns

And then to upload those priors from the rate grid to the IHME databases since the IHME databases require standard GBD ages and times, we use this function. This is just for visualization purposes:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.format_rate_grid_for_ihme(rates, gbd_round_id, location_id, sex_id)[source]¶

Formats a grid of mean, upper, and lower for a prior rate for the IHME database. Only does this for Gaussian priors.

Parameters

rates (Dict[str, SmoothGrid]) – A dictionary of SmoothGrids, keyed by primary rates like “iota”
gbd_round_id (int) – the GBD round
location_id (int) – the location ID to append to this data frame
sex_id (int) – the sex ID to append to this data frame

Returns

Return type

A data frame formatted for the IHME databases

Multithreading¶

When we want to do multithreading on a dismod database, we can define some process that works, for example, on only a subset of a database’s data or samples, etc. In order to do this work, there is a base class here that is subclassed in sample and Predict since there are tasks that can be done in parallel on one database.

class cascade_at.dismod.api.multithreading._DismodThread(main_db, index_file_pattern)[source]¶: Splits a dismod database into multiple databases to run parallel processes on the database. The work happens when you call an instantiated _DismodThread.

cascade_at.dismod.api.multithreading.dmdismod_in_parallel(dm_thread, sims, n_pool)[source]¶: Run a dismod thread in parallel by constructing a multiprocessing pool. A dismod thread is anything that is based off of _DismodThread so it has a __call__ method with an overridden _process method.