Fill and Extract Helpers¶
In order to fill data into the dismod databases
in a meaningful way for the cascade, we have two
classes that are subclasses
of DismodIO
and provide easy functionality for filling tables
based on a model version’s settings.
Dismod Filler¶
-
class
cascade_at.dismod.api.dismod_filler.
DismodFiller
(path, settings_configuration, measurement_inputs, grid_alchemy, parent_location_id, sex_id, child_prior=None, mulcov_prior=None)[source]¶ Bases:
cascade_at.dismod.api.dismod_io.DismodIO
Sits on top of the DismodIO class, and takes everything from the collector module and puts them into the Dismod database tables in the correct construction.
Dismod Filler wraps a dismod database and fills all of the tables using the measurement inputs object, settings, and the grid alchemy constructor.
It optionally includes rate priors and covariate multiplier priors.
- Parameters
path (
Union
[str
,Path
]) – the path of the dismod databasesettings_configuration (
SettingsConfig
) – the settings configuration objectmeasurement_inputs (
MeasurementInputs
) – the measurement inputs objectgrid_alchemy (
Alchemy
) – the grid alchemy objectparent_location_id (
int
) – the parent location ID for this databasesex_id (
int
) – the reference sex for this databasechild_prior (
Optional
[Dict
[str
,Dict
[str
,ndarray
]]]) – a dictionary of child rate priors to use. The first level of the dictionary is the rate name, and the second is the type of prior, being value, age, or dtime.
-
self.
parent_child_model
¶ Model that was constructed from grid_alchemy parameters for one specific parent and its descendents
Examples
>>> from pathlib import Path >>> from cascade_at.model.grid_alchemy import Alchemy >>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings >>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE) >>> inputs = MeasurementInputsFromSettings(settings) >>> inputs.demographics.location_id = [102, 555] # subset the locations to make it go faster >>> inputs.get_raw_inputs() >>> inputs.configure_inputs_for_dismod(settings) >>> alchemy = Alchemy(settings)
>>> da = DismodFiller(path=Path('temp.db'), >>> settings_configuration=settings, >>> measurement_inputs=inputs, >>> grid_alchemy=alchemy, >>> parent_location_id=1, >>> sex_id=3) >>> da.fill_for_parent_child()
-
get_omega_df
()[source]¶ Get the correct omega data frame for this two-level model.
- Return type
DataFrame
-
get_parent_child_model
()[source]¶ Construct a two-level model that corresponds to this parent location ID and its children.
- Return type
-
calculate_reference_covariates
()[source]¶ Calculates reference covariate values based on the input object and the parent/sex we have in the two-level model. Modifies the baseline covariate specs object.
- Return type
-
fill_for_parent_child
(**options)[source]¶ Fills the Dismod database with inputs and a model construction for a parent location and its descendents.
Pass in some optional keyword arguments to fill the option table with additional info or to over-ride the defaults.
- Return type
None
-
node_id_from_location_id
(location_id)[source]¶ Get the node ID from a location ID in an already created node table.
- Return type
int
-
fill_reference_tables
()[source]¶ Fills all of the reference tables including density, node, covariate, age, and time.
Dismod Extractor¶
-
class
cascade_at.dismod.api.dismod_extractor.
DismodExtractor
(path)[source]¶ Bases:
cascade_at.dismod.api.dismod_io.DismodIO
Sits on top of the DismodIO class, and extracts helpful data frames from the dismod database tables.
- Parameters
path (
str
) – The database filepath
-
get_predictions
(locations=None, sexes=None, samples=False, predictions=None)[source]¶ Get the predictions from the predict table for locations and sexes. Will either return a column of ‘mean’ if not samples, otherwise ‘draw’, which can then be reshaped wide if necessary.
- Return type
DataFrame
-
gather_draws_for_prior_grid
(location_id, sex_id, rates, value=True, dage=False, dtime=False, samples=True)[source]¶ Takes draws and formats them for a prior grid for values, dage, and dtime. Assumes that age_lower == age_upper and time_lower == time_upper for all data rows. We might not want to do all value, dage, and dtime, so pass False if you want to skip those.
- Parameters
location_id (
int
) –sex_id (
int
) –rates (
List
[str
]) – list of rates to get the draws forvalue (
bool
) – whether to calculate value priorsdage (
bool
) – whether to calculate dage priorsdtime (
bool
) – whether to calculate dtime priorssamples (
bool
) – whether the prior came from samples
- Returns
- Return type
Dictionary of 3-d arrays of value, dage, and dtime draws over age and time for this loc and sex
-
format_predictions_for_ihme
(gbd_round_id, locations=None, sexes=None, samples=False, predictions=None)[source]¶ Formats predictions from the prediction table and returns either the mean or draws, based on whether or not samples is False or True.
- Parameters
locations (
Optional
[List
[int
]]) – A list of locations to extract from the predictionssexes (
Optional
[List
[int
]]) – A list of sexes to extract from the predictionsgbd_round_id (
int
) – The GBD round ID to format the predictions forsamples (
bool
) – Whether or not the predictions have draws (samples) or whether it is just one fit.predictions (
Optional
[DataFrame
]) – An optional data frame with the predictions to use rather than reading them directly from the database.
- Returns
- Return type
Data frame with predictions formatted for the IHME databases.
Table Creation¶
The DismodFiller
uses the following table creation functions internally.
Formatting Reference Tables¶
The dismod database needs some standard reference tables. These are made with the following functions.
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
construct_integrand_table
(data_cv_from_settings=None, default_data_cv=0.0)[source]¶ Constructs the integrand table and adds data CV in the minimum_meas_cv column.
- Parameters
data_cv_from_settings ((optional dict) key, value pair that has) – integrands mapped to data cv
default_data_cv ((float) default value for data CV to use) –
- Return type
DataFrame
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
default_rate_table
()[source]¶ Constructs the default rate table with rate names and ids.
- Return type
DataFrame
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
construct_node_table
(location_dag)[source]¶ Constructs the node table from a location DAG’s to_dataframe() method.
- Parameters
location_dag (
LocationDAG
) – location hierarchy object- Return type
DataFrame
Formatting Dismod Data Tables¶
There are helper functions to create data files. Broke them up into small functions to help with unit testing.
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
prep_data_avgint
(df, node_df, covariate_df)[source]¶ Preps both the data table and the avgint table by mapping locations to nodes and covariates to names.
Putting it in the same function because it does the same stuff, but data and avgint need to be called separately because dismod requires different columns.
- Parameters
df (
DataFrame
) – The data frame to mapnode_df (
DataFrame
) – The node table from dismod dbcovariate_df (
DataFrame
) – The covariate table from dismod db
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
construct_data_table
(df, node_df, covariate_df, ages, times)[source]¶ Constructs the data table from input df.
- Parameters
df (
DataFrame
) – data frame of inputs that have been prepped for dismodnode_df (
DataFrame
) – the dismod node tablecovariate_df (
DataFrame
) – the dismod covariate tableages (
ndarray
) –times (
ndarray
) –
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
construct_gbd_avgint_table
(df, node_df, covariate_df, integrand_df, ages, times)[source]¶ Constructs the avgint table using the output df from the inputs.to_avgint() method.
- Parameters
df (
DataFrame
) – The data frame to construct the avgint table from, that has things like ages, times, nodes (locations), sexes, etc.node_df (
DataFrame
) – dismod node data framecovariate_df (
DataFrame
) – dismod covariate data frameintegrand_df (
DataFrame
) – dismod integrand data frameages (
ndarray
) – array of ages for the modeltimes (
ndarray
) – array of times for the model
- Return type
DataFrame
Formatting Grid Tables¶
There are helper functions to create grid tables in the dismod database. These are things like WeightGrid and SmoothGrid.
-
cascade_at.dismod.api.fill_extract_helpers.grid_tables.
construct_model_tables
(model, location_df, age_df, time_df, covariate_df)[source]¶ Main function that loops through the items from a model object, which include rate, random_effect, alpha, beta, and gamma and constructs the modeling tables in dismod db.
Each of these are “grid” vars, so they need entries in prior, smooth, and smooth_grid. This function returns those tables.
It also constructs the rate, integrand, and mulcov tables (alpha, beta, gamma), plus nslist and nslist_pair tables.
- Parameters
model (
Model
) – A model object that has rate informationlocation_df (
DataFrame
) – A location / node data frameage_df (
DataFrame
) – An age data frame for dismodtime_df (
DataFrame
) – A time data frame for dismodcovariate_df (
DataFrame
) – A covariate data frame for dismod
- Returns
rate, prior, smooth, smooth_grid, mulcov, nslist, nslist_pair, and subgroup tables
- Return type
A dictionary of data frames for each table name, includes
-
cascade_at.dismod.api.fill_extract_helpers.grid_tables.
construct_weight_grid_tables
(weights, age_df, time_df)[source]¶ Constructs the weight and weight_grid tables.”
- Parameters
weights (
Dict
[str
,Var
]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.age_df – Age data frame from dismod db
time_df – Time data frame from dismod db
- Returns
- Return type
Tuple of the weight table and the weight grid table
Helper Functions¶
Posterior to Prior¶
When we do “posterior to prior” that means to take
the fit from a parent database and use the rate posteriors as the prior
for the child fits. This happens in
DismodFiller
when it builds the two-level model
with Alchemy
because it replaces the default
priors with the ones passed in.
The posterior is passed down by predicting the parent model on the rate grid for the children. To construct the rate grid, we use the following function:
-
cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.
get_prior_avgint_grid
(grids, sexes, locations, midpoint=False)[source]¶ Get a data frame to use for setting up posterior predictions on a grid. The grids are specified in the grids parameter.
Will still need to have covariates added to it, and prep data from dismod.api.data_tables.prep_data_avgint to convert nodes and covariate names before it can be input into the avgint table in a database.
- Parameters
grids (
Dict
[str
,Dict
[str
,ndarray
]]) – A dictionary of grids with keys for each integrand, which are dictionaries for “age” and “time”.sexes (
List
[int
]) – A list of sexeslocations (
List
[int
]) – A list of locationsmidpoint (
bool
) – Whether to midpoint the grid lower and upper values (recommended for rates).
- Returns
“avgint_id”, “integrand_id”, “location_id”, “weight_id”, “subgroup_id”, “age_lower”, “age_upper”, “time_lower”, “time_upper”, “sex_id”
- Return type
Dataframe with columns
And then to upload those priors from the rate grid to the IHME databases since the IHME databases require standard GBD ages and times, we use this function. This is just for visualization purposes:
-
cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.
format_rate_grid_for_ihme
(rates, gbd_round_id, location_id, sex_id)[source]¶ Formats a grid of mean, upper, and lower for a prior rate for the IHME database. Only does this for Gaussian priors.
- Parameters
rates (
Dict
[str
,SmoothGrid
]) – A dictionary of SmoothGrids, keyed by primary rates like “iota”gbd_round_id (
int
) – the GBD roundlocation_id (
int
) – the location ID to append to this data framesex_id (
int
) – the sex ID to append to this data frame
- Returns
- Return type
A data frame formatted for the IHME databases
Multithreading¶
When we want to do multithreading on a dismod
database, we can define some process
that works, for example, on only
a subset of a database’s data or samples, etc.
In order to do this work, there is a base class
here that is subclassed in
sample
and
Predict
since there are tasks that can be done in parallel
on one database.