Cascade-AT¶
Cascade-AT estimates incidence, prevalence, and mortality rates for a single disease for every country or district in the world from the sum of all available disease and death observations in all countries.
The main function of this page is to document the API, and to explain some of the Dismod-AT concepts.
This documentation is meant for developers, not modelers. If you are a DisMod-AT modeler, please see the internal documentation here.
Installation¶
Installation of Cascade-AT¶
Cascade-AT interacts with Dismod-AT underneath. Cascade-AT runs Dismod-AT within the IHME infrastructure. Clone it from Cascade-AT on Github.
We recommend you create a conda environment into which to install the code. Then clone the repository and run the tests.:
git clone https://github.com/ihmeuw/cascade-at.git
cd cascade
pip install . [ihme, docs]
python setup.py develop
cd tests && pytest
NOTE: [ihme, docs] are optional arguments.
NOTE: The above code is intended for installing on the cluster. When working on a local machine, the correct step is to replace ‘python setup.py develop’ with python setup.py install
For instructions on how to install all of the IHME dependencies, see the internal documentation here.
For instructions on how to install dismod_at
, see Brad Bell’s documentation
here.
Module Documentation¶
This is the documentation for each of the modules
in the cascade-at
package. Here is an overview for how
all of the modules fit together:

Defining and Sequencing the Work¶
Cascade-AT performs work by calling scripts that are included as entry points to the package.
Scripts as Building Blocks¶
The work in a Cascade-AT model is sequenced like building blocks. Here is how the work starts out, moving from micro to macro.
The following submodules contain classes and functions for constructing a job graph that runs Dismod-AT. The smallest is a cascade operation, which defines one executable task. These can be stacked together into sequences (stacks), and then recursively put into a tree structure (dags). The cascade commands are wrappers around the dags.
To see documentation for the current “traditional cascade” that is implemented,
see TraditionalCascade
.
Cascade Operations¶
-
class
cascade_at.cascade.cascade_operations.
_CascadeOperation
(upstream_commands=None, executor_parameters=None)[source]¶ Bases:
object
The base class for a cascade operation.
- Parameters
upstream_commands (
Optional
[List
[str
]]) – A list of commands that are upstream to this operation. This means that it will be run before this operation.executor_parameters (
Optional
[Dict
[str
,Any
]]) – Optional dictionary of execution parameters that updates the execution parametersDEFAULT_EXECUTOR_PARAMETERS
-
_make_template_kwargs
(**kwargs)[source]¶ Takes kwargs like model_version_id=0 and turns it into kwargs dict that looks like {‘model_version_id’: –model-version-id 0}.
For boolean args, it will look like {‘do_this’: ‘–do-this’}. And for arguments from self.arg_list that have defaults, it will fill in the default value if it is not passed in the kwargs (unless it’s None).
Used for converting things into Jobmon TaskTemplates.
- Parameters
kwargs – Keyword arguments
- Return type
Dict
[str
,str
]- Returns
Dictionary of keyword arguments similar to what was passed but with
values that have been converted to what the
TaskTemplate
in Jobmon expects. Alsofilling in default arguments that are not passed but are listed in
the ArgumentList for self.
Cascade Operation Sequences¶
Cascade Operation Stacking Functions¶
These functions make sequences of _CascadeOperation and the appropriate upstream dependencies. They can then be used together to create a _CascadeCommand.
-
cascade_at.cascade.cascade_stacks.
single_fit
(model_version_id, location_id, sex_id)[source]¶ Create a sequence of tasks to do a single fit both model. Configures inputs, does a fit fixed, then fit both, then predict and uploads the result. Will fit the model based on the settings attached to the model version ID.
- Parameters
model_version_id (
int
) – The model version ID.location_id (
int
) – The parent location ID to run the model for.sex_id (
int
) – The sex ID to run the model for.
- Returns
- Return type
List of CascadeOperations.
-
cascade_at.cascade.cascade_stacks.
single_fit_with_uncertainty
(model_version_id, location_id, sex_id, n_sim=100, n_pool=20, skip_configure=False, ode_fit_strategy=True)[source]¶ Create a sequence of tasks to do a single fit both model. Configures inputs, does a fit fixed, then fit both, then predict and uploads the result. Will fit the model based on the settings attached to the model version ID.
- Parameters
model_version_id (
int
) – The model version ID.location_id (
int
) – The parent location ID to run the model for.sex_id (
int
) – The sex ID to run the model for.n_sim (
int
) – The number of simulations to do, number of draws to maken_pool (
int
) – The number of multiprocessing pools to use in creating the draws
- Returns
- Return type
List of CascadeOperations.
-
cascade_at.cascade.cascade_stacks.
root_fit
(model_version_id, location_id, sex_id, child_locations, child_sexes, skip_configure=False, mulcov_stats=True, n_sim=100, n_pool=20, ode_fit_strategy=True)[source]¶ Create a sequence of tasks to do a top-level prior fit. Does a fit fixed, then fit both, then creates posteriors that can be used as priors later on. Saves its fit to be uploaded.
- Parameters
model_version_id (
int
) – The model version ID.location_id (
int
) – The parent location ID to run the model for.sex_id (
int
) – The sex ID to run the model for.child_locations (
List
[int
]) – The children to fill the avgint table withchild_sexes (
List
[int
]) – The sexes to predict for.skip_configure (
bool
) – Don’t run a task to configure the inputs. Only do this if it has already happened. This disables building the inputs.p and setting.json files.mulcov_stats (
bool
) – Compute mulcov statistics at this leveln_sim (
int
) –n_pool (
int
) –
- Returns
- Return type
List of CascadeOperations.
-
cascade_at.cascade.cascade_stacks.
branch_fit
(model_version_id, location_id, sex_id, prior_parent, prior_sex, child_locations, child_sexes, upstream_commands=None, n_sim=100, n_pool=20, ode_fit_strategy=False)[source]¶ Create a sequence of tasks to do a cascade fit (mid-level). Does a fit fixed, then fit both, predicts on the prior rate grid to create posteriors that can be used as priors later on. Saves its fit to be uploaded.
- Parameters
model_version_id (
int
) – The model version ID.location_id (
int
) – The parent location ID to run the model for.sex_id (
int
) – The sex ID to run the model for.prior_parent (
int
) – The location ID corresponding to a database to pull the prior fromprior_sex (
int
) – The sex ID corresponding to a database to pull the prior fromchild_locations (
List
[int
]) – The children to fill the avgint table withchild_sexes (
List
[int
]) – The sexes to predict for.upstream_commands (
Optional
[List
[str
]]) – Commands that need to be run before this stack.
- Returns
- Return type
List of CascadeOperations.
-
cascade_at.cascade.cascade_stacks.
leaf_fit
(model_version_id, location_id, sex_id, prior_parent, prior_sex, n_sim=100, n_pool=20, upstream_commands=None, ode_fit_strategy=False)[source]¶ Create a sequence of tasks to do a for a leaf-node fit, no children. Does a fit fixed then sample simulate to create posteriors. Saves its fit to be uploaded.
- Parameters
model_version_id (
int
) – The model version ID.location_id (
int
) – The parent location ID to run the model for.sex_id (
int
) – The sex ID to run the model for.prior_parent (
int
) – The location ID corresponding to a database to pull the prior fromprior_sex (
int
) – The sex ID corresponding to a database to pull the prior fromn_sim (
int
) – The number of simulations to do to get the posterior fit.n_pool (
int
) – The number of pools to use to do the simulation fits.upstream_commands (
Optional
[List
[str
]]) – Commands that need to be run before this stack.
- Returns
- Return type
List of CascadeOperations.
Cascade Job Graphs¶
-
cascade_at.cascade.cascade_dags.
branch_or_leaf
(dag, location_id, sex, model_version_id, parent_location, parent_sex, n_sim, n_pool, upstream, tasks)[source]¶ Recursive function that either creates a branch (by calling itself) or a leaf fit depending on whether or not it is at a terminal node. Determines if it’s at a terminal node using the dag.successors() method from networkx. Appends tasks onto the tasks parameter.
-
cascade_at.cascade.cascade_dags.
make_cascade_dag
(model_version_id, dag, location_start, sex_start, split_sex, n_sim=100, n_pool=100, skip_configure=False)[source]¶ Make a traditional cascade dag for a model version. Relies on a location DAG and a starting point in the DAG for locations and sexes.
- Parameters
model_version_id (
int
) – Model version IDdag (
LocationDAG
) – A location DAG that specifies the location hierarchylocation_start (
int
) – Where to start in the location hierarchysex_start (
int
) – Which sex to start with, can be most detailed or both.split_sex (
bool
) – Whether or not to split sex into most detailed. If not, then will just stay at ‘both’ sex.n_sim (
int
) – Number of simulations to do in sample simulaten_pool (
int
) – Number of multiprocessing pools to create during sample simulateskip_configure (
bool
) – Don’t configure inputs. Only do this if it’s already been done.
- Returns
- Return type
List of _CascadeOperation.
Cascade Commands¶
Cascade Commands¶
Sequences of cascade operations that work together to create a cascade command that will run the whole cascade (or a drill – which is a version of the cascade).
-
class
cascade_at.cascade.cascade_commands.
_CascadeCommand
[source]¶ Bases:
object
Initializes a task dictionary. All tasks added to this command in the form of cascade operations are added to the dictionary.
-
self.
task_dict
¶ A dictionary of cascade operations, keyed by the command for that operation. This is so that we can look up the task later by the exact command.
-
add_task
(cascade_operation)[source]¶ Adds a cascade operation to the task dictionary.
- Parameters
cascade_operation (
_CascadeOperation
) – A cascade operation to add to the command dictionary- Return type
None
-
-
class
cascade_at.cascade.cascade_commands.
Drill
(model_version_id, drill_parent_location_id, drill_sex, n_sim, n_pool=10, skip_configure=False)[source]¶ Bases:
cascade_at.cascade.cascade_commands._CascadeCommand
A cascade command that runs a drill model, meaning that it runs one Dismod-AT model with a parent plus its children.
- Parameters
model_version_id (
int
) – The model version ID to create the drill fordrill_parent_location_id (
int
) – The parent location ID to start the drill fromdrill_sex (
int
) – Which sex to drill forn_sim (
int
) – The number of simulations to do to get uncertainty at the leaf nodesn_pool (
int
) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.
-
class
cascade_at.cascade.cascade_commands.
TraditionalCascade
(model_version_id, split_sex, dag, n_sim, n_pool=10, location_start=None, sex=None, skip_configure=False)[source]¶ Bases:
cascade_at.cascade.cascade_commands._CascadeCommand
Runs the “traditional” dismod cascade. The traditional cascade as implemented here runs fit fixed all the way to the leaf nodes of the cascade to save time (rather than fit both). To get posterior to prior it uses the coefficient of variation to get the variance of the posterior that becomes the prior at the next level. At the leaf nodes to get final posteriors, it does sample asymptotic. If sample asymptotic fails due to bad constraints it does sample simulate instead.
- Parameters
model_version_id (
int
) – The model version IDsplit_sex (
bool
) – Whether or not to split sexdag (
LocationDAG
) – A location dag that specifies the structure of the cascade hierarchyn_sim (
int
) – The number of simulations to do to get uncertainty at the leaf nodesn_pool (
int
) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.location_start (
Optional
[int
]) – Which location to start the cascade from (typically 1 = Global)sex (
Optional
[int
]) – Which sex to run the cascade for (if it’s 3 = Both, then it will split sex, if it’s 1 or 2, then it will only run it for that sex.skip_configure (
bool
) – Use this option to skip the initial inputs pulling; should only be used in debugging cases by developers.
Each of these is described briefly below.
Scripts: All potential work starts out as a script.
Cascade Operations: We start with building wrappers around the scripts, and we call these wrappers cascade operations. These wrappers are helpful because they define the command-line string that will be executed in order to perform the work by calling the script with particular arguments, the name of the job if it is submitted through a qsub, etc. They also directly interface with
jobmon
, an IHME package that submits and tracks parallel jobs.Cascade Operation Sequences: There are some sequences of work that often go together, for example like running a fit fixed, then a sample, then a predict. These types of sequences are called stacks, because they are “stacks” of cascade operations.
Cascade Job Graphs: Once we take many sequences and form them into a tree-like structure that traverses a location hierarchy, that’s called a DAG or a job graph. The structure of this DAG is based off of an IHME location hierarchy, and it defines the work for the entire cascade. The DAGs module provides functions to, for example, recursively create stacks going down a tree.
Cascade Commands: This is the most “macro” type of work. You say, “I want to do a cascade” or “I want to do a drill” by creating a cascade command, and then it works its way through DAGs –> Stacks –> Operations –> Scripts to define all of the work, with arguments based off of the model version ID’s settings that you pass to the cascade command.
Arguments¶
Each of the scripts takes some arguments that are pre-defined using the tools documented in Argument Parsing.
Argument Parsing¶
Each of the scripts from Defining and Sequencing the Work uses argument utilities that are
described here. Arguments are single command like args, that would be passed
in something like --do-this-thing
or --location-id 101
as flags.
We use the argparse
package to interpret these arguments and to define
which arguments are allowed for which scripts.
Arguments are building blocks for argument lists. Each script has an argument list that defines the arguments that can be passed to it that’s included at the top of the script.
Arguments¶
There are general arguments and specific arguments that we define here so we don’t have to use them over and over.
-
exception
cascade_at.executor.args.args.
CascadeArgError
[source]¶ Bases:
cascade_at.core.CascadeATError
-
class
cascade_at.executor.args.args.
IntArg
(*args, **kwargs)[source]¶ Bases:
cascade_at.executor.args.args._Argument
An integer argument.
-
class
cascade_at.executor.args.args.
FloatArg
(*args, **kwargs)[source]¶ Bases:
cascade_at.executor.args.args._Argument
A float argument.
-
class
cascade_at.executor.args.args.
StrArg
(*args, **kwargs)[source]¶ Bases:
cascade_at.executor.args.args._Argument
A string argument.
-
class
cascade_at.executor.args.args.
BoolArg
(*args, **kwargs)[source]¶ Bases:
cascade_at.executor.args.args._Argument
A boolean argument.
-
class
cascade_at.executor.args.args.
ListArg
(*args, **kwargs)[source]¶ Bases:
cascade_at.executor.args.args._Argument
A list argument. Passed in as an
nargs +
type of argument toargparse
.
-
class
cascade_at.executor.args.args.
ModelVersionID
[source]¶ Bases:
cascade_at.executor.args.args.IntArg
The Model Version ID argument is the only task argument, meaning an argument that makes the commands that it is used in unique across workflows.
-
class
cascade_at.executor.args.args.
ParentLocationID
[source]¶ Bases:
cascade_at.executor.args.args.IntArg
A parent location ID argument.
-
class
cascade_at.executor.args.args.
SexID
[source]¶ Bases:
cascade_at.executor.args.args.IntArg
A sex ID argument.
-
class
cascade_at.executor.args.args.
DmCommands
[source]¶ Bases:
cascade_at.executor.args.args.ListArg
A dismod commands argument, based off of the list argument.
-
class
cascade_at.executor.args.args.
DmOptions
[source]¶ Bases:
cascade_at.executor.args.args.ListArg
A dismod options argument, based off of the list argument. Arguments need to be passed in as a list, but then look like
KEY=VALUE=TYPE
. So, if you wanted the options to look like this{'kind': 'random'}
, you would pass on the command-linekind=random=str
.
-
class
cascade_at.executor.args.args.
NSim
[source]¶ Bases:
cascade_at.executor.args.args.IntArg
Number of simulations argument. Defaults to 1.
-
class
cascade_at.executor.args.args.
NPool
[source]¶ Bases:
cascade_at.executor.args.args.IntArg
Number of threads for a multiprocessing pool argument, defaults to 1, which is no multiprocessing.
-
class
cascade_at.executor.args.args.
LogLevel
[source]¶ Bases:
cascade_at.executor.args.args.StrArg
Logging level argument. Defaults to “info”.
Argument List¶
Argument lists are made up of arguments, and are defined at the top of each of the Defining and Sequencing the Work scripts. The reason that they’re helpful is because we can then use those lists to parse command line arguments and at the same time use them to validate arguments in Cascade Operations. This makes building new cascade operations much less error-prone. It also has a method to convert an argument list into a task template command for Utilizing Jobmon.
Argument Encoding¶
When we are defining arguments to an operation, we don’t want to write as if we were writing something on the command line, especially with things like dictionaries and lists of dismod database commands.
The following functions are helpful for encoding and decoding dismod option dictionaries to be used with the dismod database and dismod commands to run on a dismod database.
-
cascade_at.executor.args.arg_utils.
encode_options
(options)[source]¶ Encode an option dict into a command line string that cascade_at can understand.
- Returns
- Return type
List of strings that can be passed to the command line..
-
cascade_at.executor.args.arg_utils.
parse_options
(option_list)[source]¶ Parse a key=value=type command line arg that comes in a list.
- Returns
- Return type
Dictionary of options with the correct types.
Jobmon¶
The submitting and tracking of the distributed jobs to do a cascade is done
by the IHME package jobmon
. Cascade Operations are roughly jobmon
tasks and Cascade Commands are roughly jobmon workflows.
We have to convert between cascade operations and tasks and cascade commands and workflows. Helper functions to do these conversions are documented in Utilizing Jobmon.
Jobmon uses information from cascade operations and cascade commands to interface directly with the IHME cluster and the Jobmon databases. See Utilizing Jobmon.
Utilizing Jobmon¶
Unfortunately, we can’t document these functions because jobmon
is not yet open source and
the sphinx-autodoc
extension won’t work. To be continued once it’s released… but for now
please see the source code directly here.
Jobmon Workflows¶
At the highest level, we need to make a workflow from a
Cascade Commands.
This utilizes the Jobmon Guppy version, which allows us to create
“task templates”. In the Guppy terminology, a Cascade-AT workflow is considered
to come from a dismod-at
“tool”.
Resources¶
Using jobmon requires some knowledge of the amount of cluster resources that a job
will use. Right now, there is no resource prediction algorithm implemented in Cascade-AT.
The base resources are the same for all jobs, and then some are increased or decreased
depending on the specific task, as options
passed to _CascadeOperation
.
Entry Points¶
Each of these scripts takes arguments, defined at the top of the scripts. Here we list the different types of work that are done, and in each section are three things:
The main function in the script, with documentation
The cascade operation associated with that script
They are listed in the order that they typically occur to run a Cascade-AT model from start to finish, with the exception of Run a Cascade-AT Model, which is how all of this work is kicked off in the first place.
Run a Cascade-AT Model¶
Run a Cascade-AT model from start to finish using the run cascade function. All of the tasks that it constructs can be found in each of the scripts linked to in Defining and Sequencing the Work.
-
cascade_at.executor.run.
run
(model_version_id, jobmon=True, make=True, n_sim=10, n_pool=10, addl_workflow_args=None, skip_configure=False, json_file=None, test_dir=None, execute_dag=True)[source]¶ Runs the whole cascade or drill for a model version (whichever one is specified in the model version settings).
Creates a cascade command and a bunch of cascade operations based on the model version settings. More information on this structure is in Defining and Sequencing the Work.
- Parameters
model_version_id (
int
) – The model version to runjobmon (
bool
) – Whether or not to use Jobmon. If not using Jobmon, executes the commands in sequence in this session.make (
bool
) – Whether or not to make the directory structure for the databases, inputs, and outputs.n_sim (
int
) – Number of simulations to do going down the cascadeaddl_workflow_args (
Optional
[str
]) – Additional workflow args to add to the jobmon workflow name so that it is unique if you’re testingskip_configure (
bool
) – Skip configuring the inputs because
- Return type
None
Configure Inputs¶
Configure inputs for a Cascade-AT model.
Inputs Script¶
-
cascade_at.executor.configure_inputs.
configure_inputs
(model_version_id, make, configure, test_dir=None, json_file=None)[source]¶ Grabs the inputs for a specific model version ID, sets up the folder structure, and pickles the inputs object plus writes the settings json for use later on. Also uploads CSMR to the database attached to the model version, if applicable.
Optionally use a json file for settings instead of a model version ID’s json file.
- Parameters
model_version_id (
int
) – The model version ID to configure inputs formake (
bool
) – Whether or not to make the directory structure for the model version IDconfigure (
bool
) – Configure the application for the IHME cluster, otherwise will use the test_dir for the directory tree instead.test_dir (
Optional
[str
]) – A test directory to use rather than the directory specified by the model version context in the IHME file system.json_file (
Optional
[str
]) – An optional filepath pointing to a different json than is attached to the model_version_id. Will use this instead for settings.
- Return type
None
Inputs Cascade Operation¶
Dismod Database Creation and Commands¶
When we want to fill a dismod database with some data for a model, and then run some commands on it, this is the script that we use.
We fill and extract dismod databases using Fill and Extract Helpers classes and functions. Then the databases are filled according to their settings and the arguments passed to these scripts, like whether to override the prior in the settings with a parent prior (this is called “posterior to prior”) or whether to add a covariate multiplier prior.
Dismod Database Script¶
-
cascade_at.executor.dismod_db.
dismod_db
(model_version_id, parent_location_id, sex_id=None, dm_commands=[], dm_options={}, prior_samples=False, prior_parent=None, prior_sex=None, prior_mulcov_model_version_id=None, test_dir=None, fill=False, save_fit=True, save_prior=True)[source]¶ Creates a dismod database using the saved inputs and the file structure specified in the context. Alternatively it will skip the filling stage and move straight to the command stage if you don’t pass –fill.
Then runs an optional set of commands on the database passed in the –commands argument.
Also passes an optional argument –options as a dictionary to the dismod database to fill/modify the options table.
- Parameters
model_version_id (
int
) – The model version IDparent_location_id (
int
) – The parent location for the databasesex_id (
Optional
[int
]) – The parent sex for the databasedm_commands (
List
[str
]) – A list of commands to pass to the run_dismod_commands function, executed directly on the dismod databasedm_options (
Dict
[str
,Union
[int
,float
,str
]]) – A dictionary of options to pass to the the dismod option tableprior_samples (
bool
) – Whether the prior was derived from samples or notprior_mulcov_model_version_id (
Optional
[int
]) – The model version ID to use for pulling covariate multiplier statistics as priors for this fitprior_parent (
Optional
[int
]) – An optional parent location ID that specifies where to pull the prior information from.prior_sex (
Optional
[int
]) – An optional parent sex ID that specifies where to pull the prior information from.test_dir (
Optional
[str
]) – A test directory to create the database in rather than the database specified by the IHME file system context.fill (
bool
) – Whether or not to fill the database with new inputs based on the model_version_id, parent_location_id, and sex_id. If not filling, this script can be used to just execute commands on the database instead.save_fit (
bool
) – Whether or not to save the fit from this database as the parent fit.save_prior (
bool
) – Whether or not to save the prior for the parent as the parent’s prior.
- Return type
None
-
cascade_at.executor.dismod_db.
save_predictions
(db_file, model_version_id, gbd_round_id, out_dir, locations=None, sexes=None, sample=False, predictions=None)[source]¶ Save the fit from this dismod database for a specific location and sex to be uploaded later on.
- Return type
None
-
cascade_at.executor.dismod_db.
fill_database
(path, settings, inputs, alchemy, parent_location_id, sex_id, child_prior, mulcov_prior, options)[source]¶ Fill a DisMod database at the specified path with the inputs, model, and settings specified, for a specific parent and sex ID, with options to override the priors.
- Return type
-
cascade_at.executor.dismod_db.
get_mulcov_priors
(model_version_id)[source]¶ Read in covariate multiplier statistics from a specific model version ID and returns a dictionary with a prior object for that covariate multiplier type, covariate name, and rate or integrand.
- Parameters
model_version_id (
int
) – The model version ID to pull covariate multiplier statistics from- Return type
Dict
[Tuple
[str
,str
,str
],_Prior
]
Dismod Database Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
_DismodDB
(model_version_id, parent_location_id, sex_id, fill, prior_samples=False, prior_mulcov=False, prior_parent=None, prior_sex=None, dm_options=None, dm_commands=None, save_prior=False, save_fit=False, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
Base class for creating an operation that interfaces with the dismod database.
- Parameters
model_version_id (
int
) – The model version to run the model for.parent_location_id (
int
) – The parent location for this dismod database.sex_id (
int
) – The reference sex for this dismod database.fill (
bool
) – Whether or not to fill this database with new data base on the cached inputs or this model version.prior_samples (
bool
) – Whether or not the prior came from samples or just a mean fitprior_mulcov (
bool
) – The model version ID where the covariate multiplier statistics are saved. If this is included, then it will add a prior for the covariate multiplier(s) associated with this model version ID.prior_parent (
Optional
[int
]) – The location ID of the parent database to grab the prior for.prior_sex (
Optional
[int
]) – The sex ID of the parent database to grab the prior for.dm_options (
Optional
[Dict
[str
,Union
[int
,float
,str
]]]) – Additional options to pass to the dismod database, outside of those that would be passed based on the model settings.dm_commands (
Optional
[List
[str
]]) – Commands to run on the dismod database.save_prior (
bool
) – Whether or not to save the prior as the prior for this parent location.save_fit (
bool
) – Whether or not to save the fit as the fit for this parent location.kwargs –
-
class
cascade_at.cascade.cascade_operations.
Fit
(model_version_id, parent_location_id, sex_id, predict=True, fill=True, both=False, save_fit=False, save_prior=False, ode_fit_strategy=False, ode_init=False, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._DismodDB
Perform a fit on the dismod database for this model version ID, parent location, and sex ID. (See undocumented arguments in
_DismodDB
.- Parameters
model_version_id (
int
) –parent_location_id (
int
) –sex_id (
int
) –predict (
bool
) – Whether or not to run a predict on this database. Will predict for the avgint table that is based on the IHME-GBD demographics grid.fill (
bool
) –both (
bool
) – Whether or not to run a fit both (True) or a fit fixed only (False)save_fit (
bool
) –save_prior (
bool
) –kwargs –
Create Samples of Variables¶
After we’ve run a fit on a database, then we can make posterior samples of the variables.
Sample Script¶
-
cascade_at.executor.sample.
simulate
(path, n_sim)[source]¶ Simulate from a database, within a database.
- Parameters
path (
Union
[str
,Path
]) – A path to the database object to create simulations in.n_sim (
int
) – Number of simulations to create.
-
class
cascade_at.executor.sample.
FitSample
(fit_type, **kwargs)[source]¶ Bases:
cascade_at.dismod.api.multithreading._DismodThread
Fit Sample for a database in parallel. Copies the sample table and fits for just one sample index. Will use the __call__ method from _DismodThread.
- Parameters
main_db – Path to the main database to sample from.
index_file_pattern – File pattern to create the index databases with different samples.
fit_type (
str
) – The type of fit to run, one of “fixed” or “both”.
-
cascade_at.executor.sample.
sample_simulate_pool
(main_db, index_file_pattern, fit_type, n_sim, n_pool)[source]¶ Fit the samples in a database in parallel by making copies of the database, fitting them separately, and then combining them back together in the sample table of main_db.
- Parameters
main_db (
Union
[str
,Path
]) – Path to the main database that will be spawned.index_file_pattern (
str
) – File pattern for the new databases that will have index equal to the simulation number.fit_type (
str
) – The type of fit to run, one of “fixed” or “both”.n_sim (
int
) – Number of simulations that will be fit.n_pool (
int
) – Number of pools for the multiprocessing.
-
cascade_at.executor.sample.
sample_simulate_sequence
(path, n_sim, fit_type)[source]¶ Fit the samples in a database in sequence.
- Parameters
path (
Union
[str
,Path
]) – A path to the database object to create simulations in.n_sim (
int
) – Number of simulations to create.fit_type (
str
) – Type of fit – fixed or both
-
cascade_at.executor.sample.
sample
(model_version_id, parent_location_id, sex_id, n_sim, n_pool, fit_type, asymptotic=False)[source]¶ Creates variable samples from a dismod database that has already had a fit run on it. Does so optionally in parallel. Defaults to doing stochastic samples (this is like the parametric bootstrap). If you want asymptotic samples, it will try to do that but if it fails, it will do stochastic samples instead.
- Parameters
model_version_id (
int
) – The model version IDparent_location_id (
int
) – The parent location ID specifying location of databasesex_id (
int
) – The sex ID specifying location of databasen_sim (
int
) – The number of simulations to don_pool (
int
) – The number of multiprocessing pools to create. If 1, then will not run with pools but just run all simulations together in one dmdismod command.fit_type (
str
) – The type of fit that was performed on this database, one of fixed or both.asymptotic (
bool
) – Whether or not to do asymptotic samples or fit-refit
- Return type
None
Sample Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
Sample
(model_version_id, parent_location_id, sex_id, n_sim, fit_type, asymptotic, n_pool=1, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
Create posterior samples from a dismod database that has already had a fit run on it. This may be done in parallel with a multiprocessing pool. The samples can either be asymptotic (sampling from a multivariate normal distribution) or stochastic simulations. If you choose to sample asymptotic, and it fails (it may fail because of issues with the constraints), then it will automatically do a sample simulate.
- Parameters
model_version_id (
int
) – The model version IDparent_location_id (
int
) – The parent location IDsex_id (
int
) – The reference sex ID for the databasen_sim (
int
) – The number of posterior samples to createfit_type (
str
) – The original fit type for this database. Should be either ‘fixed’ or ‘both’ (could also be ‘random’ but we don’t use that).asymptotic (
bool
) – Whether or not to do asymptotic samples or simulation-based samples.n_pool (
int
) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.kwargs –
Compute Covariate Multiplier Statistics¶
Mulcov Statistics Script¶
(Note: mulcov is a short name for “covariate multiplier”)
Once we’ve done a sample on a database to get posteriors, we can compute statistics of the covariate multipliers.
This is useful because we often like to use covariate multiplier statistics at one level of the cascade as a prior for the covariate multiplier estimation in another level of the cascade.
-
cascade_at.executor.mulcov_statistics.
get_mulcovs
(dbs, covs, table='fit_var')[source]¶ Get mulcov values from all of the dbs, with all of the common covariates.
Parameters dbs
A list of dismod i/o objects
- covs
A list of covariate names
- table
Name of the table to pull from (can be fit_var or sample)
- Return type
DataFrame
-
cascade_at.executor.mulcov_statistics.
compute_statistics
(df, mean=True, std=True, quantile=None)[source]¶ Compute statistics on a data frame with covariate multipliers. :param df: pd.DataFrame :param mean: bool :param std: bool :param quantile: optional list
Returns: dictionary with requested statistics
-
cascade_at.executor.mulcov_statistics.
mulcov_statistics
(model_version_id, locations, sexes, outfile_name, sample=True, mean=True, std=True, quantile=None)[source]¶ Compute statistics for the covariate multipliers on a dismod database, and save them to a file.
- Parameters
model_version_id (
int
) – The model version IDlocations (
List
[int
]) – A list of locations that, when used in combination with sexes, point to the databases to pull covariate multiplier estimates fromsexes (
List
[int
]) – A list of sexes that, when used in combination with locations, point to the databases to pull covariate multiplier estimates fromoutfile_name (
str
) – A filepath specifying where to save the covariate multiplier statistics.sample (
bool
) – Whether or not the results are stored in the sample table or the fit_var table.mean (
bool
) – Whether or not to compute the meanstd (
bool
) – Whether or not to compute the standard deviationquantile (
Optional
[List
[float
]]) – An optional list of quantiles to compute
- Return type
None
Mulcov Statistics Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
MulcovStatistics
(model_version_id, locations, sexes, sample, mean, std, quantile, outfile_name=None, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
The base class for a cascade operation.
- Parameters
upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.
executor_parameters – Optional dictionary of execution parameters that updates the execution parameters
DEFAULT_EXECUTOR_PARAMETERS
Make Predictions of Integrands¶
Once we’ve fit to a database and/or made posterior samples, we can make predictions using the fit or sampled variables on the average integrand grid. This is how we make predictions for age groups and times on the IHME grid.
Predict Script¶
-
cascade_at.executor.predict.
fill_avgint_with_priors_grid
(inputs, alchemy, settings, source_db_path, child_locations, child_sexes)[source]¶ Fill the average integrand table with the grid that the priors are on. This is so that we can “predict” the prior for the next level of the cascade.
- Parameters
inputs (
MeasurementInputs
) – An inputs objectalchemy (
Alchemy
) – A grid alchemy objectsettings (
SettingsConfig
) – A settings configuration objectsource_db_path (
Union
[str
,Path
]) – The path of the source database that has had a fit on itchild_locations (
List
[int
]) – The child locations to predict forchild_sexes (
List
[int
]) – The child sexes to predict for
-
class
cascade_at.executor.predict.
Predict
(**kwargs)[source]¶ Bases:
cascade_at.dismod.api.multithreading._DismodThread
Predicts for a database in parallel. Chops up the sample table into a bunch of copies, each with only one sample.
-
cascade_at.executor.predict.
predict_sample_sequence
(path, table)[source]¶ Runs predict for either fit_var or sample, based on the table.
-
cascade_at.executor.predict.
predict_sample_pool
(main_db, index_file_pattern, n_sim, n_pool)[source]¶ Run predict sample in a pool by making copies of the existing database and splitting out the sample table into n_sim databases, running predict sample on each of them, and combining the results back into the main database.
-
cascade_at.executor.predict.
predict_sample
(model_version_id, parent_location_id, sex_id, child_locations, child_sexes, prior_grid=True, save_fit=False, save_final=False, sample=False, n_sim=1, n_pool=1)[source]¶ Takes a database that has already had a fit and simulate sample run on it, fills the avgint table for the child_locations and child_sexes you want to make predictions for, and then predicts on that grid. Makes predictions on the grid that is specified for the primary rates in the model, for the primary rates only.
- Parameters
model_version_id (
int
) – The model version IDparent_location_id (
int
) – The parent location ID that specifies where the database is storedsex_id (
int
) – The sex ID that specifies where the database is storedchild_locations (
List
[int
]) – The child locations to make predictions for on the rate gridchild_sexes (
List
[int
]) – The child sexes to make predictions for on the rate gridprior_grid (
bool
) – Whether or not to replace the default gbd-avgint grid with a prior grid for the rates.save_fit (
bool
) – Whether or not to save the fit for upload later.save_final (
bool
) – Whether or not to save the final for upload later.sample (
bool
) – Whether to predict from the sample table or the fit_var tablen_sim (
int
) – The number of simulations to predict forn_pool (
int
) – The number of multiprocessing pools to create. If 1, then will not run with pools but just run all simulations together in one dmdismod command.
- Return type
None
Predict Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
Predict
(model_version_id, parent_location_id, sex_id, child_locations=None, child_sexes=None, prior_grid=True, save_fit=False, save_final=False, sample=True, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
The base class for a cascade operation.
- Parameters
upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.
executor_parameters – Optional dictionary of execution parameters that updates the execution parameters
DEFAULT_EXECUTOR_PARAMETERS
Upload Results¶
After a Cascade-AT model has finished running, we can upload the results to the IHME epi database.
Upload Script¶
-
cascade_at.executor.upload.
upload_prior
(context, rh)[source]¶ Uploads the saved priors to the epi database in the table epi.model_prior..
- Parameters
rh (
ResultsHandler
) – a Results Handler objectcontext (
Context
) – A context object
- Return type
None
-
cascade_at.executor.upload.
upload_fit
(context, rh)[source]¶ Uploads the saved final results to a the epi database in the table epi.model_estimate_fit. . :type rh:
ResultsHandler
:param rh: a Results Handler object :type context:Context
:param context: A context object- Return type
None
-
cascade_at.executor.upload.
upload_final
(context, rh)[source]¶ Uploads the saved final results to a the epi database in the table epi.model_estimate_final.
- Parameters
rh (
ResultsHandler
) – a Results Handler objectcontext (
Context
) – A context object
- Return type
None
Upload Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
Upload
(model_version_id, final=False, fit=False, prior=False, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
The base class for a cascade operation.
- Parameters
upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.
executor_parameters – Optional dictionary of execution parameters that updates the execution parameters
DEFAULT_EXECUTOR_PARAMETERS
Clean Up Files¶
The cleanup script is used to delete unnecessary databases after we already have final results for a model.
Cleanup Script¶
Cleanup Cascade Operation¶
-
class
cascade_at.cascade.cascade_operations.
CleanUp
(model_version_id, **kwargs)[source]¶ Bases:
cascade_at.cascade.cascade_operations._CascadeOperation
The base class for a cascade operation.
- Parameters
upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.
executor_parameters – Optional dictionary of execution parameters that updates the execution parameters
DEFAULT_EXECUTOR_PARAMETERS
EpiViz-AT Settings¶
The EpiViz-AT Settings are the set of all choices a user makes in the EpiViz-AT user interface. This is how the interface sends those choices to the command-line EpiViz-AT.
The list of all possible settings is in
https://github.com/ihmeuw/cascade/blob/develop/src/cascade-at/input_data/configuration/form.py
where any setting with the word Dummy
is being ignored.
Any setting that is unset, meaning the user has used the close box to ensure it is greyed-out in the EpiViz-AT user interface, will be missing from the EpiViz-AT settings sent to the command-line program, and the program understands that it should use a default.
Settings Configuration¶
Helper Functions¶
-
cascade_at.settings.settings.
load_settings
(settings_json)[source]¶ Loads settings from a settings_json.
- Parameters
settings_json (
Dict
[str
,Any
]) – dictionary of settings
Examples
>>> from cascade_at.settings.base_case import BASE_CASE >>> settings = load_settings(BASE_CASE)
- Return type
-
cascade_at.settings.settings.
settings_json_from_model_version_id
(model_version_id, conn_def)[source]¶ Loads settings for a specific model version ID into a json.
- Parameters
model_version_id (
int
) – the model version IDconn_def (
str
) – the connection definition like ‘dismod-at-dev’
- Return type
Dict
[str
,any
]
-
cascade_at.settings.settings.
settings_from_model_version_id
(model_version_id, conn_def)[source]¶ Loads settings for a specific model version ID.
- Parameters
model_version_id (
int
) – the model version IDconn_def (
str
) – the connection definition like ‘dismod-at-dev’
Examples
>>> settings = settings_from_model_version_id(model_version_id=395837, >>> conn_def='dismod-at-dev')
- Return type
Settings Configuration Form¶
All available options from EpiViz-AT.
-
class
cascade_at.settings.settings_config.
SettingsConfig
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
The root Form of the whole settings inputs tree. This collects all settings from EpiViz-AT and adds default values when they are missing.
A representation of the configuration form we expect to receive from EpiViz. The hope is that this form will do as much validation and precondition checking as is feasible within the constraint that it must be able to validate a full EpiViz parameter document in significantly less than one second. This is because it will be used as part of a web service which gates EpiViz submissions and must return in near real time.
The Configuration class is the root of the form.
Example
>>> import json >>> input_data = json.loads(json_blob) >>> form = SettingsConfig(input_data) >>> errors = form.validate_and_normalize()
-
model
: cascade_at.settings.settings_config.Model = None¶
-
policies
: cascade_at.settings.settings_config.Policies = None¶
-
gbd_round_id
: cascade_at.core.form.fields.IntField = None¶
-
random_effect
: cascade_at.core.form.fields.FormList = None¶
-
rate
: cascade_at.core.form.fields.FormList = None¶
-
country_covariate
: cascade_at.core.form.fields.FormList = None¶
-
study_covariate
: cascade_at.core.form.fields.FormList = None¶
-
eta
: cascade_at.settings.settings_config.Eta = None¶
-
students_dof
: cascade_at.settings.settings_config.StudentsDOF = None¶
-
log_students_dof
: cascade_at.settings.settings_config.StudentsDOF = None¶
-
location_set_version_id
: cascade_at.core.form.fields.IntField = None¶
-
csmr_cod_output_version_id
: cascade_at.core.form.fields.IntField = None¶
-
csmr_mortality_output_version_id
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
min_cv
: cascade_at.core.form.fields.FormList = None¶
-
min_cv_by_rate
: cascade_at.core.form.fields.FormList = None¶
-
re_bound_location
: cascade_at.core.form.fields.FormList = None¶
-
derivative_test
: cascade_at.settings.settings_config.DerivativeTest = None¶
-
max_num_iter
: cascade_at.settings.settings_config.FixedRandomInt = None¶
-
print_level
: cascade_at.settings.settings_config.FixedRandomInt = None¶
-
accept_after_max_steps
: cascade_at.settings.settings_config.FixedRandomInt = None¶
-
tolerance
: cascade_at.settings.settings_config.FixedRandomFloat = None¶
-
data_cv_by_integrand
: cascade_at.core.form.fields.FormList = None¶
-
data_eta_by_integrand
: cascade_at.core.form.fields.FormList = None¶
-
data_density_by_integrand
: cascade_at.core.form.fields.FormList = None¶
-
config_version
: cascade_at.core.form.fields.StrField = None¶
-
-
class
cascade_at.settings.settings_config.
SmoothingPrior
(*args, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
Priors for smoothing.
-
prior_type
: cascade_at.core.form.fields.OptionField = None¶
-
age_lower
: cascade_at.core.form.fields.FloatField = None¶
-
age_upper
: cascade_at.core.form.fields.FloatField = None¶
-
time_lower
: cascade_at.core.form.fields.FloatField = None¶
-
time_upper
: cascade_at.core.form.fields.FloatField = None¶
-
born_lower
: cascade_at.core.form.fields.FloatField = None¶
-
born_upper
: cascade_at.core.form.fields.FloatField = None¶
-
density
: cascade_at.core.form.fields.OptionField = None¶
-
min
: cascade_at.core.form.fields.FloatField = None¶
-
mean
: cascade_at.core.form.fields.FloatField = None¶
-
max
: cascade_at.core.form.fields.FloatField = None¶
-
std
: cascade_at.core.form.fields.FloatField = None¶
-
nu
: cascade_at.core.form.fields.FloatField = None¶
-
eta
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
SmoothingPriorGroup
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
dage
: cascade_at.settings.settings_config.SmoothingPrior = None¶
-
dtime
: cascade_at.settings.settings_config.SmoothingPrior = None¶
-
value
: cascade_at.settings.settings_config.SmoothingPrior = None¶
-
-
class
cascade_at.settings.settings_config.
Smoothing
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
rate
: cascade_at.core.form.fields.OptionField = None¶
-
location
: cascade_at.core.form.fields.IntField = None¶
-
age_grid
: cascade_at.core.form.fields.StringListField = None¶
-
time_grid
: cascade_at.core.form.fields.StringListField = None¶
-
default
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
mulstd
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
detail
: cascade_at.core.form.fields.FormList = None¶
-
age_time_specific
: cascade_at.core.form.fields.IntField = None¶
-
custom_age_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
custom_time_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
-
class
cascade_at.settings.settings_config.
StudyCovariate
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
study_covariate_id
: cascade_at.core.form.fields.IntField = None¶
-
measure_id
: cascade_at.core.form.fields.StrField = None¶
-
mulcov_type
: cascade_at.core.form.fields.OptionField = None¶
-
transformation
: cascade_at.core.form.fields.IntField = None¶
-
age_time_specific
: cascade_at.core.form.fields.IntField = None¶
-
age_grid
: cascade_at.core.form.fields.StringListField = None¶
-
time_grid
: cascade_at.core.form.fields.StringListField = None¶
-
default
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
mulstd
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
detail
: cascade_at.core.form.fields.FormList = None¶
-
custom_age_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
custom_time_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
-
class
cascade_at.settings.settings_config.
CountryCovariate
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
country_covariate_id
: cascade_at.core.form.fields.IntField = None¶
-
measure_id
: cascade_at.core.form.fields.StrField = None¶
-
mulcov_type
: cascade_at.core.form.fields.OptionField = None¶
-
transformation
: cascade_at.core.form.fields.IntField = None¶
-
age_time_specific
: cascade_at.core.form.fields.IntField = None¶
-
age_grid
: cascade_at.core.form.fields.StringListField = None¶
-
time_grid
: cascade_at.core.form.fields.StringListField = None¶
-
default
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
mulstd
: cascade_at.settings.settings_config.SmoothingPriorGroup = None¶
-
detail
: cascade_at.settings.settings_config.SmoothingPrior = None¶
-
custom_age_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
custom_time_grid
: cascade_at.core.form.fields.Dummy = NO_VALUE¶
-
-
class
cascade_at.settings.settings_config.
Model
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
modelable_entity_id
: cascade_at.core.form.fields.IntField = None¶
-
decomp_step_id
: cascade_at.core.form.fields.IntField = None¶
-
model_version_id
: cascade_at.core.form.fields.IntField = None¶
-
random_seed
: cascade_at.core.form.fields.IntField = None¶
-
minimum_meas_cv
: cascade_at.core.form.fields.FloatField = None¶
-
add_csmr_cause
: cascade_at.core.form.fields.IntField = None¶
-
title
: cascade_at.core.form.fields.StrField = None¶
-
description
: cascade_at.core.form.fields.StrField = None¶
-
crosswalk_version_id
: cascade_at.core.form.fields.IntField = None¶
-
bundle_id
: cascade_at.core.form.fields.IntField = None¶
-
drill
: cascade_at.core.form.fields.OptionField = None¶
-
drill_location
: cascade_at.core.form.fields.IntField = None¶
-
drill_location_start
: cascade_at.core.form.fields.IntField = None¶
-
drill_location_end
: cascade_at.core.form.fields.NativeListField = None¶
-
drill_sex
: cascade_at.core.form.fields.OptionField = None¶
-
birth_prev
: cascade_at.core.form.fields.OptionField = None¶
-
default_age_grid
: cascade_at.core.form.fields.StringListField = None¶
-
default_time_grid
: cascade_at.core.form.fields.StringListField = None¶
-
constrain_omega
: cascade_at.core.form.fields.OptionField = None¶
-
exclude_data_for_param
: cascade_at.core.form.fields.ListField = None¶
-
ode_step_size
: cascade_at.core.form.fields.FloatField = None¶
-
addl_ode_stpes
: cascade_at.core.form.fields.StringListField = None¶
-
split_sex
: cascade_at.core.form.fields.OptionField = None¶
-
quasi_fixed
: cascade_at.core.form.fields.OptionField = None¶
-
zero_sum_random
: cascade_at.core.form.fields.ListField = None¶
-
bound_frac_fixed
: cascade_at.core.form.fields.FloatField = None¶
-
bound_random
: cascade_at.core.form.fields.FloatField = None¶
-
rate_case
: cascade_at.core.form.fields.StrField = None¶
-
data_density
: cascade_at.core.form.fields.StrField = None¶
-
relabel_incidence
: cascade_at.core.form.fields.IntField = None¶
-
midpoint_approximation
: cascade_at.core.form.fields.NativeListField = None¶
-
-
class
cascade_at.settings.settings_config.
Eta
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
priors
: cascade_at.core.form.fields.FloatField = None¶
-
data
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
DataCV
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
integrand_measure_id
: cascade_at.core.form.fields.IntField = None¶
-
value
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
MinCV
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
cascade_level_id
: cascade_at.core.form.fields.StrField = None¶
-
value
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
MinCVRate
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
cascade_level_id
: cascade_at.core.form.fields.StrField = None¶
-
rate_measure_id
: cascade_at.core.form.fields.StrField = None¶
-
value
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
DataEta
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
integrand_measure_id
: cascade_at.core.form.fields.IntField = None¶
-
value
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
DataDensity
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
value
: cascade_at.core.form.fields.StrField = None¶
-
integrand_measure_id
: cascade_at.core.form.fields.IntField = None¶
-
-
class
cascade_at.settings.settings_config.
StudentsDOF
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
priors
: cascade_at.core.form.fields.FloatField = None¶
-
data
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
DerivativeTest
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
fixed
: cascade_at.core.form.fields.OptionField = None¶
-
random
: cascade_at.core.form.fields.OptionField = None¶
-
-
class
cascade_at.settings.settings_config.
FixedRandomInt
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
fixed
: cascade_at.core.form.fields.IntField = None¶
-
random
: cascade_at.core.form.fields.IntField = None¶
-
-
class
cascade_at.settings.settings_config.
FixedRandomFloat
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
fixed
: cascade_at.core.form.fields.FloatField = None¶
-
random
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
RandomEffectBound
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
location
: cascade_at.core.form.fields.IntField = None¶
-
value
: cascade_at.core.form.fields.FloatField = None¶
-
-
class
cascade_at.settings.settings_config.
Policies
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ Bases:
cascade_at.core.form.abstract_form.Form
-
estimate_emr_from_prevalence
: cascade_at.core.form.fields.OptionField = None¶
-
use_weighted_age_group_midpoints
: cascade_at.core.form.fields.OptionField = None¶
-
number_of_fixed_effect_samples
: cascade_at.core.form.fields.IntField = None¶
-
with_hiv
: cascade_at.core.form.fields.BoolField = None¶
-
age_group_set_id
: cascade_at.core.form.fields.IntField = None¶
-
exclude_relative_risk
: cascade_at.core.form.fields.OptionField = None¶
-
meas_noise_effect
: cascade_at.core.form.fields.OptionField = None¶
-
limited_memory_max_history_fixed
: cascade_at.core.form.fields.IntField = None¶
-
gbd_round_id
: cascade_at.core.form.fields.IntField = None¶
-
Converting Settings¶
These functions are used to convert the settings that have missing data into dictionaries that are filled in with default values for things like data coefficient of variation, eta the log offset, etc.
-
cascade_at.settings.convert.
midpoint_list_from_settings
(settings)[source]¶ Takes the settings configuration for which integrands to midpoint which comes in as measure ID and translates that to integrand enums.
- Parameters
settings (
SettingsConfig
) – The settings configuration to convert from- Return type
List
[str
]
-
cascade_at.settings.convert.
measures_to_exclude_from_settings
(settings)[source]¶ Gets the measures to exclude from the data from the model settings configuration.
- Parameters
settings (
SettingsConfig
) – The settings configuration to convert from- Return type
List
[str
]
-
cascade_at.settings.convert.
data_eta_from_settings
(settings, default=nan)[source]¶ Gets the data eta from the settings Configuration. The default data eta is np.nan.
- Parameters
settings (
SettingsConfig
) – The settings configuration to convert fromdefault (
float
) – The default eta to use
- Return type
Dict
[str
,float
]
-
cascade_at.settings.convert.
density_from_settings
(settings, default='gaussian')[source]¶ Gets the density from the settings Configuration. The default density is “gaussian”.
- Parameters
settings (
SettingsConfig
) – The settings configuration to convert fromdefault (
str
) – The default data density to use
- Return type
Dict
[str
,str
]
-
cascade_at.settings.convert.
data_cv_from_settings
(settings, default=0.0)[source]¶ Gets the data min coefficient of variation from the settings Configuration
- Parameters
settings (
SettingsConfig
) – The settings configuration to convert fromdefault (
float
) – The default data coefficient of variation
- Return type
Dict
[str
,float
]
-
cascade_at.settings.convert.
min_cv_from_settings
(settings, default=0.0)[source]¶ Gets the minimum coefficient of variation by rate and level of the cascade from settings. First key is cascade level, second is rate
- Parameters
settings (
SettingsConfig
) – The settings configuration from which to pulldefault (
float
) – The default min CV to use when not specified
- Return type
defaultdict
-
cascade_at.settings.convert.
nu_from_settings
(settings, default=nan)[source]¶ Gets nu from the settings Configuration. The default nu is np.nan.
- Parameters
settings (
SettingsConfig
) – The settings configuration from which to pulldefault (
float
) – The default nu to use when not specified in the settings
- Return type
Dict
[str
,float
]
Data Inputs for Cascade-AT¶
Wrangling the inputs for a Cascade-AT model is a very important first step. All of the inputs at this time come from the IHME epi databases. In the future we’d like to create input data classes that don’t depend on the epi databases.
Input Components documents the inputs that are pulled for a model run. Input Demographics describes the demographic and location inputs that need to be set for a model. Covariates describes how covariates are pulled and transformed. Other-Cause Mortality describes how we calculate other cause mortality from the mortality inputs and use them as a constraint. Measurement Inputs documents how each of those inputs works together to create one large object that stores all of the input data for a model run (including each of the input components).
Input Demographics¶
There are two main demographic objects needed to pull data from the IHME databases, and more generally for building the cascade model.
Demographics¶
-
class
cascade_at.inputs.demographics.
Demographics
(gbd_round_id, location_set_version_id=None)[source]¶ Bases:
object
Grabs and stores demographic information needed for shared functions. Will also make a location hierarchy dag.
- Parameters
gbd_round_id (
int
) – The GBD roundlocation_set_version_id (
Optional
[int
]) – The location set version to use (right now EpiViz-AT is passing dismod location set versions, but this will eventually switch to the cause of death hierarchy that is more extensive).
Location Hierarchy¶
-
class
cascade_at.inputs.locations.
LocationDAG
(location_set_version_id=None, gbd_round_id=None, df=None, root=None)[source]¶ Bases:
object
Create a location DAG from the GBD location hierarchy, using networkx graph where each node is the location ID, and its properties are all properties from db_queries.
The root of this dag is the global location ID.
- Parameters
location_set_version_id (
Optional
[int
]) – The location set version corresponding to the hierarchy to pull from the IHME databasesgbd_round_id (
Optional
[int
]) – Which gbd round the location set version is coming fromdf (
Optional
[DataFrame
]) – An optional df to pass instead of location sets and gbd rounds if you’d rather construct the DAG from a pandas data frame.
-
descendants
(location_id)[source]¶ Gets all descendants (not just direct children) for a location ID. :type location_id:
int
:param location_id: (int) :rtype:List
[int
] :return:
-
ancestors
(location_id)[source]¶ Gets all the ancestors (not just the direct parent) for a location ID.
- Return type
List
[int
]
Input Components¶
These are all of the inputs that are pulled for a model run. Some may not be pulled depending on the settings (for example, some models don’t have cause-specific mortality data).
Crosswalk Version¶
-
class
cascade_at.inputs.data.
CrosswalkVersion
(crosswalk_version_id, exclude_outliers, demographics, conn_def, gbd_round_id)[source]¶ Bases:
cascade_at.inputs.base_input.BaseInput
Pulls and formats all of the data from a crosswalk version in the epi database.
- Parameters
crosswalk_version_id (
int
) – The crosswalk version to pull fromexclude_outliers (
bool
) – whether to exclude outliersconn_def (
str
) – database connection definitiongbd_round_id (
int
) – The GBD rounddemographics (
Demographics
) – The demographics object
-
get_raw
()[source]¶ Pulls the raw crosswalk version from the database. These are the observations that will be used in the bundle.
-
configure_for_dismod
(relabel_incidence, measures_to_exclude=None)[source]¶ Configures the crosswalk version for DisMod.
- Parameters
measures_to_exclude (
Optional
[List
[str
]]) – list of parameters to exclude, by namerelabel_incidence (
int
) – how to label incidence – see RELABEL_INCIDENCE_MAP
- Return type
DataFrame
Cause-Specific Mortality Rate¶
-
class
cascade_at.inputs.csmr.
CSMR
(cause_id, demographics, decomp_step, gbd_round_id)[source]¶ Bases:
cascade_at.inputs.base_input.BaseInput
Get cause-specific mortality rate for demographic groups from a specific CodCorrect output version.
- Parameters
cause_id (
int
) – The GBD cause of death to pull mortality fromdemographics (
Demographics
) –decomp_step (
str
) –gbd_round_id (
int
) –
All-Cause Mortality Rate¶
-
class
cascade_at.inputs.asdr.
ASDR
(demographics, decomp_step, gbd_round_id)[source]¶ Bases:
cascade_at.inputs.base_input.BaseInput
Gets age-specific all-cause death rate for all demographic groups.
- Parameters
demographics (
Demographics
) –decomp_step (
str
) –gbd_round_id (
int
) –
Population¶
-
class
cascade_at.inputs.population.
Population
(demographics, decomp_step, gbd_round_id)[source]¶ Bases:
cascade_at.inputs.base_input.BaseInput
Gets population for all demographic groups. This is not and input for DisMod-AT (and therefore does not subclass BaseInput. It is just used to do covariate interpolation over non-standard age groups and years.
- Parameters
demographics (
Demographics
) – A demographics objectdecomp_step (
str
) – The decomp stepgbd_round_id (
int
) – The gbd round
Covariates¶
Covariates Design from EpiViz¶
EpiViz-AT classifies covariates as country and study types. The country are 0 or 1 and are specific to the bundle. The country are floating-point values defined for every age / location / sex / year.
The strategy for parsing these and putting them into the model is to
split the data download and normalization from construction of model priors.
The EpiVizCovariate
is the information part.
The EpiVizCovariateMultiplier
is the model prior part.

For reading data, the main complication is that covariates have several IDs and names.
study_covariate_id
andcountry_covariate_id
may be equal for different covariates. That is, they are two sets of IDs. We have no guarantee this is not the case (even if someone tells us it is not the case).In the inputs, each covariate has a
short_name
, which is what we use. The short name, in other inputs, can contain spaces. I don’t know that study and country short names are guaranteed to be distinct. Therefore…We prefix study and country covariates with
s_
andc_
.Covariates are often transformed into log space, exponential space, or others. These get
_log
,_exp
, or whatever appended.When covariates are put into the model, they have English names, but inside Dismod-AT, they get renamed to
x_0
,x_1
,x_...
.
-
class
cascade_at.inputs.utilities.covariate_specifications.
EpiVizCovariate
(study_country, covariate_id, transformation_id)[source]¶ Bases:
object
This specifies covariate data from settings. It is separate from the cascade.model.Covariate, which is a Dismod-AT covariate. EpiViz-AT distinguishes study and country covariates and encodes them into the Dismod-AT covariate names.
-
transformation_id
¶ Which function to apply to this covariate column (log, exp, etc)
-
untransformed_covariate_name
¶ The name for this covariate before transformation.
-
property
spec
¶ Unique identifier for a covariate because two multipliers may refer to the same covariate.
-
property
name
¶ The name for this covariate in the final data.
-
-
class
cascade_at.inputs.utilities.covariate_specifications.
EpiVizCovariateMultiplier
(covariate, settings)[source]¶ Bases:
object
- Parameters
covariate (EpiVizCovariate) – The covariate
settings (StudyCovariate|CountryCovariate) – Section of the form.
-
property
group
¶ The name of the DismodGroups group, so it’s alpha, beta, or gamma.
-
property
key
¶ Key for the
DismodGroups
object, so it is a tuple of (covariate name, rate) or (covariate name, integrand) where rate and integrand are strings.
-
cascade_at.inputs.utilities.covariate_specifications.
create_covariate_specifications
(country_covariate, study_covariate)[source]¶ Parses EpiViz-AT settings to create two data structures for Covariate creation.
Covariate multipliers will only contain country covariates. Covariate specifications will contain both the country and study covariates, which are only the ‘sex’ and ‘one’ covariates.
>>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings >>> settings = load_settings(BASE_CASE) >>> multipliers, data_spec = create_covariate_specifications(settings.country_covariate, settings.study_covariate)
- Parameters
country_covariate (
List
[CountryCovariate
]) – The country_covariate member of the EpiViz-AT settings.study_covariate (
List
[StudyCovariate
]) – The study_covariate member of the EpiViz-AT settings.
- Return type
(typing.List[cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariateMultiplier], typing.List[cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariate])
- Returns
The multipliers are specification for making SmoothGrids.
The covariates are specification
for downloading data and attaching it to the crosswalk version and average integrand
tables. The multipliers use the covariates in order to know the name
of the covariate.
The following class is a wrapper around the covariate specifications that makes them easier to work with and provides helpful metadata.
Definition of Study and Country¶
There are three reasons to use a covariate.
- Country Covariate
We believe this covariate predicts disease behavior.
- Study Covariate
THIS IS DEPRECATED: the only study covariates are sex and one, described below.
The covariate marks a set of studies that behave differently. For instance, different sets of measurements may have different criteria for when a person is said to have the disease. We assign a covariate to the set of studies to account for bias from study design.
- Sex Covariate
This is usually used to select a subset of data by sex, but this could be done based on any covariate associated with observation data. In addition to being used to subset data, the sex covariate is a covariate multiplier applied the same way as a study covariate.
- One Covariate
The “one covariate” is a covariate of all ones. It’s treated within the bundle management system as a study covariate. It’s used as a covariate on measurement standard deviations, in order to account for between-study heterogeneity. A paper that might be a jumping-off point for understanding this is [Serghiou2019].
A covariate column that is used just for exclusion doesn’t need a covariate multiplier. In practice, the sex covariate is used at global or super-region level as a study covariate. Then the adjustments determined at the upper level are applied as constraints down the hierarchy. This means there is a covariate multiplier for sex, and its smooth is a grid of constraints, not typical priors.
Dismod-AT applies covariate effects to one of three different variables. It either uses the covariate to predict the underlying rate, or it applies the covariate to predict the measured data. It can be an effect on either the measured data value or the observation data standard deviation. Dismod-AT calls these, respectively, the alpha, beta, and gamma covariates.
As a rule of thumb, the three uses of covariates apply to different variables, as shown in the table below.
Use of Covariate |
Rate |
Measured Value |
Measured Stddev |
---|---|---|---|
Country |
Yes |
Maybe |
Maybe |
Study |
Maybe |
Yes |
Yes |
Sex (exclusion) |
No |
Yes |
No |
Country and study covariates can optionally use outliering. The sex covariate is defined by its use of regular outliering. Male and female data is assigned a value of -0.5 and 0.5, and the mean and maximum difference are adjusted to include one, the other, or both sexes.
Policies for Study and Country Covariates¶
Sex is added as a covariate called
s_sex
, which Dismod-AT translates tox_0
for its db file format. It is -0.5 for women, 0.5 for men, and 0 for both or neither. This covariate is used to exclude data by setting a reference value equal to -0.5 or 0.5 and a max allowed difference to 0.75, so that the “both” category is included and the other sex is excluded.The
s_one
covariate is a study covariate of ones. This can be selected in the user interface and is usually used as a gamma covariate, meaning it is a covariate multiplier on the standard deviation of measurement data. Its covariate id is 1604, and it appears in the db file asx_1
with a reference value of 0 and no max difference.
- Serghiou2019
Serghiou, Stylianos, and Steven N. Goodman. “Random-Effects Meta-analysis: Summarizing Evidence With Caveats.” Jama 321.3 (2019): 301-302.
Country Covariate Data¶
To grab the data for the covariates, we use this class that is part of the core data inputs.
-
class
cascade_at.inputs.covariate_data.
CovariateData
(covariate_id, demographics, decomp_step, gbd_round_id)[source]¶ Bases:
cascade_at.inputs.base_input.BaseInput
Get covariate estimates, and map them to the necessary demographic ages and sexes. If only one age group is present in the covariate data then that means that it’s not age-specific and we want to copy the values over to all the other age groups we’re working with in demographics. Same with sex.
-
configure_for_dismod
(pop_df, loc_df)[source]¶ Configures covariates for DisMod. Completes covariate ages, sexes, and locations based on what covariate data is already available.
To fill in ages, it copies over all age or age standardized covariates into each of the specific age groups.
To fill in sexes, it copies over any both sex covariates to the sex specific groups.
To fill in locations, it takes a population-weighted average of child locations for parent locations all the way up the location hierarchy.
- Parameters
pop_df (
DataFrame
) – A data frame with population info for all ages, sexes, locations, and yearsloc_df (
DataFrame
) – A data frame with location hierarchy information
-
Because study covariates are deprecated, we don’t need to get data for those.
Instead, in the MeasurementInputs
class we just assign the sex and one covariate values on the fly.
Covariate Interpolation¶
When we attach covariate values to data points, we often need to interpolate across ages or times because the data points don’t fit nicely into the covariate age and time groups that come from the GBD database.
The interpolation happens inside of
MeasurementInputs
,
using the following function that creates a
CovariateInterpolator
for each covariate.
-
cascade_at.inputs.utilities.covariate_weighting.
get_interpolated_covariate_values
(data_df, covariate_dict, population_df)[source]¶ Gets the unique age-time combinations from the data_df, and creates interpolated covariate values for each of these combinations by population-weighting the standard GBD age-years that span the non-standard combinations.
- Parameters
data_df (
DataFrame
) – A data frame with data observations in itcovariate_dict (
Dict
[str
,DataFrame
]) – A dictionary of covariate data frames with covariate names as keyspopulation_df (
DataFrame
) – A data frame with population in it
- Return type
DataFrame
-
class
cascade_at.inputs.utilities.covariate_weighting.
CovariateInterpolator
(covariate, population)[source]¶ Bases:
object
Interpolates a covariate by population weighting.
- Parameters
covariate (
DataFrame
) – Data frame with covariate informationpopulation (
DataFrame
) – Data frame with population information
Covariate Multipliers¶
All of the above sections involve pre-processing of the EpiViz-AT settings and covariate data. This is all so that we can make a covariate correctly in the dismod model specifications.
For the “covariate multiplier” that uses all of this information and converts it into something that dismod can understand, see Covariate Multipliers.
Other-Cause Mortality¶
The IHME databases supply all-cause mortality, but Dismod-AT uses other-cause mortality. It can impute what it needs to know using all-cause mortality, but it is helpful to add other-cause mortality not just as input data but as a constraint to the model.
We use total mortality as other-cause mortality. The correct formulae to use are for “cause-deleted lifetables” or “cause deletion.”
Omega Constraint¶
This constrains other-cause mortality using data from mtother, which is the integrand for other-cause mortality.
The choice to use an omega constraint is set in EpiViz-AT, and this is obeyed. If the user does choose to constrain omega, then it is included with the following function.
-
cascade_at.inputs.utilities.data.
calculate_omega
(asdr, csmr)[source]¶ Calculates other cause mortality (omega) from ASDR (mtall – all-cause mortality) and CSMR (mtspecific – cause-specific mortality). For most diseases, mtall is a good approximation to omega, but we calculate omega = mtall - mtspecific in case it isn’t. For diseases without CSMR (self.csmr_cause_id = None), then omega = mtall.
- Parameters
asdr (
DataFrame
) – data frame with age-specific all-cause mortality ratescsmr (
DataFrame
) – data frame with age-specific cause-specific mortality rates
- Return type
DataFrame
Measurement Inputs¶
Measurement inputs collects all of the things from Input Components and has a bunch of helper functions to format and combine them in accordance with the model settings for dismod.
-
class
cascade_at.inputs.measurement_inputs.
MeasurementInputs
(model_version_id, gbd_round_id, decomp_step_id, conn_def, country_covariate_id, csmr_cause_id, crosswalk_version_id, location_set_version_id=None, drill_location_start=None, drill_location_end=None)[source]¶ Bases:
object
The class that constructs all of the measurement inputs. Pulls ASDR, CSMR, crosswalk versions, and country covariates, and puts them into one data frame that then formats itself for the dismod database. Performs covariate value interpolation if age and year ranges don’t match up with GBD age and year ranges.
- Parameters
model_version_id (
int
) – the model version IDgbd_round_id (
int
) – the GBD round IDdecomp_step_id (
int
) – the decomp step IDcsmr_cause_id ((int) cause to pull CSMR from) –
crosswalk_version_id (
int
) – crosswalk version to usecountry_covariate_id (
List
[int
]) – list of covariate IDsconn_def (
str
) – connection definition from .odbc file (e.g. ‘epi’) to connect to the IHME databaseslocation_set_version_id (
Optional
[int
]) – can be None, if it’s none, get the best location_set_version_id for estimation hierarchy of this GBD rounddrill_location_start (
Optional
[int
]) – which location ID to drill from as the parentdrill_location_end (
Optional
[List
[int
]]) – which immediate children of the drill_location_start parent to include in the drill
-
self.
decomp_step
¶ the decomp step in string form
- Type
str
-
self.
demographics
¶ a demographics object that specifies the age group, sex, location, and year IDs to grab
-
self.
integrand_map
¶ dictionary mapping from GBD measure IDs to DisMod IDs
- Type
Dict[int, int]
-
self.
asdr
¶ all-cause mortality input object
-
self.
csmr
¶ cause-specific mortality input object from cause csmr_cause_id
-
self.
data
¶ crosswalk version data from IHME database
-
self.
covariate_data
¶ list of covariate data objects that contains the raw covariate data mapped to IDs
-
self.
location_dag
¶ DAG of locations to be used
-
self.
population
¶ population object that is used for covariate weighting
-
self.
data_eta
¶ applied to each measure
- Type
(Dict[str, float]): dictionary of eta value to be
-
self.
density
¶ applied to each measure
- Type
(Dict[str, str]): dictionary of density to be
-
self.
nu
¶ to each measure
- Type
(Dict[str, float]): dictionary of nu value to be applied
-
self.
dismod_data
¶ to be used in the dismod database
- Type
(pd.DataFrame) resulting dismod data formatted
Examples
>>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings >>> >>> settings = load_settings(BASE_CASE) >>> covariate_id = [i.country_covariate_id for i in settings.country_covariate] >>> >>> i = MeasurementInputs( >>> model_version_id=settings.model.model_version_id, >>> gbd_round_id=settings.gbd_round_id, >>> decomp_step_id=settings.model.decomp_step_id, >>> csmr_cause_id = settings.model.add_csmr_cause, >>> crosswalk_version_id=settings.model.crosswalk_version_id, >>> country_covariate_id=covariate_id, >>> conn_def='epi', >>> location_set_version_id=settings.location_set_version_id >>> ) >>> i.get_raw_inputs() >>> i.configure_inputs_for_dismod(settings)
-
configure_inputs_for_dismod
(settings, mortality_year_reduction=5)[source]¶ Modifies the inputs for DisMod based on model-specific settings.
- Parameters
settings (
SettingsConfig
) – Settings for the modelmortality_year_reduction (
int
) – number of years to decimate csmr and asdr
-
prune_mortality_data
(parent_location_id)[source]¶ Remove mortality data for descendants that are not children of parent_location_id from the configured dismod data before it gets filled into the dismod database.
- Return type
DataFrame
-
add_covariates_to_data
(df)[source]¶ Add on covariates to a data frame that has age_group_id, year_id or time-age upper / lower, and location_id and sex_id. Adds both country-level and study-level covariates.
- Return type
DataFrame
-
to_gbd_avgint
(parent_location_id, sex_id)[source]¶ Converts the demographics of the model to the avgint table.
- Return type
DataFrame
-
interpolate_country_covariate_values
(df, cov_dict)[source]¶ Interpolates the covariate values onto the data so that the non-standard ages and years match up to meaningful covariate values.
-
transform_country_covariates
(df)[source]¶ Transforms the covariate data with the transformation ID. :param df: (pd.DataFrame) :return: self
-
calculate_country_covariate_reference_values
(parent_location_id, sex_id)[source]¶ Gets the country covariate reference value for a covariate ID and a parent location ID. Also gets the maximum difference between the reference value and covariate values observed.
Run this when you’re going to make a DisMod AT database for a specific parent location and sex ID.
- Param
(int)
- Parameters
parent_location_id (
int
) – (int)sex_id (
int
) – (int)
- Return type
- Returns
List[CovariateSpec] list of the covariate specs with the correct reference values and max diff.
-
class
cascade_at.inputs.measurement_inputs.
MeasurementInputsFromSettings
(settings)[source]¶ Bases:
cascade_at.inputs.measurement_inputs.MeasurementInputs
Wrapper for MeasurementInputs that takes a settings object rather than the individual arguments. For convenience.
Examples
>>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE) >>> i = MeasurementInputs(settings) >>> i.get_raw_inputs() >>> i.configure_inputs_for_dismod()
Modeling¶
The model module provides tools to build a Dismod-AT model with variables, constraints, priors, and in the grid structure that Dismod-AT requires.
The main model object is documented here Model Class. The model object has two levels maximum (parents and children). To build that model object with “global” settings from an EpiViz-AT model, we have a wrapper around the model object, described below in Grid Alchemy that builds a two-level model at any parent location ID in a model hierarchy.
Var Class¶
-
class
cascade_at.model.var.
Var
(ages, times, column_name='mean')[source]¶ A Var is a function of age and time, defined by values on a grid. It linearly interpolates over values defined at grid points in a rectangular grid of age and time.
This is a single age-time grid. It is usually found in
cascade.model.DismodGroups
object which is a set of age-time grids. The following areDismodGroups
containingcascade.model.Var
: the fit, initial guess, truth var, and scale var.- Parameters
ages (List[float]) – Points along the age axis.
times (List[float]) – Points in time.
column_name (str) – A var has an internal Pandas DataFrame representation, and this column name can be
mean
ormeas_value
, depending on which Var is needed.
-
__setitem__
(at_slice, value)[source]¶ To set a value on a Var instance, set it on ranges of age and time or at specific ages and times.
>>> var = Var([0, 10, 20], [2000]) >>> var[:, :] = 0.001 >>> var[5:50, 2000] = 0.01 >>> var[10, :] = 0.02
- Parameters
at_slice (slice, slice) – What to change, as integer offset into ages and times.
value (float) – A float or integer.
-
__getitem__
(age_and_time)[source]¶ Gets the value of a Var at a single point. The point has to be one of the ages and times defined when the var was created.
>>> var = Var([0, 50, 100], [1990, 2000, 2010]) >>> var[:, :] = 1e-4 >>> assert var[50, 2000] == 1e-4
Trying to read from an age and time not in the ages and times of the grid will result in a
KeyError
.An easy way to set values is to use the age_time iterator, which loops through the ages and times in the underlying grid.
>>> for age, time in var.age_time(): >>> var[age, time] = 0.01 * age
- Parameters
age_and_time (age, time) – A two-dimensional index of age and time.
- Returns
The value at this age and time.
- Return type
float
-
set_mulstd
(kind, value)[source]¶ Set the value of the multiplier on the standard deviation. Kind must be one of “value”, “dage”, or “dtime”. The value should be convertible to a float.
>>> var = Var([50], [2000, 2001, 2002]) >>> var.set_mulstd("value", 0.4)
-
get_mulstd
(kind)[source]¶ Get the value of a standard deviation multiplier for a Var.
>>> var = Var([50], [2000, 2001, 2002]) >>> var.set_mulstd("value", 0.4) >>> assert var.get_mulstd("value") == 4
If the standard deviation multiplier wasn’t set, then this will return a nan.
>>> assert np.isnan(var.get_mulstd("dage"))
-
__call__
(age, time)[source]¶ A Var is a function of age and time, and this is how to call it.
>>> var = Var([0, 100], [1990, 2000]) >>> var[0, 1990] = 0 >>> var[0, 2000] = 1 >>> var[100, 1990] = 2 >>> var[100, 2000] = 3 >>> for a, t in var.age_time(): >>> print(f"At corner ({a}, {t}), {var(a, t)}") >>> for a, ti in [[53, 1997], [-5, 2000], [120, 2000], [0, 1900], [0, 2010]]: >>> print(f"Anywhere ({a}, {t}), {var(a, t)}") At corner (0.0, 1990.0), 0.0 At corner (0.0, 2000.0), 1.0 At corner (100.0, 1990.0), 2.0 At corner (100.0, 2000.0), 3.0 Anywhere (53, 2000.0), 2.06 Anywhere (-5, 2000.0), 1.0 Anywhere (120, 2000.0), 3.0 Anywhere (0, 2000.0), 1.0 Anywhere (0, 2000.0), 1.0
The grid points in a Var represent a continuous function, determined by bivariate interpolation. All points outside the grid are equal to the nearest point inside the grid.
Age Time Grid¶
-
class
cascade_at.model.age_time_grid.
AgeTimeGrid
(ages, times, columns)[source]¶ Bases:
object
The AgeTime grid holds rows of a table at each age and time value.
At each age and time point is a DataFrame consisting of the columns given in the constructor. So getting an item returns a dataframe with those columns. Setting a DataFrame sets those columns. Each AgeTimeGrid has three possible mulstds, for value, dage, dtime.
>>> atg = AgeTimeGrid([0, 10, 20], [1990, 2000, 2010], ["height", "weight"]) >>> atg[:, :] = [6.1, 195] >>> atg[:, :].height = [5.9] >>> atg[10, 2000] = [5.7, 180] >>> atg[5:17, 1980:1990].weight = 125 >>> assert (atg[20, 2000].weight == 195).all() >>> assert isinstance(atg[0, 1990], pd.DataFrame)
If the column has the same name as a function (mean), then access it with getitem,
>>> atg[:, :]["mean"] = [5.9]
Why is this in Pandas, when it’s a regular array of data with an index, which makes it better suited to XArray, or even a Numpy array? It needs to interface with a database representation, and Pandas is a better match there.
-
property
mulstd
¶
-
property
DismodGroups Class¶
-
class
cascade.model.
DismodGroups
¶ A DismodGroups contains Var instances or contains SmoothGrid instances. It gives them the shape of the whole model, so it expresses what rates are nonzero, what random effects are defined, and on which of these there are covariate multipliers.
The DismodGroups structure will appear in lots of places. The fit returned by Dismod will be a DismodGroups containing Var objects. The Model itself is a DismodGroups containing SmoothGrid objects.
A classic use of this is to create a new
DismodGroups
ofVar
. The first loop is over the rate, random effect, and covariate group names. The inner loop is over particular sets of keys, which are composed of tuples of the primary rate, covariate name, and location IDs.var_groups = DismodGroups() for group_name, group in var_ids.items(): for key, var_id_mapping in group.items(): var_groups[group_name][key] = var_builder(table, var_id_mapping)
-
rate[primary_rate]
This is a dictionary of rates. They are always one of the five underlying rates: iota, chi, omega, rho, pini:
dg = DismodGroups() dg.rate["iota"] = Var([0, 1, 50], [2000])
-
random_effect[(primary_rate, child_location)]
A dictionary of random effects on the rates, so the keys are a rate and the ID of the child for which this is a random effect. When constructing a
Model
, we typically want to make oneSmoothGrid
of priors for all child random effects on a particular rate. In that case, specify the child ID asNone
:model = Model() # A Model is a DismodGroups object, too. model.random_effect[("iota", None)] = SmoothGrid([0, 100], [1990]) scale = DismodGroups() scale.random_effect[("omega", 2)] = Var([0, 100], [1990, 2000]) scale.random_effect[("omega", 3)] = Var([0, 100], [1990, 2000])
-
alpha[(covariate_name, rate_name)]
Alpha are covariate multipliers on the rates. The key is the name of the Covariate, which should match the name in the class given as an argument to the Session object. The rate name is one of the five underlying rates.
-
beta[(covariate_name, integrand_name)]
Beta are covariate multipliers on the measured value of the integrands. The integrand name is one of the canonical values.
-
gamma[(covariate_name, integrand_name)]
Gamma are covariate multipliers on the measured standard deviation of the integrands.
-
Priors¶
These are classes for the priors.
-
class
cascade_at.model.priors.
_Prior
¶ All priors have these methods.
-
parameters
()¶ Returns a dictionary of all parameters for this prior, including the prior type as “density”.
-
assign
(parameter=value, parameter=value...)¶ Creates a new Prior object with the same parameters as this Prior, except for the requested changes.
-
-
class
cascade_at.model.priors.
Uniform
(lower, upper, mean=None, eta=None, name=None)[source]¶ - Parameters
lower (float) – Lower bound
upper (float) – Upper bound
mean (float) – Doesn’t make sense, but it’s used to seed solver.
eta (float) – Used for logarithmic distributions.
name (str) – A name in case this is a pet prior.
-
class
cascade_at.model.priors.
Constant
(mean, name=None)[source]¶ - Parameters
mean (float) – The const value.
name (str) – A name for this prior, e.g. Susan.
-
class
cascade_at.model.priors.
Gaussian
(mean, standard_deviation, lower=- inf, upper=inf, eta=None, name=None)[source]¶ A Gaussian is
\[f(x) = \frac{1}{2\pi \sigma^2} e^{-(x-\mu)^2/(2\sigma^2)}\]where \(\sigma\) is the variance and \(\mu\) the mean.
- Parameters
mean (float) – This is \(\mu\).
standard_deviation (float) – This is \(\sigma\).
lower (float) – lower limit.
upper (float) – upper limit.
eta (float) – Offset for calculating standard deviation.
name (str) – Name for this prior.
-
class
cascade_at.model.priors.
Laplace
(mean, standard_deviation, lower=- inf, upper=inf, eta=None, name=None)[source]¶ This version of the Laplace distribution is parametrized by its variance instead of by scaling of the axis. Usually, the Laplace distribution is
\[f(x) = \frac{1}{2b}e^{-|x-\mu|/b}\]where \(\mu\) is the mean and \(b\) is the scale, but the variance is \(\sigma^2=2b^2\), so the Dismod-AT version looks like
\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\sqrt{2}|x-\mu|/\sigma}.\]The standard deviation assigned is \(\sigma\).
-
class
cascade_at.model.priors.
StudentsT
(mean, standard_deviation, nu, lower=- inf, upper=inf, eta=None, name=None)[source]¶ This Students-t must have \(\nu>2\). Students-t distribution is usually
\[f(x,\nu) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu)}(1+x^2/\nu)^{-(\nu+1)/2}\]with mean 0 for \(\nu>1\). The variance is \(\nu/(\nu-2)\) for \(\nu>2\). Dismod-AT rewrites this using \(\sigma^2=\nu/(\nu-2)\) to get
\[f(x) = \frac{\Gamma((\nu+1)/2)}{\sqrt(\pi\nu)\Gamma(\nu/2)} \left(1 + (x-\mu)^2/(\sigma^2(\nu-2))\right)^{-(\nu+1)/2}\]
-
class
cascade_at.model.priors.
LogGaussian
(mean, standard_deviation, eta, lower=- inf, upper=inf, name=None)[source]¶ Dismod-AT parametrizes the Log-Gaussian with the standard deviation as
\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\log((x-\mu)/\sigma)^2/2}\]-
mle
(draws)[source]¶ Assign new mean and stdev, with mean clamped between upper and lower. This does a fit using a normal distribution.
- Parameters
draws (np.ndarray) – A 1D array of floats.
- Returns
With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.
- Return type
Gaussian
-
-
class
cascade_at.model.priors.
LogLaplace
(mean, standard_deviation, eta, lower=- inf, upper=inf, name=None)[source]¶
-
class
cascade_at.model.priors.
LogStudentsT
(mean, standard_deviation, nu, eta, lower=- inf, upper=inf, name=None)[source]¶ -
mle
(draws)[source]¶ Assign new mean and stdev, with mean clamped between upper and lower. This does a fit using a normal distribution.
- Parameters
draws (np.ndarray) – A 1D array of floats.
- Returns
With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.
- Return type
Gaussian
-
Covariate Multipliers¶
See more information about how covariate settings and data are pulled in Covariates.
-
class
cascade_at.model.covariate.
Covariate
(column_name, reference=None, max_difference=None)[source]¶ Bases:
object
Establishes a reference value for a covariate column on input data and in output data. It is possible to create a covariate column with nothing but a name, but it must have a reference value before it can be used in a model.
- Parameters
column_name (
str
) – Name of the column in the input data.reference (
Optional
[float
]) – Reference where covariate has no effect.max_difference (
Optional
[float
]) – If a data point’s covariate is farther than max_difference from the reference value, then this data point is excluded from the calculation. Must be greater than or equal to zero.
-
property
name
¶
-
property
reference
¶
-
property
max_difference
¶
SmoothGrid Class¶
A SmoothGrid represents model priors (as opposed to data priors) in
a Dismod-AT model. A Model
is a bunch of
SmoothGrids, one for each rate, random effect, and covariate multiplier.
For instance, in order to set priors on underlying incidence rate, iota, create a SmoothGrid, set its priors, and add it to the Model:
smooth = SmoothGrid([0, 5, 10, 50, 100], [1990, 2015])
smooth.value[:, :] = Uniform(mean=0.01, lower=1e-6, upper=5)
smooth.dage[:, :] = Gaussian(mean=0, standard_deviation=10)
smooth.dtime[:, :] = Gaussian(mean=0, standard_deviation=0.1)
All of the priors in a SmoothGrid need to be defined. There is a value prior at each age and time, but the prior for difference in age and time are forward differences, so there is no prior for the largest age point and largest time point. That means you’ll notice examples with no dtime priors when the underlying grid is defined for only one year.
If you want more control over exact priors, iterate over them. The age_time_diff iterator returns the age and time at the age points but also the difference in age and time to the next age point:
for age, time, age_diff, time_diff in smooth.age_time_diff():
if not isinf(age_diff):
smooth.dage[age, time] = \
Gaussian(mean=0, standard_deviation=1 + 5 * age_diff)
This would change the standard deviation as the age interval changes, which could be helpful when age intervals change greatly. The check for isinf catches the last age difference, which returns the value inf because there is no next age point.
It is also possible to see what priors are set. This gets the prior at each age and time. Then it sets a new value for the prior with twice-as-large a standard deviation but the same density:
for age, time in smooth.age_time():
prior = smooth.value[age, time]
print(f"prior mean {prior.mean} std {prior.standard_deviation}")
smooth.value[age, time] = prior.update(standard_deviation=2 * prior.standard_deviation)
Smooth Grid Priors¶
For the Rate tab, Random Effect tab, Study tab and Country Covariate tab,
the interface sets priors. This describes how those settings are interpreted.
Most of this work happens in the function
cascade.executor.construct_model.smooth_grid_from_smoothing_form()
,
and you can check its source there.
The default value, dage, and dtime priors are used to initialize those parts of the smooth grid. For smooth grids with only one age, the dage priors aren’t meaningful, and the same is true for dtime priors when there is only one year.
After that, the detailed priors are applied in the order they appear in the settings, and note that the order may or may not reflect the order in the user interface. There are three ways to specify which age and time points each detailed prior applies to:
age_lower
andage_upper
- A missing value here (one that’s not filled-in in the UI) is treated as -infinity or infinity, respectively.
time_lower
andtime_upper
- Missing values similarly set to include all points on that side.
born_lower
andborn_upper
- Each line for the born limit corresponds to \(a \le t - b\) or \(a \ge t - b\), respectively.
For each prior, all three of these sieves are applied to the grid of ages and times defined by the age values and time values for that smooth grid. If a detailed prior doesn’t match any of the age and time points in this grid, there will be a statement in the log that says “No ages and times match prior with extents <lower and upper extents>.”
Model Class¶
The Model
holds all of the SmoothGrids that define priors on
rates, random effects, and covariates. It also has a few other
properties necessary to define a complete model.
Which of the rates are nonzero. This is a list of, for instance, [“iota”, “omega”, “chi”].
The parent location as an integer ID. These correspond to the IDs supplied to the Dismod-AT session.
A list of child locations. Not children and grandchildren, but the direct child locations as integer IDs.
A list of covariates, supplied as
Covariate
objects.Weight functions, that are used to compute integrands. Each weight function is a Var.
A scaling function, which sets the scale for every model variable. If this isn’t set, it will be calculated by Dismod-AT from the mean of value priors. It is used to ensure different terms in the likelihood have similar importance.
-
class
cascade_at.model.model.
Model
(nonzero_rates, parent_location, child_location=None, covariates=None, weights=None)[source]¶ >>> from cascade_at.inputs.locations import LocationDAG >>> locations = LocationDAG(location_set_version_id=429) >>> m = Model(["chi", "omega", "iota"], 6, locations.dag.successors(6))
- Parameters
nonzero_rates (
List
[str
]) – A list of rates, using the Dismod-AT terms for the rates, so they are “iota”, “chi”, “omega”, “rho”, and “pini”.parent_location (
int
) – The location ID for the parent.child_location (
Optional
[List
[int
]]) – List of the children.A list of covariate objects. This supplies the reference values and max differences, (covariates) – used to exclude data by covariate value.
weights (
Optional
[Dict
[str
,Var
]]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.
Grid Alchemy¶
In order to build two-level models with the
settings from EpiViz-AT but at different
parent locations, and extracting the correct
information from the measurement inputs,
we use a wrapper around all of the modeling
components, with a method called
construct_two_level_model
.
This alchemy object is one of the three things that is read in each time we grab a Context object.
-
class
cascade_at.model.grid_alchemy.
Alchemy
(settings)[source]¶ Bases:
object
An object initialized with model settings from cascade.settings.configuration.Configuration that can be used to construct parent-child location-specific models with the attribute ModelConstruct.construct_two_level_model().
Examples
>>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings >>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings
>>> settings = load_settings(BASE_CASE) >>> mc = Alchemy(settings)
>>> i = MeasurementInputsFromSettings(settings) >>> i.get_raw_inputs()
>>> mc.construct_two_level_model(location_dag=i.location_dag, >>> parent_location_id=102, >>> covariate_specs=i.covariate_specs)
-
construct_age_time_grid
()[source]¶ Construct a DEFAULT age-time grid, to be updated when we initialize the model.
- Return type
Dict
[str
,ndarray
]
-
construct_single_age_time_grid
()[source]¶ Construct a single age-time grid. Use this age and time when a smooth grid doesn’t depend on age and time.
- Return type
Tuple
[ndarray
,ndarray
]
-
get_smoothing_grid
(rate)[source]¶ Construct a smoothing grid for any rate in the model.
- Parameters
rate (
Smoothing
) – Some smoothing form for a rate.- Return type
- Returns
The rate translated into a SmoothGrid based on the model settings’
default age and time grids.
-
get_all_rates_grids
()[source]¶ Get a dictionary of all the rates and their grids in the model.
- Return type
Dict
[str
,SmoothGrid
]
-
static
override_priors
(rate_grid, update_dict=typing.Dict[str, numpy.ndarray], new_prior_distribution='gaussian')[source]¶ Override priors for rates. This is used when we want to do posterior to prior, so we are overriding the global settings with location-specific settings based on parent posteriors.
- Parameters
rate_grid (
SmoothGrid
) – SmoothGrid object for a rateupdate_dict – Dictionary with ages and times vectors and draws for values, dage, and dtime to use in overriding the prior.
new_prior_distribution (
Optional
[str
]) – The new prior distribution to override the existing priors.
-
static
apply_min_cv_to_prior_grid
(prior_grid, min_cv, min_std=1e-10)[source]¶ Applies the minimum coefficient of variation to a _PriorGrid to enforce that minCV across all variables in the grid. Updates the _PriorGrid in place.
- Return type
None
-
construct_two_level_model
(location_dag, parent_location_id, covariate_specs, weights=None, omega_df=None, update_prior=None, min_cv=None, update_mulcov_prior=None)[source]¶ Construct a Model object for a parent location and its children.
- Parameters
location_dag (
LocationDAG
) – Location DAG specifying the location hierarchyparent_location_id (
int
) – Parent location to build the model forcovariate_specs (
CovariateSpecs
) – covariate specifications, specifically will use covariate_specs.covariate_multipliersweights (
Optional
[Dict
[str
,Var
]]) –omega_df (
Optional
[DataFrame
]) – data frame with omega values in it (other cause mortality)update_prior (
Optional
[Dict
[str
,Dict
[str
,ndarray
]]]) – dictionary of dictionary for prior updates to ratesupdate_mulcov_prior (
Optional
[Dict
[Tuple
[str
,str
,str
],_Prior
]]) – dictionary of mulcov prior updatesmin_cv (
Optional
[Dict
[str
,Dict
[str
,float
]]]) – dictionary (can be defaultdict) for minimum coefficient of variation keyed by cascade level, then by rate
-
Dismod Database API¶
This module describes the interface for reading and writing from dismod databases. Dismod-AT works on SQLite databases, and we need a user-friendly way to write data to and extract data from these databases. It is also important to make sure we have all of the correct columns and column types.
The input tables and column types are explained here, and the output tables and column types are explained here.
We mimic that table metadata here, and then build an interface on top of it for easy reading and writing.
Interface¶
The base interface is DismodSQLite
, and the input and output class has getters and setters
for each of the tables (DismodIO
, not documented here).
To use a DismodIO(DismodSQLite)
interface, you can do
from cascade_at.dismod.api.dismod_io import DismodIO
file = 'my_database.db'
db = DismodIO(file)
# Tables are stored as attributes, e.g.
db.data
db.age
db.time
db.prior
# Tables can be set with
db.data = pd.DataFrame(...)
-
class
cascade_at.dismod.api.dismod_sqlite.
DismodSQLite
(path)[source]¶ Bases:
object
Initiates an SQLite reader from the path.
- Parameters
path (
Union
[str
,Path
]) – A string or Path pointing to the DisMod database file.
-
update_table_columns
(table_name, table)[source]¶ Updates the table columns with additional columns like “c_” which are comments and “x_” which are covariates.
Run Dismod Commands¶
To run dismod commands on a database (all possible options are
here),
you can use the following helper functions. They will figure out
where your dmdismod
executable is, whether it be installed on your
computer or pulling from docker, based on the installation of cascade_at_scripts
.
-
cascade_at.dismod.api.run_dismod.
run_dismod
(dm_file, command)[source]¶ Executes a command on a dismod file.
- Parameters
dm_file (
str
) – the dismod db filepathcommand (
str
) – a command to run
-
cascade_at.dismod.api.run_dismod.
run_dismod_commands
(dm_file, commands, sys_exit=True)[source]¶ Runs multiple commands on a dismod file and returns the exit statuses. Will raise an exception if it runs into an error.
- Parameters
dm_file (
str
) – the dismod db filepathcommands (
List
[str
]) – a list of stringssys_exit – whether to exit the code altogether if there is an error. If False, then it will pass the error string back to the original python process.
Fill and Extract Helpers¶
In order to fill data into the dismod databases
in a meaningful way for the cascade, we have two
classes that are subclasses
of DismodIO
and provide easy functionality for filling tables
based on a model version’s settings.
Dismod Filler¶
-
class
cascade_at.dismod.api.dismod_filler.
DismodFiller
(path, settings_configuration, measurement_inputs, grid_alchemy, parent_location_id, sex_id, child_prior=None, mulcov_prior=None)[source]¶ Bases:
cascade_at.dismod.api.dismod_io.DismodIO
Sits on top of the DismodIO class, and takes everything from the collector module and puts them into the Dismod database tables in the correct construction.
Dismod Filler wraps a dismod database and fills all of the tables using the measurement inputs object, settings, and the grid alchemy constructor.
It optionally includes rate priors and covariate multiplier priors.
- Parameters
path (
Union
[str
,Path
]) – the path of the dismod databasesettings_configuration (
SettingsConfig
) – the settings configuration objectmeasurement_inputs (
MeasurementInputs
) – the measurement inputs objectgrid_alchemy (
Alchemy
) – the grid alchemy objectparent_location_id (
int
) – the parent location ID for this databasesex_id (
int
) – the reference sex for this databasechild_prior (
Optional
[Dict
[str
,Dict
[str
,ndarray
]]]) – a dictionary of child rate priors to use. The first level of the dictionary is the rate name, and the second is the type of prior, being value, age, or dtime.
-
self.
parent_child_model
¶ Model that was constructed from grid_alchemy parameters for one specific parent and its descendents
Examples
>>> from pathlib import Path >>> from cascade_at.model.grid_alchemy import Alchemy >>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings >>> from cascade_at.settings.base_case import BASE_CASE >>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE) >>> inputs = MeasurementInputsFromSettings(settings) >>> inputs.demographics.location_id = [102, 555] # subset the locations to make it go faster >>> inputs.get_raw_inputs() >>> inputs.configure_inputs_for_dismod(settings) >>> alchemy = Alchemy(settings)
>>> da = DismodFiller(path=Path('temp.db'), >>> settings_configuration=settings, >>> measurement_inputs=inputs, >>> grid_alchemy=alchemy, >>> parent_location_id=1, >>> sex_id=3) >>> da.fill_for_parent_child()
-
get_omega_df
()[source]¶ Get the correct omega data frame for this two-level model.
- Return type
DataFrame
-
get_parent_child_model
()[source]¶ Construct a two-level model that corresponds to this parent location ID and its children.
- Return type
-
calculate_reference_covariates
()[source]¶ Calculates reference covariate values based on the input object and the parent/sex we have in the two-level model. Modifies the baseline covariate specs object.
- Return type
-
fill_for_parent_child
(**options)[source]¶ Fills the Dismod database with inputs and a model construction for a parent location and its descendents.
Pass in some optional keyword arguments to fill the option table with additional info or to over-ride the defaults.
- Return type
None
-
node_id_from_location_id
(location_id)[source]¶ Get the node ID from a location ID in an already created node table.
- Return type
int
-
fill_reference_tables
()[source]¶ Fills all of the reference tables including density, node, covariate, age, and time.
Dismod Extractor¶
-
class
cascade_at.dismod.api.dismod_extractor.
DismodExtractor
(path)[source]¶ Bases:
cascade_at.dismod.api.dismod_io.DismodIO
Sits on top of the DismodIO class, and extracts helpful data frames from the dismod database tables.
- Parameters
path (
str
) – The database filepath
-
get_predictions
(locations=None, sexes=None, samples=False, predictions=None)[source]¶ Get the predictions from the predict table for locations and sexes. Will either return a column of ‘mean’ if not samples, otherwise ‘draw’, which can then be reshaped wide if necessary.
- Return type
DataFrame
-
gather_draws_for_prior_grid
(location_id, sex_id, rates, value=True, dage=False, dtime=False, samples=True)[source]¶ Takes draws and formats them for a prior grid for values, dage, and dtime. Assumes that age_lower == age_upper and time_lower == time_upper for all data rows. We might not want to do all value, dage, and dtime, so pass False if you want to skip those.
- Parameters
location_id (
int
) –sex_id (
int
) –rates (
List
[str
]) – list of rates to get the draws forvalue (
bool
) – whether to calculate value priorsdage (
bool
) – whether to calculate dage priorsdtime (
bool
) – whether to calculate dtime priorssamples (
bool
) – whether the prior came from samples
- Returns
- Return type
Dictionary of 3-d arrays of value, dage, and dtime draws over age and time for this loc and sex
-
format_predictions_for_ihme
(gbd_round_id, locations=None, sexes=None, samples=False, predictions=None)[source]¶ Formats predictions from the prediction table and returns either the mean or draws, based on whether or not samples is False or True.
- Parameters
locations (
Optional
[List
[int
]]) – A list of locations to extract from the predictionssexes (
Optional
[List
[int
]]) – A list of sexes to extract from the predictionsgbd_round_id (
int
) – The GBD round ID to format the predictions forsamples (
bool
) – Whether or not the predictions have draws (samples) or whether it is just one fit.predictions (
Optional
[DataFrame
]) – An optional data frame with the predictions to use rather than reading them directly from the database.
- Returns
- Return type
Data frame with predictions formatted for the IHME databases.
Table Creation¶
The DismodFiller
uses the following table creation functions internally.
Formatting Reference Tables¶
The dismod database needs some standard reference tables. These are made with the following functions.
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
construct_integrand_table
(data_cv_from_settings=None, default_data_cv=0.0)[source]¶ Constructs the integrand table and adds data CV in the minimum_meas_cv column.
- Parameters
data_cv_from_settings ((optional dict) key, value pair that has) – integrands mapped to data cv
default_data_cv ((float) default value for data CV to use) –
- Return type
DataFrame
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
default_rate_table
()[source]¶ Constructs the default rate table with rate names and ids.
- Return type
DataFrame
-
cascade_at.dismod.api.fill_extract_helpers.reference_tables.
construct_node_table
(location_dag)[source]¶ Constructs the node table from a location DAG’s to_dataframe() method.
- Parameters
location_dag (
LocationDAG
) – location hierarchy object- Return type
DataFrame
Formatting Dismod Data Tables¶
There are helper functions to create data files. Broke them up into small functions to help with unit testing.
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
prep_data_avgint
(df, node_df, covariate_df)[source]¶ Preps both the data table and the avgint table by mapping locations to nodes and covariates to names.
Putting it in the same function because it does the same stuff, but data and avgint need to be called separately because dismod requires different columns.
- Parameters
df (
DataFrame
) – The data frame to mapnode_df (
DataFrame
) – The node table from dismod dbcovariate_df (
DataFrame
) – The covariate table from dismod db
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
construct_data_table
(df, node_df, covariate_df, ages, times)[source]¶ Constructs the data table from input df.
- Parameters
df (
DataFrame
) – data frame of inputs that have been prepped for dismodnode_df (
DataFrame
) – the dismod node tablecovariate_df (
DataFrame
) – the dismod covariate tableages (
ndarray
) –times (
ndarray
) –
-
cascade_at.dismod.api.fill_extract_helpers.data_tables.
construct_gbd_avgint_table
(df, node_df, covariate_df, integrand_df, ages, times)[source]¶ Constructs the avgint table using the output df from the inputs.to_avgint() method.
- Parameters
df (
DataFrame
) – The data frame to construct the avgint table from, that has things like ages, times, nodes (locations), sexes, etc.node_df (
DataFrame
) – dismod node data framecovariate_df (
DataFrame
) – dismod covariate data frameintegrand_df (
DataFrame
) – dismod integrand data frameages (
ndarray
) – array of ages for the modeltimes (
ndarray
) – array of times for the model
- Return type
DataFrame
Formatting Grid Tables¶
There are helper functions to create grid tables in the dismod database. These are things like WeightGrid and SmoothGrid.
-
cascade_at.dismod.api.fill_extract_helpers.grid_tables.
construct_model_tables
(model, location_df, age_df, time_df, covariate_df)[source]¶ Main function that loops through the items from a model object, which include rate, random_effect, alpha, beta, and gamma and constructs the modeling tables in dismod db.
Each of these are “grid” vars, so they need entries in prior, smooth, and smooth_grid. This function returns those tables.
It also constructs the rate, integrand, and mulcov tables (alpha, beta, gamma), plus nslist and nslist_pair tables.
- Parameters
model (
Model
) – A model object that has rate informationlocation_df (
DataFrame
) – A location / node data frameage_df (
DataFrame
) – An age data frame for dismodtime_df (
DataFrame
) – A time data frame for dismodcovariate_df (
DataFrame
) – A covariate data frame for dismod
- Returns
rate, prior, smooth, smooth_grid, mulcov, nslist, nslist_pair, and subgroup tables
- Return type
A dictionary of data frames for each table name, includes
-
cascade_at.dismod.api.fill_extract_helpers.grid_tables.
construct_weight_grid_tables
(weights, age_df, time_df)[source]¶ Constructs the weight and weight_grid tables.”
- Parameters
weights (
Dict
[str
,Var
]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.age_df – Age data frame from dismod db
time_df – Time data frame from dismod db
- Returns
- Return type
Tuple of the weight table and the weight grid table
Helper Functions¶
Posterior to Prior¶
When we do “posterior to prior” that means to take
the fit from a parent database and use the rate posteriors as the prior
for the child fits. This happens in
DismodFiller
when it builds the two-level model
with Alchemy
because it replaces the default
priors with the ones passed in.
The posterior is passed down by predicting the parent model on the rate grid for the children. To construct the rate grid, we use the following function:
-
cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.
get_prior_avgint_grid
(grids, sexes, locations, midpoint=False)[source]¶ Get a data frame to use for setting up posterior predictions on a grid. The grids are specified in the grids parameter.
Will still need to have covariates added to it, and prep data from dismod.api.data_tables.prep_data_avgint to convert nodes and covariate names before it can be input into the avgint table in a database.
- Parameters
grids (
Dict
[str
,Dict
[str
,ndarray
]]) – A dictionary of grids with keys for each integrand, which are dictionaries for “age” and “time”.sexes (
List
[int
]) – A list of sexeslocations (
List
[int
]) – A list of locationsmidpoint (
bool
) – Whether to midpoint the grid lower and upper values (recommended for rates).
- Returns
“avgint_id”, “integrand_id”, “location_id”, “weight_id”, “subgroup_id”, “age_lower”, “age_upper”, “time_lower”, “time_upper”, “sex_id”
- Return type
Dataframe with columns
And then to upload those priors from the rate grid to the IHME databases since the IHME databases require standard GBD ages and times, we use this function. This is just for visualization purposes:
-
cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.
format_rate_grid_for_ihme
(rates, gbd_round_id, location_id, sex_id)[source]¶ Formats a grid of mean, upper, and lower for a prior rate for the IHME database. Only does this for Gaussian priors.
- Parameters
rates (
Dict
[str
,SmoothGrid
]) – A dictionary of SmoothGrids, keyed by primary rates like “iota”gbd_round_id (
int
) – the GBD roundlocation_id (
int
) – the location ID to append to this data framesex_id (
int
) – the sex ID to append to this data frame
- Returns
- Return type
A data frame formatted for the IHME databases
Multithreading¶
When we want to do multithreading on a dismod
database, we can define some process
that works, for example, on only
a subset of a database’s data or samples, etc.
In order to do this work, there is a base class
here that is subclassed in
sample
and
Predict
since there are tasks that can be done in parallel
on one database.
Constants¶
Dismod-AT makes assumptions about the order of variables. In some cases, it has relaxed those assumptions over time, but we retain these as conventions.
-
class
cascade_at.dismod.constants.
RateEnum
(value)[source]¶ These are the five underlying rates.
-
pini
= 0¶ Initial prevalence of the condition at birth, as a fraction of one.
-
iota
= 1¶ Incidence rate for leaving susceptible to become diseased.
-
rho
= 2¶ Remission from disease to susceptible.
-
chi
= 3¶ Excess mortality rate.
-
omega
= 4¶ Other-cause mortality rate.
-
-
class
cascade_at.dismod.constants.
IntegrandEnum
(value)[source]¶ These are all of the integrands Dismod-AT supports, and they will have exactly these IDs when serialized.
-
Sincidence
= 0¶ Susceptible incidence, where the denominator is the number of susceptibles. Corresponds to iota.
-
remission
= 1¶ Remission rate, corresponds to rho.
-
mtexcess
= 2¶ Excess mortality rate, corresponds to chi.
-
mtother
= 3¶ Other-cause mortality, corresponds to omega.
-
mtwith
= 4¶ Mortality rate for those with condition.
-
susceptible
= 5¶ Fraction of susceptibles out of total population.
-
withC
= 6¶ Fraction of population with the disease. Total pop is the denominator.
-
prevalence
= 7¶ Fraction of those alive with the disease, so S+C is denominator.
-
Tincidence
= 8¶ Total-incidence, where denominator is susceptibles and with-condition.
-
mtspecific
= 9¶ Cause-specific mortality rate, so mx_c.
-
mtall
= 10¶ All-cause mortality rate, mx.
-
mtstandard
= 11¶ Standardized mortality ratio.
-
relrisk
= 12¶ Relative risk.
-
incidence
= -99¶ This integrand should never be used, but we need it when we are converting from the epi database measures initially
-
-
class
cascade_at.dismod.constants.
DensityEnum
(value)[source]¶ The distributions supported by Dismod-AT. They always have these ids.
-
uniform
= 0¶ Uniform Distribution
-
gaussian
= 1¶ Gaussian Distribution
-
laplace
= 2¶ Laplace Distribution
-
students
= 3¶ Students-t Distribution
-
log_gaussian
= 4¶ Log-Gaussian Distribution
-
log_laplace
= 5¶ Log-Laplace Distribution
-
log_students
= 6¶ Log-Students-t Distribution
-
-
class
cascade_at.dismod.constants.
WeightEnum
(value)[source]¶ Dismod-AT allows arbitrary weights, which are functions of space and time, defined by bilinear interpolations on grids. These weights are used to average rates over age and time intervals. Given this problem, there are three kinds of weights that are relevant.
-
constant
= 0¶ This weight is constant everywhere at 1. This is the no-weight weight.
-
susceptible
= 1¶ For measures that are integrals over population without the condition.
-
with_condition
= 2¶ For measures that are integrals over those with the disease.
-
total
= 3¶ For measures where the denominator is the whole population.
-
-
constants.
INTEGRAND_TO_WEIGHT
= {'Sincidence': <WeightEnum.susceptible: 1>, 'Tincidence': <WeightEnum.total: 3>, 'mtall': <WeightEnum.total: 3>, 'mtexcess': <WeightEnum.with_condition: 2>, 'mtother': <WeightEnum.total: 3>, 'mtspecific': <WeightEnum.total: 3>, 'mtstandard': <WeightEnum.constant: 0>, 'mtwith': <WeightEnum.with_condition: 2>, 'prevalence': <WeightEnum.total: 3>, 'relrisk': <WeightEnum.constant: 0>, 'remission': <WeightEnum.with_condition: 2>, 'susceptible': <WeightEnum.constant: 0>, 'withC': <WeightEnum.constant: 0>}¶ Each integrand has a natural association with a particular weight because it is a count of events with one of four denominators: constant, susceptibles, with-condition, or the total population.
Dismod Integrand Mappings¶
There is a big of mapping that has to occur between GBD measures and Dismod-AT integrands and rates. Functions and dictionaries to aid in this mapping are, for example:
-
cascade_at.dismod.integrand_mappings.
integrand_to_gbd_measures
(df, integrand_col)[source]¶ Maps the integrand column to measure IDs and adds in filler measures where necessary (e.g. copies over Sincidence to incidence).
- Parameters
df (
DataFrame
) – data frame with integrandintegrand_col (
str
) – column name for integrand column
- Returns
- Return type
data frame with integrands mapped to measures
See here for more details.
Core Functions¶
The core module has the following, miscellaneous components. Form Validation is the building blocks for the Settings Configuration Form coming from EpiViz. Shared Functions allows us to import internal shared functions into open source environments (like Travis CI).
Form Validation¶
The form classes are the building blocks for the Settings Configuration Form.
Fields¶
This module defines specializations of the general tools in abstract_form, mostly useful field types.
-
class
cascade_at.core.form.fields.
NativeListField
(*args, **kwargs)[source]¶ Because we already have a ListField for space separated strings which become lists, this field type should be used when the .json config returns a native python list.
-
class
cascade_at.core.form.fields.
FormList
(inner_form_constructor, *args, **kwargs)[source]¶ This represents a homogeneous list of forms. For example, it might be used to contain a list of priors within a smoothing grid.
- Parameters
inner_form_constructor – A factory which produces an instance of a Form
Most often it will just be the Form subclass itself. (subclass.) –
-
class
cascade_at.core.form.fields.
Dummy
(*args, **kwargs)[source]¶ A black hole which consumes all values without error. Use to mark sections of the configuration which have yet to be implemented and should be ignored.
-
validate_and_normalize
(instance, root=None)[source]¶ Validates the data for this field on the given parent instance and transforms the data into it’s normalized form. The actual details of validating and transforming are delegated to subclasses except for checking for missing data which is handled here.
- Parameters
instance (Form) – the instance of the form for which this field should be validated.
root (Form) – pointer back to the base of the form hierarchy.
- Returns
- a list of error messages with path strings
showing where in this object they occurred. For most fields the path will always be empty.
- Return type
[(str, str, str)]
-
-
class
cascade_at.core.form.fields.
OptionField
(options, *args, constructor=<class 'str'>, **kwargs)[source]¶ A field which will only accept values from a predefined set.
- Parameters
options (list) – The list of options to choose from
constructor – A function which takes a string and returns the expected type. Behaves as the constructor for SimpleTypeField. Defaults to str
-
class
cascade_at.core.form.fields.
ListField
(*args, constructor=<class 'str'>, separator=' ', **kwargs)[source]¶ A field which takes a string containing values demarcated by some separator and transforms them into a homogeneous list of items of an expected type.
- Parameters
constructor – A function which takes a string and returns the expected type. Behaves as the constructor for SimpleTypeField. Defaults to str
separator (str) – The string to split by. Defaults to a single space.
Abstract Form¶
This module defines general tools for building validators for messy hierarchical parameter data. It provides a declarative API for creating form validators. It tries to follow conventions from form validation systems in the web application world since that is a very similar problem.
Example
Validators are defined as classes with attributes which correspond to the values they expect to receive. For example, consider this JSON blob:
{“field_a”: “10”, “field_b”: “22.4”, “nested”: {“field_c”: “Some Text”}}
A validator for that document would look like this:
- class NestedValidator(Form):
field_c = SimpleTypeField(str)
- class BlobValidator(Form):
field_a = SimpleTypeField(int) field_b = SimpleTypeField(int) nested = NestedValidator()
And could be used as follows:
>>> form = BlobValidator(json.loads(document))
>>> form.validate_and_normalize()
>>> form.field_a
10
>>> form.nested.field_c
"Some Text"
-
class
cascade_at.core.form.abstract_form.
NoValue
[source]¶ Represents an unset value, which is distinct from None because None may actually appear in input data.
-
class
cascade_at.core.form.abstract_form.
FormComponent
(nullable=False, default=None, display=None, validation_priority=100)[source]¶ Base class for all form components. It bundles up behavior shared by both (sub)Forms and Fields.
Note
FormComponent, Form and Field all make heavy use of the descriptor protocol (https://docs.python.org/3/howto/descriptor.html). That means that the relationship between objects and the data they operate on is more complex than usual. Read up on descriptors, if you aren’t familiar, and pay close attention to how __set__ and __get__ access data.
- Parameters
nullable (bool) – If False then missing data for this node is considered an error. Defaults to False.
default – Default value to return if unset
display (str) – The name used in the EpiViz interface.
validation_priority (int) – Sort order for validation.
-
class
cascade_at.core.form.abstract_form.
Field
(*args, **kwargs)[source]¶ A field within a form. Fields are responsible for validating the data they contain (without respect to data in other fields) and transforming it into a normalized form.
-
validate_and_normalize
(instance, root=None)[source]¶ Validates the data for this field on the given parent instance and transforms the data into it’s normalized form. The actual details of validating and transforming are delegated to subclasses except for checking for missing data which is handled here.
- Parameters
instance (Form) – the instance of the form for which this field should be validated.
root (Form) – pointer back to the base of the form hierarchy.
- Returns
- a list of error messages with path strings
showing where in this object they occurred. For most fields the path will always be empty.
- Return type
[(str, str, str)]
-
-
class
cascade_at.core.form.abstract_form.
SimpleTypeField
(constructor, *args, **kwargs)[source]¶ A field which transforms input data using a constructor function and emits errors if that transformation fails.
In general this is used to convert to simple types like int or float. Because it emits only very simple messages it is not appropriate for cases where the cause of any error isn’t obvious from knowing the name of the constructor function and a string representation of the input value.
- Parameters
constructor – a function which takes one argument and returns a normalized version of that argument. It must raise ValueError, TypeError or OverflowError if transformation is not possible.
-
class
cascade_at.core.form.abstract_form.
Form
(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]¶ The parent class of all forms.
Validation for forms happens in two stages. First all the form’s fields and sub forms are validated. If none of those have errors, then the form is known to be in a consistent state and it’s _full_form_validation method is run to finalize validation. If any field or sub form is invalid then this form’s _full_form_validation method will not be run because the form may be in an inconsistent state.
Simple forms will be valid if all their fields are valid but more complex forms will require additional checks across multiple fields which are handled by _full_form_validation.
Note
A nested form may be marked nullable. It is considered null if all of it’s children are null. If a nullable form is null then it is not an error for non-nullable fields in it to be null. If any of the form’s fields are non-null then the whole form is considered non-null at which point missing data for non-nullable fields becomes an error again.
- Parameters
source (dict) – The input data to parse. If None, it can be supplied later by calling process_source
name_field (str) – If supplied then a field of the same name must be present on the subclass. That field will always have the name of the attribute this class is assigned to in it’s parent rather than the value, if any, that the field had in the input data.
Error-Handling Plans¶
The following is a proposal for how to handle errors. This is not implemented.
Exception-Handling Proposal¶
If we look at the layers of the code, we can handle errors in different ways between and within the layers.
EpiViz is one version of the top of this chain.
At the top, catch all exceptions and return as strings to EpiViz on initial call.
Within processing the settings, any settings that don’t make sense are returned as a list of errors, not through exception-handling.
Below this, assume we are working inside of a UGE job. Failure of one job does not kill all jobs b/c people can use whatever data they get, often times.
The UGE job catches all exceptions and sends them to logs. This includes both random exceptions and exceptions that are about the more complex construction of the model. An example of such an exception is that you created a covariate but never set its reference value.
Within the code to setup the model, throw exceptions from our custom hierarchy when there is something a modeler could do differently.
Dismod-AT errors… maybe these are returned as exceptions?
That would be a hierarchy that looks like:
CascadeModelError (This is the one that catches more complicated model setup faults) and is for the modelers.
Data selection problems.
Algorithm selection problems.
Settings selection problems.
and it is only used within the model construction and serialization, not during checking of settings.
Logging¶
The modelers should be able to see statistical choices, and those can be separate from debugging statements. Those logs would have separate lifetimes on disk, too.
Code log This records regular debugging statements, such as function entry and exit. It is kept on the disk.
Math log This has information about choices the code makes with the data. It is shown to the users in EpiViz. All of the Math log is always kept.
Code Log |
|
Debug |
Up to coder. Will be turned off during production runs. |
Info |
Kept on in production runs. |
Warn |
Kept on in production runs. Any warning that fires requires action to to disable it. |
Error |
Kept on in production runs, and we read all of these. |
Math Log |
|
Debug |
About choices that are built-in to model logic. |
Info |
About choices where a switch decides what to do. |
Warn |
A problem that needs to be fixed, possibly with another run, but it doesn’t make this run completely fail. |
Error |
Has to be addressed in order to complete this Cascade run. |
Mathlog statements should have the following.
Put MATHLOG statements in places where you have context on the data the function affects. This often means the log statement is in the caller.
Include in the log statement summary stats like the number of rows, names of variables, things that inform about this run.
If something was a choice, indicate how a modeler made that choice, and hence how she could unmake it, so refer to the EpiViz selection.
Alec proposes we could construct a hierarchical and narrative MATHLOG which reads, for the modeler, like:
Preparing model:
Downloading input data:
...
Constructing model representation:
Adding mortality data from GBD:
Assigning standard error based on bounds
...
Running dismodat
We could write this as a streamable HTML document.
Faults and Failures¶
Classify failures by the faults that caused them. Highlight to the modeler the ones they can fix.
Model configuration
settings don’t meet needs and can be changed.
bundle values don’t make sense in some way.
IHME Data, maybe modelers know these.
Database doesn’t have data we think it should.
IHME Database not responding or otherwise having a problem.
All the other faults, not possible modelers will fix these.
Logic errors
Environment errors regarding directories, writability.
All errors go to the math log, which also goes to the code log.
Logging Usability¶
Messages to the GUI user should include
The line in the code, with a link to that line in Github.
A link to the exception description in the help on docs.
A link to the function in which exception occurred.
These would require, for the link to Github, knowing
the git commit so that it links to the right line.
For the URL, it would mean having the refs from the objects.inv
file that sphinx makes when it makes the docs. It has
the mapping from Python entity to its URL and tag in the
documentation.
Application Context¶
Each model run needs to have an object that determines the file structure, connections to the IHME databases, etc.
This context can be modified for a local environment, but that’s not currently implemented in an intuitive or user-friendly way. When we want to enable local runs of an entire cascade, this configuration is what we need to do design work on.
Configuration¶
There is an additional repository that stores application information for the IHME configuration.
Context¶
Based on the configuration above, and a model version ID from the epi database, we define a context object that keeps track of database connections and file structures.
-
class
cascade_at.context.model_context.
Context
(model_version_id, make=False, configure_application=True, root_directory=None)[source]¶ Bases:
object
Context for running a model.
- Parameters
model_version_id (
int
) – The model version ID for this context. If you’re not configuring the application, doesn’t matter what this is.make (
bool
) – Whether or not the make the directory tree for the model.configure_application (
bool
) – Configure the production application. If False, this can be used for testing on a local machine.
-
db_file
(location_id, sex_id)[source]¶ Gets the database file for a given location and sex.
- Parameters
location_id (
int
) – Location ID for the database (parent).sex_id (
int
) – Sex ID for the database, as the reference.
- Return type
Path
-
db_index_file_pattern
(location_id, sex_id)[source]¶ Gets the database file pattern for databases with indices. Used in sample simulate when it’s done in parallel.
- Parameters
location_id (
int
) – Location ID for the database (parent).sex_id (
int
) – Sex ID for the database, as the reference.
- Returns
- Return type
String representing the absolute path to the index database.
It also provides methods to read in the three things that are always needed to construct models:
Saver and Uploader¶
The saver model takes results from a Cascade-AT model and saves them in the correct format to the IHME file system and also uploads them to the epi databases.
The results of a Cascade-AT model need to be saved to the IHME epi databases. This module wrangles the draw files from a completed model and uploads summaries to the epi databases for visualization in EpiViz.
Eventually, this module should be replaced by something like save_results_at
.
-
exception
cascade_at.saver.results_handler.
ResultsError
[source]¶ Raised when there is an error with uploading or validating the results.
-
class
cascade_at.saver.results_handler.
ResultsHandler
[source]¶ -
self.
draw_keys
¶ The keys of the draw data frames
-
self.
summary_cols
¶ The columns that need to be present in all summary files
-
summarize_results
(df)[source]¶ Summarizes results from either mean or draw cols to get mean, upper, and lower cols.
- Parameters
df (
DataFrame
) – A data frame with draw columns or just a mean column- Return type
DataFrame
-
save_draw_files
(df, model_version_id, directory, add_summaries)[source]¶ Saves a data frame by location and sex in .csv files. This currently saves the summaries, but when we get save_results working it will save draws and then summaries as part of that.
- Parameters
df (
DataFrame
) –- Data frame with the following columns:
[‘location_id’, ‘year_id’, ‘age_group_id’, ‘sex_id’, ‘measure_id’, ‘mean’ OR ‘draw’]
model_version_id (
int
) – The model version to attach to the datadirectory (
Path
) – Path to save the files toadd_summaries (
bool
) – Save an additional file with summaries to upload
- Return type
None
-
save_summary_files
(df, model_version_id, directory)[source]¶ Saves a data frame with summaries by location and sex in summary.csv files.
- Parameters
df (
DataFrame
) –- Data frame with the following columns:
[‘location_id’, ‘year_id’, ‘age_group_id’, ‘sex_id’, ‘measure_id’, ‘mean’, ‘lower’, and ‘upper’]
model_version_id (
int
) – The model version to attach to the datadirectory (
Path
) – Path to save the files to
- Return type
None
-
static
upload_summaries
(directory, conn_def, table)[source]¶ Uploads results from a directory to the model_estimate_final table in the Epi database specified by the conn_def argument.
In the future, this will probably be replaced by save_results_dismod but we don’t have draws to work with so we’re just uploading summaries for now directly.
- Parameters
directory (
Path
) – Directory where files are savedconn_def (
str
) – Connection to a database to be used with db_tools.ezfuncstable (
str
) – which table to upload to
- Return type
None
-
Dismod-AT Concepts¶
The following pages explain some helpful concepts in Dismod-AT.
Measurement, Rate, Integrand¶
The Dismod-AT program has its own documentation, which serves well for specifics about database tables, definitions of distributions, and other details. This documentation is a high-level view of what Dismod-AT does in order to explain what you can do with the Cascade.
Dismod-AT does statistical estimation. It is a nonlinear, multi-level regression. The two hierarchical levels are the measurements, at the micro level, and the locations, at the macro level.
Measurements are input data from data bundles. Every measurement has a positive, non-zero standard deviation. A measurement may or may not have the same upper and lower age or the same upper and lower time. All measurements are associated with locations.
Dismod-AT’s central feature is that it estimates rates of a disease process. The disease process is nonlinear and described by a differential equation. We can discuss the behavior of that model in detail later. For this differential equation,
Rates go in.
Prevalence and death comes out.
A Rate is incidence, remission, excess mortality, other-cause mortality, or initial prevalence. A rate is a continuous function of age and time. It’s specified as a set of points, and interpolated between those points, but it’s continuous. Even the initial prevalence is continuous across time but defined only for the youngest age. The data associated with rates is defined at points of age and time, so it isn’t associated with age or time ranges. It also doesn’t have standard deviations.
If we think of a typical linear regression,
we can draw an equivalence for Dismod-AT where \(x\) are the covariates, \(b\) are the covariate multipliers, \(\epsilon\) are distributions of priors, a are the rates, and y are the observations. How Dismod-AT connects rates to observations is much more complicated than a typical linear regression.
In order to relate a rate to an observation, Dismod-AT has to do a few steps.
Use the ODE to predict prevalence and death.
Construct a function of rates, prevalence, and death to form the desired observation.
Integrate that function over the requested age and time range to get a single value for the observation.
Integrands are outputs from Dismod-AT that are predictions of either measurements or rates. Because studies observe participants with ranges of ages over periods of time, they are generally associated with the integral of the continuous rates underlying the disease process. For this reason, Dismod-AT calls its predictions of observations integrands. It supports a wide variety of integrands.
Flow of Commands in Dismod-AT¶
There are a few different ways to use Dismod-AT to examine data. They correspond to different sequences of Dismod-AT commands.
Stream Out Prevalence The simplest use of Dismod-AT is to ask it to run the ordinary differential equation on known rates and produce prevalence, death, and integrands derived from these.
Precondition Provide known values for all rates over the whole domain. List the integrands desired for the output.
Run predict on those rates.
Postcondition Dismod-AT places any requested integrands in its predict table. These can be rates, prevalence, death, or any of the integrands.
Simple Fit to a Dataset This describes a fit with the simplest way to determine uncertainty.
Precondition The input data is observations, with standard deviations, of any of the known integrands.
Run fit on those observations to produce rates and covariate multipliers.
Run predict on the rates to produce integrands.
Postcondition Integrands are in the predict table.
Fit with Asymptotic Uncertainty This fit produces some values of uncertainty.
Precondition The input data is observations, with standard deviations, of any of the known integrands.
Run fit on those observations to produce rates and covariate multipliers.
Run sample asymptotic.
Postcondition Integrands are in the predict table.
Fit with Simulated Uncertainty This uses multiple predictions in order to obtain a better estimate of uncertainty.
Precondition The input data is observations, with standard deviations, of any of the known integrands.
Run fit on those observations to produce rates and covariate multipliers.
Run simulate to generate simulations of measurements data and priors.
Run sample simulate.
Postcondition Integrands are in the predict table.
Smoothing Continuous Functions¶
We said that rates and covariate multipliers are continuous functions of age and time. It takes a little work to parametrize an interpolated function of age and time.
You have to tell it where the control points are. In Cascade, we call this the
AgeTimeGrid
. It’s a list of ages and a list of times that define a rectangular grid.At each of the control points of the age time grid, Dismod-AT will evaluate how close the rate or covariate multiplier is to some reference value. At these points, we define prior distributions. Cascade makes these value priors part of the
PriorGrid
.It’s rare to have data points that are dense across all of age and time. Dismod-AT needs to take a data point at one end, a data point at the other end, and draw a line that connects them. We help it by introducing constraints on how quickly a value can change over age and time. These are a kind of regularization of the problem, called age-time difference priors. They apply to the difference in value between one age-time point and the next greater in age and the next-greater in time. As with value priors, these are specified in the Cascade as part of the
PriorGrid
.
The random effect for locations is also a continuous quantity.
Hierarchical Model¶
The hierarchical part of Dismod-AT does one thing, estimate how locations affect rates. If the rate at grid point \((i,k)\) is \(q_{ik}(a,t)\), and the covariate multiplier is \(\alpha_{ik}(a,t)\), then the adjusted rate is
The offset, \(u\), is linear with the covariates, but it is inside the exponential, which guarantees that all rates remain positive. This offset is the only random effect in the problem, and it is called the child rate effect because each location, or node in Dismod-AT’s language, is considered a child of a parent.
Because the child rate effect is continuous, you can conclude that it must be defined on a smoothing grid. Dismod-AT will either define one smoothing grid for each child rate effect (one for each of the five rates) or let you define a smoothing grid for every location and every child rate effect, should that be necessary.
Model Variables - The Unknowns¶
When we ask Dismod-AT to do a fit, what unknowns will it solve for? If we do a fit to a linear regression, \(y ~ b_0 + b_1 x\), then it tells us the parameters \(b_i\). It also tells us the uncertainty, as determined by residuals between predicted and actual \(y\). In the case of Dismod-AT, the model variables are equivalent to those parameters \(b_i\). Dismod-AT documentation lists all of the model variables, but let’s cover the most common ones here.
First are the five disease rates, which are inputs to the ODE. Each rate is a continuous function of age and time, specified by an interpolation among points on an age-time grid. Therefore, the model variables from a rate are its value at each of the age-time points.
The covariate multipliers are also continuous functions of age and time. Each of the covariate multipliers has model variables for every point in its smoothing. There can be a covariate multiplier for each combination of covariate column and application to rate value, measurement value, or measurement standard deviation, so that’s a possible \(3c\) covariate multipliers, where \(c\) is the number of covariate columns.
The child rate effects also are variables. Because there is one for each location, and there is a smoothing grid for child rate effects, this creates many model variables.
Covariates¶
Covariates are the independent variables in the statistical model. They appear as columns in observation data, associated with each measurement. The word covariate is overdetermined, so we will refer to a covariate column, a covariate use, a covariate multiplier, and applying a covariate.
A covariate column has a unique name and a reference value for which the observed data is considered unadjusted. All priors on covariates are with respect to this unadjusted value.
Outliering by Covariates¶
Each covariate column has an optional maximum difference to set. If the covariate is beyond the maximum difference from its reference value, then the data point is outliered. As a consequence, that data point will not be in the data subset table. Nor will it subsequently appear in the avgint table.
If there is a need to use two different references or maximum differences for the same covariate column, then duplicate the column.
Usage¶
Covariate data is columns in the input DataFrame and in the average integrand DataFrame. Let’s not discuss here how to obtain this covariate data, but discuss what Dismod-AT needs to know about those covariate columns in order to use it for a fit.
In order to use a covariate column as a country covariate, specify
its reference value
an optional maximum difference, beyond which covariate value the data which it predicts will be considered an outlier,
one of the five rates (iota, rho, chi, omega, pini), to which it will apply
a smoothing grid, as a support on which the covariate effect is solved. This grid defines a mean prior and elastic priors on age and time, as usual for smoothing grids.
We give Dismod-AT measured data with associated covariates. Dismod-AT treats the covariates as a continuous function of age and time, which we call the covariate multiplier. It solves for that continuous function, much like it solves for the rates. Therefore, each application of a covariate column to a rate or measured value or standard deviation requires a smoothing grid.
Applying a study covariate is much the same, except that it usually applies not to a rate but to the value or standard deviation of an integrand.
For instance:
# Assume smooth = Smooth() exists.
income = Covariate("income", 1000)
income_cov = CovariateMultiplier(income, smooth)
model.rates.iota.covariate_multipliers.append(income)
model.outputs.integrands.prevalence.value_covariate_multipliers.append(income)
model.outputs.integrands.prevalence.std_covariate_multipliers.append(income)
Covariates are unique combinations of the covariate column, and the rate or measured value or standard deviation, so they can be accessed that way.
Missing Values¶
Were a covariate value to be missing, Dismod-AT would assume it has the reference value. In this sense, every measurement always has a covariate. Therefore, the interface requires every measurement explicitly have every covariate.
Hazard Rates¶
The hazard rate is defined first for an individual:
A hazard rate is the probability, per unit time, that an event will happen given that it has not yet happened.
For a population, the hazard rate is the sum of the hazard rates for all individuals in that population. For instance, the remission rate, as a function of age, averages over all the different times someone may have entered the with-condition state.
The Dismod-AT compartmental model has four Dismod-AT primary rates, all of which are hazard rates,
Susceptible Incidence rate, \(\iota\)
Remission rate, \(\rho\)
Excess mortality rate, \(\chi\)
Other-cause mortality rate, \(\omega\)
and an initial condition, birth prevalence, \(p_{ini}\). We call the primary rates hazard rates because they are the probability per unit time that an individual, age \(x\), moves from one compartment to another, given that they have not yet left their current compartment. Note that birth prevalence for a cohort is, when we look at it across years, a birth rate. That is why you will see birth prevalence called one of the Dismod-AT primary rates.
These primary rates are exactly the parameters in the Dismod-AT differential equation,
where \(S(x)\) are susceptibles as a function of cohort age and \(C(x)\) are with-condition as a function of cohort age.
S-Incidence and T-Incidence¶
We distinguish susceptible incidence rate from total incidence rate. These are also called s-incidence and t-incidence. Total incidence rate is the number of new observations of a disease per person in the population, where both people with and without the disease are counted. Because hazard rates are the probablity per unit time of a transition given that the transition has not happened, we wouldn’t call t-incidence a hazard rate because it includes people for whom the transition to the disease state has already happened. Both, however, can be population rates.
Population Rates¶
Measurements of a population count events that happen to some set of people. They take the form
Different measurements have different denominators, and those denominators become weight functions in Dismod-AT. If you get the weight function wrong, then you get the comparison from hazard rates to population measurements wrong. This section lists various measurements and their denominators. People in the \(S\) state are exposed to incidence and death. People in the \(C\) state are exposed to remission and death.
Some population rates are estimates of hazard rates. The population rate for s-incidence is an estimate of a hazard rate. As the age-extent and time-extent for the measurement gets closer to a point estimate, the population rate and the hazard rate become the same value.
We can be exact about the relationship between population rates and hazard rates by following the example of mortality rate in Preston’s Demography. The mortality rate is
where \(l(x)=S(x)+C(x)\) is the remaining fraction of those alive and \(\mu(x)\) is the total mortality rate. The numerator in that equation is the age-specific death rate and the denominator is the exposure, as person-years lived, or \({}_nL_x\). When we look at these numbers over age and time, instead of over cohort age, \(x\), the integral changes to
Let’s not write out the double-integral for all examples below, but Dismod-AT does perform its integration over both age and time. Instead, write the following short-hand,
for the integral.
Similarly, the population susceptible incidence rate is
The population remission rate has the same problem as the incidence, in that it can be counted as a percentage of those with condition who remit or a percentage of the population that remits. If we consider the remission hazard rate, which is the former, then it is
Note
We could define a t-remission as
but we don’t. Is that because all remission is of one type or another? Which type?
The population excess mortality rate is
Other-cause mortality is just like mortality, but only for susceptibles,
The population rate for mtall
and mtspecific
both use \(S(x)+C(x)\) as their
weight. The same is true of standardized mortality ratio and relative risk.
Note
Dismod-AT expects the user to provide weight functions. The GBD provides weight functions, which should correspond to \(S(x)+C(x)\). These should also be close enough for \(S(x)\). It would make sense to create and refine the weight corresponding to \(C(x)\) as we solve down the location hierarchy.
Crude Population Rates¶
Dismod-AT works with life table rates, not crude rates. A crude rate is the number of deaths divided by the number of people exposed to that event. If \(k(t)\) is the birth rate over time, then a crude mortality rate is
The life table rate adjusts the crude rate to remove the effect of varying birth rates. In Dismod-AT, the birth rate is normalized to a rate of 1 for all populations. In demographic textbooks, \({}_nm_x\) is called the lifetable mortality rate, and \({}_nM_x\) is called the crude mortality rate.
Note
The bundles aggregate measurements from many sources. Do they use crude population rates or lifetable population rates?
This matters when there is a birth pulse that skews data towards younger or older sides of an age interval. Dismod-AT assumes that the average over an age interval is determined by the lifetable person-years lived.
Testing¶
Running Tests¶
Running Unit Tests¶
Unit tests can run from either the root directory or tests
subdirectory
using pytest
. Note the following useful options for pytest. The first
couple are custom flags we created.
pytest --ihme
This is a flag we created that enables those tests which we would run within the IHME environment. If you write a test that calls IHME databases, you must include the ihme fixture in order for that unit test to run. This guarantees that when Jenkins runs without the –ihme flag, none of the tests it runs require the IHME databases.
pytest --dismod
This is a flag we created that enables those tests which require having a command-line Dismod-AT running. Usingihme
turns ondismod
.
pytest --signals
This is a flag we created that enables those tests which turns off tests that send UNIX signals to test failure modes. It’s useful on the Mac, which helpfully offers to inform Apple of application failure.
The rest are standard options, but they are so important that I’m listing them here.
pytest ---log-cli-level debug
Captures log messages nicely within unit tests.
pytest --pdb
This flag asks to drop into the debugger when an exception happens in a unit test. Very helpful when using tests for test-driven development.
pytest --capture=no
This allows stdout, stderr, and logging to be printed when running tests.
pytest -k partial_name
This picks out all tests whose names contain the letters “partial_name”.
pytest --lf
Run the last set of failing tests.
pytest --ff
Run the last set of failing tests, and then run the rest of the tests.
pytest -x
Die on the first failure.
In order to make a test that relies on IHME databases, use the global fixture
called ihme
:
def test_get_ages(ihme):
gbd_round = 12
ages = get_gbd_age_groups(gbd_round)
This test will automatically be disabled until the --ihme
flag is on.
Running Acceptance Tests¶
There is a separate directory for acceptance tests. It’s called acceptance
in the unit directory. Here, too, run pytest
, but it will take longer
and do thread tests, which are tests from one interaction to a response.
Unit and acceptance tests are run with the --ihme
flag
turned on, just before the end of installation. If they fail, then
installation fails. Be sure to run unit tests on the cluster
with --ihme
, even if they pass in Tox, which runs a subset
of tests.
Structure of Tests¶
Testing structure follows the component structure of the code, but there are a few tests that outweigh others in importance because they are system integration tests. If we look at the larger architectural parts, those system integration tests mock out different pieces. The larger architectural parts are:
Main success scenario (MSS), that does a fit and simulate with Dismod-AT.
Input data of various kinds
Bundle data records
IHME databases of mortality.
EpiViz-AT settings.
Interface with the Dismod-AT db file
Main Success Scenario¶
There is a single file that runs the core set of steps for the wrapper, using no inputs from external sources. It does the first two of these steps. As we work through the main success scenario, we should make it do all of the steps.
Generate input data with Dismod-AT predict.
Fit that data.
Generate simulations.
Fit those simulations.
Summarize simulation outputs.
Create posterior data.
It’s in tests/model/main_success_scenario.py
.
It’s set up to run through different types of models
and different combinations of input parameters. It does
a fractional-factorial experiment on those parameters, working
up to seeing how two parameters interact, and whether the
code still runs.
This same script generates files with timings on how long it takes Dismod-AT to do a fit, for a given set of parameters and data.
Test Settings Parsing¶
This mocks the creation of EpiViz-AT settings and
then runs stochastically-generated settings through
the builder for models, all the way to writing a Dismod-AT db file.
It’s in test_construct_model.py
Live Tests against Database¶
These use a real MVID, and pull settings and data
for it in order to build a database, in test_estimate_locations.py
.
Testing the Dismod-AT DB File Interface¶
These tests skip any IHME database interaction.
The redo the extensive tests included with Dismod-AT,
but they do it using the internal interface.
This is what tells us our internal interface works.
In test_dismod_examples.py
.
What’s Missing¶
There should be a test that creates settings and input data, and runs completely through the main scenario. This would save us from waiting for the IHME databases to send data and would exercise the later part of the main success scenario, which isn’t covered enough yet.