Cascade-AT

Cascade-AT estimates incidence, prevalence, and mortality rates for a single disease for every country or district in the world from the sum of all available disease and death observations in all countries.

The main function of this page is to document the API, and to explain some of the Dismod-AT concepts.

This documentation is meant for developers, not modelers. If you are a DisMod-AT modeler, please see the internal documentation here.

Installation

Installation of Cascade-AT

Cascade-AT interacts with Dismod-AT underneath. Cascade-AT runs Dismod-AT within the IHME infrastructure. Clone it from Cascade-AT on Github.

We recommend you create a conda environment into which to install the code. Then clone the repository and run the tests.:

git clone https://github.com/ihmeuw/cascade-at.git
cd cascade
pip install . [ihme, docs]
python setup.py develop
cd tests && pytest

NOTE: [ihme, docs] are optional arguments.

NOTE: The above code is intended for installing on the cluster. When working on a local machine, the correct step is to replace ‘python setup.py develop’ with python setup.py install

For instructions on how to install all of the IHME dependencies, see the internal documentation here.

For instructions on how to install dismod_at, see Brad Bell’s documentation here.

Module Documentation

This is the documentation for each of the modules in the cascade-at package. Here is an overview for how all of the modules fit together:

_images/overview.png

Defining and Sequencing the Work

Cascade-AT performs work by calling scripts that are included as entry points to the package.

Scripts as Building Blocks

The work in a Cascade-AT model is sequenced like building blocks. Here is how the work starts out, moving from micro to macro.

The following submodules contain classes and functions for constructing a job graph that runs Dismod-AT. The smallest is a cascade operation, which defines one executable task. These can be stacked together into sequences (stacks), and then recursively put into a tree structure (dags). The cascade commands are wrappers around the dags.

To see documentation for the current “traditional cascade” that is implemented, see TraditionalCascade.

Cascade Operations
class cascade_at.cascade.cascade_operations._CascadeOperation(upstream_commands=None, executor_parameters=None)[source]

Bases: object

The base class for a cascade operation.

Parameters
  • upstream_commands (Optional[List[str]]) – A list of commands that are upstream to this operation. This means that it will be run before this operation.

  • executor_parameters (Optional[Dict[str, Any]]) – Optional dictionary of execution parameters that updates the execution parameters DEFAULT_EXECUTOR_PARAMETERS

static _script()[source]
_make_template()[source]
_make_command(**kwargs)[source]
_make_name()[source]
_validate(**kwargs)[source]
_make_template_kwargs(**kwargs)[source]

Takes kwargs like model_version_id=0 and turns it into kwargs dict that looks like {‘model_version_id’: –model-version-id 0}.

For boolean args, it will look like {‘do_this’: ‘–do-this’}. And for arguments from self.arg_list that have defaults, it will fill in the default value if it is not passed in the kwargs (unless it’s None).

Used for converting things into Jobmon TaskTemplates.

Parameters

kwargs – Keyword arguments

Return type

Dict[str, str]

Returns

  • Dictionary of keyword arguments similar to what was passed but with

  • values that have been converted to what the TaskTemplate in Jobmon expects. Also

  • filling in default arguments that are not passed but are listed in

  • the ArgumentList for self.

_configure(**command_args)[source]

Validates the keyword arguments passed in and creates the command, job name, and task template kwargs.

Parameters

command_args – Keyword arguments to be passed to the cascade operation

Return type

None

Cascade Operation Sequences
Cascade Operation Stacking Functions

These functions make sequences of _CascadeOperation and the appropriate upstream dependencies. They can then be used together to create a _CascadeCommand.

cascade_at.cascade.cascade_stacks.single_fit(model_version_id, location_id, sex_id)[source]

Create a sequence of tasks to do a single fit both model. Configures inputs, does a fit fixed, then fit both, then predict and uploads the result. Will fit the model based on the settings attached to the model version ID.

Parameters
  • model_version_id (int) – The model version ID.

  • location_id (int) – The parent location ID to run the model for.

  • sex_id (int) – The sex ID to run the model for.

Returns

Return type

List of CascadeOperations.

cascade_at.cascade.cascade_stacks.single_fit_with_uncertainty(model_version_id, location_id, sex_id, n_sim=100, n_pool=20, skip_configure=False, ode_fit_strategy=True)[source]

Create a sequence of tasks to do a single fit both model. Configures inputs, does a fit fixed, then fit both, then predict and uploads the result. Will fit the model based on the settings attached to the model version ID.

Parameters
  • model_version_id (int) – The model version ID.

  • location_id (int) – The parent location ID to run the model for.

  • sex_id (int) – The sex ID to run the model for.

  • n_sim (int) – The number of simulations to do, number of draws to make

  • n_pool (int) – The number of multiprocessing pools to use in creating the draws

Returns

Return type

List of CascadeOperations.

cascade_at.cascade.cascade_stacks.root_fit(model_version_id, location_id, sex_id, child_locations, child_sexes, skip_configure=False, mulcov_stats=True, n_sim=100, n_pool=20, ode_fit_strategy=True)[source]

Create a sequence of tasks to do a top-level prior fit. Does a fit fixed, then fit both, then creates posteriors that can be used as priors later on. Saves its fit to be uploaded.

Parameters
  • model_version_id (int) – The model version ID.

  • location_id (int) – The parent location ID to run the model for.

  • sex_id (int) – The sex ID to run the model for.

  • child_locations (List[int]) – The children to fill the avgint table with

  • child_sexes (List[int]) – The sexes to predict for.

  • skip_configure (bool) – Don’t run a task to configure the inputs. Only do this if it has already happened. This disables building the inputs.p and setting.json files.

  • mulcov_stats (bool) – Compute mulcov statistics at this level

  • n_sim (int) –

  • n_pool (int) –

Returns

Return type

List of CascadeOperations.

cascade_at.cascade.cascade_stacks.branch_fit(model_version_id, location_id, sex_id, prior_parent, prior_sex, child_locations, child_sexes, upstream_commands=None, n_sim=100, n_pool=20, ode_fit_strategy=False)[source]

Create a sequence of tasks to do a cascade fit (mid-level). Does a fit fixed, then fit both, predicts on the prior rate grid to create posteriors that can be used as priors later on. Saves its fit to be uploaded.

Parameters
  • model_version_id (int) – The model version ID.

  • location_id (int) – The parent location ID to run the model for.

  • sex_id (int) – The sex ID to run the model for.

  • prior_parent (int) – The location ID corresponding to a database to pull the prior from

  • prior_sex (int) – The sex ID corresponding to a database to pull the prior from

  • child_locations (List[int]) – The children to fill the avgint table with

  • child_sexes (List[int]) – The sexes to predict for.

  • upstream_commands (Optional[List[str]]) – Commands that need to be run before this stack.

Returns

Return type

List of CascadeOperations.

cascade_at.cascade.cascade_stacks.leaf_fit(model_version_id, location_id, sex_id, prior_parent, prior_sex, n_sim=100, n_pool=20, upstream_commands=None, ode_fit_strategy=False)[source]

Create a sequence of tasks to do a for a leaf-node fit, no children. Does a fit fixed then sample simulate to create posteriors. Saves its fit to be uploaded.

Parameters
  • model_version_id (int) – The model version ID.

  • location_id (int) – The parent location ID to run the model for.

  • sex_id (int) – The sex ID to run the model for.

  • prior_parent (int) – The location ID corresponding to a database to pull the prior from

  • prior_sex (int) – The sex ID corresponding to a database to pull the prior from

  • n_sim (int) – The number of simulations to do to get the posterior fit.

  • n_pool (int) – The number of pools to use to do the simulation fits.

  • upstream_commands (Optional[List[str]]) – Commands that need to be run before this stack.

Returns

Return type

List of CascadeOperations.

Cascade Job Graphs
cascade_at.cascade.cascade_dags.branch_or_leaf(dag, location_id, sex, model_version_id, parent_location, parent_sex, n_sim, n_pool, upstream, tasks)[source]

Recursive function that either creates a branch (by calling itself) or a leaf fit depending on whether or not it is at a terminal node. Determines if it’s at a terminal node using the dag.successors() method from networkx. Appends tasks onto the tasks parameter.

cascade_at.cascade.cascade_dags.make_cascade_dag(model_version_id, dag, location_start, sex_start, split_sex, n_sim=100, n_pool=100, skip_configure=False)[source]

Make a traditional cascade dag for a model version. Relies on a location DAG and a starting point in the DAG for locations and sexes.

Parameters
  • model_version_id (int) – Model version ID

  • dag (LocationDAG) – A location DAG that specifies the location hierarchy

  • location_start (int) – Where to start in the location hierarchy

  • sex_start (int) – Which sex to start with, can be most detailed or both.

  • split_sex (bool) – Whether or not to split sex into most detailed. If not, then will just stay at ‘both’ sex.

  • n_sim (int) – Number of simulations to do in sample simulate

  • n_pool (int) – Number of multiprocessing pools to create during sample simulate

  • skip_configure (bool) – Don’t configure inputs. Only do this if it’s already been done.

Returns

Return type

List of _CascadeOperation.

Cascade Commands
Cascade Commands

Sequences of cascade operations that work together to create a cascade command that will run the whole cascade (or a drill – which is a version of the cascade).

class cascade_at.cascade.cascade_commands._CascadeCommand[source]

Bases: object

Initializes a task dictionary. All tasks added to this command in the form of cascade operations are added to the dictionary.

self.task_dict

A dictionary of cascade operations, keyed by the command for that operation. This is so that we can look up the task later by the exact command.

add_task(cascade_operation)[source]

Adds a cascade operation to the task dictionary.

Parameters

cascade_operation (_CascadeOperation) – A cascade operation to add to the command dictionary

Return type

None

get_commands()[source]

Gets a list of commands in sequence so that you can run them without using jobmon.

Returns

Return type

Returns a list of commands that you can run on the command-line.

class cascade_at.cascade.cascade_commands.Drill(model_version_id, drill_parent_location_id, drill_sex, n_sim, n_pool=10, skip_configure=False)[source]

Bases: cascade_at.cascade.cascade_commands._CascadeCommand

A cascade command that runs a drill model, meaning that it runs one Dismod-AT model with a parent plus its children.

Parameters
  • model_version_id (int) – The model version ID to create the drill for

  • drill_parent_location_id (int) – The parent location ID to start the drill from

  • drill_sex (int) – Which sex to drill for

  • n_sim (int) – The number of simulations to do to get uncertainty at the leaf nodes

  • n_pool (int) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.

class cascade_at.cascade.cascade_commands.TraditionalCascade(model_version_id, split_sex, dag, n_sim, n_pool=10, location_start=None, sex=None, skip_configure=False)[source]

Bases: cascade_at.cascade.cascade_commands._CascadeCommand

Runs the “traditional” dismod cascade. The traditional cascade as implemented here runs fit fixed all the way to the leaf nodes of the cascade to save time (rather than fit both). To get posterior to prior it uses the coefficient of variation to get the variance of the posterior that becomes the prior at the next level. At the leaf nodes to get final posteriors, it does sample asymptotic. If sample asymptotic fails due to bad constraints it does sample simulate instead.

Parameters
  • model_version_id (int) – The model version ID

  • split_sex (bool) – Whether or not to split sex

  • dag (LocationDAG) – A location dag that specifies the structure of the cascade hierarchy

  • n_sim (int) – The number of simulations to do to get uncertainty at the leaf nodes

  • n_pool (int) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.

  • location_start (Optional[int]) – Which location to start the cascade from (typically 1 = Global)

  • sex (Optional[int]) – Which sex to run the cascade for (if it’s 3 = Both, then it will split sex, if it’s 1 or 2, then it will only run it for that sex.

  • skip_configure (bool) – Use this option to skip the initial inputs pulling; should only be used in debugging cases by developers.

Each of these is described briefly below.

  • Scripts: All potential work starts out as a script.

  • Cascade Operations: We start with building wrappers around the scripts, and we call these wrappers cascade operations. These wrappers are helpful because they define the command-line string that will be executed in order to perform the work by calling the script with particular arguments, the name of the job if it is submitted through a qsub, etc. They also directly interface with jobmon, an IHME package that submits and tracks parallel jobs.

  • Cascade Operation Sequences: There are some sequences of work that often go together, for example like running a fit fixed, then a sample, then a predict. These types of sequences are called stacks, because they are “stacks” of cascade operations.

  • Cascade Job Graphs: Once we take many sequences and form them into a tree-like structure that traverses a location hierarchy, that’s called a DAG or a job graph. The structure of this DAG is based off of an IHME location hierarchy, and it defines the work for the entire cascade. The DAGs module provides functions to, for example, recursively create stacks going down a tree.

  • Cascade Commands: This is the most “macro” type of work. You say, “I want to do a cascade” or “I want to do a drill” by creating a cascade command, and then it works its way through DAGs –> Stacks –> Operations –> Scripts to define all of the work, with arguments based off of the model version ID’s settings that you pass to the cascade command.

Arguments

Each of the scripts takes some arguments that are pre-defined using the tools documented in Argument Parsing.

Argument Parsing

Each of the scripts from Defining and Sequencing the Work uses argument utilities that are described here. Arguments are single command like args, that would be passed in something like --do-this-thing or --location-id 101 as flags. We use the argparse package to interpret these arguments and to define which arguments are allowed for which scripts.

Arguments are building blocks for argument lists. Each script has an argument list that defines the arguments that can be passed to it that’s included at the top of the script.

Arguments

There are general arguments and specific arguments that we define here so we don’t have to use them over and over.

exception cascade_at.executor.args.args.CascadeArgError[source]

Bases: cascade_at.core.CascadeATError

exception cascade_at.executor.args.args.StaticArgError[source]

Bases: cascade_at.executor.args.args.CascadeArgError

class cascade_at.executor.args.args.IntArg(*args, **kwargs)[source]

Bases: cascade_at.executor.args.args._Argument

An integer argument.

class cascade_at.executor.args.args.FloatArg(*args, **kwargs)[source]

Bases: cascade_at.executor.args.args._Argument

A float argument.

class cascade_at.executor.args.args.StrArg(*args, **kwargs)[source]

Bases: cascade_at.executor.args.args._Argument

A string argument.

class cascade_at.executor.args.args.BoolArg(*args, **kwargs)[source]

Bases: cascade_at.executor.args.args._Argument

A boolean argument.

class cascade_at.executor.args.args.ListArg(*args, **kwargs)[source]

Bases: cascade_at.executor.args.args._Argument

A list argument. Passed in as an nargs + type of argument to argparse.

class cascade_at.executor.args.args.ModelVersionID[source]

Bases: cascade_at.executor.args.args.IntArg

The Model Version ID argument is the only task argument, meaning an argument that makes the commands that it is used in unique across workflows.

class cascade_at.executor.args.args.ParentLocationID[source]

Bases: cascade_at.executor.args.args.IntArg

A parent location ID argument.

class cascade_at.executor.args.args.SexID[source]

Bases: cascade_at.executor.args.args.IntArg

A sex ID argument.

class cascade_at.executor.args.args.DmCommands[source]

Bases: cascade_at.executor.args.args.ListArg

A dismod commands argument, based off of the list argument.

class cascade_at.executor.args.args.DmOptions[source]

Bases: cascade_at.executor.args.args.ListArg

A dismod options argument, based off of the list argument. Arguments need to be passed in as a list, but then look like KEY=VALUE=TYPE. So, if you wanted the options to look like this {'kind': 'random'}, you would pass on the command-line kind=random=str.

class cascade_at.executor.args.args.NSim[source]

Bases: cascade_at.executor.args.args.IntArg

Number of simulations argument. Defaults to 1.

class cascade_at.executor.args.args.NPool[source]

Bases: cascade_at.executor.args.args.IntArg

Number of threads for a multiprocessing pool argument, defaults to 1, which is no multiprocessing.

class cascade_at.executor.args.args.LogLevel[source]

Bases: cascade_at.executor.args.args.StrArg

Logging level argument. Defaults to “info”.

Argument List

Argument lists are made up of arguments, and are defined at the top of each of the Defining and Sequencing the Work scripts. The reason that they’re helpful is because we can then use those lists to parse command line arguments and at the same time use them to validate arguments in Cascade Operations. This makes building new cascade operations much less error-prone. It also has a method to convert an argument list into a task template command for Utilizing Jobmon.

class cascade_at.executor.args.arg_utils.ArgumentList(arg_list)[source]

A class that does operations on a list of arguments.

Parameters

arg_list (List[_Argument]) –

Argument Encoding

When we are defining arguments to an operation, we don’t want to write as if we were writing something on the command line, especially with things like dictionaries and lists of dismod database commands.

The following functions are helpful for encoding and decoding dismod option dictionaries to be used with the dismod database and dismod commands to run on a dismod database.

cascade_at.executor.args.arg_utils.encode_options(options)[source]

Encode an option dict into a command line string that cascade_at can understand.

Returns

Return type

List of strings that can be passed to the command line..

cascade_at.executor.args.arg_utils.parse_options(option_list)[source]

Parse a key=value=type command line arg that comes in a list.

Returns

Return type

Dictionary of options with the correct types.

cascade_at.executor.args.arg_utils.encode_commands(command_list)[source]

Encode the commands to a DisMod database so they can be passed to the command line.

Return type

List[str]

cascade_at.executor.args.arg_utils.parse_commands(command_list)[source]

Parse the dismod commands that come from command line arguments in a list.

Returns

Return type

list of commands that dismod can understand

Jobmon

The submitting and tracking of the distributed jobs to do a cascade is done by the IHME package jobmon. Cascade Operations are roughly jobmon tasks and Cascade Commands are roughly jobmon workflows.

We have to convert between cascade operations and tasks and cascade commands and workflows. Helper functions to do these conversions are documented in Utilizing Jobmon.

Jobmon uses information from cascade operations and cascade commands to interface directly with the IHME cluster and the Jobmon databases. See Utilizing Jobmon.

Utilizing Jobmon

Unfortunately, we can’t document these functions because jobmon is not yet open source and the sphinx-autodoc extension won’t work. To be continued once it’s released… but for now please see the source code directly here.

Jobmon Workflows

At the highest level, we need to make a workflow from a Cascade Commands. This utilizes the Jobmon Guppy version, which allows us to create “task templates”. In the Guppy terminology, a Cascade-AT workflow is considered to come from a dismod-at “tool”.

Resources

Using jobmon requires some knowledge of the amount of cluster resources that a job will use. Right now, there is no resource prediction algorithm implemented in Cascade-AT. The base resources are the same for all jobs, and then some are increased or decreased depending on the specific task, as options passed to _CascadeOperation.

Entry Points

Each of these scripts takes arguments, defined at the top of the scripts. Here we list the different types of work that are done, and in each section are three things:

  1. The main function in the script, with documentation

  2. The cascade operation associated with that script

They are listed in the order that they typically occur to run a Cascade-AT model from start to finish, with the exception of Run a Cascade-AT Model, which is how all of this work is kicked off in the first place.

Run a Cascade-AT Model

Run a Cascade-AT model from start to finish using the run cascade function. All of the tasks that it constructs can be found in each of the scripts linked to in Defining and Sequencing the Work.

cascade_at.executor.run.run(model_version_id, jobmon=True, make=True, n_sim=10, n_pool=10, addl_workflow_args=None, skip_configure=False, json_file=None, test_dir=None, execute_dag=True)[source]

Runs the whole cascade or drill for a model version (whichever one is specified in the model version settings).

Creates a cascade command and a bunch of cascade operations based on the model version settings. More information on this structure is in Defining and Sequencing the Work.

Parameters
  • model_version_id (int) – The model version to run

  • jobmon (bool) – Whether or not to use Jobmon. If not using Jobmon, executes the commands in sequence in this session.

  • make (bool) – Whether or not to make the directory structure for the databases, inputs, and outputs.

  • n_sim (int) – Number of simulations to do going down the cascade

  • addl_workflow_args (Optional[str]) – Additional workflow args to add to the jobmon workflow name so that it is unique if you’re testing

  • skip_configure (bool) – Skip configuring the inputs because

Return type

None

Configure Inputs

Configure inputs for a Cascade-AT model.

Inputs Script
cascade_at.executor.configure_inputs.configure_inputs(model_version_id, make, configure, test_dir=None, json_file=None)[source]

Grabs the inputs for a specific model version ID, sets up the folder structure, and pickles the inputs object plus writes the settings json for use later on. Also uploads CSMR to the database attached to the model version, if applicable.

Optionally use a json file for settings instead of a model version ID’s json file.

Parameters
  • model_version_id (int) – The model version ID to configure inputs for

  • make (bool) – Whether or not to make the directory structure for the model version ID

  • configure (bool) – Configure the application for the IHME cluster, otherwise will use the test_dir for the directory tree instead.

  • test_dir (Optional[str]) – A test directory to use rather than the directory specified by the model version context in the IHME file system.

  • json_file (Optional[str]) – An optional filepath pointing to a different json than is attached to the model_version_id. Will use this instead for settings.

Return type

None

Inputs Cascade Operation
class cascade_at.cascade.cascade_operations.ConfigureInputs(model_version_id, **kwargs)[source]

Configure the inputs for a model version ID.

Parameters

model_version_id (int) – The model version to configure inputs for

Dismod Database Creation and Commands

When we want to fill a dismod database with some data for a model, and then run some commands on it, this is the script that we use.

We fill and extract dismod databases using Fill and Extract Helpers classes and functions. Then the databases are filled according to their settings and the arguments passed to these scripts, like whether to override the prior in the settings with a parent prior (this is called “posterior to prior”) or whether to add a covariate multiplier prior.

Dismod Database Script
cascade_at.executor.dismod_db.dismod_db(model_version_id, parent_location_id, sex_id=None, dm_commands=[], dm_options={}, prior_samples=False, prior_parent=None, prior_sex=None, prior_mulcov_model_version_id=None, test_dir=None, fill=False, save_fit=True, save_prior=True)[source]

Creates a dismod database using the saved inputs and the file structure specified in the context. Alternatively it will skip the filling stage and move straight to the command stage if you don’t pass –fill.

Then runs an optional set of commands on the database passed in the –commands argument.

Also passes an optional argument –options as a dictionary to the dismod database to fill/modify the options table.

Parameters
  • model_version_id (int) – The model version ID

  • parent_location_id (int) – The parent location for the database

  • sex_id (Optional[int]) – The parent sex for the database

  • dm_commands (List[str]) – A list of commands to pass to the run_dismod_commands function, executed directly on the dismod database

  • dm_options (Dict[str, Union[int, float, str]]) – A dictionary of options to pass to the the dismod option table

  • prior_samples (bool) – Whether the prior was derived from samples or not

  • prior_mulcov_model_version_id (Optional[int]) – The model version ID to use for pulling covariate multiplier statistics as priors for this fit

  • prior_parent (Optional[int]) – An optional parent location ID that specifies where to pull the prior information from.

  • prior_sex (Optional[int]) – An optional parent sex ID that specifies where to pull the prior information from.

  • test_dir (Optional[str]) – A test directory to create the database in rather than the database specified by the IHME file system context.

  • fill (bool) – Whether or not to fill the database with new inputs based on the model_version_id, parent_location_id, and sex_id. If not filling, this script can be used to just execute commands on the database instead.

  • save_fit (bool) – Whether or not to save the fit from this database as the parent fit.

  • save_prior (bool) – Whether or not to save the prior for the parent as the parent’s prior.

Return type

None

cascade_at.executor.dismod_db.save_predictions(db_file, model_version_id, gbd_round_id, out_dir, locations=None, sexes=None, sample=False, predictions=None)[source]

Save the fit from this dismod database for a specific location and sex to be uploaded later on.

Return type

None

cascade_at.executor.dismod_db.fill_database(path, settings, inputs, alchemy, parent_location_id, sex_id, child_prior, mulcov_prior, options)[source]

Fill a DisMod database at the specified path with the inputs, model, and settings specified, for a specific parent and sex ID, with options to override the priors.

Return type

DismodFiller

cascade_at.executor.dismod_db.get_mulcov_priors(model_version_id)[source]

Read in covariate multiplier statistics from a specific model version ID and returns a dictionary with a prior object for that covariate multiplier type, covariate name, and rate or integrand.

Parameters

model_version_id (int) – The model version ID to pull covariate multiplier statistics from

Return type

Dict[Tuple[str, str, str], _Prior]

cascade_at.executor.dismod_db.get_prior(path, location_id, sex_id, rates, samples=True)[source]

Gets priors from a path to a database for a given location ID and sex ID.

Return type

Dict[str, Dict[str, ndarray]]

Dismod Database Cascade Operation
class cascade_at.cascade.cascade_operations._DismodDB(model_version_id, parent_location_id, sex_id, fill, prior_samples=False, prior_mulcov=False, prior_parent=None, prior_sex=None, dm_options=None, dm_commands=None, save_prior=False, save_fit=False, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

Base class for creating an operation that interfaces with the dismod database.

Parameters
  • model_version_id (int) – The model version to run the model for.

  • parent_location_id (int) – The parent location for this dismod database.

  • sex_id (int) – The reference sex for this dismod database.

  • fill (bool) – Whether or not to fill this database with new data base on the cached inputs or this model version.

  • prior_samples (bool) – Whether or not the prior came from samples or just a mean fit

  • prior_mulcov (bool) – The model version ID where the covariate multiplier statistics are saved. If this is included, then it will add a prior for the covariate multiplier(s) associated with this model version ID.

  • prior_parent (Optional[int]) – The location ID of the parent database to grab the prior for.

  • prior_sex (Optional[int]) – The sex ID of the parent database to grab the prior for.

  • dm_options (Optional[Dict[str, Union[int, float, str]]]) – Additional options to pass to the dismod database, outside of those that would be passed based on the model settings.

  • dm_commands (Optional[List[str]]) – Commands to run on the dismod database.

  • save_prior (bool) – Whether or not to save the prior as the prior for this parent location.

  • save_fit (bool) – Whether or not to save the fit as the fit for this parent location.

  • kwargs

class cascade_at.cascade.cascade_operations.Fit(model_version_id, parent_location_id, sex_id, predict=True, fill=True, both=False, save_fit=False, save_prior=False, ode_fit_strategy=False, ode_init=False, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._DismodDB

Perform a fit on the dismod database for this model version ID, parent location, and sex ID. (See undocumented arguments in _DismodDB.

Parameters
  • model_version_id (int) –

  • parent_location_id (int) –

  • sex_id (int) –

  • predict (bool) – Whether or not to run a predict on this database. Will predict for the avgint table that is based on the IHME-GBD demographics grid.

  • fill (bool) –

  • both (bool) – Whether or not to run a fit both (True) or a fit fixed only (False)

  • save_fit (bool) –

  • save_prior (bool) –

  • kwargs

Create Samples of Variables

After we’ve run a fit on a database, then we can make posterior samples of the variables.

Sample Script
cascade_at.executor.sample.simulate(path, n_sim)[source]

Simulate from a database, within a database.

Parameters
  • path (Union[str, Path]) – A path to the database object to create simulations in.

  • n_sim (int) – Number of simulations to create.

class cascade_at.executor.sample.FitSample(fit_type, **kwargs)[source]

Bases: cascade_at.dismod.api.multithreading._DismodThread

Fit Sample for a database in parallel. Copies the sample table and fits for just one sample index. Will use the __call__ method from _DismodThread.

Parameters
  • main_db – Path to the main database to sample from.

  • index_file_pattern – File pattern to create the index databases with different samples.

  • fit_type (str) – The type of fit to run, one of “fixed” or “both”.

cascade_at.executor.sample.sample_simulate_pool(main_db, index_file_pattern, fit_type, n_sim, n_pool)[source]

Fit the samples in a database in parallel by making copies of the database, fitting them separately, and then combining them back together in the sample table of main_db.

Parameters
  • main_db (Union[str, Path]) – Path to the main database that will be spawned.

  • index_file_pattern (str) – File pattern for the new databases that will have index equal to the simulation number.

  • fit_type (str) – The type of fit to run, one of “fixed” or “both”.

  • n_sim (int) – Number of simulations that will be fit.

  • n_pool (int) – Number of pools for the multiprocessing.

cascade_at.executor.sample.sample_simulate_sequence(path, n_sim, fit_type)[source]

Fit the samples in a database in sequence.

Parameters
  • path (Union[str, Path]) – A path to the database object to create simulations in.

  • n_sim (int) – Number of simulations to create.

  • fit_type (str) – Type of fit – fixed or both

cascade_at.executor.sample.sample(model_version_id, parent_location_id, sex_id, n_sim, n_pool, fit_type, asymptotic=False)[source]

Creates variable samples from a dismod database that has already had a fit run on it. Does so optionally in parallel. Defaults to doing stochastic samples (this is like the parametric bootstrap). If you want asymptotic samples, it will try to do that but if it fails, it will do stochastic samples instead.

Parameters
  • model_version_id (int) – The model version ID

  • parent_location_id (int) – The parent location ID specifying location of database

  • sex_id (int) – The sex ID specifying location of database

  • n_sim (int) – The number of simulations to do

  • n_pool (int) – The number of multiprocessing pools to create. If 1, then will not run with pools but just run all simulations together in one dmdismod command.

  • fit_type (str) – The type of fit that was performed on this database, one of fixed or both.

  • asymptotic (bool) – Whether or not to do asymptotic samples or fit-refit

Return type

None

Sample Cascade Operation
class cascade_at.cascade.cascade_operations.Sample(model_version_id, parent_location_id, sex_id, n_sim, fit_type, asymptotic, n_pool=1, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

Create posterior samples from a dismod database that has already had a fit run on it. This may be done in parallel with a multiprocessing pool. The samples can either be asymptotic (sampling from a multivariate normal distribution) or stochastic simulations. If you choose to sample asymptotic, and it fails (it may fail because of issues with the constraints), then it will automatically do a sample simulate.

Parameters
  • model_version_id (int) – The model version ID

  • parent_location_id (int) – The parent location ID

  • sex_id (int) – The reference sex ID for the database

  • n_sim (int) – The number of posterior samples to create

  • fit_type (str) – The original fit type for this database. Should be either ‘fixed’ or ‘both’ (could also be ‘random’ but we don’t use that).

  • asymptotic (bool) – Whether or not to do asymptotic samples or simulation-based samples.

  • n_pool (int) – The number of threads to create in a multiprocessing pool. If this is 1, then it will not do multiprocessing.

  • kwargs

Compute Covariate Multiplier Statistics
Mulcov Statistics Script

(Note: mulcov is a short name for “covariate multiplier”)

Once we’ve done a sample on a database to get posteriors, we can compute statistics of the covariate multipliers.

This is useful because we often like to use covariate multiplier statistics at one level of the cascade as a prior for the covariate multiplier estimation in another level of the cascade.

cascade_at.executor.mulcov_statistics.get_mulcovs(dbs, covs, table='fit_var')[source]

Get mulcov values from all of the dbs, with all of the common covariates.

Parameters dbs

A list of dismod i/o objects

covs

A list of covariate names

table

Name of the table to pull from (can be fit_var or sample)

Return type

DataFrame

cascade_at.executor.mulcov_statistics.compute_statistics(df, mean=True, std=True, quantile=None)[source]

Compute statistics on a data frame with covariate multipliers. :param df: pd.DataFrame :param mean: bool :param std: bool :param quantile: optional list

Returns: dictionary with requested statistics

cascade_at.executor.mulcov_statistics.mulcov_statistics(model_version_id, locations, sexes, outfile_name, sample=True, mean=True, std=True, quantile=None)[source]

Compute statistics for the covariate multipliers on a dismod database, and save them to a file.

Parameters
  • model_version_id (int) – The model version ID

  • locations (List[int]) – A list of locations that, when used in combination with sexes, point to the databases to pull covariate multiplier estimates from

  • sexes (List[int]) – A list of sexes that, when used in combination with locations, point to the databases to pull covariate multiplier estimates from

  • outfile_name (str) – A filepath specifying where to save the covariate multiplier statistics.

  • sample (bool) – Whether or not the results are stored in the sample table or the fit_var table.

  • mean (bool) – Whether or not to compute the mean

  • std (bool) – Whether or not to compute the standard deviation

  • quantile (Optional[List[float]]) – An optional list of quantiles to compute

Return type

None

Mulcov Statistics Cascade Operation
class cascade_at.cascade.cascade_operations.MulcovStatistics(model_version_id, locations, sexes, sample, mean, std, quantile, outfile_name=None, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

The base class for a cascade operation.

Parameters
  • upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.

  • executor_parameters – Optional dictionary of execution parameters that updates the execution parameters DEFAULT_EXECUTOR_PARAMETERS

Make Predictions of Integrands

Once we’ve fit to a database and/or made posterior samples, we can make predictions using the fit or sampled variables on the average integrand grid. This is how we make predictions for age groups and times on the IHME grid.

Predict Script
cascade_at.executor.predict.fill_avgint_with_priors_grid(inputs, alchemy, settings, source_db_path, child_locations, child_sexes)[source]

Fill the average integrand table with the grid that the priors are on. This is so that we can “predict” the prior for the next level of the cascade.

Parameters
  • inputs (MeasurementInputs) – An inputs object

  • alchemy (Alchemy) – A grid alchemy object

  • settings (SettingsConfig) – A settings configuration object

  • source_db_path (Union[str, Path]) – The path of the source database that has had a fit on it

  • child_locations (List[int]) – The child locations to predict for

  • child_sexes (List[int]) – The child sexes to predict for

class cascade_at.executor.predict.Predict(**kwargs)[source]

Bases: cascade_at.dismod.api.multithreading._DismodThread

Predicts for a database in parallel. Chops up the sample table into a bunch of copies, each with only one sample.

cascade_at.executor.predict.predict_sample_sequence(path, table)[source]

Runs predict for either fit_var or sample, based on the table.

cascade_at.executor.predict.predict_sample_pool(main_db, index_file_pattern, n_sim, n_pool)[source]

Run predict sample in a pool by making copies of the existing database and splitting out the sample table into n_sim databases, running predict sample on each of them, and combining the results back into the main database.

cascade_at.executor.predict.predict_sample(model_version_id, parent_location_id, sex_id, child_locations, child_sexes, prior_grid=True, save_fit=False, save_final=False, sample=False, n_sim=1, n_pool=1)[source]

Takes a database that has already had a fit and simulate sample run on it, fills the avgint table for the child_locations and child_sexes you want to make predictions for, and then predicts on that grid. Makes predictions on the grid that is specified for the primary rates in the model, for the primary rates only.

Parameters
  • model_version_id (int) – The model version ID

  • parent_location_id (int) – The parent location ID that specifies where the database is stored

  • sex_id (int) – The sex ID that specifies where the database is stored

  • child_locations (List[int]) – The child locations to make predictions for on the rate grid

  • child_sexes (List[int]) – The child sexes to make predictions for on the rate grid

  • prior_grid (bool) – Whether or not to replace the default gbd-avgint grid with a prior grid for the rates.

  • save_fit (bool) – Whether or not to save the fit for upload later.

  • save_final (bool) – Whether or not to save the final for upload later.

  • sample (bool) – Whether to predict from the sample table or the fit_var table

  • n_sim (int) – The number of simulations to predict for

  • n_pool (int) – The number of multiprocessing pools to create. If 1, then will not run with pools but just run all simulations together in one dmdismod command.

Return type

None

Predict Cascade Operation
class cascade_at.cascade.cascade_operations.Predict(model_version_id, parent_location_id, sex_id, child_locations=None, child_sexes=None, prior_grid=True, save_fit=False, save_final=False, sample=True, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

The base class for a cascade operation.

Parameters
  • upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.

  • executor_parameters – Optional dictionary of execution parameters that updates the execution parameters DEFAULT_EXECUTOR_PARAMETERS

Upload Results

After a Cascade-AT model has finished running, we can upload the results to the IHME epi database.

Upload Script
cascade_at.executor.upload.upload_prior(context, rh)[source]

Uploads the saved priors to the epi database in the table epi.model_prior..

Parameters
Return type

None

cascade_at.executor.upload.upload_fit(context, rh)[source]

Uploads the saved final results to a the epi database in the table epi.model_estimate_fit. . :type rh: ResultsHandler :param rh: a Results Handler object :type context: Context :param context: A context object

Return type

None

cascade_at.executor.upload.upload_final(context, rh)[source]

Uploads the saved final results to a the epi database in the table epi.model_estimate_final.

Parameters
Return type

None

cascade_at.executor.upload.format_upload(model_version_id, final=False, fit=False, prior=False)[source]
Return type

None

Upload Cascade Operation
class cascade_at.cascade.cascade_operations.Upload(model_version_id, final=False, fit=False, prior=False, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

The base class for a cascade operation.

Parameters
  • upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.

  • executor_parameters – Optional dictionary of execution parameters that updates the execution parameters DEFAULT_EXECUTOR_PARAMETERS

Clean Up Files

The cleanup script is used to delete unnecessary databases after we already have final results for a model.

Cleanup Script
cascade_at.executor.cleanup.cleanup(model_version_id)[source]

Delete all databases (.db) files attached to a model version.

Parameters

model_version_id (int) – The model version ID to delete databases for

Return type

None

Cleanup Cascade Operation
class cascade_at.cascade.cascade_operations.CleanUp(model_version_id, **kwargs)[source]

Bases: cascade_at.cascade.cascade_operations._CascadeOperation

The base class for a cascade operation.

Parameters
  • upstream_commands – A list of commands that are upstream to this operation. This means that it will be run before this operation.

  • executor_parameters – Optional dictionary of execution parameters that updates the execution parameters DEFAULT_EXECUTOR_PARAMETERS

EpiViz-AT Settings

The EpiViz-AT Settings are the set of all choices a user makes in the EpiViz-AT user interface. This is how the interface sends those choices to the command-line EpiViz-AT.

The list of all possible settings is in https://github.com/ihmeuw/cascade/blob/develop/src/cascade-at/input_data/configuration/form.py where any setting with the word Dummy is being ignored.

Any setting that is unset, meaning the user has used the close box to ensure it is greyed-out in the EpiViz-AT user interface, will be missing from the EpiViz-AT settings sent to the command-line program, and the program understands that it should use a default.

Settings Configuration

Helper Functions
cascade_at.settings.settings.load_settings(settings_json)[source]

Loads settings from a settings_json.

Parameters

settings_json (Dict[str, Any]) – dictionary of settings

Examples

>>> from cascade_at.settings.base_case import BASE_CASE
>>> settings = load_settings(BASE_CASE)
Return type

SettingsConfig

cascade_at.settings.settings.settings_json_from_model_version_id(model_version_id, conn_def)[source]

Loads settings for a specific model version ID into a json.

Parameters
  • model_version_id (int) – the model version ID

  • conn_def (str) – the connection definition like ‘dismod-at-dev’

Return type

Dict[str, any]

cascade_at.settings.settings.settings_from_model_version_id(model_version_id, conn_def)[source]

Loads settings for a specific model version ID.

Parameters
  • model_version_id (int) – the model version ID

  • conn_def (str) – the connection definition like ‘dismod-at-dev’

Examples

>>> settings = settings_from_model_version_id(model_version_id=395837,
>>>                                           conn_def='dismod-at-dev')
Return type

SettingsConfig

Settings Configuration Form

All available options from EpiViz-AT.

class cascade_at.settings.settings_config.SettingsConfig(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

The root Form of the whole settings inputs tree. This collects all settings from EpiViz-AT and adds default values when they are missing.

A representation of the configuration form we expect to receive from EpiViz. The hope is that this form will do as much validation and precondition checking as is feasible within the constraint that it must be able to validate a full EpiViz parameter document in significantly less than one second. This is because it will be used as part of a web service which gates EpiViz submissions and must return in near real time.

The Configuration class is the root of the form.

Example

>>> import json
>>> input_data = json.loads(json_blob)
>>> form = SettingsConfig(input_data)
>>> errors = form.validate_and_normalize()
model: cascade_at.settings.settings_config.Model = None
policies: cascade_at.settings.settings_config.Policies = None
gbd_round_id: cascade_at.core.form.fields.IntField = None
random_effect: cascade_at.core.form.fields.FormList = None
rate: cascade_at.core.form.fields.FormList = None
country_covariate: cascade_at.core.form.fields.FormList = None
study_covariate: cascade_at.core.form.fields.FormList = None
eta: cascade_at.settings.settings_config.Eta = None
students_dof: cascade_at.settings.settings_config.StudentsDOF = None
log_students_dof: cascade_at.settings.settings_config.StudentsDOF = None
location_set_version_id: cascade_at.core.form.fields.IntField = None
csmr_cod_output_version_id: cascade_at.core.form.fields.IntField = None
csmr_mortality_output_version_id: cascade_at.core.form.fields.Dummy = NO_VALUE
min_cv: cascade_at.core.form.fields.FormList = None
min_cv_by_rate: cascade_at.core.form.fields.FormList = None
re_bound_location: cascade_at.core.form.fields.FormList = None
derivative_test: cascade_at.settings.settings_config.DerivativeTest = None
max_num_iter: cascade_at.settings.settings_config.FixedRandomInt = None
print_level: cascade_at.settings.settings_config.FixedRandomInt = None
accept_after_max_steps: cascade_at.settings.settings_config.FixedRandomInt = None
tolerance: cascade_at.settings.settings_config.FixedRandomFloat = None
data_cv_by_integrand: cascade_at.core.form.fields.FormList = None
data_eta_by_integrand: cascade_at.core.form.fields.FormList = None
data_density_by_integrand: cascade_at.core.form.fields.FormList = None
config_version: cascade_at.core.form.fields.StrField = None
class cascade_at.settings.settings_config.SmoothingPrior(*args, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

Priors for smoothing.

prior_type: cascade_at.core.form.fields.OptionField = None
age_lower: cascade_at.core.form.fields.FloatField = None
age_upper: cascade_at.core.form.fields.FloatField = None
time_lower: cascade_at.core.form.fields.FloatField = None
time_upper: cascade_at.core.form.fields.FloatField = None
born_lower: cascade_at.core.form.fields.FloatField = None
born_upper: cascade_at.core.form.fields.FloatField = None
density: cascade_at.core.form.fields.OptionField = None
min: cascade_at.core.form.fields.FloatField = None
mean: cascade_at.core.form.fields.FloatField = None
max: cascade_at.core.form.fields.FloatField = None
std: cascade_at.core.form.fields.FloatField = None
nu: cascade_at.core.form.fields.FloatField = None
eta: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.SmoothingPriorGroup(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

dage: cascade_at.settings.settings_config.SmoothingPrior = None
dtime: cascade_at.settings.settings_config.SmoothingPrior = None
value: cascade_at.settings.settings_config.SmoothingPrior = None
class cascade_at.settings.settings_config.Smoothing(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

rate: cascade_at.core.form.fields.OptionField = None
location: cascade_at.core.form.fields.IntField = None
age_grid: cascade_at.core.form.fields.StringListField = None
time_grid: cascade_at.core.form.fields.StringListField = None
default: cascade_at.settings.settings_config.SmoothingPriorGroup = None
mulstd: cascade_at.settings.settings_config.SmoothingPriorGroup = None
detail: cascade_at.core.form.fields.FormList = None
age_time_specific: cascade_at.core.form.fields.IntField = None
custom_age_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
custom_time_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
class cascade_at.settings.settings_config.StudyCovariate(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

study_covariate_id: cascade_at.core.form.fields.IntField = None
measure_id: cascade_at.core.form.fields.StrField = None
mulcov_type: cascade_at.core.form.fields.OptionField = None
transformation: cascade_at.core.form.fields.IntField = None
age_time_specific: cascade_at.core.form.fields.IntField = None
age_grid: cascade_at.core.form.fields.StringListField = None
time_grid: cascade_at.core.form.fields.StringListField = None
default: cascade_at.settings.settings_config.SmoothingPriorGroup = None
mulstd: cascade_at.settings.settings_config.SmoothingPriorGroup = None
detail: cascade_at.core.form.fields.FormList = None
custom_age_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
custom_time_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
class cascade_at.settings.settings_config.CountryCovariate(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

country_covariate_id: cascade_at.core.form.fields.IntField = None
measure_id: cascade_at.core.form.fields.StrField = None
mulcov_type: cascade_at.core.form.fields.OptionField = None
transformation: cascade_at.core.form.fields.IntField = None
age_time_specific: cascade_at.core.form.fields.IntField = None
age_grid: cascade_at.core.form.fields.StringListField = None
time_grid: cascade_at.core.form.fields.StringListField = None
default: cascade_at.settings.settings_config.SmoothingPriorGroup = None
mulstd: cascade_at.settings.settings_config.SmoothingPriorGroup = None
detail: cascade_at.settings.settings_config.SmoothingPrior = None
custom_age_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
custom_time_grid: cascade_at.core.form.fields.Dummy = NO_VALUE
class cascade_at.settings.settings_config.Model(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

modelable_entity_id: cascade_at.core.form.fields.IntField = None
decomp_step_id: cascade_at.core.form.fields.IntField = None
model_version_id: cascade_at.core.form.fields.IntField = None
random_seed: cascade_at.core.form.fields.IntField = None
minimum_meas_cv: cascade_at.core.form.fields.FloatField = None
add_csmr_cause: cascade_at.core.form.fields.IntField = None
title: cascade_at.core.form.fields.StrField = None
description: cascade_at.core.form.fields.StrField = None
crosswalk_version_id: cascade_at.core.form.fields.IntField = None
bundle_id: cascade_at.core.form.fields.IntField = None
drill: cascade_at.core.form.fields.OptionField = None
drill_location: cascade_at.core.form.fields.IntField = None
drill_location_start: cascade_at.core.form.fields.IntField = None
drill_location_end: cascade_at.core.form.fields.NativeListField = None
drill_sex: cascade_at.core.form.fields.OptionField = None
birth_prev: cascade_at.core.form.fields.OptionField = None
default_age_grid: cascade_at.core.form.fields.StringListField = None
default_time_grid: cascade_at.core.form.fields.StringListField = None
constrain_omega: cascade_at.core.form.fields.OptionField = None
exclude_data_for_param: cascade_at.core.form.fields.ListField = None
ode_step_size: cascade_at.core.form.fields.FloatField = None
addl_ode_stpes: cascade_at.core.form.fields.StringListField = None
split_sex: cascade_at.core.form.fields.OptionField = None
quasi_fixed: cascade_at.core.form.fields.OptionField = None
zero_sum_random: cascade_at.core.form.fields.ListField = None
bound_frac_fixed: cascade_at.core.form.fields.FloatField = None
bound_random: cascade_at.core.form.fields.FloatField = None
rate_case: cascade_at.core.form.fields.StrField = None
data_density: cascade_at.core.form.fields.StrField = None
relabel_incidence: cascade_at.core.form.fields.IntField = None
midpoint_approximation: cascade_at.core.form.fields.NativeListField = None
class cascade_at.settings.settings_config.Eta(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

priors: cascade_at.core.form.fields.FloatField = None
data: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.DataCV(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

integrand_measure_id: cascade_at.core.form.fields.IntField = None
value: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.MinCV(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

cascade_level_id: cascade_at.core.form.fields.StrField = None
value: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.MinCVRate(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

cascade_level_id: cascade_at.core.form.fields.StrField = None
rate_measure_id: cascade_at.core.form.fields.StrField = None
value: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.DataEta(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

integrand_measure_id: cascade_at.core.form.fields.IntField = None
value: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.DataDensity(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

value: cascade_at.core.form.fields.StrField = None
integrand_measure_id: cascade_at.core.form.fields.IntField = None
class cascade_at.settings.settings_config.StudentsDOF(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

priors: cascade_at.core.form.fields.FloatField = None
data: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.DerivativeTest(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

fixed: cascade_at.core.form.fields.OptionField = None
random: cascade_at.core.form.fields.OptionField = None
class cascade_at.settings.settings_config.FixedRandomInt(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

fixed: cascade_at.core.form.fields.IntField = None
random: cascade_at.core.form.fields.IntField = None
class cascade_at.settings.settings_config.FixedRandomFloat(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

fixed: cascade_at.core.form.fields.FloatField = None
random: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.RandomEffectBound(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

location: cascade_at.core.form.fields.IntField = None
value: cascade_at.core.form.fields.FloatField = None
class cascade_at.settings.settings_config.Policies(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

Bases: cascade_at.core.form.abstract_form.Form

estimate_emr_from_prevalence: cascade_at.core.form.fields.OptionField = None
use_weighted_age_group_midpoints: cascade_at.core.form.fields.OptionField = None
number_of_fixed_effect_samples: cascade_at.core.form.fields.IntField = None
with_hiv: cascade_at.core.form.fields.BoolField = None
age_group_set_id: cascade_at.core.form.fields.IntField = None
exclude_relative_risk: cascade_at.core.form.fields.OptionField = None
meas_noise_effect: cascade_at.core.form.fields.OptionField = None
limited_memory_max_history_fixed: cascade_at.core.form.fields.IntField = None
gbd_round_id: cascade_at.core.form.fields.IntField = None

Converting Settings

These functions are used to convert the settings that have missing data into dictionaries that are filled in with default values for things like data coefficient of variation, eta the log offset, etc.

cascade_at.settings.convert.midpoint_list_from_settings(settings)[source]

Takes the settings configuration for which integrands to midpoint which comes in as measure ID and translates that to integrand enums.

Parameters

settings (SettingsConfig) – The settings configuration to convert from

Return type

List[str]

cascade_at.settings.convert.measures_to_exclude_from_settings(settings)[source]

Gets the measures to exclude from the data from the model settings configuration.

Parameters

settings (SettingsConfig) – The settings configuration to convert from

Return type

List[str]

cascade_at.settings.convert.data_eta_from_settings(settings, default=nan)[source]

Gets the data eta from the settings Configuration. The default data eta is np.nan.

Parameters
  • settings (SettingsConfig) – The settings configuration to convert from

  • default (float) – The default eta to use

Return type

Dict[str, float]

cascade_at.settings.convert.density_from_settings(settings, default='gaussian')[source]

Gets the density from the settings Configuration. The default density is “gaussian”.

Parameters
  • settings (SettingsConfig) – The settings configuration to convert from

  • default (str) – The default data density to use

Return type

Dict[str, str]

cascade_at.settings.convert.data_cv_from_settings(settings, default=0.0)[source]

Gets the data min coefficient of variation from the settings Configuration

Parameters
  • settings (SettingsConfig) – The settings configuration to convert from

  • default (float) – The default data coefficient of variation

Return type

Dict[str, float]

cascade_at.settings.convert.min_cv_from_settings(settings, default=0.0)[source]

Gets the minimum coefficient of variation by rate and level of the cascade from settings. First key is cascade level, second is rate

Parameters
  • settings (SettingsConfig) – The settings configuration from which to pull

  • default (float) – The default min CV to use when not specified

Return type

defaultdict

cascade_at.settings.convert.nu_from_settings(settings, default=nan)[source]

Gets nu from the settings Configuration. The default nu is np.nan.

Parameters
  • settings (SettingsConfig) – The settings configuration from which to pull

  • default (float) – The default nu to use when not specified in the settings

Return type

Dict[str, float]

Data Inputs for Cascade-AT

Wrangling the inputs for a Cascade-AT model is a very important first step. All of the inputs at this time come from the IHME epi databases. In the future we’d like to create input data classes that don’t depend on the epi databases.

Input Components documents the inputs that are pulled for a model run. Input Demographics describes the demographic and location inputs that need to be set for a model. Covariates describes how covariates are pulled and transformed. Other-Cause Mortality describes how we calculate other cause mortality from the mortality inputs and use them as a constraint. Measurement Inputs documents how each of those inputs works together to create one large object that stores all of the input data for a model run (including each of the input components).

Input Demographics

There are two main demographic objects needed to pull data from the IHME databases, and more generally for building the cascade model.

Demographics
class cascade_at.inputs.demographics.Demographics(gbd_round_id, location_set_version_id=None)[source]

Bases: object

Grabs and stores demographic information needed for shared functions. Will also make a location hierarchy dag.

Parameters
  • gbd_round_id (int) – The GBD round

  • location_set_version_id (Optional[int]) – The location set version to use (right now EpiViz-AT is passing dismod location set versions, but this will eventually switch to the cause of death hierarchy that is more extensive).

Location Hierarchy
class cascade_at.inputs.locations.LocationDAG(location_set_version_id=None, gbd_round_id=None, df=None, root=None)[source]

Bases: object

Create a location DAG from the GBD location hierarchy, using networkx graph where each node is the location ID, and its properties are all properties from db_queries.

The root of this dag is the global location ID.

Parameters
  • location_set_version_id (Optional[int]) – The location set version corresponding to the hierarchy to pull from the IHME databases

  • gbd_round_id (Optional[int]) – Which gbd round the location set version is coming from

  • df (Optional[DataFrame]) – An optional df to pass instead of location sets and gbd rounds if you’d rather construct the DAG from a pandas data frame.

depth(location_id)[source]

Gets the depth of the hierarchy at this location.

Return type

int

descendants(location_id)[source]

Gets all descendants (not just direct children) for a location ID. :type location_id: int :param location_id: (int) :rtype: List[int] :return:

children(location_id)[source]

Gets the child location IDs.

Return type

List[int]

parent_children(location_id)[source]

Gets the parent and the child location IDs.

Return type

List[int]

is_leaf(location_id)[source]

Checks if a location is a leaf node in the tree.

Return type

bool

to_dataframe()[source]

Converts the location DAG to a data frame with location ID and parent ID and name. Helpful for debugging, and putting into the dismod database.

Return type

DataFrame

Returns

pd.DataFrame

Input Components

These are all of the inputs that are pulled for a model run. Some may not be pulled depending on the settings (for example, some models don’t have cause-specific mortality data).

Crosswalk Version
class cascade_at.inputs.data.CrosswalkVersion(crosswalk_version_id, exclude_outliers, demographics, conn_def, gbd_round_id)[source]

Bases: cascade_at.inputs.base_input.BaseInput

Pulls and formats all of the data from a crosswalk version in the epi database.

Parameters
  • crosswalk_version_id (int) – The crosswalk version to pull from

  • exclude_outliers (bool) – whether to exclude outliers

  • conn_def (str) – database connection definition

  • gbd_round_id (int) – The GBD round

  • demographics (Demographics) – The demographics object

get_raw()[source]

Pulls the raw crosswalk version from the database. These are the observations that will be used in the bundle.

configure_for_dismod(relabel_incidence, measures_to_exclude=None)[source]

Configures the crosswalk version for DisMod.

Parameters
  • measures_to_exclude (Optional[List[str]]) – list of parameters to exclude, by name

  • relabel_incidence (int) – how to label incidence – see RELABEL_INCIDENCE_MAP

Return type

DataFrame

static map_to_integrands(df, relabel_incidence)[source]

Maps the data from the IHME databases to the integrands expected by DisMod AT.

Parameters
  • df (DataFrame) – A data frame to map to integrands

  • relabel_incidence (int) – A relabel incidence code. Can be found in RELABEL_INCIDENCE_MAP

Cause-Specific Mortality Rate
class cascade_at.inputs.csmr.CSMR(cause_id, demographics, decomp_step, gbd_round_id)[source]

Bases: cascade_at.inputs.base_input.BaseInput

Get cause-specific mortality rate for demographic groups from a specific CodCorrect output version.

Parameters
  • cause_id (int) – The GBD cause of death to pull mortality from

  • demographics (Demographics) –

  • decomp_step (str) –

  • gbd_round_id (int) –

get_raw()[source]

Pulls the raw CSMR and assigns it to this class.

attach_to_model_version_in_db(model_version_id, conn_def)[source]

Uploads the CSMR for this model and attaches it to the model version so that it can be viewed in EpiViz.

configure_for_dismod(hold_out=0)[source]

Configures CSMR for DisMod.

Parameters

hold_out (int) – hold-out value for Dismod. 0 means it will be fit, 1 means held out

Return type

DataFrame

cascade_at.inputs.csmr.get_best_cod_correct(gbd_round_id)[source]

Gets the best codcorrect version for a given GBD round.

Parameters

gbd_round_id (int) –

Returns

Return type

The process_version_id to be used with a db_queries.get_outputs call.

All-Cause Mortality Rate
class cascade_at.inputs.asdr.ASDR(demographics, decomp_step, gbd_round_id)[source]

Bases: cascade_at.inputs.base_input.BaseInput

Gets age-specific all-cause death rate for all demographic groups.

Parameters
  • demographics (Demographics) –

  • decomp_step (str) –

  • gbd_round_id (int) –

get_raw()[source]

Pulls the raw ASDR and assigns them to this class.

configure_for_dismod(hold_out=0)[source]

Configures ASDR for DisMod.

Parameters

hold_out (int) – hold-out value for Dismod. 0 means it will be fit, 1 means held out

Return type

DataFrame

Population
class cascade_at.inputs.population.Population(demographics, decomp_step, gbd_round_id)[source]

Bases: cascade_at.inputs.base_input.BaseInput

Gets population for all demographic groups. This is not and input for DisMod-AT (and therefore does not subclass BaseInput. It is just used to do covariate interpolation over non-standard age groups and years.

Parameters
  • demographics (Demographics) – A demographics object

  • decomp_step (str) – The decomp step

  • gbd_round_id (int) – The gbd round

get_population()[source]

Gets the population counts from the database for the specified demographic group.

configure_for_dismod()[source]

Configures population inputs for use in dismod by converting to age lower and upper from GBD age groups.

Return type

DataFrame

Covariates

Covariates Design from EpiViz

EpiViz-AT classifies covariates as country and study types. The country are 0 or 1 and are specific to the bundle. The country are floating-point values defined for every age / location / sex / year.

The strategy for parsing these and putting them into the model is to split the data download and normalization from construction of model priors. The EpiVizCovariate is the information part. The EpiVizCovariateMultiplier is the model prior part.

_images/covariate-movement.png

For reading data, the main complication is that covariates have several IDs and names.

  • study_covariate_id and country_covariate_id may be equal for different covariates. That is, they are two sets of IDs. We have no guarantee this is not the case (even if someone tells us it is not the case).

  • In the inputs, each covariate has a short_name, which is what we use. The short name, in other inputs, can contain spaces. I don’t know that study and country short names are guaranteed to be distinct. Therefore…

  • We prefix study and country covariates with s_ and c_.

  • Covariates are often transformed into log space, exponential space, or others. These get _log, _exp, or whatever appended.

  • When covariates are put into the model, they have English names, but inside Dismod-AT, they get renamed to x_0, x_1, x_....

class cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariate(study_country, covariate_id, transformation_id)[source]

Bases: object

This specifies covariate data from settings. It is separate from the cascade.model.Covariate, which is a Dismod-AT covariate. EpiViz-AT distinguishes study and country covariates and encodes them into the Dismod-AT covariate names.

transformation_id

Which function to apply to this covariate column (log, exp, etc)

untransformed_covariate_name

The name for this covariate before transformation.

property spec

Unique identifier for a covariate because two multipliers may refer to the same covariate.

property name

The name for this covariate in the final data.

class cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariateMultiplier(covariate, settings)[source]

Bases: object

Parameters
  • covariate (EpiVizCovariate) – The covariate

  • settings (StudyCovariate|CountryCovariate) – Section of the form.

property group

The name of the DismodGroups group, so it’s alpha, beta, or gamma.

property key

Key for the DismodGroups object, so it is a tuple of (covariate name, rate) or (covariate name, integrand) where rate and integrand are strings.

cascade_at.inputs.utilities.covariate_specifications.kind_and_id(covariate_setting)[source]
cascade_at.inputs.utilities.covariate_specifications.create_covariate_specifications(country_covariate, study_covariate)[source]

Parses EpiViz-AT settings to create two data structures for Covariate creation.

Covariate multipliers will only contain country covariates. Covariate specifications will contain both the country and study covariates, which are only the ‘sex’ and ‘one’ covariates.

>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE)
>>> multipliers, data_spec = create_covariate_specifications(settings.country_covariate, settings.study_covariate)
Parameters
  • country_covariate (List[CountryCovariate]) – The country_covariate member of the EpiViz-AT settings.

  • study_covariate (List[StudyCovariate]) – The study_covariate member of the EpiViz-AT settings.

Return type

(typing.List[cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariateMultiplier], typing.List[cascade_at.inputs.utilities.covariate_specifications.EpiVizCovariate])

Returns

  • The multipliers are specification for making SmoothGrids.

  • The covariates are specification

  • for downloading data and attaching it to the crosswalk version and average integrand

  • tables. The multipliers use the covariates in order to know the name

  • of the covariate.

The following class is a wrapper around the covariate specifications that makes them easier to work with and provides helpful metadata.

class cascade_at.inputs.covariate_specs.CovariateSpecs(country_covariates, study_covariates)[source]

Bases: object

create_covariate_list()[source]

Creates a list of Covariate objects with the current reference value and max difference.

Definition of Study and Country

There are three reasons to use a covariate.

Country Covariate

We believe this covariate predicts disease behavior.

Study Covariate

THIS IS DEPRECATED: the only study covariates are sex and one, described below.

The covariate marks a set of studies that behave differently. For instance, different sets of measurements may have different criteria for when a person is said to have the disease. We assign a covariate to the set of studies to account for bias from study design.

Sex Covariate

This is usually used to select a subset of data by sex, but this could be done based on any covariate associated with observation data. In addition to being used to subset data, the sex covariate is a covariate multiplier applied the same way as a study covariate.

One Covariate

The “one covariate” is a covariate of all ones. It’s treated within the bundle management system as a study covariate. It’s used as a covariate on measurement standard deviations, in order to account for between-study heterogeneity. A paper that might be a jumping-off point for understanding this is [Serghiou2019].

A covariate column that is used just for exclusion doesn’t need a covariate multiplier. In practice, the sex covariate is used at global or super-region level as a study covariate. Then the adjustments determined at the upper level are applied as constraints down the hierarchy. This means there is a covariate multiplier for sex, and its smooth is a grid of constraints, not typical priors.

Dismod-AT applies covariate effects to one of three different variables. It either uses the covariate to predict the underlying rate, or it applies the covariate to predict the measured data. It can be an effect on either the measured data value or the observation data standard deviation. Dismod-AT calls these, respectively, the alpha, beta, and gamma covariates.

As a rule of thumb, the three uses of covariates apply to different variables, as shown in the table below.

Use of Covariate

Rate

Measured Value

Measured Stddev

Country

Yes

Maybe

Maybe

Study

Maybe

Yes

Yes

Sex (exclusion)

No

Yes

No

Country and study covariates can optionally use outliering. The sex covariate is defined by its use of regular outliering. Male and female data is assigned a value of -0.5 and 0.5, and the mean and maximum difference are adjusted to include one, the other, or both sexes.

Policies for Study and Country Covariates
  • Sex is added as a covariate called s_sex, which Dismod-AT translates to x_0 for its db file format. It is -0.5 for women, 0.5 for men, and 0 for both or neither. This covariate is used to exclude data by setting a reference value equal to -0.5 or 0.5 and a max allowed difference to 0.75, so that the “both” category is included and the other sex is excluded.

  • The s_one covariate is a study covariate of ones. This can be selected in the user interface and is usually used as a gamma covariate, meaning it is a covariate multiplier on the standard deviation of measurement data. Its covariate id is 1604, and it appears in the db file as x_1 with a reference value of 0 and no max difference.

Serghiou2019

Serghiou, Stylianos, and Steven N. Goodman. “Random-Effects Meta-analysis: Summarizing Evidence With Caveats.” Jama 321.3 (2019): 301-302.

Country Covariate Data

To grab the data for the covariates, we use this class that is part of the core data inputs.

class cascade_at.inputs.covariate_data.CovariateData(covariate_id, demographics, decomp_step, gbd_round_id)[source]

Bases: cascade_at.inputs.base_input.BaseInput

Get covariate estimates, and map them to the necessary demographic ages and sexes. If only one age group is present in the covariate data then that means that it’s not age-specific and we want to copy the values over to all the other age groups we’re working with in demographics. Same with sex.

get_raw()[source]

Pulls the raw covariate data from the database.

configure_for_dismod(pop_df, loc_df)[source]

Configures covariates for DisMod. Completes covariate ages, sexes, and locations based on what covariate data is already available.

To fill in ages, it copies over all age or age standardized covariates into each of the specific age groups.

To fill in sexes, it copies over any both sex covariates to the sex specific groups.

To fill in locations, it takes a population-weighted average of child locations for parent locations all the way up the location hierarchy.

Parameters
  • pop_df (DataFrame) – A data frame with population info for all ages, sexes, locations, and years

  • loc_df (DataFrame) – A data frame with location hierarchy information

Because study covariates are deprecated, we don’t need to get data for those. Instead, in the MeasurementInputs class we just assign the sex and one covariate values on the fly.

Covariate Interpolation

When we attach covariate values to data points, we often need to interpolate across ages or times because the data points don’t fit nicely into the covariate age and time groups that come from the GBD database.

The interpolation happens inside of MeasurementInputs, using the following function that creates a CovariateInterpolator for each covariate.

cascade_at.inputs.utilities.covariate_weighting.get_interpolated_covariate_values(data_df, covariate_dict, population_df)[source]

Gets the unique age-time combinations from the data_df, and creates interpolated covariate values for each of these combinations by population-weighting the standard GBD age-years that span the non-standard combinations.

Parameters
  • data_df (DataFrame) – A data frame with data observations in it

  • covariate_dict (Dict[str, DataFrame]) – A dictionary of covariate data frames with covariate names as keys

  • population_df (DataFrame) – A data frame with population in it

Return type

DataFrame

class cascade_at.inputs.utilities.covariate_weighting.CovariateInterpolator(covariate, population)[source]

Bases: object

Interpolates a covariate by population weighting.

Parameters
  • covariate (DataFrame) – Data frame with covariate information

  • population (DataFrame) – Data frame with population information

interpolate(loc_id, sex_id, age_lower, age_upper, time_lower, time_upper)[source]

Main interpolation function.

Covariate Multipliers

All of the above sections involve pre-processing of the EpiViz-AT settings and covariate data. This is all so that we can make a covariate correctly in the dismod model specifications.

For the “covariate multiplier” that uses all of this information and converts it into something that dismod can understand, see Covariate Multipliers.

Other-Cause Mortality

The IHME databases supply all-cause mortality, but Dismod-AT uses other-cause mortality. It can impute what it needs to know using all-cause mortality, but it is helpful to add other-cause mortality not just as input data but as a constraint to the model.

We use total mortality as other-cause mortality. The correct formulae to use are for “cause-deleted lifetables” or “cause deletion.”

Omega Constraint

This constrains other-cause mortality using data from mtother, which is the integrand for other-cause mortality.

The choice to use an omega constraint is set in EpiViz-AT, and this is obeyed. If the user does choose to constrain omega, then it is included with the following function.

cascade_at.inputs.utilities.data.calculate_omega(asdr, csmr)[source]

Calculates other cause mortality (omega) from ASDR (mtall – all-cause mortality) and CSMR (mtspecific – cause-specific mortality). For most diseases, mtall is a good approximation to omega, but we calculate omega = mtall - mtspecific in case it isn’t. For diseases without CSMR (self.csmr_cause_id = None), then omega = mtall.

Parameters
  • asdr (DataFrame) – data frame with age-specific all-cause mortality rates

  • csmr (DataFrame) – data frame with age-specific cause-specific mortality rates

Return type

DataFrame

Measurement Inputs

Measurement inputs collects all of the things from Input Components and has a bunch of helper functions to format and combine them in accordance with the model settings for dismod.

class cascade_at.inputs.measurement_inputs.MeasurementInputs(model_version_id, gbd_round_id, decomp_step_id, conn_def, country_covariate_id, csmr_cause_id, crosswalk_version_id, location_set_version_id=None, drill_location_start=None, drill_location_end=None)[source]

Bases: object

The class that constructs all of the measurement inputs. Pulls ASDR, CSMR, crosswalk versions, and country covariates, and puts them into one data frame that then formats itself for the dismod database. Performs covariate value interpolation if age and year ranges don’t match up with GBD age and year ranges.

Parameters
  • model_version_id (int) – the model version ID

  • gbd_round_id (int) – the GBD round ID

  • decomp_step_id (int) – the decomp step ID

  • csmr_cause_id ((int) cause to pull CSMR from) –

  • crosswalk_version_id (int) – crosswalk version to use

  • country_covariate_id (List[int]) – list of covariate IDs

  • conn_def (str) – connection definition from .odbc file (e.g. ‘epi’) to connect to the IHME databases

  • location_set_version_id (Optional[int]) – can be None, if it’s none, get the best location_set_version_id for estimation hierarchy of this GBD round

  • drill_location_start (Optional[int]) – which location ID to drill from as the parent

  • drill_location_end (Optional[List[int]]) – which immediate children of the drill_location_start parent to include in the drill

self.decomp_step

the decomp step in string form

Type

str

self.demographics

a demographics object that specifies the age group, sex, location, and year IDs to grab

Type

cascade_at.inputs.demographics.Demographics

self.integrand_map

dictionary mapping from GBD measure IDs to DisMod IDs

Type

Dict[int, int]

self.asdr

all-cause mortality input object

Type

cascade_at.inputs.asdr.ASDR

self.csmr

cause-specific mortality input object from cause csmr_cause_id

Type

cascade_at.inputs.csmr.CSMR

self.data

crosswalk version data from IHME database

Type

cascade_at.inputs.data.CrosswalkVersion

self.covariate_data

list of covariate data objects that contains the raw covariate data mapped to IDs

Type

List[cascade_at.inputs.covariate_data.CovariateData]

self.location_dag

DAG of locations to be used

Type

cascade_at.inputs.locations.LocationDAG

self.population

population object that is used for covariate weighting

Type

(cascade_at.inputs.population.Population)

self.data_eta

applied to each measure

Type

(Dict[str, float]): dictionary of eta value to be

self.density

applied to each measure

Type

(Dict[str, str]): dictionary of density to be

self.nu

to each measure

Type

(Dict[str, float]): dictionary of nu value to be applied

self.dismod_data

to be used in the dismod database

Type

(pd.DataFrame) resulting dismod data formatted

Examples

>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>>
>>> settings = load_settings(BASE_CASE)
>>> covariate_id = [i.country_covariate_id for i in settings.country_covariate]
>>>
>>> i = MeasurementInputs(
>>>    model_version_id=settings.model.model_version_id,
>>>    gbd_round_id=settings.gbd_round_id,
>>>    decomp_step_id=settings.model.decomp_step_id,
>>>    csmr_cause_id = settings.model.add_csmr_cause,
>>>    crosswalk_version_id=settings.model.crosswalk_version_id,
>>>    country_covariate_id=covariate_id,
>>>    conn_def='epi',
>>>    location_set_version_id=settings.location_set_version_id
>>> )
>>> i.get_raw_inputs()
>>> i.configure_inputs_for_dismod(settings)
get_raw_inputs()[source]

Get the raw inputs that need to be used in the modeling.

configure_inputs_for_dismod(settings, mortality_year_reduction=5)[source]

Modifies the inputs for DisMod based on model-specific settings.

Parameters
  • settings (SettingsConfig) – Settings for the model

  • mortality_year_reduction (int) – number of years to decimate csmr and asdr

prune_mortality_data(parent_location_id)[source]

Remove mortality data for descendants that are not children of parent_location_id from the configured dismod data before it gets filled into the dismod database.

Return type

DataFrame

add_covariates_to_data(df)[source]

Add on covariates to a data frame that has age_group_id, year_id or time-age upper / lower, and location_id and sex_id. Adds both country-level and study-level covariates.

Return type

DataFrame

to_gbd_avgint(parent_location_id, sex_id)[source]

Converts the demographics of the model to the avgint table.

Return type

DataFrame

interpolate_country_covariate_values(df, cov_dict)[source]

Interpolates the covariate values onto the data so that the non-standard ages and years match up to meaningful covariate values.

transform_country_covariates(df)[source]

Transforms the covariate data with the transformation ID. :param df: (pd.DataFrame) :return: self

calculate_country_covariate_reference_values(parent_location_id, sex_id)[source]

Gets the country covariate reference value for a covariate ID and a parent location ID. Also gets the maximum difference between the reference value and covariate values observed.

Run this when you’re going to make a DisMod AT database for a specific parent location and sex ID.

Param

(int)

Parameters
  • parent_location_id (int) – (int)

  • sex_id (int) – (int)

Return type

CovariateSpecs

Returns

List[CovariateSpec] list of the covariate specs with the correct reference values and max diff.

reset_index(drop, inplace)[source]
class cascade_at.inputs.measurement_inputs.MeasurementInputsFromSettings(settings)[source]

Bases: cascade_at.inputs.measurement_inputs.MeasurementInputs

Wrapper for MeasurementInputs that takes a settings object rather than the individual arguments. For convenience.

Examples

>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE)
>>> i = MeasurementInputs(settings)
>>> i.get_raw_inputs()
>>> i.configure_inputs_for_dismod()

Modeling

The model module provides tools to build a Dismod-AT model with variables, constraints, priors, and in the grid structure that Dismod-AT requires.

The main model object is documented here Model Class. The model object has two levels maximum (parents and children). To build that model object with “global” settings from an EpiViz-AT model, we have a wrapper around the model object, described below in Grid Alchemy that builds a two-level model at any parent location ID in a model hierarchy.

Var Class

class cascade_at.model.var.Var(ages, times, column_name='mean')[source]

A Var is a function of age and time, defined by values on a grid. It linearly interpolates over values defined at grid points in a rectangular grid of age and time.

This is a single age-time grid. It is usually found in cascade.model.DismodGroups object which is a set of age-time grids. The following are DismodGroups containing cascade.model.Var: the fit, initial guess, truth var, and scale var.

Parameters
  • ages (List[float]) – Points along the age axis.

  • times (List[float]) – Points in time.

  • column_name (str) – A var has an internal Pandas DataFrame representation, and this column name can be mean or meas_value, depending on which Var is needed.

__setitem__(at_slice, value)[source]

To set a value on a Var instance, set it on ranges of age and time or at specific ages and times.

>>> var = Var([0, 10, 20], [2000])
>>> var[:, :] = 0.001
>>> var[5:50, 2000] = 0.01
>>> var[10, :] = 0.02
Parameters
  • at_slice (slice, slice) – What to change, as integer offset into ages and times.

  • value (float) – A float or integer.

__getitem__(age_and_time)[source]

Gets the value of a Var at a single point. The point has to be one of the ages and times defined when the var was created.

>>> var = Var([0, 50, 100], [1990, 2000, 2010])
>>> var[:, :] = 1e-4
>>> assert var[50, 2000] == 1e-4

Trying to read from an age and time not in the ages and times of the grid will result in a KeyError.

An easy way to set values is to use the age_time iterator, which loops through the ages and times in the underlying grid.

>>> for age, time in var.age_time():
>>>    var[age, time] = 0.01 * age
Parameters

age_and_time (age, time) – A two-dimensional index of age and time.

Returns

The value at this age and time.

Return type

float

set_mulstd(kind, value)[source]

Set the value of the multiplier on the standard deviation. Kind must be one of “value”, “dage”, or “dtime”. The value should be convertible to a float.

>>> var = Var([50], [2000, 2001, 2002])
>>> var.set_mulstd("value", 0.4)
get_mulstd(kind)[source]

Get the value of a standard deviation multiplier for a Var.

>>> var = Var([50], [2000, 2001, 2002])
>>> var.set_mulstd("value", 0.4)
>>> assert var.get_mulstd("value") == 4

If the standard deviation multiplier wasn’t set, then this will return a nan.

>>> assert np.isnan(var.get_mulstd("dage"))
__call__(age, time)[source]

A Var is a function of age and time, and this is how to call it.

>>> var = Var([0, 100], [1990, 2000])
>>> var[0, 1990] = 0
>>> var[0, 2000] = 1
>>> var[100, 1990] = 2
>>> var[100, 2000] = 3
>>> for a, t in var.age_time():
>>>     print(f"At corner ({a}, {t}), {var(a, t)}")
>>> for a, ti in [[53, 1997], [-5, 2000], [120, 2000], [0, 1900], [0, 2010]]:
>>>     print(f"Anywhere ({a}, {t}), {var(a, t)}")
At corner (0.0, 1990.0), 0.0
At corner (0.0, 2000.0), 1.0
At corner (100.0, 1990.0), 2.0
At corner (100.0, 2000.0), 3.0
Anywhere (53, 2000.0), 2.06
Anywhere (-5, 2000.0), 1.0
Anywhere (120, 2000.0), 3.0
Anywhere (0, 2000.0), 1.0
Anywhere (0, 2000.0), 1.0

The grid points in a Var represent a continuous function, determined by bivariate interpolation. All points outside the grid are equal to the nearest point inside the grid.

Age Time Grid

class cascade_at.model.age_time_grid.AgeTimeGrid(ages, times, columns)[source]

Bases: object

The AgeTime grid holds rows of a table at each age and time value.

At each age and time point is a DataFrame consisting of the columns given in the constructor. So getting an item returns a dataframe with those columns. Setting a DataFrame sets those columns. Each AgeTimeGrid has three possible mulstds, for value, dage, dtime.

>>> atg = AgeTimeGrid([0, 10, 20], [1990, 2000, 2010], ["height", "weight"])
>>> atg[:, :] = [6.1, 195]
>>> atg[:, :].height = [5.9]
>>> atg[10, 2000] = [5.7, 180]
>>> atg[5:17, 1980:1990].weight = 125
>>> assert (atg[20, 2000].weight == 195).all()
>>> assert isinstance(atg[0, 1990], pd.DataFrame)

If the column has the same name as a function (mean), then access it with getitem,

>>> atg[:, :]["mean"] = [5.9]

Why is this in Pandas, when it’s a regular array of data with an index, which makes it better suited to XArray, or even a Numpy array? It needs to interface with a database representation, and Pandas is a better match there.

property mulstd
age_time()[source]
variable_count()[source]

DismodGroups Class

class cascade.model.DismodGroups

A DismodGroups contains Var instances or contains SmoothGrid instances. It gives them the shape of the whole model, so it expresses what rates are nonzero, what random effects are defined, and on which of these there are covariate multipliers.

The DismodGroups structure will appear in lots of places. The fit returned by Dismod will be a DismodGroups containing Var objects. The Model itself is a DismodGroups containing SmoothGrid objects.

A classic use of this is to create a new DismodGroups of Var. The first loop is over the rate, random effect, and covariate group names. The inner loop is over particular sets of keys, which are composed of tuples of the primary rate, covariate name, and location IDs.

var_groups = DismodGroups()
for group_name, group in var_ids.items():
    for key, var_id_mapping in group.items():
        var_groups[group_name][key] = var_builder(table, var_id_mapping)
rate[primary_rate]

This is a dictionary of rates. They are always one of the five underlying rates: iota, chi, omega, rho, pini:

dg = DismodGroups()
dg.rate["iota"] = Var([0, 1, 50], [2000])
random_effect[(primary_rate, child_location)]

A dictionary of random effects on the rates, so the keys are a rate and the ID of the child for which this is a random effect. When constructing a Model, we typically want to make one SmoothGrid of priors for all child random effects on a particular rate. In that case, specify the child ID as None:

model = Model()  # A Model is a DismodGroups object, too.
model.random_effect[("iota", None)] = SmoothGrid([0, 100], [1990])

scale = DismodGroups()
scale.random_effect[("omega", 2)] = Var([0, 100], [1990, 2000])
scale.random_effect[("omega", 3)] = Var([0, 100], [1990, 2000])
alpha[(covariate_name, rate_name)]

Alpha are covariate multipliers on the rates. The key is the name of the Covariate, which should match the name in the class given as an argument to the Session object. The rate name is one of the five underlying rates.

beta[(covariate_name, integrand_name)]

Beta are covariate multipliers on the measured value of the integrands. The integrand name is one of the canonical values.

gamma[(covariate_name, integrand_name)]

Gamma are covariate multipliers on the measured standard deviation of the integrands.

Priors

These are classes for the priors.

class cascade_at.model.priors._Prior

All priors have these methods.

parameters()

Returns a dictionary of all parameters for this prior, including the prior type as “density”.

assign(parameter=value, parameter=value...)

Creates a new Prior object with the same parameters as this Prior, except for the requested changes.

class cascade_at.model.priors.Uniform(lower, upper, mean=None, eta=None, name=None)[source]
Parameters
  • lower (float) – Lower bound

  • upper (float) – Upper bound

  • mean (float) – Doesn’t make sense, but it’s used to seed solver.

  • eta (float) – Used for logarithmic distributions.

  • name (str) – A name in case this is a pet prior.

mle(draws)[source]

Using draws, assign a new mean, guaranteed between lower and upper.

Parameters

draws (np.ndarray) – 1D array of floats.

Returns

A new distribution with the mean set to the mean of draws.

Return type

Uniform

class cascade_at.model.priors.Constant(mean, name=None)[source]
Parameters
  • mean (float) – The const value.

  • name (str) – A name for this prior, e.g. Susan.

mle(_=None)[source]

Don’t change the const value. It is unaffected by this call.

class cascade_at.model.priors.Gaussian(mean, standard_deviation, lower=- inf, upper=inf, eta=None, name=None)[source]

A Gaussian is

\[f(x) = \frac{1}{2\pi \sigma^2} e^{-(x-\mu)^2/(2\sigma^2)}\]

where \(\sigma\) is the variance and \(\mu\) the mean.

Parameters
  • mean (float) – This is \(\mu\).

  • standard_deviation (float) – This is \(\sigma\).

  • lower (float) – lower limit.

  • upper (float) – upper limit.

  • eta (float) – Offset for calculating standard deviation.

  • name (str) – Name for this prior.

mle(draws)[source]

Assign new mean and stdev, with mean clamped between upper and lower.

Parameters

draws (np.ndarray) – A 1D array of floats.

Returns

With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.

Return type

Gaussian

class cascade_at.model.priors.Laplace(mean, standard_deviation, lower=- inf, upper=inf, eta=None, name=None)[source]

This version of the Laplace distribution is parametrized by its variance instead of by scaling of the axis. Usually, the Laplace distribution is

\[f(x) = \frac{1}{2b}e^{-|x-\mu|/b}\]

where \(\mu\) is the mean and \(b\) is the scale, but the variance is \(\sigma^2=2b^2\), so the Dismod-AT version looks like

\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\sqrt{2}|x-\mu|/\sigma}.\]

The standard deviation assigned is \(\sigma\).

mle(draws)[source]

Assign new mean and stdev, with mean clamped between upper and lower.

Parameters

draws (np.ndarray) – A 1D array of floats.

Returns

With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.

Return type

Gaussian

class cascade_at.model.priors.StudentsT(mean, standard_deviation, nu, lower=- inf, upper=inf, eta=None, name=None)[source]

This Students-t must have \(\nu>2\). Students-t distribution is usually

\[f(x,\nu) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu)}(1+x^2/\nu)^{-(\nu+1)/2}\]

with mean 0 for \(\nu>1\). The variance is \(\nu/(\nu-2)\) for \(\nu>2\). Dismod-AT rewrites this using \(\sigma^2=\nu/(\nu-2)\) to get

\[f(x) = \frac{\Gamma((\nu+1)/2)}{\sqrt(\pi\nu)\Gamma(\nu/2)} \left(1 + (x-\mu)^2/(\sigma^2(\nu-2))\right)^{-(\nu+1)/2}\]
mle(draws)[source]

Assign new mean and stdev, with mean clamped between upper and lower.

Parameters

draws (np.ndarray) – A 1D array of floats.

Returns

With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.

Return type

Gaussian

class cascade_at.model.priors.LogGaussian(mean, standard_deviation, eta, lower=- inf, upper=inf, name=None)[source]

Dismod-AT parametrizes the Log-Gaussian with the standard deviation as

\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\log((x-\mu)/\sigma)^2/2}\]
mle(draws)[source]

Assign new mean and stdev, with mean clamped between upper and lower. This does a fit using a normal distribution.

Parameters

draws (np.ndarray) – A 1D array of floats.

Returns

With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.

Return type

Gaussian

class cascade_at.model.priors.LogLaplace(mean, standard_deviation, eta, lower=- inf, upper=inf, name=None)[source]
class cascade_at.model.priors.LogStudentsT(mean, standard_deviation, nu, eta, lower=- inf, upper=inf, name=None)[source]
mle(draws)[source]

Assign new mean and stdev, with mean clamped between upper and lower. This does a fit using a normal distribution.

Parameters

draws (np.ndarray) – A 1D array of floats.

Returns

With mean and stdev set, where mean is between upper and lower, by force. Upper and lower are unchanged.

Return type

Gaussian

Covariate Multipliers

See more information about how covariate settings and data are pulled in Covariates.

class cascade_at.model.covariate.Covariate(column_name, reference=None, max_difference=None)[source]

Bases: object

Establishes a reference value for a covariate column on input data and in output data. It is possible to create a covariate column with nothing but a name, but it must have a reference value before it can be used in a model.

Parameters
  • column_name (str) – Name of the column in the input data.

  • reference (Optional[float]) – Reference where covariate has no effect.

  • max_difference (Optional[float]) – If a data point’s covariate is farther than max_difference from the reference value, then this data point is excluded from the calculation. Must be greater than or equal to zero.

property name
property reference
property max_difference

SmoothGrid Class

A SmoothGrid represents model priors (as opposed to data priors) in a Dismod-AT model. A Model is a bunch of SmoothGrids, one for each rate, random effect, and covariate multiplier.

For instance, in order to set priors on underlying incidence rate, iota, create a SmoothGrid, set its priors, and add it to the Model:

smooth = SmoothGrid([0, 5, 10, 50, 100], [1990, 2015])
smooth.value[:, :] = Uniform(mean=0.01, lower=1e-6, upper=5)
smooth.dage[:, :] = Gaussian(mean=0, standard_deviation=10)
smooth.dtime[:, :] = Gaussian(mean=0, standard_deviation=0.1)

All of the priors in a SmoothGrid need to be defined. There is a value prior at each age and time, but the prior for difference in age and time are forward differences, so there is no prior for the largest age point and largest time point. That means you’ll notice examples with no dtime priors when the underlying grid is defined for only one year.

If you want more control over exact priors, iterate over them. The age_time_diff iterator returns the age and time at the age points but also the difference in age and time to the next age point:

for age, time, age_diff, time_diff in smooth.age_time_diff():
    if not isinf(age_diff):
        smooth.dage[age, time] = \
            Gaussian(mean=0, standard_deviation=1 + 5 * age_diff)

This would change the standard deviation as the age interval changes, which could be helpful when age intervals change greatly. The check for isinf catches the last age difference, which returns the value inf because there is no next age point.

It is also possible to see what priors are set. This gets the prior at each age and time. Then it sets a new value for the prior with twice-as-large a standard deviation but the same density:

for age, time in smooth.age_time():
    prior = smooth.value[age, time]
    print(f"prior mean {prior.mean} std {prior.standard_deviation}")
    smooth.value[age, time] = prior.update(standard_deviation=2 * prior.standard_deviation)
class cascade_at.model.smooth_grid.SmoothGrid(ages, times)[source]

The Smooth Grid is a set of priors on an age-time grid.

Parameters
  • ages

  • times

Smooth Grid Priors

For the Rate tab, Random Effect tab, Study tab and Country Covariate tab, the interface sets priors. This describes how those settings are interpreted. Most of this work happens in the function cascade.executor.construct_model.smooth_grid_from_smoothing_form(), and you can check its source there.

The default value, dage, and dtime priors are used to initialize those parts of the smooth grid. For smooth grids with only one age, the dage priors aren’t meaningful, and the same is true for dtime priors when there is only one year.

After that, the detailed priors are applied in the order they appear in the settings, and note that the order may or may not reflect the order in the user interface. There are three ways to specify which age and time points each detailed prior applies to:

  • age_lower and age_upper - A missing value here (one that’s not filled-in in the UI) is treated as -infinity or infinity, respectively.

  • time_lower and time_upper - Missing values similarly set to include all points on that side.

  • born_lower and born_upper - Each line for the born limit corresponds to \(a \le t - b\) or \(a \ge t - b\), respectively.

For each prior, all three of these sieves are applied to the grid of ages and times defined by the age values and time values for that smooth grid. If a detailed prior doesn’t match any of the age and time points in this grid, there will be a statement in the log that says “No ages and times match prior with extents <lower and upper extents>.”

Model Class

The Model holds all of the SmoothGrids that define priors on rates, random effects, and covariates. It also has a few other properties necessary to define a complete model.

  • Which of the rates are nonzero. This is a list of, for instance, [“iota”, “omega”, “chi”].

  • The parent location as an integer ID. These correspond to the IDs supplied to the Dismod-AT session.

  • A list of child locations. Not children and grandchildren, but the direct child locations as integer IDs.

  • A list of covariates, supplied as Covariate objects.

  • Weight functions, that are used to compute integrands. Each weight function is a Var.

  • A scaling function, which sets the scale for every model variable. If this isn’t set, it will be calculated by Dismod-AT from the mean of value priors. It is used to ensure different terms in the likelihood have similar importance.

class cascade_at.model.model.Model(nonzero_rates, parent_location, child_location=None, covariates=None, weights=None)[source]
>>> from cascade_at.inputs.locations import LocationDAG
>>> locations = LocationDAG(location_set_version_id=429)
>>> m = Model(["chi", "omega", "iota"], 6, locations.dag.successors(6))
Parameters
  • nonzero_rates (List[str]) – A list of rates, using the Dismod-AT terms for the rates, so they are “iota”, “chi”, “omega”, “rho”, and “pini”.

  • parent_location (int) – The location ID for the parent.

  • child_location (Optional[List[int]]) – List of the children.

  • A list of covariate objects. This supplies the reference values and max differences, (covariates) – used to exclude data by covariate value.

  • weights (Optional[Dict[str, Var]]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.

Grid Alchemy

In order to build two-level models with the settings from EpiViz-AT but at different parent locations, and extracting the correct information from the measurement inputs, we use a wrapper around all of the modeling components, with a method called construct_two_level_model.

This alchemy object is one of the three things that is read in each time we grab a Context object.

class cascade_at.model.grid_alchemy.Alchemy(settings)[source]

Bases: object

An object initialized with model settings from cascade.settings.configuration.Configuration that can be used to construct parent-child location-specific models with the attribute ModelConstruct.construct_two_level_model().

Examples

>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings
>>> settings = load_settings(BASE_CASE)
>>> mc = Alchemy(settings)
>>> i = MeasurementInputsFromSettings(settings)
>>> i.get_raw_inputs()
>>> mc.construct_two_level_model(location_dag=i.location_dag,
>>>                              parent_location_id=102,
>>>                              covariate_specs=i.covariate_specs)
construct_age_time_grid()[source]

Construct a DEFAULT age-time grid, to be updated when we initialize the model.

Return type

Dict[str, ndarray]

construct_single_age_time_grid()[source]

Construct a single age-time grid. Use this age and time when a smooth grid doesn’t depend on age and time.

Return type

Tuple[ndarray, ndarray]

get_smoothing_grid(rate)[source]

Construct a smoothing grid for any rate in the model.

Parameters

rate (Smoothing) – Some smoothing form for a rate.

Return type

SmoothGrid

Returns

  • The rate translated into a SmoothGrid based on the model settings’

  • default age and time grids.

get_all_rates_grids()[source]

Get a dictionary of all the rates and their grids in the model.

Return type

Dict[str, SmoothGrid]

static override_priors(rate_grid, update_dict=typing.Dict[str, numpy.ndarray], new_prior_distribution='gaussian')[source]

Override priors for rates. This is used when we want to do posterior to prior, so we are overriding the global settings with location-specific settings based on parent posteriors.

Parameters
  • rate_grid (SmoothGrid) – SmoothGrid object for a rate

  • update_dict – Dictionary with ages and times vectors and draws for values, dage, and dtime to use in overriding the prior.

  • new_prior_distribution (Optional[str]) – The new prior distribution to override the existing priors.

static apply_min_cv_to_prior_grid(prior_grid, min_cv, min_std=1e-10)[source]

Applies the minimum coefficient of variation to a _PriorGrid to enforce that minCV across all variables in the grid. Updates the _PriorGrid in place.

Return type

None

construct_two_level_model(location_dag, parent_location_id, covariate_specs, weights=None, omega_df=None, update_prior=None, min_cv=None, update_mulcov_prior=None)[source]

Construct a Model object for a parent location and its children.

Parameters
  • location_dag (LocationDAG) – Location DAG specifying the location hierarchy

  • parent_location_id (int) – Parent location to build the model for

  • covariate_specs (CovariateSpecs) – covariate specifications, specifically will use covariate_specs.covariate_multipliers

  • weights (Optional[Dict[str, Var]]) –

  • omega_df (Optional[DataFrame]) – data frame with omega values in it (other cause mortality)

  • update_prior (Optional[Dict[str, Dict[str, ndarray]]]) – dictionary of dictionary for prior updates to rates

  • update_mulcov_prior (Optional[Dict[Tuple[str, str, str], _Prior]]) – dictionary of mulcov prior updates

  • min_cv (Optional[Dict[str, Dict[str, float]]]) – dictionary (can be defaultdict) for minimum coefficient of variation keyed by cascade level, then by rate

Dismod Database API

This module describes the interface for reading and writing from dismod databases. Dismod-AT works on SQLite databases, and we need a user-friendly way to write data to and extract data from these databases. It is also important to make sure we have all of the correct columns and column types.

The input tables and column types are explained here, and the output tables and column types are explained here.

We mimic that table metadata here, and then build an interface on top of it for easy reading and writing.

Interface

The base interface is DismodSQLite, and the input and output class has getters and setters for each of the tables (DismodIO, not documented here).

To use a DismodIO(DismodSQLite) interface, you can do

from cascade_at.dismod.api.dismod_io import DismodIO
file = 'my_database.db'
db = DismodIO(file)

# Tables are stored as attributes, e.g.
db.data
db.age
db.time
db.prior

# Tables can be set with
db.data = pd.DataFrame(...)
class cascade_at.dismod.api.dismod_sqlite.DismodSQLite(path)[source]

Bases: object

Initiates an SQLite reader from the path.

Parameters

path (Union[str, Path]) – A string or Path pointing to the DisMod database file.

create_tables(tables=None)[source]

Make all of the tables in the metadata.

update_table_columns(table_name, table)[source]

Updates the table columns with additional columns like “c_” which are comments and “x_” which are covariates.

read_table(table_name)[source]

Read a table from the database in engine specified.

write_table(table_name, table)[source]

Writes a table to the database in the engine specified.

Parameters
  • table_name (str) – the name of the table to write to

  • table (pd.DataFrame) – data frame to write

empty_table(table_name, extra_columns=None)[source]

Initializes an empty table for table_name.

Run Dismod Commands

To run dismod commands on a database (all possible options are here), you can use the following helper functions. They will figure out where your dmdismod executable is, whether it be installed on your computer or pulling from docker, based on the installation of cascade_at_scripts.

cascade_at.dismod.api.run_dismod.run_dismod(dm_file, command)[source]

Executes a command on a dismod file.

Parameters
  • dm_file (str) – the dismod db filepath

  • command (str) – a command to run

cascade_at.dismod.api.run_dismod.run_dismod_commands(dm_file, commands, sys_exit=True)[source]

Runs multiple commands on a dismod file and returns the exit statuses. Will raise an exception if it runs into an error.

Parameters
  • dm_file (str) – the dismod db filepath

  • commands (List[str]) – a list of strings

  • sys_exit – whether to exit the code altogether if there is an error. If False, then it will pass the error string back to the original python process.

Fill and Extract Helpers

In order to fill data into the dismod databases in a meaningful way for the cascade, we have two classes that are subclasses of DismodIO and provide easy functionality for filling tables based on a model version’s settings.

Dismod Filler
class cascade_at.dismod.api.dismod_filler.DismodFiller(path, settings_configuration, measurement_inputs, grid_alchemy, parent_location_id, sex_id, child_prior=None, mulcov_prior=None)[source]

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and takes everything from the collector module and puts them into the Dismod database tables in the correct construction.

Dismod Filler wraps a dismod database and fills all of the tables using the measurement inputs object, settings, and the grid alchemy constructor.

It optionally includes rate priors and covariate multiplier priors.

Parameters
  • path (Union[str, Path]) – the path of the dismod database

  • settings_configuration (SettingsConfig) – the settings configuration object

  • measurement_inputs (MeasurementInputs) – the measurement inputs object

  • grid_alchemy (Alchemy) – the grid alchemy object

  • parent_location_id (int) – the parent location ID for this database

  • sex_id (int) – the reference sex for this database

  • child_prior (Optional[Dict[str, Dict[str, ndarray]]]) – a dictionary of child rate priors to use. The first level of the dictionary is the rate name, and the second is the type of prior, being value, age, or dtime.

self.parent_child_model

Model that was constructed from grid_alchemy parameters for one specific parent and its descendents

Examples

>>> from pathlib import Path
>>> from cascade_at.model.grid_alchemy import Alchemy
>>> from cascade_at.inputs.measurement_inputs import MeasurementInputsFromSettings
>>> from cascade_at.settings.base_case import BASE_CASE
>>> from cascade_at.settings.settings import load_settings
>>> settings = load_settings(BASE_CASE)
>>> inputs = MeasurementInputsFromSettings(settings)
>>> inputs.demographics.location_id = [102, 555] # subset the locations to make it go faster
>>> inputs.get_raw_inputs()
>>> inputs.configure_inputs_for_dismod(settings)
>>> alchemy = Alchemy(settings)
>>> da = DismodFiller(path=Path('temp.db'),
>>>                    settings_configuration=settings,
>>>                    measurement_inputs=inputs,
>>>                    grid_alchemy=alchemy,
>>>                    parent_location_id=1,
>>>                    sex_id=3)
>>> da.fill_for_parent_child()
get_omega_df()[source]

Get the correct omega data frame for this two-level model.

Return type

DataFrame

get_parent_child_model()[source]

Construct a two-level model that corresponds to this parent location ID and its children.

Return type

Model

calculate_reference_covariates()[source]

Calculates reference covariate values based on the input object and the parent/sex we have in the two-level model. Modifies the baseline covariate specs object.

Return type

CovariateSpecs

fill_for_parent_child(**options)[source]

Fills the Dismod database with inputs and a model construction for a parent location and its descendents.

Pass in some optional keyword arguments to fill the option table with additional info or to over-ride the defaults.

Return type

None

node_id_from_location_id(location_id)[source]

Get the node ID from a location ID in an already created node table.

Return type

int

fill_reference_tables()[source]

Fills all of the reference tables including density, node, covariate, age, and time.

fill_data_tables()[source]

Fills the data tables including data and avgint.

fill_grid_tables()[source]

Fills the grid-like tables including weight, rate, smooth, smooth_grid, prior, integrand, mulcov, nslist, nslist_pair.

construct_option_table(**kwargs)[source]

Construct the option table with the default arguments, and if needed can pass in some kwargs to update the dictionary with new options or over-ride old options.

Return type

DataFrame

Dismod Extractor
class cascade_at.dismod.api.dismod_extractor.DismodExtractor(path)[source]

Bases: cascade_at.dismod.api.dismod_io.DismodIO

Sits on top of the DismodIO class, and extracts helpful data frames from the dismod database tables.

Parameters

path (str) – The database filepath

get_predictions(locations=None, sexes=None, samples=False, predictions=None)[source]

Get the predictions from the predict table for locations and sexes. Will either return a column of ‘mean’ if not samples, otherwise ‘draw’, which can then be reshaped wide if necessary.

Return type

DataFrame

gather_draws_for_prior_grid(location_id, sex_id, rates, value=True, dage=False, dtime=False, samples=True)[source]

Takes draws and formats them for a prior grid for values, dage, and dtime. Assumes that age_lower == age_upper and time_lower == time_upper for all data rows. We might not want to do all value, dage, and dtime, so pass False if you want to skip those.

Parameters
  • location_id (int) –

  • sex_id (int) –

  • rates (List[str]) – list of rates to get the draws for

  • value (bool) – whether to calculate value priors

  • dage (bool) – whether to calculate dage priors

  • dtime (bool) – whether to calculate dtime priors

  • samples (bool) – whether the prior came from samples

Returns

Return type

Dictionary of 3-d arrays of value, dage, and dtime draws over age and time for this loc and sex

format_predictions_for_ihme(gbd_round_id, locations=None, sexes=None, samples=False, predictions=None)[source]

Formats predictions from the prediction table and returns either the mean or draws, based on whether or not samples is False or True.

Parameters
  • locations (Optional[List[int]]) – A list of locations to extract from the predictions

  • sexes (Optional[List[int]]) – A list of sexes to extract from the predictions

  • gbd_round_id (int) – The GBD round ID to format the predictions for

  • samples (bool) – Whether or not the predictions have draws (samples) or whether it is just one fit.

  • predictions (Optional[DataFrame]) – An optional data frame with the predictions to use rather than reading them directly from the database.

Returns

Return type

Data frame with predictions formatted for the IHME databases.

Table Creation

The DismodFiller uses the following table creation functions internally.

Formatting Reference Tables

The dismod database needs some standard reference tables. These are made with the following functions.

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_integrand_table(data_cv_from_settings=None, default_data_cv=0.0)[source]

Constructs the integrand table and adds data CV in the minimum_meas_cv column.

Parameters
  • data_cv_from_settings ((optional dict) key, value pair that has) – integrands mapped to data cv

  • default_data_cv ((float) default value for data CV to use) –

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.default_rate_table()[source]

Constructs the default rate table with rate names and ids.

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_node_table(location_dag)[source]

Constructs the node table from a location DAG’s to_dataframe() method.

Parameters

location_dag (LocationDAG) – location hierarchy object

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_covariate_table(covariates)[source]

Constructs the covariate table from a list of Covariate objects.

Return type

DataFrame

cascade_at.dismod.api.fill_extract_helpers.reference_tables.construct_density_table()[source]

Constructs the default density table.

Return type

DataFrame

Formatting Dismod Data Tables

There are helper functions to create data files. Broke them up into small functions to help with unit testing.

cascade_at.dismod.api.fill_extract_helpers.data_tables.prep_data_avgint(df, node_df, covariate_df)[source]

Preps both the data table and the avgint table by mapping locations to nodes and covariates to names.

Putting it in the same function because it does the same stuff, but data and avgint need to be called separately because dismod requires different columns.

Parameters
  • df (DataFrame) – The data frame to map

  • node_df (DataFrame) – The node table from dismod db

  • covariate_df (DataFrame) – The covariate table from dismod db

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_data_table(df, node_df, covariate_df, ages, times)[source]

Constructs the data table from input df.

Parameters
  • df (DataFrame) – data frame of inputs that have been prepped for dismod

  • node_df (DataFrame) – the dismod node table

  • covariate_df (DataFrame) – the dismod covariate table

  • ages (ndarray) –

  • times (ndarray) –

cascade_at.dismod.api.fill_extract_helpers.data_tables.construct_gbd_avgint_table(df, node_df, covariate_df, integrand_df, ages, times)[source]

Constructs the avgint table using the output df from the inputs.to_avgint() method.

Parameters
  • df (DataFrame) – The data frame to construct the avgint table from, that has things like ages, times, nodes (locations), sexes, etc.

  • node_df (DataFrame) – dismod node data frame

  • covariate_df (DataFrame) – dismod covariate data frame

  • integrand_df (DataFrame) – dismod integrand data frame

  • ages (ndarray) – array of ages for the model

  • times (ndarray) – array of times for the model

Return type

DataFrame

Formatting Grid Tables

There are helper functions to create grid tables in the dismod database. These are things like WeightGrid and SmoothGrid.

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_model_tables(model, location_df, age_df, time_df, covariate_df)[source]

Main function that loops through the items from a model object, which include rate, random_effect, alpha, beta, and gamma and constructs the modeling tables in dismod db.

Each of these are “grid” vars, so they need entries in prior, smooth, and smooth_grid. This function returns those tables.

It also constructs the rate, integrand, and mulcov tables (alpha, beta, gamma), plus nslist and nslist_pair tables.

Parameters
  • model (Model) – A model object that has rate information

  • location_df (DataFrame) – A location / node data frame

  • age_df (DataFrame) – An age data frame for dismod

  • time_df (DataFrame) – A time data frame for dismod

  • covariate_df (DataFrame) – A covariate data frame for dismod

Returns

rate, prior, smooth, smooth_grid, mulcov, nslist, nslist_pair, and subgroup tables

Return type

A dictionary of data frames for each table name, includes

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_weight_grid_tables(weights, age_df, time_df)[source]

Constructs the weight and weight_grid tables.”

Parameters
  • weights (Dict[str, Var]) – There are four kinds of weights: “constant”, “susceptible”, “with_condition”, and “total”. No other weights are used.

  • age_df – Age data frame from dismod db

  • time_df – Time data frame from dismod db

Returns

Return type

Tuple of the weight table and the weight grid table

cascade_at.dismod.api.fill_extract_helpers.grid_tables.construct_subgroup_table()[source]

Constructs the default subgroup table. If we want to actually use the subgroup table, need to build this in.

Return type

DataFrame

Helper Functions
Posterior to Prior

When we do “posterior to prior” that means to take the fit from a parent database and use the rate posteriors as the prior for the child fits. This happens in DismodFiller when it builds the two-level model with Alchemy because it replaces the default priors with the ones passed in.

The posterior is passed down by predicting the parent model on the rate grid for the children. To construct the rate grid, we use the following function:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.get_prior_avgint_grid(grids, sexes, locations, midpoint=False)[source]

Get a data frame to use for setting up posterior predictions on a grid. The grids are specified in the grids parameter.

Will still need to have covariates added to it, and prep data from dismod.api.data_tables.prep_data_avgint to convert nodes and covariate names before it can be input into the avgint table in a database.

Parameters
  • grids (Dict[str, Dict[str, ndarray]]) – A dictionary of grids with keys for each integrand, which are dictionaries for “age” and “time”.

  • sexes (List[int]) – A list of sexes

  • locations (List[int]) – A list of locations

  • midpoint (bool) – Whether to midpoint the grid lower and upper values (recommended for rates).

Returns

“avgint_id”, “integrand_id”, “location_id”, “weight_id”, “subgroup_id”, “age_lower”, “age_upper”, “time_lower”, “time_upper”, “sex_id”

Return type

Dataframe with columns

And then to upload those priors from the rate grid to the IHME databases since the IHME databases require standard GBD ages and times, we use this function. This is just for visualization purposes:

cascade_at.dismod.api.fill_extract_helpers.posterior_to_prior.format_rate_grid_for_ihme(rates, gbd_round_id, location_id, sex_id)[source]

Formats a grid of mean, upper, and lower for a prior rate for the IHME database. Only does this for Gaussian priors.

Parameters
  • rates (Dict[str, SmoothGrid]) – A dictionary of SmoothGrids, keyed by primary rates like “iota”

  • gbd_round_id (int) – the GBD round

  • location_id (int) – the location ID to append to this data frame

  • sex_id (int) – the sex ID to append to this data frame

Returns

Return type

A data frame formatted for the IHME databases

Multithreading

When we want to do multithreading on a dismod database, we can define some process that works, for example, on only a subset of a database’s data or samples, etc. In order to do this work, there is a base class here that is subclassed in sample and Predict since there are tasks that can be done in parallel on one database.

class cascade_at.dismod.api.multithreading._DismodThread(main_db, index_file_pattern)[source]

Splits a dismod database into multiple databases to run parallel processes on the database. The work happens when you call an instantiated _DismodThread.

cascade_at.dismod.api.multithreading.dmdismod_in_parallel(dm_thread, sims, n_pool)[source]

Run a dismod thread in parallel by constructing a multiprocessing pool. A dismod thread is anything that is based off of _DismodThread so it has a __call__ method with an overridden _process method.

Constants

Dismod-AT makes assumptions about the order of variables. In some cases, it has relaxed those assumptions over time, but we retain these as conventions.

class cascade_at.dismod.constants.RateEnum(value)[source]

These are the five underlying rates.

pini = 0

Initial prevalence of the condition at birth, as a fraction of one.

iota = 1

Incidence rate for leaving susceptible to become diseased.

rho = 2

Remission from disease to susceptible.

chi = 3

Excess mortality rate.

omega = 4

Other-cause mortality rate.

class cascade_at.dismod.constants.IntegrandEnum(value)[source]

These are all of the integrands Dismod-AT supports, and they will have exactly these IDs when serialized.

Sincidence = 0

Susceptible incidence, where the denominator is the number of susceptibles. Corresponds to iota.

remission = 1

Remission rate, corresponds to rho.

mtexcess = 2

Excess mortality rate, corresponds to chi.

mtother = 3

Other-cause mortality, corresponds to omega.

mtwith = 4

Mortality rate for those with condition.

susceptible = 5

Fraction of susceptibles out of total population.

withC = 6

Fraction of population with the disease. Total pop is the denominator.

prevalence = 7

Fraction of those alive with the disease, so S+C is denominator.

Tincidence = 8

Total-incidence, where denominator is susceptibles and with-condition.

mtspecific = 9

Cause-specific mortality rate, so mx_c.

mtall = 10

All-cause mortality rate, mx.

mtstandard = 11

Standardized mortality ratio.

relrisk = 12

Relative risk.

incidence = -99

This integrand should never be used, but we need it when we are converting from the epi database measures initially

class cascade_at.dismod.constants.DensityEnum(value)[source]

The distributions supported by Dismod-AT. They always have these ids.

uniform = 0

Uniform Distribution

gaussian = 1

Gaussian Distribution

laplace = 2

Laplace Distribution

students = 3

Students-t Distribution

log_gaussian = 4

Log-Gaussian Distribution

log_laplace = 5

Log-Laplace Distribution

log_students = 6

Log-Students-t Distribution

class cascade_at.dismod.constants.WeightEnum(value)[source]

Dismod-AT allows arbitrary weights, which are functions of space and time, defined by bilinear interpolations on grids. These weights are used to average rates over age and time intervals. Given this problem, there are three kinds of weights that are relevant.

constant = 0

This weight is constant everywhere at 1. This is the no-weight weight.

susceptible = 1

For measures that are integrals over population without the condition.

with_condition = 2

For measures that are integrals over those with the disease.

total = 3

For measures where the denominator is the whole population.

constants.INTEGRAND_TO_WEIGHT = {'Sincidence': <WeightEnum.susceptible: 1>, 'Tincidence': <WeightEnum.total: 3>, 'mtall': <WeightEnum.total: 3>, 'mtexcess': <WeightEnum.with_condition: 2>, 'mtother': <WeightEnum.total: 3>, 'mtspecific': <WeightEnum.total: 3>, 'mtstandard': <WeightEnum.constant: 0>, 'mtwith': <WeightEnum.with_condition: 2>, 'prevalence': <WeightEnum.total: 3>, 'relrisk': <WeightEnum.constant: 0>, 'remission': <WeightEnum.with_condition: 2>, 'susceptible': <WeightEnum.constant: 0>, 'withC': <WeightEnum.constant: 0>}

Each integrand has a natural association with a particular weight because it is a count of events with one of four denominators: constant, susceptibles, with-condition, or the total population.

Dismod Integrand Mappings

There is a big of mapping that has to occur between GBD measures and Dismod-AT integrands and rates. Functions and dictionaries to aid in this mapping are, for example:

cascade_at.dismod.integrand_mappings.integrand_to_gbd_measures(df, integrand_col)[source]

Maps the integrand column to measure IDs and adds in filler measures where necessary (e.g. copies over Sincidence to incidence).

Parameters
  • df (DataFrame) – data frame with integrand

  • integrand_col (str) – column name for integrand column

Returns

Return type

data frame with integrands mapped to measures

See here for more details.

Core Functions

The core module has the following, miscellaneous components. Form Validation is the building blocks for the Settings Configuration Form coming from EpiViz. Shared Functions allows us to import internal shared functions into open source environments (like Travis CI).

Form Validation

The form classes are the building blocks for the Settings Configuration Form.

Fields

This module defines specializations of the general tools in abstract_form, mostly useful field types.

class cascade_at.core.form.fields.BoolField(*args, **kwargs)[source]
class cascade_at.core.form.fields.IntField(*args, **kwargs)[source]
class cascade_at.core.form.fields.FloatField(*args, **kwargs)[source]
class cascade_at.core.form.fields.StrField(*args, **kwargs)[source]
class cascade_at.core.form.fields.NativeListField(*args, **kwargs)[source]

Because we already have a ListField for space separated strings which become lists, this field type should be used when the .json config returns a native python list.

class cascade_at.core.form.fields.FormList(inner_form_constructor, *args, **kwargs)[source]

This represents a homogeneous list of forms. For example, it might be used to contain a list of priors within a smoothing grid.

Parameters
  • inner_form_constructor – A factory which produces an instance of a Form

  • Most often it will just be the Form subclass itself. (subclass.) –

class cascade_at.core.form.fields.Dummy(*args, **kwargs)[source]

A black hole which consumes all values without error. Use to mark sections of the configuration which have yet to be implemented and should be ignored.

validate_and_normalize(instance, root=None)[source]

Validates the data for this field on the given parent instance and transforms the data into it’s normalized form. The actual details of validating and transforming are delegated to subclasses except for checking for missing data which is handled here.

Parameters
  • instance (Form) – the instance of the form for which this field should be validated.

  • root (Form) – pointer back to the base of the form hierarchy.

Returns

a list of error messages with path strings

showing where in this object they occurred. For most fields the path will always be empty.

Return type

[(str, str, str)]

class cascade_at.core.form.fields.OptionField(options, *args, constructor=<class 'str'>, **kwargs)[source]

A field which will only accept values from a predefined set.

Parameters
  • options (list) – The list of options to choose from

  • constructor – A function which takes a string and returns the expected type. Behaves as the constructor for SimpleTypeField. Defaults to str

class cascade_at.core.form.fields.ListField(*args, constructor=<class 'str'>, separator=' ', **kwargs)[source]

A field which takes a string containing values demarcated by some separator and transforms them into a homogeneous list of items of an expected type.

Parameters
  • constructor – A function which takes a string and returns the expected type. Behaves as the constructor for SimpleTypeField. Defaults to str

  • separator (str) – The string to split by. Defaults to a single space.

class cascade_at.core.form.fields.StringListField(*args, constructor=<class 'str'>, separator=' ', **kwargs)[source]
Abstract Form

This module defines general tools for building validators for messy hierarchical parameter data. It provides a declarative API for creating form validators. It tries to follow conventions from form validation systems in the web application world since that is a very similar problem.

Example

Validators are defined as classes with attributes which correspond to the values they expect to receive. For example, consider this JSON blob:

{“field_a”: “10”, “field_b”: “22.4”, “nested”: {“field_c”: “Some Text”}}

A validator for that document would look like this:

class NestedValidator(Form):

field_c = SimpleTypeField(str)

class BlobValidator(Form):

field_a = SimpleTypeField(int) field_b = SimpleTypeField(int) nested = NestedValidator()

And could be used as follows:

>>> form = BlobValidator(json.loads(document))
>>> form.validate_and_normalize()
>>> form.field_a
10
>>> form.nested.field_c
"Some Text"
class cascade_at.core.form.abstract_form.NoValue[source]

Represents an unset value, which is distinct from None because None may actually appear in input data.

class cascade_at.core.form.abstract_form.FormComponent(nullable=False, default=None, display=None, validation_priority=100)[source]

Base class for all form components. It bundles up behavior shared by both (sub)Forms and Fields.

Note

FormComponent, Form and Field all make heavy use of the descriptor protocol (https://docs.python.org/3/howto/descriptor.html). That means that the relationship between objects and the data they operate on is more complex than usual. Read up on descriptors, if you aren’t familiar, and pay close attention to how __set__ and __get__ access data.

Parameters
  • nullable (bool) – If False then missing data for this node is considered an error. Defaults to False.

  • default – Default value to return if unset

  • display (str) – The name used in the EpiViz interface.

  • validation_priority (int) – Sort order for validation.

class cascade_at.core.form.abstract_form.Field(*args, **kwargs)[source]

A field within a form. Fields are responsible for validating the data they contain (without respect to data in other fields) and transforming it into a normalized form.

validate_and_normalize(instance, root=None)[source]

Validates the data for this field on the given parent instance and transforms the data into it’s normalized form. The actual details of validating and transforming are delegated to subclasses except for checking for missing data which is handled here.

Parameters
  • instance (Form) – the instance of the form for which this field should be validated.

  • root (Form) – pointer back to the base of the form hierarchy.

Returns

a list of error messages with path strings

showing where in this object they occurred. For most fields the path will always be empty.

Return type

[(str, str, str)]

class cascade_at.core.form.abstract_form.SimpleTypeField(constructor, *args, **kwargs)[source]

A field which transforms input data using a constructor function and emits errors if that transformation fails.

In general this is used to convert to simple types like int or float. Because it emits only very simple messages it is not appropriate for cases where the cause of any error isn’t obvious from knowing the name of the constructor function and a string representation of the input value.

Parameters

constructor – a function which takes one argument and returns a normalized version of that argument. It must raise ValueError, TypeError or OverflowError if transformation is not possible.

class cascade_at.core.form.abstract_form.Form(source=None, name_field=None, nullable=False, display=None, **kwargs)[source]

The parent class of all forms.

Validation for forms happens in two stages. First all the form’s fields and sub forms are validated. If none of those have errors, then the form is known to be in a consistent state and it’s _full_form_validation method is run to finalize validation. If any field or sub form is invalid then this form’s _full_form_validation method will not be run because the form may be in an inconsistent state.

Simple forms will be valid if all their fields are valid but more complex forms will require additional checks across multiple fields which are handled by _full_form_validation.

Note

A nested form may be marked nullable. It is considered null if all of it’s children are null. If a nullable form is null then it is not an error for non-nullable fields in it to be null. If any of the form’s fields are non-null then the whole form is considered non-null at which point missing data for non-nullable fields becomes an error again.

Parameters
  • source (dict) – The input data to parse. If None, it can be supplied later by calling process_source

  • name_field (str) – If supplied then a field of the same name must be present on the subclass. That field will always have the name of the attribute this class is assigned to in it’s parent rather than the value, if any, that the field had in the input data.

Shared Functions

The central computation team and the scientific computing team at IHME maintain several packages that Cascade-AT relies on. These are not open source, so we can’t use them in Travis CI, or in building the docs. In order to get around this, there is a class that wraps a module and will only use it if its importable.

class cascade_at.core.db.ModuleProxy(module_name)[source]

This class acts like a module. It’s meant to be imported into an init. This exists in order to actively turn off modules during testing. Ensure tests that claim not to use database functions really don’t use them, so that their tests also pass outside IHME.

Examples

>>> # db-queries and db-tools are IHME internal packages
>>>
>>> db_queries = ModuleProxy("db_queries")
>>> ezfuncs = ModuleProxy("db_tools.ezfuncs")
__init__(module_name)[source]

Initialize self. See help(type(self)) for accurate signature.

__dir__()[source]

Default dir() implementation.

Error-Handling Plans

The following is a proposal for how to handle errors. This is not implemented.

Exception-Handling Proposal

If we look at the layers of the code, we can handle errors in different ways between and within the layers.

  • EpiViz is one version of the top of this chain.

  • At the top, catch all exceptions and return as strings to EpiViz on initial call.

  • Within processing the settings, any settings that don’t make sense are returned as a list of errors, not through exception-handling.

  • Below this, assume we are working inside of a UGE job. Failure of one job does not kill all jobs b/c people can use whatever data they get, often times.

  • The UGE job catches all exceptions and sends them to logs. This includes both random exceptions and exceptions that are about the more complex construction of the model. An example of such an exception is that you created a covariate but never set its reference value.

  • Within the code to setup the model, throw exceptions from our custom hierarchy when there is something a modeler could do differently.

  • Dismod-AT errors… maybe these are returned as exceptions?

That would be a hierarchy that looks like:

  • CascadeModelError (This is the one that catches more complicated model setup faults) and is for the modelers.

    • Data selection problems.

    • Algorithm selection problems.

    • Settings selection problems.

and it is only used within the model construction and serialization, not during checking of settings.

Logging

The modelers should be able to see statistical choices, and those can be separate from debugging statements. Those logs would have separate lifetimes on disk, too.

  • Code log This records regular debugging statements, such as function entry and exit. It is kept on the disk.

  • Math log This has information about choices the code makes with the data. It is shown to the users in EpiViz. All of the Math log is always kept.

Code Log

Debug

Up to coder. Will be turned off during production runs.

Info

Kept on in production runs.

Warn

Kept on in production runs. Any warning that fires requires action to to disable it.

Error

Kept on in production runs, and we read all of these.

Math Log

Debug

About choices that are built-in to model logic.

Info

About choices where a switch decides what to do.

Warn

A problem that needs to be fixed, possibly with another run, but it doesn’t make this run completely fail.

Error

Has to be addressed in order to complete this Cascade run.

Mathlog statements should have the following.

  1. Put MATHLOG statements in places where you have context on the data the function affects. This often means the log statement is in the caller.

  2. Include in the log statement summary stats like the number of rows, names of variables, things that inform about this run.

  3. If something was a choice, indicate how a modeler made that choice, and hence how she could unmake it, so refer to the EpiViz selection.

Alec proposes we could construct a hierarchical and narrative MATHLOG which reads, for the modeler, like:

Preparing model:
   Downloading input data:
       ...
   Constructing model representation:
       Adding mortality data from GBD:
           Assigning standard error based on bounds
           ...
   Running dismodat

We could write this as a streamable HTML document.

Faults and Failures

Classify failures by the faults that caused them. Highlight to the modeler the ones they can fix.

  • Model configuration

    • settings don’t meet needs and can be changed.

    • bundle values don’t make sense in some way.

  • IHME Data, maybe modelers know these.

    • Database doesn’t have data we think it should.

    • IHME Database not responding or otherwise having a problem.

  • All the other faults, not possible modelers will fix these.

    • Logic errors

    • Environment errors regarding directories, writability.

All errors go to the math log, which also goes to the code log.

Logging Usability

Messages to the GUI user should include

  • The line in the code, with a link to that line in Github.

  • A link to the exception description in the help on docs.

  • A link to the function in which exception occurred.

These would require, for the link to Github, knowing the git commit so that it links to the right line. For the URL, it would mean having the refs from the objects.inv file that sphinx makes when it makes the docs. It has the mapping from Python entity to its URL and tag in the documentation.

Application Context

Each model run needs to have an object that determines the file structure, connections to the IHME databases, etc.

This context can be modified for a local environment, but that’s not currently implemented in an intuitive or user-friendly way. When we want to enable local runs of an entire cascade, this configuration is what we need to do design work on.

Configuration

There is an additional repository that stores application information for the IHME configuration.

cascade_at.context.configuration.application_config()[source]

Returns a configuration dictionary based on the configuration that is installed into the environment.

Returns

This is a mapping type.

Return type

ConfigParser.SectionProxy

Context

Based on the configuration above, and a model version ID from the epi database, we define a context object that keeps track of database connections and file structures.

class cascade_at.context.model_context.Context(model_version_id, make=False, configure_application=True, root_directory=None)[source]

Bases: object

Context for running a model.

Parameters
  • model_version_id (int) – The model version ID for this context. If you’re not configuring the application, doesn’t matter what this is.

  • make (bool) – Whether or not the make the directory tree for the model.

  • configure_application (bool) – Configure the production application. If False, this can be used for testing on a local machine.

update_status(status)[source]

Updates status in the database.

db_folder(location_id, sex_id)[source]
db_file(location_id, sex_id)[source]

Gets the database file for a given location and sex.

Parameters
  • location_id (int) – Location ID for the database (parent).

  • sex_id (int) – Sex ID for the database, as the reference.

Return type

Path

db_index_file_pattern(location_id, sex_id)[source]

Gets the database file pattern for databases with indices. Used in sample simulate when it’s done in parallel.

Parameters
  • location_id (int) – Location ID for the database (parent).

  • sex_id (int) – Sex ID for the database, as the reference.

Returns

Return type

String representing the absolute path to the index database.

write_inputs(inputs=None, settings=None)[source]

Write the inputs objects to disk.

read_inputs()[source]

Read the inputs from disk.

Return type

(<class ‘cascade_at.inputs.measurement_inputs.MeasurementInputs’>, <class ‘cascade_at.model.grid_alchemy.Alchemy’>, <class ‘cascade_at.settings.settings_config.SettingsConfig’>)

It also provides methods to read in the three things that are always needed to construct models:

Saver and Uploader

The saver model takes results from a Cascade-AT model and saves them in the correct format to the IHME file system and also uploads them to the epi databases.

The results of a Cascade-AT model need to be saved to the IHME epi databases. This module wrangles the draw files from a completed model and uploads summaries to the epi databases for visualization in EpiViz.

Eventually, this module should be replaced by something like save_results_at.

exception cascade_at.saver.results_handler.ResultsError[source]

Raised when there is an error with uploading or validating the results.

class cascade_at.saver.results_handler.ResultsHandler[source]
self.draw_keys

The keys of the draw data frames

self.summary_cols

The columns that need to be present in all summary files

summarize_results(df)[source]

Summarizes results from either mean or draw cols to get mean, upper, and lower cols.

Parameters

df (DataFrame) – A data frame with draw columns or just a mean column

Return type

DataFrame

save_draw_files(df, model_version_id, directory, add_summaries)[source]

Saves a data frame by location and sex in .csv files. This currently saves the summaries, but when we get save_results working it will save draws and then summaries as part of that.

Parameters
  • df (DataFrame) –

    Data frame with the following columns:

    [‘location_id’, ‘year_id’, ‘age_group_id’, ‘sex_id’, ‘measure_id’, ‘mean’ OR ‘draw’]

  • model_version_id (int) – The model version to attach to the data

  • directory (Path) – Path to save the files to

  • add_summaries (bool) – Save an additional file with summaries to upload

Return type

None

save_summary_files(df, model_version_id, directory)[source]

Saves a data frame with summaries by location and sex in summary.csv files.

Parameters
  • df (DataFrame) –

    Data frame with the following columns:

    [‘location_id’, ‘year_id’, ‘age_group_id’, ‘sex_id’, ‘measure_id’, ‘mean’, ‘lower’, and ‘upper’]

  • model_version_id (int) – The model version to attach to the data

  • directory (Path) – Path to save the files to

Return type

None

static upload_summaries(directory, conn_def, table)[source]

Uploads results from a directory to the model_estimate_final table in the Epi database specified by the conn_def argument.

In the future, this will probably be replaced by save_results_dismod but we don’t have draws to work with so we’re just uploading summaries for now directly.

Parameters
  • directory (Path) – Directory where files are saved

  • conn_def (str) – Connection to a database to be used with db_tools.ezfuncs

  • table (str) – which table to upload to

Return type

None

Dismod-AT Concepts

The following pages explain some helpful concepts in Dismod-AT.

Measurement, Rate, Integrand

The Dismod-AT program has its own documentation, which serves well for specifics about database tables, definitions of distributions, and other details. This documentation is a high-level view of what Dismod-AT does in order to explain what you can do with the Cascade.

Dismod-AT does statistical estimation. It is a nonlinear, multi-level regression. The two hierarchical levels are the measurements, at the micro level, and the locations, at the macro level.

Measurements are input data from data bundles. Every measurement has a positive, non-zero standard deviation. A measurement may or may not have the same upper and lower age or the same upper and lower time. All measurements are associated with locations.

Dismod-AT’s central feature is that it estimates rates of a disease process. The disease process is nonlinear and described by a differential equation. We can discuss the behavior of that model in detail later. For this differential equation,

  1. Rates go in.

  2. Prevalence and death comes out.

A Rate is incidence, remission, excess mortality, other-cause mortality, or initial prevalence. A rate is a continuous function of age and time. It’s specified as a set of points, and interpolated between those points, but it’s continuous. Even the initial prevalence is continuous across time but defined only for the youngest age. The data associated with rates is defined at points of age and time, so it isn’t associated with age or time ranges. It also doesn’t have standard deviations.

If we think of a typical linear regression,

\[y = a + bx + \epsilon\]

we can draw an equivalence for Dismod-AT where \(x\) are the covariates, \(b\) are the covariate multipliers, \(\epsilon\) are distributions of priors, a are the rates, and y are the observations. How Dismod-AT connects rates to observations is much more complicated than a typical linear regression.

In order to relate a rate to an observation, Dismod-AT has to do a few steps.

  1. Use the ODE to predict prevalence and death.

  2. Construct a function of rates, prevalence, and death to form the desired observation.

  3. Integrate that function over the requested age and time range to get a single value for the observation.

Integrands are outputs from Dismod-AT that are predictions of either measurements or rates. Because studies observe participants with ranges of ages over periods of time, they are generally associated with the integral of the continuous rates underlying the disease process. For this reason, Dismod-AT calls its predictions of observations integrands. It supports a wide variety of integrands.

Flow of Commands in Dismod-AT

There are a few different ways to use Dismod-AT to examine data. They correspond to different sequences of Dismod-AT commands.

Stream Out Prevalence The simplest use of Dismod-AT is to ask it to run the ordinary differential equation on known rates and produce prevalence, death, and integrands derived from these.

  1. Precondition Provide known values for all rates over the whole domain. List the integrands desired for the output.

  2. Run predict on those rates.

  3. Postcondition Dismod-AT places any requested integrands in its predict table. These can be rates, prevalence, death, or any of the integrands.

Simple Fit to a Dataset This describes a fit with the simplest way to determine uncertainty.

  1. Precondition The input data is observations, with standard deviations, of any of the known integrands.

  2. Run fit on those observations to produce rates and covariate multipliers.

  3. Run predict on the rates to produce integrands.

  4. Postcondition Integrands are in the predict table.

Fit with Asymptotic Uncertainty This fit produces some values of uncertainty.

  1. Precondition The input data is observations, with standard deviations, of any of the known integrands.

  2. Run fit on those observations to produce rates and covariate multipliers.

  3. Run sample asymptotic.

  4. Postcondition Integrands are in the predict table.

Fit with Simulated Uncertainty This uses multiple predictions in order to obtain a better estimate of uncertainty.

  1. Precondition The input data is observations, with standard deviations, of any of the known integrands.

  2. Run fit on those observations to produce rates and covariate multipliers.

  3. Run simulate to generate simulations of measurements data and priors.

  4. Run sample simulate.

  5. Postcondition Integrands are in the predict table.

Smoothing Continuous Functions

We said that rates and covariate multipliers are continuous functions of age and time. It takes a little work to parametrize an interpolated function of age and time.

  • You have to tell it where the control points are. In Cascade, we call this the AgeTimeGrid. It’s a list of ages and a list of times that define a rectangular grid.

  • At each of the control points of the age time grid, Dismod-AT will evaluate how close the rate or covariate multiplier is to some reference value. At these points, we define prior distributions. Cascade makes these value priors part of the PriorGrid.

  • It’s rare to have data points that are dense across all of age and time. Dismod-AT needs to take a data point at one end, a data point at the other end, and draw a line that connects them. We help it by introducing constraints on how quickly a value can change over age and time. These are a kind of regularization of the problem, called age-time difference priors. They apply to the difference in value between one age-time point and the next greater in age and the next-greater in time. As with value priors, these are specified in the Cascade as part of the PriorGrid.

The random effect for locations is also a continuous quantity.

Hierarchical Model

The hierarchical part of Dismod-AT does one thing, estimate how locations affect rates. If the rate at grid point \((i,k)\) is \(q_{ik}(a,t)\), and the covariate multiplier is \(\alpha_{ik}(a,t)\), then the adjusted rate is

\[r_{ik}(a,t) = q_{ik}(a,t) \exp\left(u_{ik}(a,t) + \sum_j x_{ikj}\alpha_{jik}(a,t)\right).\]

The offset, \(u\), is linear with the covariates, but it is inside the exponential, which guarantees that all rates remain positive. This offset is the only random effect in the problem, and it is called the child rate effect because each location, or node in Dismod-AT’s language, is considered a child of a parent.

Because the child rate effect is continuous, you can conclude that it must be defined on a smoothing grid. Dismod-AT will either define one smoothing grid for each child rate effect (one for each of the five rates) or let you define a smoothing grid for every location and every child rate effect, should that be necessary.

Model Variables - The Unknowns

When we ask Dismod-AT to do a fit, what unknowns will it solve for? If we do a fit to a linear regression, \(y ~ b_0 + b_1 x\), then it tells us the parameters \(b_i\). It also tells us the uncertainty, as determined by residuals between predicted and actual \(y\). In the case of Dismod-AT, the model variables are equivalent to those parameters \(b_i\). Dismod-AT documentation lists all of the model variables, but let’s cover the most common ones here.

First are the five disease rates, which are inputs to the ODE. Each rate is a continuous function of age and time, specified by an interpolation among points on an age-time grid. Therefore, the model variables from a rate are its value at each of the age-time points.

The covariate multipliers are also continuous functions of age and time. Each of the covariate multipliers has model variables for every point in its smoothing. There can be a covariate multiplier for each combination of covariate column and application to rate value, measurement value, or measurement standard deviation, so that’s a possible \(3c\) covariate multipliers, where \(c\) is the number of covariate columns.

The child rate effects also are variables. Because there is one for each location, and there is a smoothing grid for child rate effects, this creates many model variables.

Covariates

Covariates are the independent variables in the statistical model. They appear as columns in observation data, associated with each measurement. The word covariate is overdetermined, so we will refer to a covariate column, a covariate use, a covariate multiplier, and applying a covariate.

A covariate column has a unique name and a reference value for which the observed data is considered unadjusted. All priors on covariates are with respect to this unadjusted value.

Outliering by Covariates

Each covariate column has an optional maximum difference to set. If the covariate is beyond the maximum difference from its reference value, then the data point is outliered. As a consequence, that data point will not be in the data subset table. Nor will it subsequently appear in the avgint table.

If there is a need to use two different references or maximum differences for the same covariate column, then duplicate the column.

Usage

Covariate data is columns in the input DataFrame and in the average integrand DataFrame. Let’s not discuss here how to obtain this covariate data, but discuss what Dismod-AT needs to know about those covariate columns in order to use it for a fit.

In order to use a covariate column as a country covariate, specify

  • its reference value

  • an optional maximum difference, beyond which covariate value the data which it predicts will be considered an outlier,

  • one of the five rates (iota, rho, chi, omega, pini), to which it will apply

  • a smoothing grid, as a support on which the covariate effect is solved. This grid defines a mean prior and elastic priors on age and time, as usual for smoothing grids.

We give Dismod-AT measured data with associated covariates. Dismod-AT treats the covariates as a continuous function of age and time, which we call the covariate multiplier. It solves for that continuous function, much like it solves for the rates. Therefore, each application of a covariate column to a rate or measured value or standard deviation requires a smoothing grid.

Applying a study covariate is much the same, except that it usually applies not to a rate but to the value or standard deviation of an integrand.

For instance:

# Assume smooth = Smooth() exists.
income = Covariate("income", 1000)
income_cov = CovariateMultiplier(income, smooth)

model.rates.iota.covariate_multipliers.append(income)
model.outputs.integrands.prevalence.value_covariate_multipliers.append(income)
model.outputs.integrands.prevalence.std_covariate_multipliers.append(income)

Covariates are unique combinations of the covariate column, and the rate or measured value or standard deviation, so they can be accessed that way.

Missing Values

Were a covariate value to be missing, Dismod-AT would assume it has the reference value. In this sense, every measurement always has a covariate. Therefore, the interface requires every measurement explicitly have every covariate.

Hazard Rates

The hazard rate is defined first for an individual:

A hazard rate is the probability, per unit time, that an event will happen given that it has not yet happened.

For a population, the hazard rate is the sum of the hazard rates for all individuals in that population. For instance, the remission rate, as a function of age, averages over all the different times someone may have entered the with-condition state.

The Dismod-AT compartmental model has four Dismod-AT primary rates, all of which are hazard rates,

  • Susceptible Incidence rate, \(\iota\)

  • Remission rate, \(\rho\)

  • Excess mortality rate, \(\chi\)

  • Other-cause mortality rate, \(\omega\)

and an initial condition, birth prevalence, \(p_{ini}\). We call the primary rates hazard rates because they are the probability per unit time that an individual, age \(x\), moves from one compartment to another, given that they have not yet left their current compartment. Note that birth prevalence for a cohort is, when we look at it across years, a birth rate. That is why you will see birth prevalence called one of the Dismod-AT primary rates.

These primary rates are exactly the parameters in the Dismod-AT differential equation,

\[ \begin{align}\begin{aligned}\frac{dS(x)}{dx} = -\iota(x) S(x) +\rho(x) C(x) - \omega(x) S(x)\\\frac{C(x)}{dx} = \iota(x) S(x) - \rho(x) C(x) - \left(\omega(x) + \chi(x)\right) C(x)\end{aligned}\end{align} \]

where \(S(x)\) are susceptibles as a function of cohort age and \(C(x)\) are with-condition as a function of cohort age.

S-Incidence and T-Incidence

We distinguish susceptible incidence rate from total incidence rate. These are also called s-incidence and t-incidence. Total incidence rate is the number of new observations of a disease per person in the population, where both people with and without the disease are counted. Because hazard rates are the probablity per unit time of a transition given that the transition has not happened, we wouldn’t call t-incidence a hazard rate because it includes people for whom the transition to the disease state has already happened. Both, however, can be population rates.

Population Rates

Measurements of a population count events that happen to some set of people. They take the form

\[\frac{\mbox{number of events}}{\mbox{people exposed to that event}}\]

Different measurements have different denominators, and those denominators become weight functions in Dismod-AT. If you get the weight function wrong, then you get the comparison from hazard rates to population measurements wrong. This section lists various measurements and their denominators. People in the \(S\) state are exposed to incidence and death. People in the \(C\) state are exposed to remission and death.

Some population rates are estimates of hazard rates. The population rate for s-incidence is an estimate of a hazard rate. As the age-extent and time-extent for the measurement gets closer to a point estimate, the population rate and the hazard rate become the same value.

We can be exact about the relationship between population rates and hazard rates by following the example of mortality rate in Preston’s Demography. The mortality rate is

\[{}_nm_x = \frac{\int_x^{x+n}l(a)\mu(a)da }{\int_x^{x+n}l(a)da}\]

where \(l(x)=S(x)+C(x)\) is the remaining fraction of those alive and \(\mu(x)\) is the total mortality rate. The numerator in that equation is the age-specific death rate and the denominator is the exposure, as person-years lived, or \({}_nL_x\). When we look at these numbers over age and time, instead of over cohort age, \(x\), the integral changes to

\[{}_nm_a(t) = \frac{\int_t^{t+n}\int_a^{a+n}l(a,t)\mu(a,t)da\:dt }{\int_t^{t+n}\int_a^{a+n}l(a,t)da\:dt}\]

Let’s not write out the double-integral for all examples below, but Dismod-AT does perform its integration over both age and time. Instead, write the following short-hand,

\[{}_nm_a(t) = \frac{\mbox{death events}}{\mbox{Susceptible + With-Condition life-years}}.\]

for the integral.

Similarly, the population susceptible incidence rate is

\[\frac{\mbox{incidence events}}{\mbox{Susceptible life-years}}\]

The population remission rate has the same problem as the incidence, in that it can be counted as a percentage of those with condition who remit or a percentage of the population that remits. If we consider the remission hazard rate, which is the former, then it is

\[\frac{\mbox{remission events}}{\mbox{With-Condition life-years}}\]

Note

We could define a t-remission as

\[\frac{\mbox{remission events}}{\mbox{Susceptible + With-Condition life-years}}.\]

but we don’t. Is that because all remission is of one type or another? Which type?

The population excess mortality rate is

\[\frac{\mbox{excess death events}}{\mbox{With-Condition life-years}}.\]

Other-cause mortality is just like mortality, but only for susceptibles,

\[\frac{\mbox{death events just from susceptible}}{\mbox{Susceptible life-years}}.\]

The population rate for mtall and mtspecific both use \(S(x)+C(x)\) as their weight. The same is true of standardized mortality ratio and relative risk.

Note

Dismod-AT expects the user to provide weight functions. The GBD provides weight functions, which should correspond to \(S(x)+C(x)\). These should also be close enough for \(S(x)\). It would make sense to create and refine the weight corresponding to \(C(x)\) as we solve down the location hierarchy.

Crude Population Rates

Dismod-AT works with life table rates, not crude rates. A crude rate is the number of deaths divided by the number of people exposed to that event. If \(k(t)\) is the birth rate over time, then a crude mortality rate is

\[{}_nM_x = \frac{\int_x^{x+n}k(t-a)l(a)\mu(a)da }{\int_x^{x+n}k(t-a)l(a)da}\]

The life table rate adjusts the crude rate to remove the effect of varying birth rates. In Dismod-AT, the birth rate is normalized to a rate of 1 for all populations. In demographic textbooks, \({}_nm_x\) is called the lifetable mortality rate, and \({}_nM_x\) is called the crude mortality rate.

Note

The bundles aggregate measurements from many sources. Do they use crude population rates or lifetable population rates?

This matters when there is a birth pulse that skews data towards younger or older sides of an age interval. Dismod-AT assumes that the average over an age interval is determined by the lifetable person-years lived.

Testing

Running Tests

Running Unit Tests

Unit tests can run from either the root directory or tests subdirectory using pytest. Note the following useful options for pytest. The first couple are custom flags we created.

  • pytest --ihme This is a flag we created that enables those tests which we would run within the IHME environment. If you write a test that calls IHME databases, you must include the ihme fixture in order for that unit test to run. This guarantees that when Jenkins runs without the –ihme flag, none of the tests it runs require the IHME databases.

  • pytest --dismod This is a flag we created that enables those tests which require having a command-line Dismod-AT running. Using ihme turns on dismod.

  • pytest --signals This is a flag we created that enables those tests which turns off tests that send UNIX signals to test failure modes. It’s useful on the Mac, which helpfully offers to inform Apple of application failure.

The rest are standard options, but they are so important that I’m listing them here.

  • pytest ---log-cli-level debug Captures log messages nicely within unit tests.

  • pytest --pdb This flag asks to drop into the debugger when an exception happens in a unit test. Very helpful when using tests for test-driven development.

  • pytest --capture=no This allows stdout, stderr, and logging to be printed when running tests.

  • pytest -k partial_name This picks out all tests whose names contain the letters “partial_name”.

  • pytest --lf Run the last set of failing tests.

  • pytest --ff Run the last set of failing tests, and then run the rest of the tests.

  • pytest -x Die on the first failure.

In order to make a test that relies on IHME databases, use the global fixture called ihme:

def test_get_ages(ihme):
    gbd_round = 12
    ages = get_gbd_age_groups(gbd_round)

This test will automatically be disabled until the --ihme flag is on.

Running Acceptance Tests

There is a separate directory for acceptance tests. It’s called acceptance in the unit directory. Here, too, run pytest, but it will take longer and do thread tests, which are tests from one interaction to a response.

Unit and acceptance tests are run with the --ihme flag turned on, just before the end of installation. If they fail, then installation fails. Be sure to run unit tests on the cluster with --ihme, even if they pass in Tox, which runs a subset of tests.

Structure of Tests

Testing structure follows the component structure of the code, but there are a few tests that outweigh others in importance because they are system integration tests. If we look at the larger architectural parts, those system integration tests mock out different pieces. The larger architectural parts are:

  1. Main success scenario (MSS), that does a fit and simulate with Dismod-AT.

  2. Input data of various kinds

    1. Bundle data records

    2. IHME databases of mortality.

    3. EpiViz-AT settings.

  3. Interface with the Dismod-AT db file

Main Success Scenario

There is a single file that runs the core set of steps for the wrapper, using no inputs from external sources. It does the first two of these steps. As we work through the main success scenario, we should make it do all of the steps.

  1. Generate input data with Dismod-AT predict.

  2. Fit that data.

  3. Generate simulations.

  4. Fit those simulations.

  5. Summarize simulation outputs.

  6. Create posterior data.

It’s in tests/model/main_success_scenario.py. It’s set up to run through different types of models and different combinations of input parameters. It does a fractional-factorial experiment on those parameters, working up to seeing how two parameters interact, and whether the code still runs.

This same script generates files with timings on how long it takes Dismod-AT to do a fit, for a given set of parameters and data.

Test Settings Parsing

This mocks the creation of EpiViz-AT settings and then runs stochastically-generated settings through the builder for models, all the way to writing a Dismod-AT db file. It’s in test_construct_model.py

Live Tests against Database

These use a real MVID, and pull settings and data for it in order to build a database, in test_estimate_locations.py.

Testing the Dismod-AT DB File Interface

These tests skip any IHME database interaction. The redo the extensive tests included with Dismod-AT, but they do it using the internal interface. This is what tells us our internal interface works. In test_dismod_examples.py.

What’s Missing

There should be a test that creates settings and input data, and runs completely through the main scenario. This would save us from waiting for the IHME databases to send data and would exercise the later part of the main success scenario, which isn’t covered enough yet.

Indices and tables