Documentation

`msm_we.modelWE`	History-augmented Markov state model estimation from WE data
`msm_we.optimization`	Discrepancy calculations and WE binning/allocation optimization
`msm_we.fpt`	First-passage time (FPT) calculations from trajectories or matrices
`msm_we.ensembles`	Implements Ensemble class for managing trajectories
`msm_we.nmm`	"Non-Markovian" trajectory analysis
`msm_we.utils`	Miscellaneous convenience functions
`msm_we.westpa_plugins.augmentation_driver`
`msm_we.westpa_plugins.hamsm_driver`	Plugin for automated haMSM construction.
`msm_we.westpa_plugins.restart_driver`
`msm_we.westpa_plugins.optimization_driver`	Plugin for automated WE hyperparameter optimization.

haMSM model building and analysis (msm_we.modelWE)

class msm_we.modelWE[source]

History-augmented Markov state model estimation from WE data

Implementation of haMSM model building, particularly for steady-state estimation from recycling WE sampling with basis (source) and target (sink) states.

Set up for typical west.h5 file structure, with coordinates to be stored in west.h5 /iterations/auxdata/coord and basis and target definitions in progress coordinate space.

References

Copperman and Zuckerman, Accelerated estimation of long-timescale kinetics by combining weighted ensemble simulation with Markov model microstategs using non-Markovian theory, arXiv (2020).

Construction

modelWE.build_analyze_model(file_paths, ref_struct, modelName, basis_pcoord_bounds, target_pcoord_bounds, dimreduce_method, tau, n_clusters, ray_kwargs={}, max_coord_iter=-1, stratified=True, streaming=True, use_ray=True, fluxmatrix_iters=[1, -1], fluxmatrix_iters_to_use=None, cross_validation_groups=2, cross_validation_blocks=4, show_live_display=True, allow_validation_failure=False, step_kwargs={})[source]

One-shot function to build the model and analyze all at once. This provides a convenient interface for running the blockwise estimation.

This may not be desirable for very long workflows, or workflows still being debugged, where it might make sense to run the individual steps one by one.

Parameters:

file_paths (list) – List of paths to H5 files to analyze.
ref_struct (string) – Path to PDB file that defines topology.
modelName (string) – Name to use in output filenames.
basis_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the basis state
target_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the target state
dimreduce_method (str) – Dimensionality reduction method. “pca”, “vamp”, or “none”.
tau (float) – Resampling time (i.e. time of 1 WE iteration). Used to map fluxes to physical times.
n_clusters (int) – Number of clusters to use when clustering. This is clusters per bin for stratified, or total clusters for aggregate.
ray_kwargs (dict) – Keyword arguments passed to ray.init(). Useful for specifying num_cpus. You could also use this to connect to an existing Ray cluster.
max_coord_iter (int, optional (Default = model.maxIter, so all)) – Last iteration to obtain coordinates from. Useful for excluding the end of some data.
stratified (bool, optional (Default = True)) – Enables stratified clustering, where clustering is performed independently within each WE bin.
streaming (bool, optional (Default = True)) – Enables streaming over input data, rather than batch processing. Substantially improves memory efficiency, at a potential small performance hit.
use_ray (bool, optional (Default = True)) – Enable parallelization, using Ray. This provides substantial speedup in discretization and fluxmatrix calculations.
fluxmatrix_iters (list, optional (Default = [1, -1])) – List of [first, last] iteration to use when calculating fluxmatrix. Defaults to using all iterations.
fluxmatrix_iters_to_use (list, optional (Default = None)) – Specific range of iterations to use, as opposed to bounds like fluxmatrix_iters. Note that either this OR fluxmatrix_iters
cross_validation_groups (int, optional (Default = 2)) – Number of independent models to build when doing cross-validation. Each group contains (blocks / groups) blocks.
cross_validation_blocks (int, optional (Default = 4)) – Number of blocks to split your data into, before building the independent models.
show_live_display –
allow_validation_failure (bool, optional (Default = False)) – If True, then raise a warning but don’t fail if construction of a cross-validation model fails.

modelWE.initialize(fileSpecifier, refPDBfile, modelName, basis_pcoord_bounds=None, target_pcoord_bounds=None, dim_reduce_method='none', tau=None, pcoord_ndim=1, auxpath='coord', _suppress_boundary_warning=False, use_weights_in_clustering=False)[source]

Initialize the model-builder.

Parameters:

fileSpecifier (list) – List of paths to H5 files to analyze.
refPDBfile (string) – Path to PDB file that defines topology.
modelName (string) – Name to use in output filenames.
basis_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the basis state
target_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the target state
dim_reduce_method (str, default="none") – Dimensionality reduction method. “pca”, “vamp”, or “none”.
tau (float) – Resampling time (i.e. time of 1 WE iteration). Used to map fluxes to physical times.
pcoord_ndim (int, default=1) – Defaults to 1. Dimensionality of progress coordinates.
auxpath (str, default="coord") – Augmented coordinates used for MSM construction are stored in west.h5 under auxdata/auxpath

Return type:

None

modelWE.get_coordSet(last_iter, streaming=None, progress_bar=None)

Loads all coordinates and progress coordinates into memory for later usage.

If streaming, then this only loads pcoords

Parameters:

last_iter –
self (modelWE) –

modelWE.dimReduce(first_iter=1, first_rough_iter=None, last_iter=None, rough_stride=10, fine_stride=1, variance_cutoff=0.95, use_weights=True, progress_bar=None)

Dimensionality reduction using the scheme specified in initialization.

This just defines the dimensionality reduction scheme and builds the model – it does NOT actually transform the data!

Transforming the data is performed via reduceCoordinates(), which uses self.coordinates as set

by this.

Updates:

self.coordinates
self.ndim

Return type:: None
Parameters:: self (modelWE) –

modelWE.cluster_coordinates(n_clusters, streaming=False, first_cluster_iter=None, use_ray=False, stratified=True, iters_to_use=None, store_validation_model=False, progress_bar=None, **_cluster_args)

Parameters:: self (modelWE) –

modelWE.get_fluxMatrix(n_lag, first_iter=1, last_iter=None, iters_to_use=None, use_ray=False, result_batch_size=5, progress_bar=None)

Compute the matrix of fluxes at a given lag time, for a range of iterations.

Checks if a file has been written named “<self.modelName>_s<first_iter>_e<last_iter>_lag<n_lag>_clust<self.n_clusters>.h5”. If this file exists, load it and recalculate if it was calculated at an earlier iteration. Otherwise, write it.

Updates:

self.n_lag
self.errorWeight
self.errorCount
self.fluxMatrixRaw

Parameters:

n_lag (int) – Number of lags to use.
first_iter (int) – First iteration to use.
last_iter (int) – Last iteration to use.
self (modelWE) –

Return type:

None

modelWE.organize_fluxMatrix(use_ray=False, progress_bar=None, **args)

This cleaning step removes all clusters that aren’t in the largest connected set, then rediscretizes all the trajectories according to the new reduced set of clusters.

Parameters:: self (modelWE) –

modelWE.do_block_validation(cross_validation_groups, cross_validation_blocks, use_ray=True, progress_bar=None)[source]

One way to estimate the uncertainty of your model is to split your data into blocks, compute models over groups of the blocks, and assess consistency between the groups.

The procedure here chops your data up into uniform blocks, by iteration. For example, with 100 iterations and 4 blocks, the blocks consist of iterations [0-25), [25-50), [50-75), [75-100). If the above blocks were assigned to 2 groups, the groups would consist of iterations ( [0-25), [50-75) ) and ( [25-50), [75-100) )

Parameters:

cross_validation_groups (int) – Number of groups to assign blocks over
cross_validation_blocks (int) – Number of blocks to split data into

Analysis

modelWE.get_Tmatrix()

Compute the transition matrix from the flux matrix. Corrects the “target” states to be true sink states.

More specifically:

row-normalizes the flux matrix,
sets any states with 0 flux ot (i.e. sinks) to have 1.0
sets target bins to uniformly recycle into basis bins

Updates:

self.Tmatrix

Return type:: None
Parameters:: self (modelWE) –

modelWE.get_steady_state(flux_fractional_convergence=0.0001, max_iters=10)

” Get the steady-state distribution for the transition matrix. Uses scipy eigensolver to obtain an initial guess, then refines that using inverse iteration.

Parameters:

flux_fractional_convergence ((optional, default=1e-4) float) – Convergence of the
max_iters ((optional, default=100) int) –
self (modelWE) –

Notes

Transition matrices generated from WE data may have entries spanning many orders of magnitude, and may have extremely high condition numbers. Furthermore, the smallest entries may be those near the target state, which are also the most important for computing target fluxes, meaning values near machine precision can’t just be truncated. All this means that floating-point error may substantially affect the results of eigensolvers, and may produce bad/negative/inaccurate values for small probability bins.

In order to obtain better estimates, sparse matrices are used to reduce the number of floating point operations being performed. A stationary distribution is first estimated using scipy’s eigensolver for sparse matrices. This is then used as an initial guess for the inverse iteration method, to further refine it. Convergence of the inverse iteration is determined using change in the flux estimate.

modelWE.get_steady_state_target_flux(pSS=None, _set=True)

Get the total flux into the target state(s).

Updates:

self.lagtime
self.JtargetSS

Parameters:

pSS ((optional) array-like) – Steady-state distribution. If nothing provided, then use self.pSS
_set ((optional) boolean) – If True, then update self.JtargetSS and self.lagtime. If False, then just return the value of JtargetSS.
self (modelWE) –

Return type:

None

modelWE.get_committor(conv=1e-05)

Iteratively obtain an estimate of the committor.

Take the flux matrix, and normalize it into a transition matrix.
Apply two-sided absorbing boundary conditions by setting self-transition probabilities for the basis and
target states to 1.0, and all transitions out to 0.0.
Starting with an initial committor “guess” of all 1s, iteratively multiply the guess by the transition matrix
until convergence is below conv.

Updates:

self.q

Parameters:

conv (numerical) – Convergence criteria for committor calculation. Calculation stops when the total difference between q_p and q is less than this.
self (modelWE) –

modelWE.get_flux()

Get the measured flux (i.e. from the flux matrix) into the target.

Parameters:: self (modelWE) –

modelWE.get_cluster_centers()

Standalone method to obtain average pcoords of all segments in each cluster.

This functionality is contained in organize_fluxMatrix.

Parameters:: self (modelWE) –

modelWE.update_cluster_structures(build_pcoord_cache=False)

Find structures (i.e. sets of coordinates) corresponding to each clusters.

Parameters:

build_pcoord_cache (bool) – If True, builds self.pcoord_cache, which has elements [cluster_idx][seg_idx] holding the pcoord for the seg_idx’th segment in MSM cluster cluster_idx.
self (modelWE) –

Returns:

in that cluster.

Return type:

A dictionary where the keys are cluster indices, and the values are lists of coordinates (structures)

Plotting

modelWE.plot_flux(custom_name=None, ax=None, save=False, suppress_validation=False, _from_colors=None, _to_colors=None, pcoord_to_use=0, **_plot_args)

Make, and save, a plot of the fluxes along the RMSD. get_flux() must be run before this.

Parameters:

custom_name (str (optional)) – The name for the saved plot. Defaults to flux_s<first iter>_e<last iter>.png
self (modelWE) –

modelWE.plot_flux_committor(nwin=1, ax=None, save=False, suppress_validation=False, _from_colors=None, _to_colors=None, **_plot_args)

Parameters:: self (modelWE) –

modelWE.plot_flux_committor_pcoordcolor(nwin=1, ax=None, pcoord_to_use=0, **_plot_args)

Parameters:: self (modelWE) –

Optimization (msm_we.optimization)

Discrepancy calculations and WE binning/allocation optimization

msm_we.optimization.solve_discrepancy(tmatrix, pi, B)[source]

Given a transition matrix, solves for the discrepancy function.

The Poisson equation for the discrepancy function is

\[(I - K)h = 1_B - \pi(B), \:\: h \cdot \pi = 0\]

however, since $I-K$ is singular, we instead solve

\[(I - K + \pi \pi^T / || \pi ||^2_2)h = 1_B - \pi(B), \:\: h \cdot \pi = 0\]

where $h$ is a volumn vector, 1_B is an indicator function which is 1 in B and 0 everywhere else, $\pi$ is the steady-state solution of $K$, and pi(B) is a column vector with the steady-state value of $\pi(B)$ in every element.

Parameters:

tmatrix (2D array-like,) – Transition matrix
pi (array-like,) – Steady-state distribution for the input transition matrix
B (array-like,) – Indices of target states B

Return type:

(discrepancy, variance)

msm_we.optimization.get_uniform_mfpt_bins(variance, discrepancy, steady_state, n_desired_we_bins)[source]

Implements the MFPT-binning strategy described in [1], where bins are groups of microstates that are uniformly spaced in the integral of pi * v

Parameters:

variance (Variance function) –
array-like (Steady-state distribution) –
discrepancy (Discrepancy function) –
array-like –
steady_state (Steady-state distribution) –
array-like –
int (n_desired_we_bins) – less any recycling or basis bins

References

[1] Aristoff, D., Copperman, J., Simpson, G., Webber, R. J. & Zuckerman, D. M. Weighted ensemble: Recent mathematical developments. Arxiv (2022).

msm_we.optimization.get_clustered_mfpt_bins(variance, discrepancy, steady_state, n_desired_we_bins, seed=None)[source]

Implements the MFPT-binning strategy described in [1], where bins are groups of microstates that are obtained by: k-means clustering on the integral of pi * v

Parameters:

variance (Variance function) –
array-like (Steady-state distribution) –
discrepancy (Discrepancy function) –
array-like –
steady_state (Steady-state distribution) –
array-like –
int (n_desired_we_bins) –

Return type:

An array where each element is the WE bin index assigned to an haMSM microstate.

References

[1] Aristoff, D., Copperman, J., Simpson, G., Webber, R. J. & Zuckerman, D. M. Weighted ensemble: Recent mathematical developments. Arxiv (2022).

class msm_we.optimization.OptimizedBinMapper(*args, **kwargs)[source]

create_new(nbins, n_original_pcoord_dims, target_pcoord_bounds, basis_pcoord_bounds, previous_binmapper, microstate_mapper, stratified_clusterer, cluster_on_pcoord=False, *args, **kwargs)[source]

Creates an OptimizedBinMapper, suitable for use with the optimization workflow

Parameters:

nbins (int, Number of WE bins) –
n_original_pcoord_dims (int, Number of dimensions in the original user-supplied progress coordinate) –
microstate_mapper (dict, Mapping of microstates to WE bins) –
stratified_clusterer (StratifiedClusters) –
cluster_on_pcoord (bool) –

WESTPA Plugins (msm_we.westpa_plugins)

class msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver(sim_manager, plugin_config)[source]

WESTPA plugin to augment west.h5 with auxiliary coordinates.

After each iteration, appends coordinates to iter_XXX/auxdata/coord, for later usage with haMSM construction.

Can be used by including the following entries in your west.cfg:

west:
    plugins:
    - plugin: msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver
          topology_file: path/to/topology.pdb
          child_traj_filename: name of segment trajectory file in traj_segs/<iter>/<seg>/
          parent_traj_filename: name of parent trajectory file in traj_segs/<iter>/<seg>/

augment_coordinates()[source]

After propagation completes in a WE iteration, this populates auxdata/coord with the coordinates.

Looks for parent/child trajectory files in the segment data_ref path defined in west.cfg, named according to parent_traj_filename and child_traj_filename in the plugin config.

Plugin for automated haMSM construction.

class msm_we.westpa_plugins.hamsm_driver.HAMSMDriver(sim_manager, plugin_config)[source]

WESTPA plugin to construct an haMSM.

Can be used by including the following entries in your west.cfg:

west:
    plugins:
    # - plugin: An augmentation plugin is also required, such as
    #           msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver
    - plugin: msm_we.westpa_plugins.hamsm_driver.HAMSMDriver
          model_name: Name for the model
          n_clusters: Number of clusters to place in each WE bin (see stratified clustering for more details)
          tau: WESTPA resampling time in physical units
          basis_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...]
          target_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...]
          first_analysis_iter: Integer of the first iteration of the prior marathon to use for analysis. I.e.,
            setting this to 10 would use only iterations 10-end for haMSM construction. This can be useful for
            omitting some burn-in after a restart.
          dim_reduce_method: A string specifying a dimensionality reduction method for
            :meth:`msm_we.msm_we.modelWE.dimReduce`
          featurization: An importable python method implementing a featurization
            for :meth:`msm_we.msm_we.modelWE.processCoordinates`
          n_cpus: Number of CPUs to use with Ray

construct_hamsm()[source]: Build an haMSM, for use with later plugins. The final constructed haMSM is stored on the data manager.

class msm_we.westpa_plugins.restart_driver.RestartDriver(sim_manager, plugin_config)[source]

WESTPA plugin to automatically handle estimating steady-state from a WE run, re-initializing a new WE run in that steady-state, and then running that initialized WE run.

Data from the previous run will be stored in the restart<restart_number>/ subdirectory of $WEST_SIM_ROOT.

This plugin depends on having the start-states implementation in the main WESTPA code, which allows initializing a WE run using states that are NOT later used for recycling.

These are used so that when the new WE run is initialized, initial structure selection is chosen by w_init, using weights assigned to the start-states based on MSM bin weight and WE segment weight.

Since it closes out the current WE run and starts a new one, this plugin should run LAST, after all other plugins.

Can be used by including the following entries in your west.cfg:

west:
    plugins:
    # - plugin: An augmentation plugin is also required, such as
    #           msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver
    - plugin: msm_we.westpa_plugins.restart_driver.RestartDriver
          n_restarts: Number of total restarts to do
          extension_iters: Amount of iterations to extend runs by if no runs have reached the target by the
            first restart.
          n_runs: Number of runs to do between restarts
          n_restarts_to_use: Number of restarts to use. Can be a fraction, to use the last fraction amount, or
            a negative integer, to use the last N.
          initialization_file: restart_initialization.json
          model_name: Name for the model
          n_clusters: Number of clusters to place in each WE bin (see stratified clustering for more details)
          basis_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...]
          target_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...]
          dim_reduce_method: A string specifying a dimensionality reduction method for
            :meth:`msm_we.msm_we.modelWE.dimReduce`
          featurization: A python module implementing a featurization
            for msm_we.msm_we.modelWE.processCoordinates
          n_cpus: Number of CPUs to use with Ray

get_original_bins()[source]

Obtains the WE bins and their probabilities at the end of the previous iteration.

Returns:

bins (np.ndarray) – Array of WE bins
binprobs (np.ndarray) – WE bin weights

property cur_iter

Get the current WE iteration.

Returns:: int
Return type:: The current iteration. Subtract one, because in finalize_run the iter has been incremented

property is_last_iteration

Get whether this is, or is past, the last iteration in this WE run.

Returns:: bool
Return type:: Whether the current iteration is the final iteration

prepare_extension_run(run_number, restart_state, first_extension=False)[source]

Copy the necessary files for an extension run (versus initializing a fresh run)

Parameters:

run_number (int) – The index of this run (should be 1-indexed!)
restart_state (dict) – Dictionary holding the current state of the restarting procedure
first_extension (bool) – True if this is the first run of an extension set. If True, then back up west.cfg, and write the extended west.cfg file.

prepare_new_we()[source]

This function prepares a new WESTPA simulation using haMSM analysis to accelerate convergence.

The marathon functionality does re-implement some of the functionality of w_multi_west. However, w_multi_west merges independent WE simulations, which may or may not be desirable. I think for the purposes of this, it’s good to keep the runs completely independent until haMSM model building. Either that, or I’m just justifying not having known about w_multi_west when I wrote this. TBD.

The algorithm is as follows:

Check to see if we’ve just completed the final iteration

Handle launching multiple runs, if desired

Build haMSM

Obtain structures for each haMSM bin

Make each structure a start-state, with probability set by (MSM-bin SS prob / # structures in bin)

Potentially some renormalization?

Start new WE simulation

Plugin for automated WE hyperparameter optimization.

class msm_we.westpa_plugins.optimization_driver.OptimizationDriver(sim_manager, plugin_config)[source]

WESTPA plugin to automatically handle performing optimization.

Using an haMSM, updates binning and allocation according to user-specified optimization algorithms. An OptimizedBinMapper is constructed from the optimized binning and allocation, and WE is continued with the new mapper.

Can be used by including the following entries in your west.cfg:

west:
    plugins:
    - plugin: msm_we.westpa_plugins.optimization_driver.OptimizationDriver
      full_coord_map: A pickled dictionary mapping discrete states to full-coordinate structures
      max_iters: Number of total iterations. WE will run for west.system.max_iters, perform optimization, and
                continue for another west.system.max_iters, up to this value.
      # The following parameters are optional, and provided as an example.
      binning_strategy: An arbitrary python function defining a bin optimization scheme.
            Takes in an msm_we.msm_we.modelWE and returns an array-like of length n_msm_states, where each
            element is the index of the WE bin that MSM state will be assigned to by the OptimizedMapper.
      allocation_strategy: An arbitrary python function defining an allocation optimization scheme.
            Takes in an msm_we.msm_we.modelWE and returns an array of integer walker allocations for the WE bins.

do_optimization()[source]: Update WESTPA with an optimized bin mapper, bin allocation, and extend the progress coordinate. Then, continue the WE for more iterations.

static default_allocation_optimizer(model)[source]: A (trivial) example allocation optimization function, which returns an array with the target number of walkers in each bin.

compute_optimized_allocation()[source]

Compute the optimal allocation.

If plugin.allocation_strategy is None or not provided, the allocation is not updated.

Otherwise, the constructed haMSM is passed to an arbitrary function that returns an array-like describing the new walker allocation over the WE bins.

static default_bin_optimizer(model)[source]: Example bin optimization function, which assigns microstates to WE bins.

compute_optimized_bins()[source]

Computes discrepancy and variance, and returns the resulting optimized bin mapper.

If plugin.binning_strategy is None or not provided, optimization.get_clustered_mfpt_bins() is used.

Otherwise, the constructed haMSM is passed to an arbitrary function that returns an array-like with the WE bin index of all MSM microbins excluding the basis/target (model.indBasis and model.indTargets).

Return type:: An OptimizedBinMapper

compute_new_pcoord_map()[source]

SynD specific: Compute a new progress coordinate mapping.

Returns:: A dictionary of {state indices
Return type:: extended progress coordinates}

update_westpa_pcoord(new_pcoord_map)[source]

Changing a progress coordinate during a WE run requires a number of changes in WESTPA’s internal state. This handles making those, so you can call w_run and continue with the new, changed pcoord

Parameters:: new_pcoord_map (A dictionary mapping discrete states to the new, extended pcoord) –

FPT Calculations

msm_we.fpt

First-passage time (FPT) calculations from trajectories or matrices

Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis

class msm_we.fpt.MatrixFPT[source]

Define a base class for calculating FPTs using transition matrix

classmethod mean_fpts(tmatrix, stateA, stateB, lag_time=1)[source]

Calculting mean-first passave time for at transition matrix

Derived class will implement this function differently.

classmethod directional_mfpt(transition_matrix, stateA, stateB, ini_probs=None, lag_time=1)[source]

Computes the mean-first passage time in a single direction using a recursive procedure

This method is useful when there is no B->A ensemble but only A->B transitions, for instance when B is absorbing.

Parameters:

transition_matrix (Numpy 2D array) –
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
ini_probs (List of float, default is None) – initial probabilities in stateA
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps

Return type:

mean-first passage time from A->B

classmethod mfpts_to_target_microstate(transition_matrix, target, lag_time=1)[source]

Computes all the mean-first passage to a target microstate (k)

Returns a list where the i-element is mfpt(i->k). This function is useful to compute the mfpt matrix.

Parameters:

transition_matrix (Numpy 2D array) –
target (Integer number that specifies the index of the state. The indexes) – should be consistent with the transition matrix and python (i.e. starting from 0)
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps

Returns:

a list where the i-element is mfpt(i->k). This function is
useful to compute the mfpt matrix.

classmethod mfpts_matrix(transition_matrix, lag_time=1)[source]

Calculate MFPT matrix, i.e., the matrix where the ij-element is MFPT(i->j)

Parameters:

transition_matrix (Numpy 2D array) –
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps

Return type:

mean-first passage time matrix with ij-element of MFPT(i->j)

classmethod min_commute_time(matrix_of_mfpts)[source]

Calculate minimum commuting time (round trip time) between all pairs

of microstates from the matrix of mfpts. It also returns the indexes of the pair of microstates involved.

Parameters:

matrix_of_mfpts (Numpy 2D array) – matrix of MFPTs with ij-element of MFPT(i->j)

Returns:

Minimum commuting time (round trip time) between all pairs
of microstates and the indexes of the pair of microstates involved.

classmethod max_commute_time(matrix_of_mfpts)[source]

Calculate maximum commuting time (round trip time) between all pairs

of microstates from the matrix of mfpts. It also returns the indexes of the pair of microstates involved.

Parameters:

matrix_of_mfpts (Numpy 2D array) – matrix of MFPTs with ij-element of MFPT(i->j)

Returns:

Maximum commuting time (round trip time) between all pairs
of microstates and the indexes of the pair of microstates involved.

classmethod fpt_distribution(t_matrix, initial_state, final_state, initial_distrib, min_power=1, max_power=12, max_n_lags=100, lag_time=1, dt=1.0, clean_recycling=False, logscale=False)[source]

Calculated distribution of first passage times from transition matrix

Parameters:

t_matrix (Numpy 2D array) –
initial_state (List of integer numbers) – Specifies the indexes of initial and final states.
final_state (List of integer numbers) – Specifies the indexes of initial and final states.
initial_distrib (List of float, default is None) – initial probabilities for initial states
min_power (Integer) – The minimum and maximum power when the FPT distribution is shown in logscale such as (10^min_power, 10^max_power)*lag_time*dt.
max_power (Integer) – The minimum and maximum power when the FPT distribution is shown in logscale such as (10^min_power, 10^max_power)*lag_time*dt.
max_n_lags (Integer) – maximum number of lags when the FPT distribution is shown in linear scale such as (0, max_n_logs)*lag_time*dt. When in logscale, this is number of points to shown in the range of (10^min_power, 10^max_power)*lag_time*dt.
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps
dt (Float) – Time step
clean_recycling (Bool) – Cleaning the recycling of steady state simulation if True
logscale (Bool) – Option to use logscale for FPT time in the distribution

Return type:

Distributions of first passage times

static adaptive_fpt_distribution(Tmatrix, initial_states, initial_state_probs, target_states, tau=1, increment=5, fine_increment=1.2, relevant_thresh=0.0001, max_steps=1000000, max_time=inf, explicit_renormalization=False, verbose=False)[source]

Adaptively computes a first-passage time distribution.

Starting at t=tau, compute the probability flowing into the target at t. Then, increment t by multiplying it by the coarse increment. When relevant_thresh probability has entered the target state, step back to the previous coarse state, and swap over to incrementing with the fine increment. This allows you to efficiently sweep log-space.

Procedurally, this starts probability in specified initial_states according to initial_state_probs, and then propagates that probability through the transition matrix. The FPT distribution is measured by tracking new probability entering the target state at each time.

Note that absorbing boundary conditions are stripped from the transition matrix – if this is not done, then the result is like a probability CDF, not a probability distribution.

Parameters:

Tmatrix (array-like) – Transition matrix
initial_states (array-like of ints) – List of initial states to start probability in
initial_state_probs (array-like) – Probability distribution across the initial states.
target_states (array-like) – Target states for MFPT.
tau –
increment (float) – Multiplicative increment for coarse steps
fine_increment (float) – Multiplicative increment for fine steps, once the minimum probability in the target has been reached.
relevant_thresh (float) – Amount of probability that must be in the target before switching to fine increments.
max_steps (int) – Maximum number of steps to run
max_time (float) – Maximum time to run to
explicit_renormalization (bool) – Whether to explicitly renormalize the transition matrix. This should not be necessary – if it is, there’s probably some numerical instability you should be careful of.
verbose (bool) – Produce verbose text output.

Returns:

FPT distribution,
probability distribution at each time,
last step index,
times at which FPT distribution was evaluated

class msm_we.fpt.MarkovFPT[source]

Derived a class for calculating FPTs using Markovian transition matrix

classmethod mean_fpts(markov_tmatrix, stateA, stateB, lag_time=1)[source]

Computes mean first passage times using Markovian transition matrix

in both directions A->B and B->A from a markov model. The MFPTs computed in this way are directly comparable with the values obtained by a long back and forth simulation between the target states.

Parameters:

markov_matrix (Numpy 2D array) – Markovian transition matrix
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps

Return type:

mean-first passage times from A->B and B->A

classmethod markov_commute_time(transition_matrix, stateA, stateB, lag_time=1)[source]

Computing commute time for Markovian Model

Parameters:

transition_matrix (Numpy 2D array) – Markovian transition matrix
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps

Return type:

Commute time from mean-first passage times

class msm_we.fpt.NonMarkovFPT[source]

Derived a class for calculating FPTs using Non Markov transition matrix

classmethod mean_fpts(nm_transition_matrix, stateA, stateB, lag_time=1)[source]

Computes the mean first passage times from a non-markovian model

in both directions of A->B and B->A. The shape of the transition matrix should be (2*n_states, 2*n_states). :param nm_transition_matrix: Non-Markovian transition matrix :type nm_transition_matrix: Numpy 2D array :param stateA: Both states are a list of indexes. :type stateA: List of integers :param stateB: Both states are a list of indexes. :type stateB: List of integers :param lag_time: Lag time used, the trajectory is “observed” every lag_time time steps :type lag_time: integer

Return type:: mean-first passage times from A->B and B->A

msm_we.ensembles

Implements Ensemble class for managing trajectories

Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis

class msm_we.ensembles.Ensemble(trajectories=None, verbose=False, dtype='float32', discrete=False, lag_time=1, **kwargs)[source]

Define a base class to store a list of space-continuous trajectories

e.g.: [ trajectory1, trajectory2,…], each trajectory is just a matrix: where each row is a snapshot, and each column represents the evolution of the corresponding variable.

add_trajectory(trajectory)[source]: Add a single trajectory to the ensemble

empirical_mfpts(stateA, stateB)[source]: Calculate mean first passage time using direct counting method

empirical_corr_function(stateA, stateB, times, symmetric=True)[source]: Calculate correlation function for trajectories

class msm_we.ensembles.PathEnsemble(trajectories=None, verbose=False, dtype='float32', discrete=False, lag_time=1, stateA=None, stateB=None, **kwargs)[source]

Derive a class to store trajectories for path analysis

classmethod from_ensemble(ensemble, stateA=None, stateB=None, map_function=None, discrete=False, dtype='float32')[source]: Build a PathEnsemble from an ensemble object or a set of trajectories

class msm_we.ensembles.DiscreteEnsemble(trajectories=None, verbose=False, dtype='int32', discrete=True, lag_time=1, **kwargs)[source]

Define a base class to store a list of space-discrete trajectories

classmethod from_ensemble(ens, map_function=None, dtype='int32')[source]: Build a DiscreteEnsemble from an ensemble object or a set of trajectories

classmethod from_transition_matrix(transition_matrix, sim_length=None, initial_state=0)[source]: Generates a DiscreteEnsemble from the transition matrix

class msm_we.ensembles.DiscretePathEnsemble(trajectories=None, verbose=False, dtype='int32', discrete=True, lag_time=1, stateA=None, stateB=None, **kwargs)[source]

Derive a class to store a list of discrete trajectories for path analysis

classmethod from_transition_matrix(transition_matrix, stateA=None, stateB=None, n_paths=1000, ini_pops=None, max_iters=1000000000)[source]

Construct a path ensemble from a transition matrix

stateA: list, intitial state

stateB: list, final state

n_paths: integer, with default value of 1000

number of paths to generate

ini_pops: list or label, probability distribution over the

initial state used to generate the path

possible values: a) None Use a uniform distribution over the states in stateA c) list A list with the explicit values of the populations in stateA that should be used to generate the ensemble

max_iters: integer, with default value of 1000000000

maximum of iterations for generating path trajectories

classmethod from_ensemble(ensemble, stateA, stateB, map_function=None)[source]: Build a DiscreteEnsemble from an ensemble object or a set of trajectories

nm_mfpt(ini_probs=None, n_states=None)[source]: Compute the mean-first passage time from the transition matrix

weighted_fundamental_sequences(transition_matrix=None, symmetric=True)[source]: Generate sorted fundamental sequences with weights

msm_we.nmm

“Non-Markovian” trajectory analysis

Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis

class msm_we.nmm.NonMarkovModel(trajectories, stateA, stateB, lag_time=1, clean_traj=False, sliding_window=True, reversible=True, markovian=False, coarse_macrostates=False, **kwargs)[source]

Define a class for analyzing MD trajectories using Markovian or non-Markovian Model

from a list of 1D trajectories of integers representing macrostates

For example:

trajectories = [ [1 , 2, 0, …], [2, 2, 1, …], [3, 1, 2, …], …]

If only one sequence is given in trajectories, the format is the same:

trajectories = [ [1 , 2, 0, …] ]

Parameters:

(integer (lag_time) – Lag time of the model.
default – Lag time of the model.
(boolean) (sliding_window) – Use a sliding window of length lag_time to compute the count matrix
stateA – Define the initial and final macrostates in form of python lists for example: stateA=[0,2,5], stateB = [1]
lists) (stateB (python) – Define the initial and final macrostates in form of python lists for example: stateA=[0,2,5], stateB = [1]

n_states

Type:: int

nm_cmatrix

Stores the number of transitions between states, the i,j element cij stores the number of transitions observed from i to j.

Type:: array, with shape (2 n_states, 2 n_states)

populations[source]

Equilibrium population, the steady state solution of of the transition matrix

Type:: array, shape (n_states,)

fit()[source]: Fits the non-Markovian model from a list of sequences

classmethod from_nm_tmatrix(transition_matrix, stateA, stateB, sim_length=None, initial_state=0)[source]: Generates a discrete ensemble from the transition matrix

empirical_mfpts()[source]: Calculate mean first passage time using direct counting method

corr_function(times)[source]

Compute the correlation function for a set of times.

Parameters:: integers) (times (list of) – List of dt values used to compute the correlation function.
Return type:: List of floats with the correlation values for the dt given in times

class msm_we.nmm.MarkovPlusColorModel(trajectories, stateA, stateB, lag_time=1, clean_traj=False, sliding_window=True, hist_length=0, **kwargs)[source]

Define a class for analyzing MD trajectories using Markovian Plus Color Model

fit()[source]: Fits the markov plus color model from a list of sequences

msm_we.utils

Miscellaneous convenience functions

Some functionality was adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis

msm_we.utils.find_connected_sets(C, directed=True)[source]

This implementation is taken from msmtools.estimation.sparse.connectivity, at commit 9312660. See the original at https://github.com/markovmodel/msmtools/blob/devel/msmtools/estimation/sparse/connectivity.py#L30

Compute connected components for a directed graph with weights represented by the given count matrix.

Parameters:

C (scipy.sparse matrix or numpy ndarray) – square matrix specifying edge weights.
directed (bool, optional) – Whether to compute connected components for a directed or undirected graph. Default is True.

Returns:

cc – Each entry is an array containing all vertices (states) in the corresponding connected component.

Return type:

list of arrays of integers

msm_we.utils.is_connected(matrix, source_states, target_states, directed=True)[source]

Check for connectivity between two states. If directed is True, then checks for directional connectivity from source_states to target_states.

Parameters:

matrix (np.array, NxN) – Transition matrix
source_states (array-like, N) – Source states
target_states (array-like, N) – Target states
directed (bool, default=True) – Compute directional connectivity

Return type:

bool

msm_we.utils.inverse_iteration(guess, matrix, mu=1)[source]

Do one iteration of inverse iteration.

Parameters:

guess (array-like (N elements)) – Vector of weights to be used as the initial guess.
matrix (array-like (NxN elements)) – Transition matrix to use for inverse iteration.

Return type:

The new vector of weights after one iteration of inverse iteration.

class msm_we.utils.Interval(interval_set, n_variables)[source]

Intervals are in general defined as half-open interval [start,end).

in any case, in each dimension the interval is specified using a list [a,b] where a < b

For 1D (single) interval a single list in the form [a,b] has to be given
The union of multiple (1D) intervals can be specified as:
[[a,b],[c,d],…]
A list of lists [[a,b],[c,d],…] are used for a n-dimensional
intervals, one for each dimension, i.e, len(interval) = n_variables
A list-of-lists-of-lists for the mathematical union of n-dimensional
intervals’

[ [[a,b],[c,d],…], [[e,f],[g,h],…], … ]

msm_we.utils.reverse_sort_lists(list_1, list_2)[source]: Reverse sorting two list based on the first one

msm_we.utils.weighted_choice(list_, weights=None)[source]: Select an element from a list with probability from weights

msm_we.utils.get_shape(trajectory)[source]: Get the shape of a trajectory array in tuple (n_snapshots, n_variables)

msm_we.utils.num_of_nonzero_elements(my_vector)[source]: Returns the number of non-zero elements in a vector

msm_we.utils.normalize_markov_matrix(transition_matrix, reversible=False)[source]

Transform a matrix of positive elements to a markov-like matrix

by dividing each row by the sum of the elements of the row.

msm_we.utils.normalize(my_vector)[source]

Normalize a vector

by dividing each element by the total sum of all its elements

msm_we.utils.random_markov_matrix(n_states=5, seed=None)[source]: Returns a random transition markov matrix

msm_we.utils.check_tmatrix(t_matrix, accept_null_rows=True)[source]

Check if the given matrix is actually a row-stochastic transition matrix

i.e, all the elements are non-negative and the rows add to one.: If the keyword argument accept_null_rows is True, is going to accept rows where all the elements are zero. Those “problematic” states are going to be removed later if necessary by clean_tmatrix.

msm_we.utils.clean_tmatrix(transition_matrix, rm_absorbing=True)[source]

Removes the states/indexes with no transitions and that are absorbing

if the the keyword argument rm_absorbing is true Returns the “clean” transition matrix and a list with the removed states/indexes (clean_tmatrix, removed_states)

msm_we.utils.pops_from_tmatrix(transition_matrix)[source]

Calculate the eigen values and eigen vectors of the transposed transition matrix

Parameters:: transition_matrix (ndarray with shape = (n_states, n_states)) –
Return type:: the solution, p, of K.T p = p where K.T is the transposed transition matrix

msm_we.utils.pops_from_nm_tmatrix(transition_matrix)[source]

Computes the populations of the real/physical states

from a non-Markovian transtion matrix with shape (2*n_states, 2*n_states)

msm_we.utils.map_to_integers(sequence, mapping_dict=None)[source]: Map a sequence of elements to a sequence of integers for intance, maps [1, ‘a’, 1, ‘b’, 2.2] to [0, 1, 0, 2, 3]

msm_we.utils.pseudo_nm_tmatrix(markovian_tmatrix, stateA, stateB)[source]

Obtain a pseudo non-Markovian transition matrix from a Markovian transiton matrix

The pseudo Markovian matrix has a shape of (2 n_states, 2 n_states)