Documentation
History-augmented Markov state model estimation from WE data |
|
Discrepancy calculations and WE binning/allocation optimization |
|
First-passage time (FPT) calculations from trajectories or matrices |
|
Implements Ensemble class for managing trajectories |
|
"Non-Markovian" trajectory analysis |
|
Miscellaneous convenience functions |
|
Plugin for automated haMSM construction. |
|
Plugin for automated WE hyperparameter optimization. |
haMSM model building and analysis (msm_we.modelWE)
- class msm_we.modelWE[source]
History-augmented Markov state model estimation from WE data
Implementation of haMSM model building, particularly for steady-state estimation from recycling WE sampling with basis (source) and target (sink) states.
Set up for typical west.h5 file structure, with coordinates to be stored in west.h5 /iterations/auxdata/coord and basis and target definitions in progress coordinate space.
References
Copperman and Zuckerman, Accelerated estimation of long-timescale kinetics by combining weighted ensemble simulation with Markov model microstategs using non-Markovian theory, arXiv (2020).
Construction
- modelWE.build_analyze_model(file_paths, ref_struct, modelName, basis_pcoord_bounds, target_pcoord_bounds, dimreduce_method, tau, n_clusters, ray_kwargs={}, max_coord_iter=-1, stratified=True, streaming=True, use_ray=True, fluxmatrix_iters=[1, -1], fluxmatrix_iters_to_use=None, cross_validation_groups=2, cross_validation_blocks=4, show_live_display=True, allow_validation_failure=False, step_kwargs={})[source]
One-shot function to build the model and analyze all at once. This provides a convenient interface for running the blockwise estimation.
This may not be desirable for very long workflows, or workflows still being debugged, where it might make sense to run the individual steps one by one.
- Parameters:
file_paths (list) – List of paths to H5 files to analyze.
ref_struct (string) – Path to PDB file that defines topology.
modelName (string) – Name to use in output filenames.
basis_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the basis state
target_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the target state
dimreduce_method (str) – Dimensionality reduction method. “pca”, “vamp”, or “none”.
tau (float) – Resampling time (i.e. time of 1 WE iteration). Used to map fluxes to physical times.
n_clusters (int) – Number of clusters to use when clustering. This is clusters per bin for stratified, or total clusters for aggregate.
ray_kwargs (dict) – Keyword arguments passed to ray.init(). Useful for specifying num_cpus. You could also use this to connect to an existing Ray cluster.
max_coord_iter (int, optional (Default = model.maxIter, so all)) – Last iteration to obtain coordinates from. Useful for excluding the end of some data.
stratified (bool, optional (Default = True)) – Enables stratified clustering, where clustering is performed independently within each WE bin.
streaming (bool, optional (Default = True)) – Enables streaming over input data, rather than batch processing. Substantially improves memory efficiency, at a potential small performance hit.
use_ray (bool, optional (Default = True)) – Enable parallelization, using Ray. This provides substantial speedup in discretization and fluxmatrix calculations.
fluxmatrix_iters (list, optional (Default = [1, -1])) – List of [first, last] iteration to use when calculating fluxmatrix. Defaults to using all iterations.
fluxmatrix_iters_to_use (list, optional (Default = None)) – Specific range of iterations to use, as opposed to bounds like fluxmatrix_iters. Note that either this OR fluxmatrix_iters
cross_validation_groups (int, optional (Default = 2)) – Number of independent models to build when doing cross-validation. Each group contains (blocks / groups) blocks.
cross_validation_blocks (int, optional (Default = 4)) – Number of blocks to split your data into, before building the independent models.
show_live_display –
allow_validation_failure (bool, optional (Default = False)) – If True, then raise a warning but don’t fail if construction of a cross-validation model fails.
- modelWE.initialize(fileSpecifier, refPDBfile, modelName, basis_pcoord_bounds=None, target_pcoord_bounds=None, dim_reduce_method='none', tau=None, pcoord_ndim=1, auxpath='coord', _suppress_boundary_warning=False, use_weights_in_clustering=False)[source]
Initialize the model-builder.
- Parameters:
fileSpecifier (list) – List of paths to H5 files to analyze.
refPDBfile (string) – Path to PDB file that defines topology.
modelName (string) – Name to use in output filenames.
basis_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the basis state
target_pcoord_bounds (list) – List of [[pcoord0 lower bound, pcoord1 upper bound], [pcoord1 lower bound, pcoord1 upper bound], …] in pcoord-space for the target state
dim_reduce_method (str, default="none") – Dimensionality reduction method. “pca”, “vamp”, or “none”.
tau (float) – Resampling time (i.e. time of 1 WE iteration). Used to map fluxes to physical times.
pcoord_ndim (int, default=1) – Defaults to 1. Dimensionality of progress coordinates.
auxpath (str, default="coord") – Augmented coordinates used for MSM construction are stored in west.h5 under auxdata/auxpath
- Return type:
None
- modelWE.get_coordSet(last_iter, streaming=None, progress_bar=None)
Loads all coordinates and progress coordinates into memory for later usage.
If streaming, then this only loads pcoords
- Parameters:
last_iter –
self (modelWE) –
- modelWE.dimReduce(first_iter=1, first_rough_iter=None, last_iter=None, rough_stride=10, fine_stride=1, variance_cutoff=0.95, use_weights=True, progress_bar=None)
Dimensionality reduction using the scheme specified in initialization.
This just defines the dimensionality reduction scheme and builds the model – it does NOT actually transform the data!
- Transforming the data is performed via reduceCoordinates(), which uses self.coordinates as set
by this.
- Updates:
self.coordinates
self.ndim
- Return type:
None
- Parameters:
self (modelWE) –
- modelWE.cluster_coordinates(n_clusters, streaming=False, first_cluster_iter=None, use_ray=False, stratified=True, iters_to_use=None, store_validation_model=False, progress_bar=None, **_cluster_args)
- Parameters:
self (modelWE) –
- modelWE.get_fluxMatrix(n_lag, first_iter=1, last_iter=None, iters_to_use=None, use_ray=False, result_batch_size=5, progress_bar=None)
Compute the matrix of fluxes at a given lag time, for a range of iterations.
Checks if a file has been written named “<self.modelName>_s<first_iter>_e<last_iter>_lag<n_lag>_clust<self.n_clusters>.h5”. If this file exists, load it and recalculate if it was calculated at an earlier iteration. Otherwise, write it.
- Updates:
self.n_lag
self.errorWeight
self.errorCount
self.fluxMatrixRaw
- Parameters:
n_lag (int) – Number of lags to use.
first_iter (int) – First iteration to use.
last_iter (int) – Last iteration to use.
self (modelWE) –
- Return type:
None
- modelWE.organize_fluxMatrix(use_ray=False, progress_bar=None, **args)
This cleaning step removes all clusters that aren’t in the largest connected set, then rediscretizes all the trajectories according to the new reduced set of clusters.
- Parameters:
self (modelWE) –
- modelWE.do_block_validation(cross_validation_groups, cross_validation_blocks, use_ray=True, progress_bar=None)[source]
One way to estimate the uncertainty of your model is to split your data into blocks, compute models over groups of the blocks, and assess consistency between the groups.
The procedure here chops your data up into uniform blocks, by iteration. For example, with 100 iterations and 4 blocks, the blocks consist of iterations [0-25), [25-50), [50-75), [75-100). If the above blocks were assigned to 2 groups, the groups would consist of iterations ( [0-25), [50-75) ) and ( [25-50), [75-100) )
- Parameters:
cross_validation_groups (int) – Number of groups to assign blocks over
cross_validation_blocks (int) – Number of blocks to split data into
Analysis
- modelWE.get_Tmatrix()
Compute the transition matrix from the flux matrix. Corrects the “target” states to be true sink states.
- More specifically:
row-normalizes the flux matrix,
sets any states with 0 flux ot (i.e. sinks) to have 1.0
sets target bins to uniformly recycle into basis bins
- Updates:
self.Tmatrix
- Return type:
None
- Parameters:
self (modelWE) –
- modelWE.get_steady_state(flux_fractional_convergence=0.0001, max_iters=10)
” Get the steady-state distribution for the transition matrix. Uses scipy eigensolver to obtain an initial guess, then refines that using inverse iteration.
- Parameters:
flux_fractional_convergence ((optional, default=1e-4) float) – Convergence of the
max_iters ((optional, default=100) int) –
self (modelWE) –
Notes
Transition matrices generated from WE data may have entries spanning many orders of magnitude, and may have extremely high condition numbers. Furthermore, the smallest entries may be those near the target state, which are also the most important for computing target fluxes, meaning values near machine precision can’t just be truncated. All this means that floating-point error may substantially affect the results of eigensolvers, and may produce bad/negative/inaccurate values for small probability bins.
In order to obtain better estimates, sparse matrices are used to reduce the number of floating point operations being performed. A stationary distribution is first estimated using scipy’s eigensolver for sparse matrices. This is then used as an initial guess for the inverse iteration method, to further refine it. Convergence of the inverse iteration is determined using change in the flux estimate.
- modelWE.get_steady_state_target_flux(pSS=None, _set=True)
Get the total flux into the target state(s).
- Updates:
self.lagtime
self.JtargetSS
- Parameters:
pSS ((optional) array-like) – Steady-state distribution. If nothing provided, then use self.pSS
_set ((optional) boolean) – If True, then update self.JtargetSS and self.lagtime. If False, then just return the value of JtargetSS.
self (modelWE) –
- Return type:
None
- modelWE.get_committor(conv=1e-05)
Iteratively obtain an estimate of the committor.
Take the flux matrix, and normalize it into a transition matrix.
- Apply two-sided absorbing boundary conditions by setting self-transition probabilities for the basis and
target states to 1.0, and all transitions out to 0.0.
- Starting with an initial committor “guess” of all 1s, iteratively multiply the guess by the transition matrix
until convergence is below conv.
- Updates:
self.q
- Parameters:
conv (numerical) – Convergence criteria for committor calculation. Calculation stops when the total difference between q_p and q is less than this.
self (modelWE) –
- modelWE.get_flux()
Get the measured flux (i.e. from the flux matrix) into the target.
- Parameters:
self (modelWE) –
- modelWE.get_cluster_centers()
Standalone method to obtain average pcoords of all segments in each cluster.
This functionality is contained in organize_fluxMatrix.
- Parameters:
self (modelWE) –
- modelWE.update_cluster_structures(build_pcoord_cache=False)
Find structures (i.e. sets of coordinates) corresponding to each clusters.
- Parameters:
build_pcoord_cache (bool) – If True, builds self.pcoord_cache, which has elements [cluster_idx][seg_idx] holding the pcoord for the seg_idx’th segment in MSM cluster cluster_idx.
self (modelWE) –
- Returns:
in that cluster.
- Return type:
A dictionary where the keys are cluster indices, and the values are lists of coordinates (structures)
Plotting
- modelWE.plot_flux(custom_name=None, ax=None, save=False, suppress_validation=False, _from_colors=None, _to_colors=None, pcoord_to_use=0, **_plot_args)
Make, and save, a plot of the fluxes along the RMSD. get_flux() must be run before this.
- Parameters:
custom_name (str (optional)) – The name for the saved plot. Defaults to flux_s<first iter>_e<last iter>.png
self (modelWE) –
Optimization (msm_we.optimization)
Discrepancy calculations and WE binning/allocation optimization
- msm_we.optimization.solve_discrepancy(tmatrix, pi, B)[source]
Given a transition matrix, solves for the discrepancy function.
The Poisson equation for the discrepancy function is
\[(I - K)h = 1_B - \pi(B), \:\: h \cdot \pi = 0\]however, since \(I-K\) is singular, we instead solve
\[(I - K + \pi \pi^T / || \pi ||^2_2)h = 1_B - \pi(B), \:\: h \cdot \pi = 0\]where \(h\) is a volumn vector, 1_B is an indicator function which is 1 in B and 0 everywhere else, \(\pi\) is the steady-state solution of \(K\), and pi(B) is a column vector with the steady-state value of \(\pi(B)\) in every element.
- Parameters:
tmatrix (2D array-like,) – Transition matrix
pi (array-like,) – Steady-state distribution for the input transition matrix
B (array-like,) – Indices of target states B
- Return type:
(discrepancy, variance)
- msm_we.optimization.get_uniform_mfpt_bins(variance, discrepancy, steady_state, n_desired_we_bins)[source]
Implements the MFPT-binning strategy described in [1], where bins are groups of microstates that are uniformly spaced in the integral of pi * v
- Parameters:
variance (Variance function) –
array-like (Steady-state distribution) –
discrepancy (Discrepancy function) –
array-like –
steady_state (Steady-state distribution) –
array-like –
int (n_desired_we_bins) – less any recycling or basis bins
References
[1] Aristoff, D., Copperman, J., Simpson, G., Webber, R. J. & Zuckerman, D. M. Weighted ensemble: Recent mathematical developments. Arxiv (2022).
- msm_we.optimization.get_clustered_mfpt_bins(variance, discrepancy, steady_state, n_desired_we_bins, seed=None)[source]
- Implements the MFPT-binning strategy described in [1], where bins are groups of microstates that are obtained by
k-means clustering on the integral of pi * v
- Parameters:
variance (Variance function) –
array-like (Steady-state distribution) –
discrepancy (Discrepancy function) –
array-like –
steady_state (Steady-state distribution) –
array-like –
int (n_desired_we_bins) –
- Return type:
An array where each element is the WE bin index assigned to an haMSM microstate.
References
[1] Aristoff, D., Copperman, J., Simpson, G., Webber, R. J. & Zuckerman, D. M. Weighted ensemble: Recent mathematical developments. Arxiv (2022).
- class msm_we.optimization.OptimizedBinMapper(*args, **kwargs)[source]
- create_new(nbins, n_original_pcoord_dims, target_pcoord_bounds, basis_pcoord_bounds, previous_binmapper, microstate_mapper, stratified_clusterer, cluster_on_pcoord=False, *args, **kwargs)[source]
Creates an OptimizedBinMapper, suitable for use with the optimization workflow
- Parameters:
nbins (int, Number of WE bins) –
n_original_pcoord_dims (int, Number of dimensions in the original user-supplied progress coordinate) –
microstate_mapper (dict, Mapping of microstates to WE bins) –
stratified_clusterer (StratifiedClusters) –
cluster_on_pcoord (bool) –
WESTPA Plugins (msm_we.westpa_plugins)
- class msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver(sim_manager, plugin_config)[source]
WESTPA plugin to augment west.h5 with auxiliary coordinates.
After each iteration, appends coordinates to iter_XXX/auxdata/coord, for later usage with haMSM construction.
Can be used by including the following entries in your west.cfg:
west: plugins: - plugin: msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver topology_file: path/to/topology.pdb child_traj_filename: name of segment trajectory file in traj_segs/<iter>/<seg>/ parent_traj_filename: name of parent trajectory file in traj_segs/<iter>/<seg>/
- augment_coordinates()[source]
After propagation completes in a WE iteration, this populates auxdata/coord with the coordinates.
Looks for parent/child trajectory files in the segment data_ref path defined in west.cfg, named according to parent_traj_filename and child_traj_filename in the plugin config.
Plugin for automated haMSM construction.
- class msm_we.westpa_plugins.hamsm_driver.HAMSMDriver(sim_manager, plugin_config)[source]
WESTPA plugin to construct an haMSM.
Can be used by including the following entries in your west.cfg:
west: plugins: # - plugin: An augmentation plugin is also required, such as # msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver - plugin: msm_we.westpa_plugins.hamsm_driver.HAMSMDriver model_name: Name for the model n_clusters: Number of clusters to place in each WE bin (see stratified clustering for more details) tau: WESTPA resampling time in physical units basis_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...] target_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...] first_analysis_iter: Integer of the first iteration of the prior marathon to use for analysis. I.e., setting this to 10 would use only iterations 10-end for haMSM construction. This can be useful for omitting some burn-in after a restart. dim_reduce_method: A string specifying a dimensionality reduction method for :meth:`msm_we.msm_we.modelWE.dimReduce` featurization: An importable python method implementing a featurization for :meth:`msm_we.msm_we.modelWE.processCoordinates` n_cpus: Number of CPUs to use with Ray
- class msm_we.westpa_plugins.restart_driver.RestartDriver(sim_manager, plugin_config)[source]
WESTPA plugin to automatically handle estimating steady-state from a WE run, re-initializing a new WE run in that steady-state, and then running that initialized WE run.
Data from the previous run will be stored in the restart<restart_number>/ subdirectory of $WEST_SIM_ROOT.
This plugin depends on having the start-states implementation in the main WESTPA code, which allows initializing a WE run using states that are NOT later used for recycling.
These are used so that when the new WE run is initialized, initial structure selection is chosen by w_init, using weights assigned to the start-states based on MSM bin weight and WE segment weight.
Since it closes out the current WE run and starts a new one, this plugin should run LAST, after all other plugins.
Can be used by including the following entries in your west.cfg:
west: plugins: # - plugin: An augmentation plugin is also required, such as # msm_we.westpa_plugins.augmentation_driver.MDAugmentationDriver - plugin: msm_we.westpa_plugins.restart_driver.RestartDriver n_restarts: Number of total restarts to do extension_iters: Amount of iterations to extend runs by if no runs have reached the target by the first restart. n_runs: Number of runs to do between restarts n_restarts_to_use: Number of restarts to use. Can be a fraction, to use the last fraction amount, or a negative integer, to use the last N. initialization_file: restart_initialization.json model_name: Name for the model n_clusters: Number of clusters to place in each WE bin (see stratified clustering for more details) basis_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...] target_pcoord_bounds: [[pcoord dim 0 lower bound, upper bound], [pcoord dim 1 lower, upper], ...] dim_reduce_method: A string specifying a dimensionality reduction method for :meth:`msm_we.msm_we.modelWE.dimReduce` featurization: A python module implementing a featurization for msm_we.msm_we.modelWE.processCoordinates n_cpus: Number of CPUs to use with Ray- get_original_bins()[source]
Obtains the WE bins and their probabilities at the end of the previous iteration.
- Returns:
bins (np.ndarray) – Array of WE bins
binprobs (np.ndarray) – WE bin weights
- property cur_iter
Get the current WE iteration.
- Returns:
int
- Return type:
The current iteration. Subtract one, because in finalize_run the iter has been incremented
- property is_last_iteration
Get whether this is, or is past, the last iteration in this WE run.
- Returns:
bool
- Return type:
Whether the current iteration is the final iteration
- prepare_extension_run(run_number, restart_state, first_extension=False)[source]
Copy the necessary files for an extension run (versus initializing a fresh run)
- Parameters:
run_number (int) – The index of this run (should be 1-indexed!)
restart_state (dict) – Dictionary holding the current state of the restarting procedure
first_extension (bool) – True if this is the first run of an extension set. If True, then back up west.cfg, and write the extended west.cfg file.
- prepare_new_we()[source]
This function prepares a new WESTPA simulation using haMSM analysis to accelerate convergence.
The marathon functionality does re-implement some of the functionality of w_multi_west. However, w_multi_west merges independent WE simulations, which may or may not be desirable. I think for the purposes of this, it’s good to keep the runs completely independent until haMSM model building. Either that, or I’m just justifying not having known about w_multi_west when I wrote this. TBD.
The algorithm is as follows:
Check to see if we’ve just completed the final iteration
Handle launching multiple runs, if desired
Build haMSM
Obtain structures for each haMSM bin
Make each structure a start-state, with probability set by (MSM-bin SS prob / # structures in bin)
Potentially some renormalization?
Start new WE simulation
Plugin for automated WE hyperparameter optimization.
- class msm_we.westpa_plugins.optimization_driver.OptimizationDriver(sim_manager, plugin_config)[source]
WESTPA plugin to automatically handle performing optimization.
Using an haMSM, updates binning and allocation according to user-specified optimization algorithms. An OptimizedBinMapper is constructed from the optimized binning and allocation, and WE is continued with the new mapper.
Can be used by including the following entries in your west.cfg:
west: plugins: - plugin: msm_we.westpa_plugins.optimization_driver.OptimizationDriver full_coord_map: A pickled dictionary mapping discrete states to full-coordinate structures max_iters: Number of total iterations. WE will run for west.system.max_iters, perform optimization, and continue for another west.system.max_iters, up to this value. # The following parameters are optional, and provided as an example. binning_strategy: An arbitrary python function defining a bin optimization scheme. Takes in an msm_we.msm_we.modelWE and returns an array-like of length n_msm_states, where each element is the index of the WE bin that MSM state will be assigned to by the OptimizedMapper. allocation_strategy: An arbitrary python function defining an allocation optimization scheme. Takes in an msm_we.msm_we.modelWE and returns an array of integer walker allocations for the WE bins.
- do_optimization()[source]
Update WESTPA with an optimized bin mapper, bin allocation, and extend the progress coordinate. Then, continue the WE for more iterations.
- static default_allocation_optimizer(model)[source]
A (trivial) example allocation optimization function, which returns an array with the target number of walkers in each bin.
- compute_optimized_allocation()[source]
Compute the optimal allocation.
If plugin.allocation_strategy is None or not provided, the allocation is not updated.
Otherwise, the constructed haMSM is passed to an arbitrary function that returns an array-like describing the new walker allocation over the WE bins.
- static default_bin_optimizer(model)[source]
Example bin optimization function, which assigns microstates to WE bins.
- compute_optimized_bins()[source]
Computes discrepancy and variance, and returns the resulting optimized bin mapper.
If plugin.binning_strategy is None or not provided,
optimization.get_clustered_mfpt_bins()is used.Otherwise, the constructed haMSM is passed to an arbitrary function that returns an array-like with the WE bin index of all MSM microbins excluding the basis/target (model.indBasis and model.indTargets).
- Return type:
An OptimizedBinMapper
- compute_new_pcoord_map()[source]
SynD specific: Compute a new progress coordinate mapping.
- Returns:
A dictionary of {state indices
- Return type:
extended progress coordinates}
- update_westpa_pcoord(new_pcoord_map)[source]
Changing a progress coordinate during a WE run requires a number of changes in WESTPA’s internal state. This handles making those, so you can call w_run and continue with the new, changed pcoord
- Parameters:
new_pcoord_map (A dictionary mapping discrete states to the new, extended pcoord) –
FPT Calculations
msm_we.fpt
First-passage time (FPT) calculations from trajectories or matrices
Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis
- class msm_we.fpt.MatrixFPT[source]
Define a base class for calculating FPTs using transition matrix
- classmethod mean_fpts(tmatrix, stateA, stateB, lag_time=1)[source]
Calculting mean-first passave time for at transition matrix
Derived class will implement this function differently.
- classmethod directional_mfpt(transition_matrix, stateA, stateB, ini_probs=None, lag_time=1)[source]
Computes the mean-first passage time in a single direction using a recursive procedure
This method is useful when there is no B->A ensemble but only A->B transitions, for instance when B is absorbing.
- Parameters:
transition_matrix (Numpy 2D array) –
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
ini_probs (List of float, default is None) – initial probabilities in stateA
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps
- Return type:
mean-first passage time from A->B
- classmethod mfpts_to_target_microstate(transition_matrix, target, lag_time=1)[source]
Computes all the mean-first passage to a target microstate (k)
Returns a list where the i-element is mfpt(i->k). This function is useful to compute the mfpt matrix.
- Parameters:
transition_matrix (Numpy 2D array) –
target (Integer number that specifies the index of the state. The indexes) – should be consistent with the transition matrix and python (i.e. starting from 0)
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps
- Returns:
a list where the i-element is mfpt(i->k). This function is
useful to compute the mfpt matrix.
- classmethod mfpts_matrix(transition_matrix, lag_time=1)[source]
Calculate MFPT matrix, i.e., the matrix where the ij-element is MFPT(i->j)
- Parameters:
transition_matrix (Numpy 2D array) –
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps
- Return type:
mean-first passage time matrix with ij-element of MFPT(i->j)
- classmethod min_commute_time(matrix_of_mfpts)[source]
Calculate minimum commuting time (round trip time) between all pairs
of microstates from the matrix of mfpts. It also returns the indexes of the pair of microstates involved.
- Parameters:
matrix_of_mfpts (Numpy 2D array) – matrix of MFPTs with ij-element of MFPT(i->j)
- Returns:
Minimum commuting time (round trip time) between all pairs
of microstates and the indexes of the pair of microstates involved.
- classmethod max_commute_time(matrix_of_mfpts)[source]
Calculate maximum commuting time (round trip time) between all pairs
of microstates from the matrix of mfpts. It also returns the indexes of the pair of microstates involved.
- Parameters:
matrix_of_mfpts (Numpy 2D array) – matrix of MFPTs with ij-element of MFPT(i->j)
- Returns:
Maximum commuting time (round trip time) between all pairs
of microstates and the indexes of the pair of microstates involved.
- classmethod fpt_distribution(t_matrix, initial_state, final_state, initial_distrib, min_power=1, max_power=12, max_n_lags=100, lag_time=1, dt=1.0, clean_recycling=False, logscale=False)[source]
Calculated distribution of first passage times from transition matrix
- Parameters:
t_matrix (Numpy 2D array) –
initial_state (List of integer numbers) – Specifies the indexes of initial and final states.
final_state (List of integer numbers) – Specifies the indexes of initial and final states.
initial_distrib (List of float, default is None) – initial probabilities for initial states
min_power (Integer) – The minimum and maximum power when the FPT distribution is shown in logscale such as (10^min_power, 10^max_power)*lag_time*dt.
max_power (Integer) – The minimum and maximum power when the FPT distribution is shown in logscale such as (10^min_power, 10^max_power)*lag_time*dt.
max_n_lags (Integer) – maximum number of lags when the FPT distribution is shown in linear scale such as (0, max_n_logs)*lag_time*dt. When in logscale, this is number of points to shown in the range of (10^min_power, 10^max_power)*lag_time*dt.
lag_time (Integer) – Lag time used, the trajectory is “observed” every lag_time time steps
dt (Float) – Time step
clean_recycling (Bool) – Cleaning the recycling of steady state simulation if True
logscale (Bool) – Option to use logscale for FPT time in the distribution
- Return type:
Distributions of first passage times
- static adaptive_fpt_distribution(Tmatrix, initial_states, initial_state_probs, target_states, tau=1, increment=5, fine_increment=1.2, relevant_thresh=0.0001, max_steps=1000000, max_time=inf, explicit_renormalization=False, verbose=False)[source]
Adaptively computes a first-passage time distribution.
Starting at t=tau, compute the probability flowing into the target at t. Then, increment t by multiplying it by the coarse increment. When relevant_thresh probability has entered the target state, step back to the previous coarse state, and swap over to incrementing with the fine increment. This allows you to efficiently sweep log-space.
Procedurally, this starts probability in specified initial_states according to initial_state_probs, and then propagates that probability through the transition matrix. The FPT distribution is measured by tracking new probability entering the target state at each time.
Note that absorbing boundary conditions are stripped from the transition matrix – if this is not done, then the result is like a probability CDF, not a probability distribution.
- Parameters:
Tmatrix (array-like) – Transition matrix
initial_states (array-like of ints) – List of initial states to start probability in
initial_state_probs (array-like) – Probability distribution across the initial states.
target_states (array-like) – Target states for MFPT.
tau –
increment (float) – Multiplicative increment for coarse steps
fine_increment (float) – Multiplicative increment for fine steps, once the minimum probability in the target has been reached.
relevant_thresh (float) – Amount of probability that must be in the target before switching to fine increments.
max_steps (int) – Maximum number of steps to run
max_time (float) – Maximum time to run to
explicit_renormalization (bool) – Whether to explicitly renormalize the transition matrix. This should not be necessary – if it is, there’s probably some numerical instability you should be careful of.
verbose (bool) – Produce verbose text output.
- Returns:
FPT distribution,
probability distribution at each time,
last step index,
times at which FPT distribution was evaluated
- class msm_we.fpt.MarkovFPT[source]
Derived a class for calculating FPTs using Markovian transition matrix
- classmethod mean_fpts(markov_tmatrix, stateA, stateB, lag_time=1)[source]
Computes mean first passage times using Markovian transition matrix
in both directions A->B and B->A from a markov model. The MFPTs computed in this way are directly comparable with the values obtained by a long back and forth simulation between the target states.
- Parameters:
markov_matrix (Numpy 2D array) – Markovian transition matrix
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps
- Return type:
mean-first passage times from A->B and B->A
- classmethod markov_commute_time(transition_matrix, stateA, stateB, lag_time=1)[source]
Computing commute time for Markovian Model
- Parameters:
transition_matrix (Numpy 2D array) – Markovian transition matrix
stateA (List of integers) – Both states are a list of indexes.
stateB (List of integers) – Both states are a list of indexes.
lag_time (integer) – Lag time used, the trajectory is “observed” every lag_time time steps
- Return type:
Commute time from mean-first passage times
- class msm_we.fpt.NonMarkovFPT[source]
Derived a class for calculating FPTs using Non Markov transition matrix
- classmethod mean_fpts(nm_transition_matrix, stateA, stateB, lag_time=1)[source]
Computes the mean first passage times from a non-markovian model
in both directions of A->B and B->A. The shape of the transition matrix should be (2*n_states, 2*n_states). :param nm_transition_matrix: Non-Markovian transition matrix :type nm_transition_matrix: Numpy 2D array :param stateA: Both states are a list of indexes. :type stateA: List of integers :param stateB: Both states are a list of indexes. :type stateB: List of integers :param lag_time: Lag time used, the trajectory is “observed” every lag_time time steps :type lag_time: integer
- Return type:
mean-first passage times from A->B and B->A
msm_we.ensembles
Implements Ensemble class for managing trajectories
Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis
- class msm_we.ensembles.Ensemble(trajectories=None, verbose=False, dtype='float32', discrete=False, lag_time=1, **kwargs)[source]
Define a base class to store a list of space-continuous trajectories
- e.g.: [ trajectory1, trajectory2,…], each trajectory is just a matrix
where each row is a snapshot, and each column represents the evolution of the corresponding variable.
- class msm_we.ensembles.PathEnsemble(trajectories=None, verbose=False, dtype='float32', discrete=False, lag_time=1, stateA=None, stateB=None, **kwargs)[source]
Derive a class to store trajectories for path analysis
- class msm_we.ensembles.DiscreteEnsemble(trajectories=None, verbose=False, dtype='int32', discrete=True, lag_time=1, **kwargs)[source]
Define a base class to store a list of space-discrete trajectories
- class msm_we.ensembles.DiscretePathEnsemble(trajectories=None, verbose=False, dtype='int32', discrete=True, lag_time=1, stateA=None, stateB=None, **kwargs)[source]
Derive a class to store a list of discrete trajectories for path analysis
- classmethod from_transition_matrix(transition_matrix, stateA=None, stateB=None, n_paths=1000, ini_pops=None, max_iters=1000000000)[source]
Construct a path ensemble from a transition matrix
stateA: list, intitial state
stateB: list, final state
- n_paths: integer, with default value of 1000
number of paths to generate
- ini_pops: list or label, probability distribution over the
initial state used to generate the path
possible values: a) None Use a uniform distribution over the states in stateA c) list A list with the explicit values of the populations in stateA that should be used to generate the ensemble
- max_iters: integer, with default value of 1000000000
maximum of iterations for generating path trajectories
- classmethod from_ensemble(ensemble, stateA, stateB, map_function=None)[source]
Build a DiscreteEnsemble from an ensemble object or a set of trajectories
msm_we.nmm
“Non-Markovian” trajectory analysis
Adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis
- class msm_we.nmm.NonMarkovModel(trajectories, stateA, stateB, lag_time=1, clean_traj=False, sliding_window=True, reversible=True, markovian=False, coarse_macrostates=False, **kwargs)[source]
Define a class for analyzing MD trajectories using Markovian or non-Markovian Model
from a list of 1D trajectories of integers representing macrostates
For example:
trajectories = [ [1 , 2, 0, …], [2, 2, 1, …], [3, 1, 2, …], …]
If only one sequence is given in trajectories, the format is the same:
trajectories = [ [1 , 2, 0, …] ]
- Parameters:
(integer (lag_time) – Lag time of the model.
default – Lag time of the model.
(boolean) (sliding_window) – Use a sliding window of length lag_time to compute the count matrix
stateA – Define the initial and final macrostates in form of python lists for example: stateA=[0,2,5], stateB = [1]
lists) (stateB (python) – Define the initial and final macrostates in form of python lists for example: stateA=[0,2,5], stateB = [1]
- n_states
- Type:
int
- nm_cmatrix
Stores the number of transitions between states, the i,j element cij stores the number of transitions observed from i to j.
- Type:
array, with shape (2 n_states, 2 n_states)
- populations[source]
Equilibrium population, the steady state solution of of the transition matrix
- Type:
array, shape (n_states,)
msm_we.utils
Miscellaneous convenience functions
Some functionality was adapted from the original NMpathAnalysis package, https://github.com/ZuckermanLab/NMpathAnalysis
- msm_we.utils.find_connected_sets(C, directed=True)[source]
This implementation is taken from msmtools.estimation.sparse.connectivity, at commit 9312660. See the original at https://github.com/markovmodel/msmtools/blob/devel/msmtools/estimation/sparse/connectivity.py#L30
Compute connected components for a directed graph with weights represented by the given count matrix.
- Parameters:
C (scipy.sparse matrix or numpy ndarray) – square matrix specifying edge weights.
directed (bool, optional) – Whether to compute connected components for a directed or undirected graph. Default is True.
- Returns:
cc – Each entry is an array containing all vertices (states) in the corresponding connected component.
- Return type:
list of arrays of integers
- msm_we.utils.is_connected(matrix, source_states, target_states, directed=True)[source]
Check for connectivity between two states. If directed is True, then checks for directional connectivity from source_states to target_states.
- Parameters:
matrix (np.array, NxN) – Transition matrix
source_states (array-like, N) – Source states
target_states (array-like, N) – Target states
directed (bool, default=True) – Compute directional connectivity
- Return type:
bool
- msm_we.utils.inverse_iteration(guess, matrix, mu=1)[source]
Do one iteration of inverse iteration.
- Parameters:
guess (array-like (N elements)) – Vector of weights to be used as the initial guess.
matrix (array-like (NxN elements)) – Transition matrix to use for inverse iteration.
- Return type:
The new vector of weights after one iteration of inverse iteration.
- class msm_we.utils.Interval(interval_set, n_variables)[source]
Intervals are in general defined as half-open interval [start,end).
in any case, in each dimension the interval is specified using a list [a,b] where a < b
For 1D (single) interval a single list in the form [a,b] has to be given
- The union of multiple (1D) intervals can be specified as:
[[a,b],[c,d],…]
- A list of lists [[a,b],[c,d],…] are used for a n-dimensional
intervals, one for each dimension, i.e, len(interval) = n_variables
- A list-of-lists-of-lists for the mathematical union of n-dimensional
intervals’
[ [[a,b],[c,d],…], [[e,f],[g,h],…], … ]
- msm_we.utils.reverse_sort_lists(list_1, list_2)[source]
Reverse sorting two list based on the first one
- msm_we.utils.weighted_choice(list_, weights=None)[source]
Select an element from a list with probability from weights
- msm_we.utils.get_shape(trajectory)[source]
Get the shape of a trajectory array in tuple (n_snapshots, n_variables)
- msm_we.utils.num_of_nonzero_elements(my_vector)[source]
Returns the number of non-zero elements in a vector
- msm_we.utils.normalize_markov_matrix(transition_matrix, reversible=False)[source]
Transform a matrix of positive elements to a markov-like matrix
by dividing each row by the sum of the elements of the row.
- msm_we.utils.normalize(my_vector)[source]
Normalize a vector
by dividing each element by the total sum of all its elements
- msm_we.utils.random_markov_matrix(n_states=5, seed=None)[source]
Returns a random transition markov matrix
- msm_we.utils.check_tmatrix(t_matrix, accept_null_rows=True)[source]
Check if the given matrix is actually a row-stochastic transition matrix
- i.e, all the elements are non-negative and the rows add to one.
If the keyword argument accept_null_rows is True, is going to accept rows where all the elements are zero. Those “problematic” states are going to be removed later if necessary by clean_tmatrix.
- msm_we.utils.clean_tmatrix(transition_matrix, rm_absorbing=True)[source]
Removes the states/indexes with no transitions and that are absorbing
if the the keyword argument rm_absorbing is true Returns the “clean” transition matrix and a list with the removed states/indexes (clean_tmatrix, removed_states)
- msm_we.utils.pops_from_tmatrix(transition_matrix)[source]
Calculate the eigen values and eigen vectors of the transposed transition matrix
- Parameters:
transition_matrix (ndarray with shape = (n_states, n_states)) –
- Return type:
the solution, p, of K.T p = p where K.T is the transposed transition matrix
- msm_we.utils.pops_from_nm_tmatrix(transition_matrix)[source]
Computes the populations of the real/physical states
from a non-Markovian transtion matrix with shape (2*n_states, 2*n_states)