API Documentation
- class domhmm.PropertyCalculation(universe_or_atomgroup: Universe | AtomGroup, membrane_select: str = 'all', gmm_kwargs: None | dict = None, hmm_kwargs: None | dict = None, leaflet_kwargs: Dict[str, Any] = {}, leaflet_select: None | AtomGroup | str | list = None, heads: Dict[str, Any] = {}, tails: Dict[str, Any] = {}, sterol_heads: Dict[str, Any] = {}, sterol_tails: Dict[str, Any] = {}, tmd_protein_list: None | AtomGroup | str | list = None, frac: float = 0.5, p_value: float = 0.05, leaflet_frame_rate: None | int = None, sterol_frame_rate: int = 1, asymmetric_membrane: bool = False, verbose: bool = False, result_plots: bool = False, trained_hmms: Dict[str, Any] = {}, n_init_hmm: int = 2, save_plots: bool = False, do_clustering=True, **kwargs)[source]
Bases:
LeafletAnalysisBaseThe DirectorOrder class calculates an order parameter for each selected lipid according to the formula:
P2 = 0.5 * (3 * cos(a)^2 - 1), (1)
where an is the angle between a pre-defined director in the lipid (e.g. C-H or CC) and a reference axis.
- GMM(gmm_kwargs)[source]
Fit Gaussian Mixture
- Parameters:
gmm_kwargs (dict) – Additional parameters for mixture.GaussianMixture
- HMM(hmm_kwargs)[source]
Create Gaussian based Hidden Markov Models for each residue type
- Parameters:
hmm_kwargs (dict) – Additional parameters for hmmlearn.hmm.GaussianHMM
- static area_per_lipid_vor(coor_xy, boxdim, frac)[source]
Calculation of the area per lipid employing Voronoi tessellation on coordinates mapped to the xy plane. The function takes also the periodic boundary conditions of the box into account.
- Parameters:
coor_xy (numpy.ndarray) – Coordinates of upper/lower leaflet
boxdim (array) – Length of box vectors in all directions
frac (float) – Fraction of box length in x and y outside the unit cell considered for Voronoi calculation
- Returns:
vor (Voronoi Tesselation) – Scipy’s Voronoi Diagram object
apl (Area per lipid) – Area per lipid based on Scipy’s Voronoi Diagram
pbc_idx (Indices) – Unit cell indices for periodic image coordinates
- static assign_core_lipids(weight_matrix_f, g_star_i_f, order_states_f, w_ii_f, z_score)[source]
Assign lipids as core members (aka lipids with a high positive autocorrelation) depending on the Getis-Ord spatial local autocorrelation statistic.
- Parameters:
weight_matrix_f (numpy.ndarray) – Matrix containing the weight factors between all lipid pairs at one time step
g_star_i_f (numpy.ndarray) – Getis-Ord spatial local autocorrelation statistic for every lipid at one time step
order_states_f (numpy.ndarray) – Order states for every lipid at one time step
w_ii_f (numpy.ndarray) – Self-influence weight factor for every lipid at one time step
z_score (dict) – Contains boundary of the reaction region
- Returns:
core_lipids – Contains a TRUE value if the lipid is a core member, otherwise it FALSE
- Return type:
numpy.ndarray (bool)
- static calc_order_parameter(chain)[source]
Calculate average Scc order parameters per acyl chain according to the equation:
S_cc = (3 * cos( theta )^2 - 1) / 2,
where theta describes the angle between the z-axis of the system and the vector between two subsequent tail beads.
- Parameters:
chain (Selection) – MDAnalysis Selection object
- Returns:
s_cc – Mean S_cc parameter for the selected chain of each residue
- Return type:
numpy.ndarray
- clustering_plot()[source]
Runs hierarchical clustering and plots clustering results in different frames.
- fit_hmm(data, gmm, hmm_kwargs, n_repeats=10)[source]
Fit several HMM models to the data and return the best one.
- Parameters:
data (numpy.ndarray) – Data of the single lipid properties for one lipid type at each time step
gmm (GaussianMixture Model) – Scikit-learn object
hmm_kwargs (dict) – Additional parameters for Hidden Markov Model
n_repeats (int) – Number of independent fits
- Returns:
best_ghmm – hmmlearn object
- Return type:
GaussianHMM
- get_leaflet_step_index_to_resid(leaflet, step)[source]
Returns leaflet index to residue id map for a specific leaflet at specific frame of the trajectory
- Parameters:
leaflet (int) – leaflet index
step (int) – step index
- Returns:
result_map – dictionary contains leaflet index of residue as key and residue id as value
- Return type:
- get_leaflet_step_order(leaflet, step)[source]
Receive residue’s order state with respect to the leaflet
- Parameters:
leaflet (int) – leaflet index
step (int) – step index
- Returns:
order_states – Numpy array contains order state results of the leaflet at step in order of system’s residues
- Return type:
numpy.ndarray
- get_leaflet_step_order_index(leaflet, step)[source]
Receive residue’s indexes and positions for a specific leaflet at any frame of the trajectory.
- Parameters:
leaflet (int) – leaflet index
step (int) – step index
- Returns:
indexes (dict) – dictionary contains numpy.arrays containing residue’s indexes for each unique residue type
positions (dict) – dictionary contains numpy.arrays containing residue’s positions for each unique residue type
- static get_residue_idx(resids_index_map, resids)[source]
Checks resids index map (lookup table) to return corresponding indexes to use in DomHMM result section.
- Parameters:
resids_index_map (dict) – Lookup table where each residue id’s index in self.membrane_unique_resids is kepts
resids (numpy.ndarray) – Residue id list for indexing
- Returns:
idx – Indexes in the same order with resids
- Return type:
numpy.ndarray
- getis_ord_stat(weight_matrix_all, leaflet)[source]
Getis-Ord Local Spatial Autocorrelation Statistics calculation based on the predicted order states of each lipid and the weighting factors between neighbored lipids.
- Parameters:
weight_matrix_all (sparse.csr_array) – Weight matrix for all lipid in a leaflet at current time step
leaflet (int) – 0 = upper leaflet, 1 = lower leafet
- Returns:
g_star_i (list of numpy.ndarrays) – G*i values for each lipid at each time step
w_ii_all (list of numpy.ndarrays) – Self-influence of the lipids
- static hierarchical_clustering(weight_matrix_f, w_ii_f, core_lipids)[source]
Hierarchical clustering approach to identify spatial related Lo domains.
- Parameters:
weight_matrix_f (numpy.ndarray) – Matrix containing the weight factors between all lipid pairs at one time step
w_ii_f (numpy.ndarray) – Self-influence weight factor for every lipid at one time step
core_lipids (numpy.ndarray) – Array contains information which lipid is assigned as core member
- Returns:
clusters – The dictionary contains each found cluster
- Return type:
- static hmm_diff_checker(means, prediction_results)[source]
- Checks if prediction assignments correct with respect to means. Since HMM is unsupervised, model can assign 0 to
disordered domains and 1 to ordered domains. In these cases needs to be changed to vice versa.
- Parameters:
means (np.ndarray) – Table of model which contains means of area per lipid and order parameters
prediction_results (np.ndarray) – Initial prediction results which is 1s and 0s
- Returns:
prediction_results – Same or flipped prediction results with respect to means table
- Return type:
np.ndarray
- mixture_plot(resname, gmm_model, hmm_model, leaflet=None)[source]
Plot results of the Gaussian Mixture Model for each residue type. The function should help to spot flaws in the results of the Gaussian Mixture Model.
The Gaussian Mixture Model in DomHMM is trained on a three-dimensional space (1 APL + 2 Acyl chains), hence also its ouput (mean vectors and covariance matrices) are three0-dimensional. However, plots in 3 dimensions are rather hard to understand, therefore we plot here the projections of the raw/fitted probability densities on the xz, yz, and xy plane.
- Parameters:
resname (str) – Resname of the actual residue type
gmm_model (GaussianMixture Model) – Scikit-learn object
hmm_model (Hidden Markow Model) – hmmlearn object
leaflet (int or None) – If membrane is asymmetric ensure that plots are made for upper and lower leaflet
- permut_getis_ord_stat(weight_matrix_all, leaflet)[source]
Getis-Ord Local Spatial Autocorrelation Statistics calculation based on the predicted order states of each lipid and the weighting factors between neighbored lipids.
- Parameters:
weight_matrix_all (sparse.csr_array) – Weight matrices for all lipid in a leaflet at each time step
leaflet (int) – 0 = upper leaflet, 1 = lower leafet
- Returns:
g_star_i – G*i values for each lipid at each time step
- Return type:
list of numpy.ndarrays
- prepare_train_data()[source]
Prepare train data for GMM and HMM while separating self.results.train with respect to each residue
- state_validate()[source]
Validate state assignments of HMM model by checking means of the model of each residue.
- static weight_matrix(vor, pbc_idx, coor_xy)[source]
Calculate the weight factors between neighbored lipid pairs based on a Voronoi tessellation.
- Parameters:
vor (Voronoi Tesselation) – Scipy’s Voronoi Diagram object
pbc_idx (Indices) – Unit cell indices of periodic image coordinates
coor_xy (numpy.ndarray) – Coordinates of upper/lower leaflet
- Returns:
weight_matrix – Weight factors wij between neighbored lipid pairs. Is 0 if lipids are not directly neighbored.
- Return type:
numpy.ndarray
- class domhmm.analysis.base.LeafletAnalysisBase(universe_or_atomgroup: Universe | AtomGroup, membrane_select: str = 'all', gmm_kwargs: None | dict = None, hmm_kwargs: None | dict = None, leaflet_kwargs: Dict[str, Any] = {}, leaflet_select: None | AtomGroup | str | list = None, heads: Dict[str, Any] = {}, tails: Dict[str, Any] = {}, sterol_heads: Dict[str, Any] = {}, sterol_tails: Dict[str, Any] = {}, tmd_protein_list: None | AtomGroup | str | list = None, frac: float = 0.5, p_value: float = 0.05, leaflet_frame_rate: None | int = None, sterol_frame_rate: int = 1, asymmetric_membrane: bool = False, verbose: bool = False, result_plots: bool = False, trained_hmms: Dict[str, Any] = {}, n_init_hmm: int = 2, save_plots: bool = False, do_clustering=True, **kwargs)[source]
Bases:
AnalysisBaseLeafletAnalysisBase class.
This class marks the start point for each analysis method.
It connects to the MDAnalysis core library, setups the MDAnalysis universe, and provides coordinates for each membrane leaflet.
- Parameters:
universe_or_atomgroup (
UniverseorAtomGroup) – Universe or group of atoms to apply this analysis to. If a trajectory is associated with the atoms, then the computation iterates over the trajectory.membrane_select (str) – Membrane selection query for analysis of simulation trajectories.
gmm_kwargs (Optional[Dict]) – Optional parameter for Gaussian mixture model function.
hmm_kwargs (Optional[Dict]) – Optional parameter for Hidden Markov model function.
leaflet_kwargs (Optional[dict]) – dictionary containing additional arguments for the MDAnalysis LeafletFinder
leaflet_select (Union[“auto”, List[AtomGroup], List[str]]) –
- Leaflet selection options for lipids which can be automatic by finding Leafletfinder, atomgroup, string query or
list
heads (Dict[str, Any]) – dictionary containing residue name and atom selection for lipid head groups
tails (Dict[str, Any]) – dictionary containing residue name and atom selection for lipid tail groups
sterol_heads (Dict[str, Any]) – dictionary containing residue name and atom selection for sterol head groups
sterol_tails (Dict[str, Any]) – dictionary containing residue name and atom selection for sterol tail groups (head as first, tail beginning as second)
tmd_protein_list (Union[“auto”, List[AtomGroup], List[str]]) – Transmembrane domain protein list to include area per lipid calculation
frac (float) – fraction of box length in x and y outside the unit cell considered for Voronoi calculation
p_value (float) – p_value for z_score calculation
leaflet_frame_rate (Union[None, int]) – Frame rate for checking lipids leaflet assignments via LeafletFinder
sterol_frame_rate (int) – Frame rate for checking sterols leaflet assignments via LeafletFinder
asymmetric_membrane (bool) – Asymmetric membrane option to train models by separated data w.r.t. leaflets
verbose (bool) – Debug option to print step progress, warnings and errors
result_plots (bool) – Plotting intermediate result option
save_plots (bool) – Option for saving intermediate plots in pdf format
trained_hmms (dict) – User-specific HMM (e.g., pre-trained on another simulation)
do_clustering (bool) – Perform the hierarchical clustering for each frame
- Variables:
universe_or_atomgroup (
Universe) – The universe to which this analysis is appliedatomgroup (
AtomGroup) – The atoms to which this analysis is appliedresults (
Results) – results of calculation are stored here, after callingrun()start (Optional[int]) – The first frame of the trajectory used to compute the analysis
stop (Optional[int]) – The frame to stop at for the analysis
step (Optional[int]) – Number of frames to skip between each analyzed frame
n_frames (int) – Number of frames analysed in the trajectory
times (numpy.ndarray) – array of Timestep times. Only exists after calling
run()frames (numpy.ndarray) – array of Timestep frame indices. Only exists after calling
run()
- get_heads()[source]
Make an atomgroup containing all head groups (lipids + sterols).
- Returns:
all_heads – AtomGroup containing headgroups of all lipids and sterols
- Return type:
MDAnalysis.AtomGroup
- get_leaflets()[source]
Automatically assign non-sterol compounds to the upper and lower leaflet of a bilayer using the MDAnalysis.LeafletFinder
- get_leaflets_sterol()[source]
Assign sterol compounds to the upper and lower leaflet of a bilayer using a distance cut-off.
- get_lipid_tails()[source]
Make atomgroups for tailgroups for each chain of residues
- Variables:
self.resid_tails_selection (dict) – dictionary containing the tail atomgroups for all residues each tail