API Documentation

class domhmm.PropertyCalculation(universe_or_atomgroup: Universe | AtomGroup, membrane_select: str = 'all', gmm_kwargs: None | dict = None, hmm_kwargs: None | dict = None, leaflet_kwargs: Dict[str, Any] = {}, leaflet_select: None | AtomGroup | str | list = None, heads: Dict[str, Any] = {}, tails: Dict[str, Any] = {}, sterol_heads: Dict[str, Any] = {}, sterol_tails: Dict[str, Any] = {}, tmd_protein_list: None | AtomGroup | str | list = None, frac: float = 0.5, p_value: float = 0.05, lipid_leaflet_rate: None | int = None, sterol_leaflet_rate: int = 1, asymmetric_membrane: bool = False, verbose: bool = False, result_plots: bool = False, trained_hmms: Dict[str, Any] = {}, n_init_hmm: int = 2, save_plots: bool = False, do_clustering: bool = True, **kwargs)[source]

Bases: LeafletAnalysisBase

The DirectorOrder class calculates an order parameter for each selected lipid according to the formula:

P2 = 0.5 * (3 * cos(a)^2 - 1), (1)

where an is the angle between a pre-defined director in the lipid (e.g. C-H or CC) and a reference axis.

GMM(gmm_kwargs)[source]

Fit Gaussian Mixture

Parameters:

gmm_kwargs (dict) – Additional parameters for mixture.GaussianMixture

HMM(hmm_kwargs)[source]

Create Gaussian based Hidden Markov Models for each residue type

Parameters:

hmm_kwargs (dict) – Additional parameters for hmmlearn.hmm.GaussianHMM

static area_per_lipid_vor(coor_xy, boxdim, frac)[source]

Calculation of the area per lipid employing Voronoi tessellation on coordinates mapped to the xy plane. The function takes also the periodic boundary conditions of the box into account.

Parameters:
  • coor_xy (numpy.ndarray) – Coordinates of upper/lower leaflet

  • boxdim (array) – Length of box vectors in all directions

  • frac (float) – Fraction of box length in x and y outside the unit cell considered for Voronoi calculation

Returns:

  • vor (Voronoi Tesselation) – Scipy’s Voronoi Diagram object

  • apl (Area per lipid) – Area per lipid based on Scipy’s Voronoi Diagram

  • pbc_idx (Indices) – Unit cell indices for periodic image coordinates

static assign_core_lipids(weight_matrix_f, g_star_i_f, order_states_f, w_ii_f, z_score)[source]

Assign lipids as core members (aka lipids with a high positive autocorrelation) depending on the Getis-Ord spatial local autocorrelation statistic.

Parameters:
  • weight_matrix_f (numpy.ndarray) – Matrix containing the weight factors between all lipid pairs at one time step

  • g_star_i_f (numpy.ndarray) – Getis-Ord spatial local autocorrelation statistic for every lipid at one time step

  • order_states_f (numpy.ndarray) – Order states for every lipid at one time step

  • w_ii_f (numpy.ndarray) – Self-influence weight factor for every lipid at one time step

  • z_score (dict) – Contains boundary of the reaction region

Returns:

core_lipids – Contains a TRUE value if the lipid is a core member, otherwise it FALSE

Return type:

numpy.ndarray (bool)

static calc_order_parameter(chain)[source]

Calculate average Scc order parameters per acyl chain according to the equation:

S_cc = (3 * cos( theta )^2 - 1) / 2,

where theta describes the angle between the z-axis of the system and the vector between two subsequent tail beads.

Parameters:

chain (Selection) – MDAnalysis Selection object

Returns:

s_cc – Mean S_cc parameter for the selected chain of each residue

Return type:

numpy.ndarray

clustering_plot()[source]

Runs hierarchical clustering and plots clustering results in different frames.

fit_hmm(data, gmm, hmm_kwargs, n_repeats=10)[source]

Fit several HMM models to the data and return the best one.

Parameters:
  • data (numpy.ndarray) – Data of the single lipid properties for one lipid type at each time step

  • gmm (GaussianMixture Model) – Scikit-learn object

  • hmm_kwargs (dict) – Additional parameters for Hidden Markov Model

  • n_repeats (int) – Number of independent fits

Returns:

best_ghmm – hmmlearn object

Return type:

GaussianHMM

get_leaflet_step_index_to_resid(leaflet, step)[source]

Returns leaflet index to residue id map for a specific leaflet at specific frame of the trajectory

Parameters:
  • leaflet (int) – leaflet index

  • step (int) – step index

Returns:

result_map – dictionary contains leaflet index of residue as key and residue id as value

Return type:

dict

get_leaflet_step_order(leaflet, step)[source]

Receive residue’s order state with respect to the leaflet

Parameters:
  • leaflet (int) – leaflet index

  • step (int) – step index

Returns:

order_states – Numpy array contains order state results of the leaflet at step in order of system’s residues

Return type:

numpy.ndarray

get_leaflet_step_order_index(leaflet, step)[source]

Receive residue’s indexes and positions for a specific leaflet at any frame of the trajectory.

Parameters:
  • leaflet (int) – leaflet index

  • step (int) – step index

Returns:

  • indexes (dict) – dictionary contains numpy.arrays containing residue’s indexes for each unique residue type

  • positions (dict) – dictionary contains numpy.arrays containing residue’s positions for each unique residue type

static get_residue_idx(resids_index_map, resids)[source]

Checks resids index map (lookup table) to return corresponding indexes to use in DomHMM result section.

Parameters:
  • resids_index_map (dict) – Lookup table where each residue id’s index in self.membrane_unique_resids is kepts

  • resids (numpy.ndarray) – Residue id list for indexing

Returns:

idx – Indexes in the same order with resids

Return type:

numpy.ndarray

getis_ord()[source]

Calculates Getis-Ord statistic for identification model

getis_ord_plot()[source]

Plots average order state for each lipid type per frame

getis_ord_stat(weight_matrix_all, leaflet)[source]

Getis-Ord Local Spatial Autocorrelation Statistics calculation based on the predicted order states of each lipid and the weighting factors between neighbored lipids.

Parameters:
  • weight_matrix_all (sparse.csr_array) – Weight matrix for all lipid in a leaflet at current time step

  • leaflet (int) – 0 = upper leaflet, 1 = lower leafet

Returns:

  • g_star_i (list of numpy.ndarrays) – G*i values for each lipid at each time step

  • w_ii_all (list of numpy.ndarrays) – Self-influence of the lipids

static hierarchical_clustering(weight_matrix_f, w_ii_f, core_lipids)[source]

Hierarchical clustering approach to identify spatial related Lo domains.

Parameters:
  • weight_matrix_f (numpy.ndarray) – Matrix containing the weight factors between all lipid pairs at one time step

  • w_ii_f (numpy.ndarray) – Self-influence weight factor for every lipid at one time step

  • core_lipids (numpy.ndarray) – Array contains information which lipid is assigned as core member

Returns:

clusters – The dictionary contains each found cluster

Return type:

dict

static hmm_diff_checker(means, prediction_results)[source]
Checks if prediction assignments correct with respect to means. Since HMM is unsupervised, model can assign 0 to

disordered domains and 1 to ordered domains. In these cases needs to be changed to vice versa.

Parameters:
  • means (np.ndarray) – Table of model which contains means of area per lipid and order parameters

  • prediction_results (np.ndarray) – Initial prediction results which is 1s and 0s

Returns:

prediction_results – Same or flipped prediction results with respect to means table

Return type:

np.ndarray

mixture_plot(resname, gmm_model, hmm_model, leaflet=None)[source]

Plot results of the Gaussian Mixture Model for each residue type. The function should help to spot flaws in the results of the Gaussian Mixture Model.

The Gaussian Mixture Model in DomHMM is trained on a three-dimensional space (1 APL + 2 Acyl chains), hence also its ouput (mean vectors and covariance matrices) are three0-dimensional. However, plots in 3 dimensions are rather hard to understand, therefore we plot here the projections of the raw/fitted probability densities on the xz, yz, and xy plane.

Parameters:
  • resname (str) – Resname of the actual residue type

  • gmm_model (GaussianMixture Model) – Scikit-learn object

  • hmm_model (Hidden Markow Model) – hmmlearn object

  • leaflet (int or None) – If membrane is asymmetric ensure that plots are made for upper and lower leaflet

order_parameter()[source]

Calculation of scc order parameter for each chain of each residue.

permut_getis_ord_stat(weight_matrix_all, leaflet)[source]

Getis-Ord Local Spatial Autocorrelation Statistics calculation based on the predicted order states of each lipid and the weighting factors between neighbored lipids.

Parameters:
  • weight_matrix_all (sparse.csr_array) – Weight matrices for all lipid in a leaflet at each time step

  • leaflet (int) – 0 = upper leaflet, 1 = lower leafet

Returns:

g_star_i – G*i values for each lipid at each time step

Return type:

list of numpy.ndarrays

plot_hmm_result()[source]

Plots convergence of HMM model training process.

predict_plot()[source]

Plots average ordered lipids of HMM prediction for each lipid type.

predict_states()[source]

Each residues prediction assignment to ordered or disordered domains.

prepare_train_data()[source]

Prepare train data for GMM and HMM while separating self.results.train with respect to each residue

result_clustering()[source]

Runs hierarchical clustering for each frame and saves result

state_validate()[source]

Validate state assignments of HMM model by checking means of the model of each residue.

static weight_matrix(vor, pbc_idx, coor_xy)[source]

Calculate the weight factors between neighbored lipid pairs based on a Voronoi tessellation.

Parameters:
  • vor (Voronoi Tesselation) – Scipy’s Voronoi Diagram object

  • pbc_idx (Indices) – Unit cell indices of periodic image coordinates

  • coor_xy (numpy.ndarray) – Coordinates of upper/lower leaflet

Returns:

weight_matrix – Weight factors wij between neighbored lipid pairs. Is 0 if lipids are not directly neighbored.

Return type:

numpy.ndarray

z_score_calc()[source]

Z score calculation for Getis-Ord statistics

class domhmm.analysis.base.LeafletAnalysisBase(universe_or_atomgroup: Universe | AtomGroup, membrane_select: str = 'all', gmm_kwargs: None | dict = None, hmm_kwargs: None | dict = None, leaflet_kwargs: Dict[str, Any] = {}, leaflet_select: None | AtomGroup | str | list = None, heads: Dict[str, Any] = {}, tails: Dict[str, Any] = {}, sterol_heads: Dict[str, Any] = {}, sterol_tails: Dict[str, Any] = {}, tmd_protein_list: None | AtomGroup | str | list = None, frac: float = 0.5, p_value: float = 0.05, lipid_leaflet_rate: None | int = None, sterol_leaflet_rate: int = 1, asymmetric_membrane: bool = False, verbose: bool = False, result_plots: bool = False, trained_hmms: Dict[str, Any] = {}, n_init_hmm: int = 2, save_plots: bool = False, do_clustering: bool = True, **kwargs)[source]

Bases: AnalysisBase

LeafletAnalysisBase class.

This class marks the start point for each analysis method.

It connects to the MDAnalysis core library, setups the MDAnalysis universe, and provides coordinates for each membrane leaflet.

Parameters:
  • universe_or_atomgroup (Universe or AtomGroup) – Universe or group of atoms to apply this analysis to. If a trajectory is associated with the atoms, then the computation iterates over the trajectory.

  • membrane_select (str) – Membrane selection query for analysis of simulation trajectories.

  • gmm_kwargs (Optional[Dict]) – Optional parameter for Gaussian mixture model function.

  • hmm_kwargs (Optional[Dict]) – Optional parameter for Hidden Markov model function.

  • leaflet_kwargs (Optional[dict]) – dictionary containing additional arguments for the MDAnalysis LeafletFinder

  • leaflet_select (Union[“auto”, List[AtomGroup], List[str]]) –

    Leaflet selection options for lipids which can be automatic by finding Leafletfinder, atomgroup, string query or

    list

  • heads (Dict[str, Any]) – dictionary containing residue name and atom selection for lipid head groups

  • tails (Dict[str, Any]) – dictionary containing residue name and atom selection for lipid tail groups

  • sterol_heads (Dict[str, Any]) – dictionary containing residue name and atom selection for sterol head groups

  • sterol_tails (Dict[str, Any]) – dictionary containing residue name and atom selection for sterol tail groups (head as first, tail beginning as second)

  • tmd_protein_list (Union[“auto”, List[AtomGroup], List[str]]) – Transmembrane domain protein list to include area per lipid calculation

  • frac (float) – fraction of box length in x and y outside the unit cell considered for Voronoi calculation

  • p_value (float) – p_value for z_score calculation

  • lipid_leaflet_rate (Union[None, int]) – Frame rate for checking lipids leaflet assignments via LeafletFinder

  • sterol_leaflet_rate (int) – Frame rate for checking sterols leaflet assignments via LeafletFinder

  • asymmetric_membrane (bool) – Asymmetric membrane option to train models by separated data w.r.t. leaflets

  • verbose (bool) – Debug option to print step progress, warnings and errors

  • result_plots (bool) – Plotting intermediate result option

  • save_plots (bool) – Option for saving intermediate plots in pdf format

  • trained_hmms (dict) – User-specific HMM (e.g., pre-trained on another simulation)

  • do_clustering (bool) – Perform the hierarchical clustering for each frame

  • n_init_hmm (int) – Number of repeats for HMM model trainings

Variables:
  • universe_or_atomgroup (Universe) – The universe to which this analysis is applied

  • atomgroup (AtomGroup) – The atoms to which this analysis is applied

  • results (Results) – results of calculation are stored here, after calling run()

  • start (Optional[int]) – The first frame of the trajectory used to compute the analysis

  • stop (Optional[int]) – The frame to stop at for the analysis

  • step (Optional[int]) – Number of frames to skip between each analyzed frame

  • n_frames (int) – Number of frames analysed in the trajectory

  • times (numpy.ndarray) – array of Timestep times. Only exists after calling run()

  • frames (numpy.ndarray) – array of Timestep frame indices. Only exists after calling run()

get_heads()[source]

Make an atomgroup containing all head groups (lipids + sterols).

Returns:

all_heads – AtomGroup containing headgroups of all lipids and sterols

Return type:

MDAnalysis.AtomGroup

get_leaflets()[source]

Automatically assign non-sterol compounds to the upper and lower leaflet of a bilayer using the MDAnalysis.LeafletFinder

get_leaflets_sterol()[source]

Assign sterol compounds to the upper and lower leaflet of a bilayer using a distance cut-off.

get_lipid_tails()[source]

Make atomgroups for tailgroups for each chain of residues

Variables:

self.resid_tails_selection (dict) – dictionary containing the tail atomgroups for all residues each tail

get_resids()[source]

Retrieve the residue indices for each residue and store it in a new dictionary.

Returns:

residue_ids – dictionary for each residue containing residue ids of it.

Return type:

dict

get_sterol_tails()[source]

Make atomgroups for sterols

Variables:

sterols_tail (dict) – dictionary containing the tail atomgroups for each sterol