sample

class mics.sample(dataset, potential, acfun=None, batchsize=None, **constants)[source]

A sample of configurations distributed according to a PDF proportional to exp(-u(x)). Each configuration x is represented by a set of collective variables from which one can evaluate the reduced potential u(x), as well as other properties of interest.

Parameters:
  • dataset (pandas.DataFrame) – A data frame whose column names are collective variables used to represent the sampled comfigurations. The rows must contain a time series of these variables, obtained by simulating the system at a state with known reduced potential.
  • potential (str) – A mathematical expression defining the reduced potential of the simulated state. This must be a function of the column names in dataset and can also depend on external parameters passed as keyword arguments (see below).
  • acfun (str, optional, default=potential) – A mathematical expression defining a property to be used for OBM autocorrelation analysis and effective sample size calculation. It must be a function of the column names in dataset and can also depend on external parameters passed as keyword arguments (see below).
  • batchsize (int, optional, default=sqrt(len(dataset))) – The size of each batch (window) to be used in the OBM analysis. If omitted, then the batch size will be the integer part of the square root of the sample size.
  • **constants (keyword arguments) – A set of keyword arguments passed as name=value, aimed to define external parameter values for the evaluation of the mathematical expressions in potential and acfun. They can also be used as labels to distinguish samples from each other, in this case not necessary being present in the mentioned expressions.
averaging(properties, combinations={}, **constants)[source]

Computes averages and uncertainties of configurational properties. In addition, computes combinations among these averages while automatically handling uncertainty propagation.

Parameters:
  • properties (dict(str: str)) – A dictionary associating names to mathematical expressions. This is used to define functions of the collective variables included in the samples. Then, averages of these functions will be evaluated at all sampled states, along with their uncertainties. The expressions might also depend on parameters passed as keyword arguments (see below).
  • combinations (dict(str: str), optional, default={}) – A dictionary associating names to mathematical expressions. This is used to define functions of the names passed as keys in the properties dictionary. The expressions might also depend on parameters passed as keyword arguments (see below).
  • **constants (optional keyword arguments) – A set of arguments passed as name=value, used to define parameter values for evaluating the mathematical expressions in both properties and combinations.
Returns:

pandas.DataFrame – A data frame containing the computed averages and combinations, as well as their estimated standard errors.

subsampling(integratedACF=True)[source]

Performs inline subsampling based on the statistical inefficiency g of the specified attribute acfun of sample, aiming at obtaining a sample of IID configurations. Subsampling is done via jumps of varying sizes around g, so that the sample size decays by a factor of approximately 1/g.

Parameters:integratedACF (bool, optional, default=True) – If true, the integrated ACF method [2] will be used for computing the statistical inefficiency. Otherwise, the OBM method will be used instead.
Returns:sample – Although the subsampling is done inline, the new sample is returned for chaining purposes.