papylio.analysis.dwell_time_analysis

papylio.analysis.dwell_time_analysis#

Dwell-time distribution analysis and fitting utilities.

Provides classes and functions to model distributions of dwell times (e.g., mixtures of exponentials), perform maximum likelihood estimation, compute BIC/AIC, and visualize fits and residuals for single-molecule dwell-time analysis.

Functions

analyze_dwells(dwells[, method, ...])

Analyze dwell times for different states using a fitting method.

auto_bin_size_for_discrete_data(dwell_times)

Calculate the optimal bin edges for a histogram of discrete data using the Freedman-Diaconis rule.

empirical_cdf(dwell_times, sampling_interval)

Compute the empirical cumulative distribution function (ECDF) of dwell times.

fit_dwell_times(dwell_times[, method, ...])

Fit dwell times using a specified fitting method.

plot_dwell_analysis(dwell_analysis, dwells)

Plot the dwell time analysis results for multiple states.

plot_dwell_analysis_state(dwell_analysis, ...)

Plot the results of a dwell time analysis, including the model fit and empirical data.

plot_dwell_time_histogram(dwell_times[, ...])

Plot a histogram of dwell times with automatic bin sizing or specified bin edges.

plot_empirical_cdf(dwell_times, ...[, ax])

Plot the empirical cumulative distribution function (ECDF) of dwell times.

Classes

ExponentialDistribution(number_of_exponentials)

A class representing a mixture of exponential distributions.

class papylio.analysis.dwell_time_analysis.ExponentialDistribution(number_of_exponentials, P_bounds=(- 1, 1), k_bounds=(1e-09, inf), truncation=None, sampling_interval=None)#

A class representing a mixture of exponential distributions.

BIC(dwell_times, optimal_parameters)#

Compute the Bayesian Information Criterion (BIC) for the given dwell times and optimal parameters.

Parameters:
  • dwell_times (array-like) – The dwell times data used to calculate the BIC.

  • optimal_parameters (array-like) – The optimal parameters obtained from fitting the model.

Returns:

BIC (float) – The Bayesian Information Criterion value.

BIC_histogram(bin_centers, counts, optimal_parameters)#

Compute the Bayesian Information Criterion (BIC) for the binned data using the optimal parameters.

Parameters:
  • bin_centers (array-like) – The centers of the histogram bins.

  • counts (array-like) – The counts corresponding to each histogram bin.

  • optimal_parameters (array-like) – The optimal parameters obtained from fitting the model.

Returns:

BIC (float) – The Bayesian Information Criterion value for the binned data.

P_and_k_from_parameters(parameters)#

Extracts probabilities and rate constants from parameter list.

Parameters:

parameters (list or array) – Input parameter values.

Returns:

tuple – P (array): Probability values. k (array): Rate constants.

cdf(t, *parameters)#

Computes the cumulative distribution function (CDF), applying truncation if needed.

Parameters:
  • t (array-like) – Time values.

  • parameters (list) – Model parameters.

Returns:

array – Computed CDF values.

cdf_fit(dwell_times, free_truncation_min=True, **kwargs)#

Fit an exponential distribution to the empirical cumulative distribution function (CDF).

The method uses curve fitting to match the observed CDF of the dwell times to the theoretical CDF of an exponential distribution.

Parameters:
  • dwell_times (array-like) – The dwell times used for model fitting.

  • free_truncation_min (bool, optional) – If True, an additional truncation parameter is included in the fitting process (default: True).

  • **kwargs (dict) – Additional arguments passed to scipy.optimize.curve_fit.

Returns:

xarray.Dataset – A dataset containing the optimal parameters, parameter uncertainties, Bayesian Information Criterion (BIC), and fitting metadata.

cdf_untruncated(t, *parameters)#

Computes the cumulative distribution function (CDF) without truncation.

Parameters:
  • t (array-like) – Time values.

  • parameters (list) – Model parameters.

Returns:

array – Computed CDF values.

dataset_to_parameters(dataset)#

Extract the model parameters from an xarray dataset.

Parameters:

dataset (xarray.Dataset) – The dataset containing the model parameters (P and k values).

Returns:

parameters (list) – The extracted model parameters, excluding the last parameter (P value).

hist_fit(*args, **kwargs)#

Fit an exponential distribution to binned dwell-time data.

This method acts as an alias for histogram_fit, forwarding all arguments to it.

Parameters:
  • *args (tuple) – Positional arguments passed to histogram_fit.

  • **kwargs (dict) – Keyword arguments passed to histogram_fit.

Returns:

xarray.Dataset – A dataset containing the fitted parameters and metadata.

histogram_fit(dwell_times, bins='auto_discrete', free_truncation_min=True, remove_first_bins=0, **kwargs)#

Fit an exponential distribution to a histogram of dwell-time data.

The method constructs a histogram of the dwell times and fits a probability density function to the binned data.

Parameters:
  • dwell_times (array-like) – Observed dwell times used for model fitting.

  • bins (int, str, or array-like, optional) – Method for binning the histogram. If ‘auto_discrete’, an automatic binning strategy is applied (default: ‘auto_discrete’).

  • free_truncation_min (bool, optional) – If True, an additional truncation parameter is included in the fitting process (default: True).

  • remove_first_bins (int, optional) – Number of initial bins to exclude from the fitting (default: 0).

  • **kwargs (dict) – Additional arguments passed to scipy.optimize.curve_fit.

Returns:

xarray.Dataset – A dataset containing the estimated parameters, their uncertainties, Bayesian Information Criterion (BIC), and metadata about the fitting procedure.

likelihood(parameters, t)#

Computes the likelihood function for given parameters.

Parameters:
  • parameters (list) – Model parameters.

  • t (array-like) – Time values.

Returns:

float – Computed likelihood value.

loglikelihood(parameters, t)#

Computes the log-likelihood function.

Parameters:
  • parameters (list) – Model parameters.

  • t (array-like) – Time values.

Returns:

float – Computed log-likelihood value.

loglikelihood_binned(parameters, bin_centers, counts)#

Computes the log-likelihood for binned data.

Parameters:
  • parameters (array-like) – Model parameters used for computing the likelihood.

  • bin_centers (array-like) – Centers of histogram bins.

  • counts (array-like) – Observed counts in each bin.

Returns:

float – The log-likelihood value.

maximum_likelihood_estimation(dwell_times, scipy_optimize_method='minimize', free_truncation_min=False, **kwargs)#

Estimate the best-fit parameters using maximum likelihood estimation (MLE).

This method applies numerical optimization to maximize the likelihood function for the given dwell-time data.

Parameters:
  • dwell_times (array-like) – Observed dwell times to be used for model fitting.

  • scipy_optimize_method (str, optional) – The optimization method from scipy.optimize (default: ‘minimize’).

  • free_truncation_min (bool, optional) – If True, an additional truncation parameter is included in the optimization (default: False).

  • **kwargs (dict) – Additional arguments passed to the optimization function.

Returns:

xarray.Dataset – A dataset containing the estimated parameters, Bayesian Information Criterion (BIC), and optimization metadata.

mle(*args, **kwargs)#

Perform maximum likelihood estimation (MLE) for fitting an exponential distribution.

This method acts as an alias for maximum_likelihood_estimation, forwarding all arguments to it.

Parameters:
  • *args (tuple) – Positional arguments passed to maximum_likelihood_estimation.

  • **kwargs (dict) – Keyword arguments passed to maximum_likelihood_estimation.

Returns:

xarray.Dataset – A dataset containing the fitted parameters and metadata.

negative_loglikelihood(parameters, t)#

Computes the negative log-likelihood for given parameters and time data.

Parameters:
  • parameters (array-like) – Model parameters used for computing the log-likelihood.

  • t (array-like) – The time data.

Returns:

float – The negative log-likelihood value.

negative_loglikelihood_binned(parameters, bin_centers, counts)#

Computes the negative log-likelihood for binned data.

Parameters:
  • parameters (array-like) – Model parameters used for computing the likelihood.

  • bin_centers (array-like) – Centers of histogram bins.

  • counts (array-like) – Observed counts in each bin.

Returns:

float – The negative log-likelihood value.

parameter_guess(dwell_times)#

Generate an initial guess for the parameters based on the dwell times.

Parameters:

dwell_times (array-like) – The dwell times data to be used for generating the parameter guess.

Returns:

parameters (list) – A list of initial guesses for the parameters.

property parameter_names#

Returns the parameter names excluding the last probability term.

Returns:

list – List of parameter names.

property parameter_names_full#

Returns all parameter names including the last probability term.

Returns:

list – List of all parameter names.

parameters_full(parameters)#

Modify the parameters by ensuring that the sum of the first number_of_exponentials - 1 parameters is less than or equal to 1 and adjusting the last parameter accordingly.

Parameters:

parameters (list) – A list of parameters where the first number_of_exponentials - 1 are P values.

Returns:

parameters (list) – The modified list of parameters with the adjusted last parameter.

parameters_to_dataset(parameters, parameter_errors=None, BIC=None)#

Convert model parameters and associated information into an xarray dataset.

Parameters:
  • parameters (list) – The model parameters (P and k values) to be included in the dataset.

  • parameter_errors (list, optional) – The errors for the model parameters, used if available to include in the dataset.

  • BIC (float, optional) – The Bayesian Information Criterion value, used if available to include in the dataset.

Returns:

dwell_analysis (xarray.Dataset) – The dataset containing the parameters, errors, BIC value, and other associated metadata.

pdf(t, *parameters)#

Computes the probability density function (PDF).

Parameters:
  • t (array-like) – Time values.

  • parameters (list) – Model parameters.

Returns:

array – Computed PDF values.

pdf_binned(t, *parameters)#

Computes the probability density function (PDF) for binned data.

Parameters:
  • t (array-like) – The time points at which to evaluate the binned PDF.

  • *parameters (tuple) – Model parameters used for computing the PDF.

Returns:

array-like – The computed binned PDF values.

papylio.analysis.dwell_time_analysis.analyze_dwells(dwells, method='maximum_likelihood_estimation', number_of_exponentials=[1, 2, 3], state_names=None, P_bounds=(- 1, 1), k_bounds=(1e-09, inf), sampling_interval=None, truncation=None, fit_dwell_times_kwargs={})#

Analyze dwell times for different states using a fitting method.

This function fits the dwell times for each state, allowing for the number of exponentials to be different for each state. The fitting is done using a specified method, such as maximum likelihood estimation. The resulting fit parameters are returned in an xarray dataset.

dwellsxarray.DataArray

An xarray DataArray containing the dwell times and states to be analyzed.

methodstr, optional

The method used for fitting the dwell times (default is ‘maximum_likelihood_estimation’).

number_of_exponentialslist of int or dict, optional

A list of integers or a dictionary specifying the number of exponentials to fit for each state. If a dictionary is provided, it maps each state to a specific number of exponentials (default is [1, 2, 3]).

state_namesdict, optional

A dictionary mapping state indices to state names (default is None).

P_boundstuple, optional

The bounds for the parameter P (default is (-1, 1)).

k_boundstuple, optional

The bounds for the parameter k (default is (1e-9, np.inf)).

sampling_intervalfloat, optional

The sampling interval for the dwell times (default is None, which uses the minimum dwell time).

truncationtuple, optional

A tuple specifying the truncation limits (default is None, which applies no truncation).

fit_dwell_times_kwargsdict, optional

Additional keyword arguments to be passed to the fit_dwell_times function.

xarray.Dataset

An xarray dataset containing the fitted parameters for each state, including the number of components, P, k, and BIC values.

papylio.analysis.dwell_time_analysis.auto_bin_size_for_discrete_data(dwell_times, sampling_interval=None)#

Calculate the optimal bin edges for a histogram of discrete data using the Freedman-Diaconis rule.

Parameters:

dwell_times (array-like) – The input dwell times, which are sorted and used to calculate the optimal bin edges.

Returns:

bin_edges (numpy.ndarray) – The calculated bin edges for the histogram.

papylio.analysis.dwell_time_analysis.empirical_cdf(dwell_times, sampling_interval)#

Compute the empirical cumulative distribution function (ECDF) of dwell times.

Parameters:
  • dwell_times (array-like) – The input dwell times for which the ECDF is calculated.

  • sampling_interval (float) – The sampling interval used to adjust the ECDF calculation.

Returns:

  • t (numpy.ndarray) – The time points at which the ECDF is evaluated.

  • empirical_cdf (numpy.ndarray) – The values of the empirical cumulative distribution function.

papylio.analysis.dwell_time_analysis.fit_dwell_times(dwell_times, method='maximum_likelihood_estimation', number_of_exponentials=[1, 2], P_bounds=(- 1, 1), k_bounds=(0, inf), sampling_interval=None, truncation=None, fit_dwell_times_kwargs={})#

Fit dwell times using a specified fitting method.

Parameters:
  • dwell_times (array-like) – The input dwell times to be fitted.

  • method (str, optional) – The fitting method to use (e.g., ‘maximum_likelihood_estimation’).

  • number_of_exponentials (list of int, optional) – The number of exponentials to fit.

  • P_bounds (tuple, optional) – The bounds for the P parameters.

  • k_bounds (tuple, optional) – The bounds for the k parameters.

  • sampling_interval (float, optional) – The sampling interval used for analysis.

  • truncation (tuple, optional) – The truncation limits for the fitting.

  • fit_dwell_times_kwargs (dict, optional) – Additional arguments to pass to the fitting method.

Returns:

dwell_analysis (xarray.Dataset) – The dataset containing the fitted parameters and analysis results.

papylio.analysis.dwell_time_analysis.plot_dwell_analysis(dwell_analysis, dwells, plot_type='pdf_binned', plot_range=None, axes=None, bins='auto_discrete', log=False, sharey=True, name=None, save_path=None)#

Plot the dwell time analysis results for multiple states.

This function generates plots for the dwell time distributions of each state, showing either the probability density function (PDF) or cumulative distribution function (CDF). The results are plotted in a single figure, with options for customization, including binning, log scaling, and saving the figure.

dwell_analysisxarray.Dataset

An xarray dataset containing the dwell time analysis results, with states as one of the dimensions.

dwellsxarray.DataArray

An xarray DataArray containing the dwell times and states to be plotted.

plot_typestr or list of str, optional

The type of plot to generate for each state. Options are ‘pdf_binned’ or ‘cdf’. Can be a list if different types are needed for different states (default is ‘pdf_binned’).

plot_rangetuple or list of tuples, optional

The range of the plot for each state. If None, the range is set automatically (default is None).

axesmatplotlib.Axes or array-like, optional

The axes on which to plot the results. If None, new axes will be created (default is None).

binsstr or int, optional

The binning strategy for the histogram. Options are ‘auto_discrete’ or an integer specifying the number of bins (default is ‘auto_discrete’).

logbool, optional

Whether to use a logarithmic scale for the y-axis (default is False).

shareybool, optional

Whether to share the y-axis across all subplots (default is True).

namestr, optional

The base name for the saved figure (default is None).

save_pathpathlib.Path, optional

The directory path where the figure will be saved (default is None, meaning the figure will not be saved).

matplotlib.figure.Figure

The figure containing the plots.

matplotlib.Axes

The axes containing the individual plots for each state.

papylio.analysis.dwell_time_analysis.plot_dwell_analysis_state(dwell_analysis, dwell_times, plot_type='pdf_binned', plot_range=None, bins='auto_discrete', log=False, ax=None)#

Plot the results of a dwell time analysis, including the model fit and empirical data.

Parameters:
  • dwell_analysis (xarray.Dataset) – The dataset containing the dwell time analysis results, including fitted parameters and BIC values.

  • dwell_times (array-like) – The input dwell times used for the analysis.

  • plot_type (str, optional) – The type of plot to generate, e.g., ‘pdf_binned’, ‘cdf’.

  • plot_range (tuple, optional) – The range of values to plot.

  • bins (str or int or sequence, optional) – The binning method for the histogram plot.

  • log (bool, optional) – If True, the y-axis is plotted on a logarithmic scale.

  • ax (matplotlib.axes.Axes, optional) – The axes on which to plot the results. If None, a new figure and axes will be created.

Returns:

  • figure (matplotlib.figure.Figure) – The figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The axes object containing the plot.

papylio.analysis.dwell_time_analysis.plot_dwell_time_histogram(dwell_times, bins='auto_discrete', range=None, sampling_interval=None, ax=None, **hist_kwargs)#

Plot a histogram of dwell times with automatic bin sizing or specified bin edges.

Parameters:
  • dwell_times (array-like) – The input dwell times to be used for plotting the histogram.

  • bins (str or int or sequence, optional) – The binning method, either ‘auto_discrete’ for automatic bin sizing or a specific bin configuration.

  • range (tuple, optional) – The range of values to be used for the histogram.

  • ax (matplotlib.axes.Axes, optional) – The axes on which to plot the histogram. If None, a new figure and axes will be created.

  • **hist_kwargs (keyword arguments) – Additional arguments to pass to matplotlib.pyplot.hist.

Returns:

  • counts (numpy.ndarray) – The counts for each bin in the histogram.

  • bin_centers (numpy.ndarray) – The center positions of each bin.

papylio.analysis.dwell_time_analysis.plot_empirical_cdf(dwell_times, sampling_interval, ax=None, **plot_kwargs)#

Plot the empirical cumulative distribution function (ECDF) of dwell times.

Parameters:
  • dwell_times (array-like) – The input dwell times to be used for the ECDF plot.

  • sampling_interval (float) – The sampling interval used to adjust the ECDF calculation.

  • ax (matplotlib.axes.Axes, optional) – The axes on which to plot the ECDF. If None, a new figure and axes will be created.

  • **plot_kwargs (keyword arguments) – Additional arguments to pass to matplotlib.pyplot.plot.

Returns:

None