papylio.File

papylio.File#

class papylio.File(*args, **kwargs)#
__init__(*args, **kwargs)#

Initialize a File object.

Parameters:

relativeFilePath (str or Path): Path to the file relative to the experiment root. extensions (set, optional): Set of file extensions associated with this file. experiment (Experiment, optional): The experiment object this file belongs to. perform_logging (bool, optional): Whether to log activities for this file. Default is True.

Methods

__init__(*args, **kwargs)

Initialize a File object.

add_extensions(extensions[, load])

Add extensions to this file and optionally load the data.

analyze_dwells([method, ...])

Analyze dwell times using MLE or other methods.

apply_classifications([add_to_current])

Apply the specified classifications to the dataset.

apply_selections(*selection_names[, ...])

Apply the specified selections to the dataset.

apply_trace_corrections([...])

Apply corrections (background, alpha, gamma) to existing intensity traces.

autoThreshold(trace_name[, threshold, ...])

Automatically set thresholds for traces.

average_image()

Return the average projection image.

calculate_FRET()

Calculate and save FRET values for the intensity traces.

classification_binary([...])

Get a binary representation of the classification.

classification_configurations([...])

Get the configurations for the specified classifications.

classify_hmm(variable[, seed, n_states, ...])

Create an HMM-based classification.

clear_classifications()

Clear all classifications and reset the active classification state.

clear_selections()

Clear all selections and reset the active selection state.

coordinates_from_channel(channel)

Get coordinates for a specific channel.

copy_coordinates_to_selected_files()

Copy the current coordinates to all selected files in the experiment.

copy_selections_to_selected_files()

Copy current selections and active selection state to all selected files in the experiment.

create_classification(classification_type, ...)

Create a new classification.

create_selection(variable, channel, ...[, name])

Create a new selection based on a threshold applied to a variable.

determine_dwells_from_classification([...])

Extract dwell times from the current classification.

determine_psf_size([method, ...])

Determine the Point Spread Function (PSF) size by fitting Gaussians to detected peaks.

determine_sequences_at_current_coordinates([...])

determine_trace_correction()

Open a GUI window to determine trace correction parameters.

export_coeff_file()

Export the mapping to a coefficient file (deprecated, use export_mapping instead).

export_map_file()

Export the mapping to a map file (deprecated, use export_mapping instead).

export_mapping([filetype])

Export the current coordinate mapping.

export_pks_file()

Export current molecule coordinates and background to a .pks file.

export_sequencing_match()

export_traces_file()

Export current intensity traces to a .traces file.

extract_background([configuration])

Extract background intensity for each molecule.

extract_traces([mask_size, ...])

Extract intensity traces for each molecule from the movie.

find_and_add_extensions()

Find associated file extensions and add them to this File object.

find_coordinates(**configuration)

Find and set the locations of all molecules within the movie's images.

find_extensions()

Scan the experiment directory for files matching this file's name and return their extensions.

find_sequences_using_stage_coordinates([...])

generate_sequencing_match([...])

get_data(key)

Retrieve data from the netCDF dataset.

get_dataset_attribute(attribute_name)

get_projection_image([load])

Get or generate a projection image.

get_sequencing_data([margin, mapping_name])

get_traces([selected])

Get all data variables that have a 'frame' dimension (traces).

get_variable(variable[, selected, ...])

Get a variable.

histogram_2D_FRET_intensity_total([...])

Generates a 2D histogram plot of FRET vs.

histogram_2D_intensity_per_channel([...])

Generates a 2D histogram plot of intensity between two specified channels, with optional marginal histograms.

import_coeff_file(extension)

Import a coefficient file for linear coordinate mapping.

import_excel_file([filename])

Import data from an Excel file.

import_map_file(extension)

Import a map file for nonlinear coordinate mapping.

import_mapping_file(extension)

Import a mapping file using MatchPoint.

import_movie(extension)

Import movie data associated with the given extension.

import_pks_file(extension)

Import molecule coordinates and background from a .pks file.

import_sequencing_match()

import_traces_file(extension)

Import intensity traces from a .traces file.

insert_sequencing_data_into_file_dataset([...])

maximum_projection_image()

Return the maximum projection image.

noneFunction(*args, **kwargs)

A placeholder function that does nothing.

perform_mapping(**configuration)

Perform coordinate mapping between channels.

plot_dwell_analysis([name, plot_type, ...])

Plot the results of dwell time analysis.

plot_hmm_rates([name])

Plot histograms of HMM transition rates for molecules with 2 states.

plot_sequencing_match()

projection_image()

Return the default projection image.

save_dataset_selected()

Save the dataset containing only selected molecules to a new netCDF file.

savetoExcel([filename, save])

Save the dataset to an Excel file.

selection_configurations(*selection_names)

Get the configurations for the specified selections.

set_coordinates_of_channel(coordinates, channel)

Set coordinates for a specific channel and update other channels using mapping.

set_variable(data, **kwargs)

Save data as a variable in the netCDF dataset.

show_average_image([figure])

Show the average projection image.

show_coordinates([figure, annotate, unit])

Show detected molecule coordinates on a plot.

show_coordinates_in_image([figure])

Show projection image with overlaid molecule coordinates.

show_histogram(variable[, selected, ...])

Show a histogram of a variable.

show_image([projection_type, figure, unit])

Show a projection image of the movie.

show_mapping_in_image([axis, save])

Visualize the coordinate mapping on a projection image.

show_traces([split_illuminations])

Open a GUI window to visualize intensity traces.

state_count([selected, states])

Count the number of molecules in each state.

state_fraction(**state_count_kwargs)

Calculate the fraction of molecules in each state.

use_for_darkfield_correction()

Use the average projection of this file as a darkfield correction image for the experiment.

use_mapping_for_all_files([perform_logging])

Apply the current coordinate mapping to all files in the experiment.

use_sequences_as_molecules()

Attributes

property absoluteFilePath#
add_extensions(extensions, load=True)#

Add extensions to this file and optionally load the data.

Parameters:

extensions (str or list): One or more extensions to add. load (bool, optional): Whether to import the data associated with the extensions. Default is True.

analyze_dwells(method='maximum_likelihood_estimation', number_of_exponentials=[1, 2], state_names=None, truncation=None, P_bounds=(- 1, 1), k_bounds=(1e-09, inf), plot=False, fit_dwell_times_kwargs={}, plot_dwell_analysis_kwargs={}, save_file_path=None)#

Analyze dwell times using MLE or other methods.

Parameters:

method (str, optional): The analysis method. Default is ‘maximum_likelihood_estimation’. number_of_exponentials (list, optional): Number of exponentials to fit. Default is [1, 2]. state_names (list, optional): Names of the states. truncation (float, optional): Truncation time for fitting. P_bounds (tuple, optional): Bounds for the amplitudes. k_bounds (tuple, optional): Bounds for the rates. plot (bool, optional): Whether to plot the results. Default is False. fit_dwell_times_kwargs (dict, optional): Additional arguments for dwell time fitting. plot_dwell_analysis_kwargs (dict, optional): Additional arguments for plotting. save_file_path (str, optional): Path to save the analysis results.

Returns:

xarray.Dataset: The dwell analysis results.

apply_classifications(add_to_current=False, **classification_assignment)#

Apply the specified classifications to the dataset.

Parameters:

add_to_current (bool, optional): Whether to add to the currently active classification. Default is False. **classification_assignment: Mapping of classification names to state indices.

apply_selections(*selection_names, add_to_current=False)#

Apply the specified selections to the dataset.

Parameters:

*selection_names: The names of the selections to apply. add_to_current (bool, optional): Whether to add to the currently active selections. Default is False.

apply_trace_corrections(background_correction=None, alpha_correction=None, gamma_correction=None)#

Apply corrections (background, alpha, gamma) to existing intensity traces.

Parameters:

background_correction (optional): Background correction method. alpha_correction (optional): Alpha correction factor. gamma_correction (optional): Gamma correction factor.

autoThreshold(trace_name, threshold=100, max_steps=20, only_selected=False, kon_str='000000000')#

Automatically set thresholds for traces.

Parameters:

trace_name (str): The name of the trace variable to threshold. threshold (float, optional): Initial threshold value. Default is 100. max_steps (int, optional): Maximum number of steps for the auto-thresholding. Default is 20. only_selected (bool, optional): Whether to only process selected molecules. Default is False. kon_str (str, optional): A string representing ‘kon’ boolean states. Default is ‘000000000’.

average_image()#

Return the average projection image.

calculate_FRET()#

Calculate and save FRET values for the intensity traces.

property classification#
classification_binary(positive_states_only=False, selected=False)#

Get a binary representation of the classification.

Parameters:

positive_states_only (bool, optional): Whether to include only positive states. Default is False. selected (bool, optional): Whether to use only selected molecules. Default is False.

Returns:

xarray.DataArray: The binary classification.

classification_configurations(classification_names='all')#

Get the configurations for the specified classifications.

Parameters:
classification_names (str or list, optional): The names of the classifications to get configurations for.

Default is ‘all’.

Returns:

dict: A dictionary of classification names and their configurations.

property classification_names#
property classifications#
classify_hmm(variable, seed=0, n_states=2, threshold_state_mean=None, level='molecule')#

Create an HMM-based classification.

Parameters:

variable (str): The name of the variable to classify. seed (int, optional): Random seed for HMM. Default is 0. n_states (int, optional): Number of states for HMM. Default is 2. threshold_state_mean (float, optional): Threshold for state mean. level (str, optional): The level at which to perform HMM (‘molecule’ or ‘frame’). Default is ‘molecule’.

clear_classifications()#

Clear all classifications and reset the active classification state.

clear_selections()#

Clear all selections and reset the active selection state.

property configuration#
property coordinates#
coordinates_from_channel(channel)#

Get coordinates for a specific channel.

Parameters:

channel (int or str): The channel index or name (‘d’, ‘a’, ‘g’, ‘r’).

Returns:

xarray.DataArray: The coordinates for the specified channel.

property coordinates_metric#
property coordinates_stage#
copy_coordinates_to_selected_files()#

Copy the current coordinates to all selected files in the experiment.

copy_selections_to_selected_files()#

Copy current selections and active selection state to all selected files in the experiment.

create_classification(classification_type: Literal['threshold', 'hmm'], variable, select=None, name=None, classification_kwargs=None, apply=None)#

Create a new classification.

Parameters:

classification_type (str): The type of classification (‘threshold’ or ‘hmm’). variable (str): The name of the variable to classify. select (optional): A selection to apply before classification. name (str, optional): The name of the classification. classification_kwargs (dict, optional): Additional arguments for the classification method. apply (bool, optional): Whether to apply the classification immediately.

create_selection(variable, channel, aggregator, operator, threshold, name=None)#

Create a new selection based on a threshold applied to a variable.

Parameters:

variable (str): The name of the variable to use. channel (int or str): The channel to use. aggregator (str): The aggregator to apply over frames (e.g., ‘mean’, ‘max’). operator (str): The operator for thresholding (‘<’ or ‘>’). threshold (float): The threshold value. name (str, optional): The name of the selection.

property cycle_time#
property data_vars#
property dataset#
property dataset_attributes#
property dataset_selected#
determine_dwells_from_classification(variable='FRET', selected=False, inactivate_start_and_end_states=True)#

Extract dwell times from the current classification.

Parameters:

variable (str, optional): The trace variable to use for dwell time extraction. Default is ‘FRET’. selected (bool, optional): Whether to use only selected molecules. Default is False. inactivate_start_and_end_states (bool, optional): Whether to ignore the first and last dwells. Default is True.

determine_psf_size(method='gaussian_fit', projection_type='average', frame_range=(0, 20), channel_index=0, illumination_index=0, peak_finding_kwargs={'minimum_intensity_difference': 150}, maximum_radius=5)#

Determine the Point Spread Function (PSF) size by fitting Gaussians to detected peaks.

Parameters:

method (str, optional): Method to determine PSF size (‘gaussian_fit’ or ‘median’). Default is ‘gaussian_fit’. projection_type (str, optional): Type of image projection to use. Default is ‘average’. frame_range (tuple, optional): Range of frames to use for projection. Default is (0, 20). channel_index (int, optional): Index of the channel to use. Default is 0. illumination_index (int, optional): Index of the illumination to use. Default is 0. peak_finding_kwargs (dict, optional): Arguments for peak finding. maximum_radius (int, optional): Maximum radius for PSF size. Default is 5.

Returns:

float: The determined PSF size.

determine_sequences_at_current_coordinates(visible_sequence_names='', distance_threshold=None)#
determine_trace_correction()#

Open a GUI window to determine trace correction parameters.

property dwell_analysis#
property dwells#
export_coeff_file()#

Export the mapping to a coefficient file (deprecated, use export_mapping instead).

export_map_file()#

Export the mapping to a map file (deprecated, use export_mapping instead).

export_mapping(filetype='yml')#

Export the current coordinate mapping.

Parameters:

filetype (str, optional): The format to save the mapping in (e.g., ‘yml’, ‘classic’). Default is ‘yml’.

export_pks_file()#

Export current molecule coordinates and background to a .pks file.

export_sequencing_match()#
export_traces_file()#

Export current intensity traces to a .traces file.

extract_background(configuration=None)#

Extract background intensity for each molecule.

Parameters:

configuration (dict, optional): Configuration overrides for background extraction.

extract_traces(mask_size=None, neighbourhood_size=None, background_correction=None, alpha_correction=None, gamma_correction=None)#

Extract intensity traces for each molecule from the movie.

Parameters:

mask_size (int, optional): Size of the mask for intensity extraction. neighbourhood_size (int, optional): Size of the neighbourhood for background extraction. background_correction (optional): Background correction method. alpha_correction (optional): Alpha correction factor. gamma_correction (optional): Gamma correction factor.

find_and_add_extensions()#

Find associated file extensions and add them to this File object.

find_coordinates(**configuration)#

Find and set the locations of all molecules within the movie’s images.

This function performs peak finding on projection images, handles multiple channels, and manages coordinate sets across different frames if sliding windows are used.

For configuration options see the “find_coordinates” section in the default configuration file.

Parameters:

**configuration: Configuration overrides for coordinate finding.

find_extensions()#

Scan the experiment directory for files matching this file’s name and return their extensions.

Returns:

list: A list of found file extensions.

find_sequences_using_stage_coordinates(channel=0, show=False, save=True)#
property frame_rate#
generate_sequencing_match(overlapping_points_threshold=25, excluded_sequence_names=None, plot=False)#
get_data(key)#

Retrieve data from the netCDF dataset.

Parameters:

key (str): The name of the data variable to retrieve.

Returns:

xarray.DataArray: The retrieved data.

get_dataset_attribute(attribute_name)#
get_projection_image(load=True, **kwargs)#

Get or generate a projection image.

Parameters:

load (bool, optional): Whether to try loading an existing image from disk. Default is True. **kwargs: Additional configuration parameters for image projection.

Returns:

numpy.ndarray: The projection image.

get_sequencing_data(margin=1, mapping_name='All files')#
get_traces(selected=False)#

Get all data variables that have a ‘frame’ dimension (traces).

Parameters:

selected (bool, optional): Whether to return traces only for selected molecules. Default is False.

Returns:

xarray.Dataset: The traces dataset.

get_variable(variable, selected=False, frame_range=None, average=False, return_none_if_nonexistent=False)#

Get a variable.

Parameters:

variable (str): The name of the variable to retrieve. selected (bool, optional): Whether to return only selected molecules. Default is False. frame_range (tuple, optional): In case the returned variable has dimension ‘frame’, frame_range can be used

to select the desired frames. Default is None.

average (bool or str, optional): Whether to calculate the average of the variable over a specific dimension.

If a string is provided, it represents the dimension to average over. Default is False.

return_none_if_nonexistent (bool, optional): Whether to return None if the variable does not exist in the object.

Default is False.

Returns:

xarray.DataArray: The requested variable.

property has_sequencing_match#
histogram_2D_FRET_intensity_total(selected=False, frame_range=None, average=False, **marginal_hist2d_kwargs)#

Generates a 2D histogram plot of FRET vs. total intensity with optional marginal histograms.

This function retrieves the ‘FRET’ and ‘intensity_total’ variables from the File object, then plots their relationship in a 2D histogram, with optional marginal histograms along the axes.

selectedbool, optional (default=False)

If True, only selected molecules will be used for plotting.

frame_rangetuple of two ints, optional (default=None)

The range of frames to use. If None, all frames are used.

averagebool, optional (default=False)

If True, the function averages the data over the specified frame range.

axismatplotlib.axes.Axes, optional (default=None)

The axes object to plot on. If None, a new plot will be created.

**marginal_hist2d_kwargsdict, optional

Additional keyword arguments passed to the marginal_hist2d function for customizing the plot. Default arguments are used for the 2D histogram’s range.

axeslist of matplotlib.axes.Axes

A list of axes objects corresponding to the 2D histogram plot and optional marginal histograms.

The function utilizes the marginal_hist2d function from the papylio.plotting module to create the plot. The default range for the FRET values is (-0.05, 1.05) for the x-axis and no limit for the y-axis.

histogram_2D_intensity_per_channel(selected=False, frame_range=None, average=False, channel_x=0, channel_y=1, **marginal_hist2d_kwargs)#

Generates a 2D histogram plot of intensity between two specified channels, with optional marginal histograms.

This function retrieves intensity data for the specified channels from the File object and generates a 2D histogram to visualize the relationship between intensities in the selected channels. Marginal histograms along the axes can optionally be included for additional insight.

selectedbool, optional (default=False)

If True, only selected molecules are used for the plot.

frame_rangetuple of two ints, optional (default=None)

Specifies the range of frames to use. If None, all frames are included.

averagebool, optional (default=False)

If True, averages the intensity data over the specified frame range.

channel_xint, optional (default=0)

The index of the channel for the x-axis data.

channel_yint, optional (default=1)

The index of the channel for the y-axis data.

**marginal_hist2d_kwargsdict, optional

Additional keyword arguments passed to the marginal_hist2d function to customize the plot. Defaults include no specific range for the histogram axes.

axeslist of matplotlib.axes.Axes

A list of axes objects corresponding to the 2D histogram plot and optional marginal histograms.

  • The function uses the marginal_hist2d function from the papylio.plotting module for visualization.

import_coeff_file(extension)#

Import a coefficient file for linear coordinate mapping.

Parameters:

extension (str): The file extension (usually ‘.coeff’).

import_excel_file(filename=None)#

Import data from an Excel file.

Parameters:

filename (str or Path, optional): Path to the Excel file. If None, looks for default steps data file.

import_map_file(extension)#

Import a map file for nonlinear coordinate mapping.

Parameters:

extension (str): The file extension (usually ‘.map’).

import_mapping_file(extension)#

Import a mapping file using MatchPoint.

Parameters:

extension (str): The file extension.

import_movie(extension)#

Import movie data associated with the given extension.

Parameters:

extension (str): The file extension to import from.

import_pks_file(extension)#

Import molecule coordinates and background from a .pks file.

Parameters:

extension (str): The file extension.

import_sequencing_match()#
import_traces_file(extension)#

Import intensity traces from a .traces file.

Parameters:

extension (str): The file extension.

insert_sequencing_data_into_file_dataset(include_raw_sequences=False, include_aligned_sequences=True, include_sequence_subset=True, determine_matched_pairs=True, include_aligned_position=False)#
property intensity_total#
maximum_projection_image()#

Return the maximum projection image.

noneFunction(*args, **kwargs)#

A placeholder function that does nothing.

property number_of_channels#
property number_of_molecules#
property number_of_selected_molecules#
property number_of_states_from_classification#
perform_mapping(**configuration)#

Perform coordinate mapping between channels.

Parameters:

**configuration: Configuration overrides for mapping.

plot_dwell_analysis(name=None, plot_type='pdf', plot_range=None, axes=None, bins='auto_discrete', log=False, sharey=False, save_path=None)#

Plot the results of dwell time analysis.

Parameters:

name (str, optional): Name for the plot title. plot_type (str, optional): Type of plot (‘pdf’, ‘cdf’, etc.). Default is ‘pdf’. plot_range (tuple, optional): Range for the x-axis. axes (optional): Matplotlib axes to plot on. bins (optional): Binning strategy. Default is ‘auto_discrete’. log (bool, optional): Whether to use a log scale for the x-axis. Default is False. sharey (bool, optional): Whether to share the y-axis across plots. Default is False. save_path (str or Path, optional): Directory to save the plot.

Returns:

tuple: (figure, axes)

plot_hmm_rates(name=None)#

Plot histograms of HMM transition rates for molecules with 2 states.

Parameters:

name (str, optional): Name to use for the plot title and filename.

plot_sequencing_match()#
projection_image()#

Return the default projection image.

property relativeFilePath#
property sampling_interval#
save_dataset_selected()#

Save the dataset containing only selected molecules to a new netCDF file.

savetoExcel(filename=None, save=True)#

Save the dataset to an Excel file.

Parameters:

filename (str or Path, optional): The filename to save to. save (bool, optional): Whether to actually save the file. Default is True.

Returns:

pandas.DataFrame: The data exported to Excel.

property selected_molecules#
selection_configurations(*selection_names)#

Get the configurations for the specified selections.

Parameters:

*selection_names: The names of the selections to get configurations for.

Returns:

dict: A dictionary of selection names and their configurations.

property selection_names#
property selection_names_active#
property selections#
property sequencing_data#
property sequencing_match#
set_coordinates_of_channel(coordinates, channel)#

Set coordinates for a specific channel and update other channels using mapping.

Parameters:

coordinates (numpy.ndarray): The coordinates to set. channel (int or str): The channel index or name.

set_variable(data, **kwargs)#

Save data as a variable in the netCDF dataset.

Parameters:

data (numpy.ndarray or xarray.DataArray): The data to save. **kwargs: Additional arguments for xarray.DataArray.

show_average_image(figure=None, **kwargs)#

Show the average projection image.

show_coordinates(figure=None, annotate=None, unit='pixel', **kwargs)#

Show detected molecule coordinates on a plot.

Parameters:

figure (optional): Matplotlib figure to plot on. annotate (bool, optional): Whether to enable interactive annotations. unit (str, optional): Unit for coordinates (‘pixel’ or ‘metric’). Default is ‘pixel’. **kwargs: Additional arguments for scatter plot.

show_coordinates_in_image(figure=None, **kwargs)#

Show projection image with overlaid molecule coordinates.

Parameters:

figure (optional): Matplotlib figure to plot on. **kwargs: Additional arguments for show_image.

show_histogram(variable, selected=False, frame_range=None, average=False, axis=None, **hist_kwargs)#

Show a histogram of a variable.

Parameters:

variable (str): The name of the variable. selected (bool, optional): Whether to use only selected molecules. Default is False. frame_range (tuple, optional): Range of frames to include. average (bool or str, optional): Whether to average the variable. axis (optional): Matplotlib axis to plot on. **hist_kwargs: Additional arguments for the histogram plot.

Returns:

tuple: (figure, axis)

show_image(projection_type='default', figure=None, unit='pixel', **kwargs)#

Show a projection image of the movie.

Parameters:

projection_type (str, optional): The type of projection (‘average’, ‘maximum’, or ‘default’). figure (optional): Matplotlib figure to plot on. unit (str, optional): Unit for axes (‘pixel’ or ‘metric’). Default is ‘pixel’. **kwargs: Additional arguments for imshow.

Returns:

tuple: (figure, axis)

show_mapping_in_image(axis=None, save=True)#

Visualize the coordinate mapping on a projection image.

Parameters:

axis (matplotlib.axes.Axes, optional): The axis to plot on. save (bool, optional): Whether to save the plot as an image. Default is True.

show_traces(split_illuminations=True, **kwargs)#

Open a GUI window to visualize intensity traces.

Parameters:

split_illuminations (bool, optional): Whether to split traces by illumination. Default is True. **kwargs: Additional arguments for TracePlotWindow.

state_count(selected=True, states=None)#

Count the number of molecules in each state.

Parameters:

selected (bool, optional): Whether to use only selected molecules. Default is True. states (list, optional): The states to count.

Returns:

xarray.DataArray: The counts for each state.

state_fraction(**state_count_kwargs)#

Calculate the fraction of molecules in each state.

Parameters:

**state_count_kwargs: Arguments passed to state_count.

Returns:

xarray.DataArray: The fraction of molecules in each state.

property traces_names#
use_for_darkfield_correction()#

Use the average projection of this file as a darkfield correction image for the experiment.

use_mapping_for_all_files(perform_logging=True)#

Apply the current coordinate mapping to all files in the experiment.

use_sequences_as_molecules()#