4. Coupling datasets#
[1]:
from pathlib2 import Path
import papylio as pp
import matplotlib.pyplot as plt
%matplotlib inline
Experiment import#
Note that the sequencing data is automatically imported.
[2]:
experiment_path = Path(r'C:\Users\user\Desktop\SPARXS example dataset')
[3]:
exp = pp.Experiment(experiment_path)
Import files: 100%|██████████████████████████████████████████████████████████████| 4190/4190 [00:00<00:00, 6337.52it/s]
File(Single-molecule data - bead slide\Bead slide TIRF 561 001) used as mapping
Initialize experiment:
C:\Users\user\Desktop\SPARXS example dataset
Import sequencing data:
Sequencing data\sequencing_data.nc
[4]:
files_bead_slide = exp.files.select('Bead slide', 'name')
files_green_laser = exp.files.select('Single-molecule data - green laser', 'relativePath')
files_red_laser_before = exp.files.select('Single-molecule data - red laser before', 'relativePath')
files_red_laser_after = exp.files.select('Single-molecule data - red laser after', 'relativePath')
[5]:
files_red_laser_before.movie.illumination_arrangement = [1]
files_red_laser_after.movie.illumination_arrangement = [1]
Add sequencing data to file datasets#
Here we import the matched sequences in the dataset for each field of view.
[6]:
files_green_laser.insert_sequencing_data_into_file_dataset(include_raw_sequences=True, include_aligned_sequences=True,
include_sequence_subset=True)
0%| | 0/896 [00:00<?, ?it/s]
Serial processing
100%|████████████████████████████████████████████████████████████████████████████████| 896/896 [02:11<00:00, 6.82it/s]
include_raw_sequences If True, include the ‘raw’ unaligned sequences in the file dataset.
include_aligned_sequences If True, include the aligned sequences in the file dataset.
include_sequence_subset If True, include the sequence subset in the file dataset.
[7]:
files_green_laser[500].dataset
[7]:
<xarray.Dataset>
Dimensions: (molecule: 1715, channel: 2, dimension: 2,
frame: 400)
Coordinates:
molecule_in_file (molecule) int32 0 1 2 3 4 ... 1711 1712 1713 1714
file (molecule) |S58 b'Single-molecule data - green ...
* channel (channel) int64 0 1
* dimension (dimension) |S1 b'x' b'y'
time (frame) float64 0.0 0.121 0.244 ... 48.75 48.87
* frame (frame) int32 0 1 2 3 4 5 ... 395 396 397 398 399
illumination (frame) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
sequence_in_file (molecule) int32 -1 -1 -1 -1 744 ... -1 -1 -1 -1
Dimensions without coordinates: molecule
Data variables: (12/16)
selected (molecule) bool False False False ... False False
coordinates (molecule, channel, dimension) float64 243.9 .....
intensity (molecule, channel, frame) float64 2.898e+04 .....
intensity_raw (molecule, channel, frame) float64 1.81e+04 ......
FRET (molecule, frame) float64 0.2133 0.1391 ... 0.6694
intensity_red_before (molecule, channel) float64 52.21 ... -318.6
... ...
sequence_quality (molecule) |S147 b' ...
sequence_aligned (molecule) |S147 b'----------------------------...
sequence_quality_aligned (molecule) |S147 b' ...
sequence_subset (molecule) |S8 b'--------' ... b'--------'
sequence_quality_subset (molecule) |S8 b' ' ... b' '
sequence_coordinates (molecule, dimension) int64 0 0 0 0 0 ... 0 0 0 0The dataset of each movie is expanded with several sequence variables.
Merge and reorder datasets#
Select files that have a sequencing alignment
[8]:
files = files_green_laser[files_green_laser.has_sequencing_match]
To create a single large dataset
[9]:
# files.merge_datasets(filepath_out=exp.main_path / 'Analysis' / 'complete_dataset.nc', init_file_index=-1,
# with_sequence_only=True)
To create dataset for each sequence subset
[10]:
files.reorder_datasets_using_sequence_subset(folderpath_out=exp.main_path / 'Analysis' / 'Datasets per sequence')
100%|████████████████████████████████████████████████████████████████████████████████████████| 744/744 [00:00<?, ?it/s]
Serial processing
100%|████████████████████████████████████████████████████████████████████████████████| 744/744 [20:52<00:00, 1.68s/it]
[11]:
files[395].dataset
[11]:
<xarray.Dataset>
Dimensions: (molecule: 1715, channel: 2, dimension: 2,
frame: 400)
Coordinates:
molecule_in_file (molecule) int32 0 1 2 3 4 ... 1711 1712 1713 1714
file (molecule) |S58 b'Single-molecule data - green ...
* channel (channel) int64 0 1
* dimension (dimension) |S1 b'x' b'y'
time (frame) float64 0.0 0.121 0.244 ... 48.75 48.87
* frame (frame) int32 0 1 2 3 4 5 ... 395 396 397 398 399
illumination (frame) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
sequence_in_file (molecule) int32 -1 -1 -1 -1 744 ... -1 -1 -1 -1
Dimensions without coordinates: molecule
Data variables: (12/16)
selected (molecule) bool False False False ... False False
coordinates (molecule, channel, dimension) float64 243.9 .....
intensity (molecule, channel, frame) float64 2.898e+04 .....
intensity_raw (molecule, channel, frame) float64 1.81e+04 ......
FRET (molecule, frame) float64 0.2133 0.1391 ... 0.6694
intensity_red_before (molecule, channel) float64 52.21 ... -318.6
... ...
sequence_quality (molecule) |S147 b' ...
sequence_aligned (molecule) |S147 b'----------------------------...
sequence_quality_aligned (molecule) |S147 b' ...
sequence_subset (molecule) |S8 b'--------' ... b'--------'
sequence_quality_subset (molecule) |S8 b' ' ... b' '
sequence_coordinates (molecule, dimension) int64 0 0 0 0 0 ... 0 0 0 0