4. Coupling datasets#

[1]:
from pathlib2 import Path
import papylio as pp
import matplotlib.pyplot as plt

%matplotlib inline

Experiment import#

Note that the sequencing data is automatically imported.

[2]:
experiment_path = Path(r'C:\Users\user\Desktop\SPARXS example dataset')
[3]:
exp = pp.Experiment(experiment_path)
Import files: 100%|██████████████████████████████████████████████████████████████| 4190/4190 [00:00<00:00, 6337.52it/s]

File(Single-molecule data - bead slide\Bead slide TIRF 561 001) used as mapping

Initialize experiment:
C:\Users\user\Desktop\SPARXS example dataset

Import sequencing data:
Sequencing data\sequencing_data.nc
[4]:
files_bead_slide = exp.files.select('Bead slide', 'name')
files_green_laser = exp.files.select('Single-molecule data - green laser', 'relativePath')
files_red_laser_before = exp.files.select('Single-molecule data - red laser before', 'relativePath')
files_red_laser_after = exp.files.select('Single-molecule data - red laser after', 'relativePath')
[5]:
files_red_laser_before.movie.illumination_arrangement = [1]
files_red_laser_after.movie.illumination_arrangement = [1]

Add sequencing data to file datasets#

Here we import the matched sequences in the dataset for each field of view.

[6]:
files_green_laser.insert_sequencing_data_into_file_dataset(include_raw_sequences=True, include_aligned_sequences=True,
                                                           include_sequence_subset=True)
  0%|                                                                                          | 0/896 [00:00<?, ?it/s]
Serial processing
100%|████████████████████████████████████████████████████████████████████████████████| 896/896 [02:11<00:00,  6.82it/s]

include_raw_sequences If True, include the ‘raw’ unaligned sequences in the file dataset.

include_aligned_sequences If True, include the aligned sequences in the file dataset.

include_sequence_subset If True, include the sequence subset in the file dataset.

[7]:
files_green_laser[500].dataset
[7]:
<xarray.Dataset>
Dimensions:                   (molecule: 1715, channel: 2, dimension: 2,
                               frame: 400)
Coordinates:
    molecule_in_file          (molecule) int32 0 1 2 3 4 ... 1711 1712 1713 1714
    file                      (molecule) |S58 b'Single-molecule data - green ...
  * channel                   (channel) int64 0 1
  * dimension                 (dimension) |S1 b'x' b'y'
    time                      (frame) float64 0.0 0.121 0.244 ... 48.75 48.87
  * frame                     (frame) int32 0 1 2 3 4 5 ... 395 396 397 398 399
    illumination              (frame) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    sequence_in_file          (molecule) int32 -1 -1 -1 -1 744 ... -1 -1 -1 -1
Dimensions without coordinates: molecule
Data variables: (12/16)
    selected                  (molecule) bool False False False ... False False
    coordinates               (molecule, channel, dimension) float64 243.9 .....
    intensity                 (molecule, channel, frame) float64 2.898e+04 .....
    intensity_raw             (molecule, channel, frame) float64 1.81e+04 ......
    FRET                      (molecule, frame) float64 0.2133 0.1391 ... 0.6694
    intensity_red_before      (molecule, channel) float64 52.21 ... -318.6
    ...                        ...
    sequence_quality          (molecule) |S147 b'                            ...
    sequence_aligned          (molecule) |S147 b'----------------------------...
    sequence_quality_aligned  (molecule) |S147 b'                            ...
    sequence_subset           (molecule) |S8 b'--------' ... b'--------'
    sequence_quality_subset   (molecule) |S8 b'        ' ... b'        '
    sequence_coordinates      (molecule, dimension) int64 0 0 0 0 0 ... 0 0 0 0

The dataset of each movie is expanded with several sequence variables.

Merge and reorder datasets#

Select files that have a sequencing alignment

[8]:
files = files_green_laser[files_green_laser.has_sequencing_match]

To create a single large dataset

[9]:
# files.merge_datasets(filepath_out=exp.main_path / 'Analysis' / 'complete_dataset.nc', init_file_index=-1,
#                               with_sequence_only=True)

To create dataset for each sequence subset

[10]:
files.reorder_datasets_using_sequence_subset(folderpath_out=exp.main_path / 'Analysis' / 'Datasets per sequence')
100%|████████████████████████████████████████████████████████████████████████████████████████| 744/744 [00:00<?, ?it/s]
Serial processing
100%|████████████████████████████████████████████████████████████████████████████████| 744/744 [20:52<00:00,  1.68s/it]
[11]:
files[395].dataset
[11]:
<xarray.Dataset>
Dimensions:                   (molecule: 1715, channel: 2, dimension: 2,
                               frame: 400)
Coordinates:
    molecule_in_file          (molecule) int32 0 1 2 3 4 ... 1711 1712 1713 1714
    file                      (molecule) |S58 b'Single-molecule data - green ...
  * channel                   (channel) int64 0 1
  * dimension                 (dimension) |S1 b'x' b'y'
    time                      (frame) float64 0.0 0.121 0.244 ... 48.75 48.87
  * frame                     (frame) int32 0 1 2 3 4 5 ... 395 396 397 398 399
    illumination              (frame) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    sequence_in_file          (molecule) int32 -1 -1 -1 -1 744 ... -1 -1 -1 -1
Dimensions without coordinates: molecule
Data variables: (12/16)
    selected                  (molecule) bool False False False ... False False
    coordinates               (molecule, channel, dimension) float64 243.9 .....
    intensity                 (molecule, channel, frame) float64 2.898e+04 .....
    intensity_raw             (molecule, channel, frame) float64 1.81e+04 ......
    FRET                      (molecule, frame) float64 0.2133 0.1391 ... 0.6694
    intensity_red_before      (molecule, channel) float64 52.21 ... -318.6
    ...                        ...
    sequence_quality          (molecule) |S147 b'                            ...
    sequence_aligned          (molecule) |S147 b'----------------------------...
    sequence_quality_aligned  (molecule) |S147 b'                            ...
    sequence_subset           (molecule) |S8 b'--------' ... b'--------'
    sequence_quality_subset   (molecule) |S8 b'        ' ... b'        '
    sequence_coordinates      (molecule, dimension) int64 0 0 0 0 0 ... 0 0 0 0