split

Split fragment files by cell type.

Usage and options

catac_fragment_tools split \
    -f <PATH_TO_SAMPLE_TO_FRAGMENT_DEFINITION> \
    -b <PATH_TO_CELL_TYPE_TO_CELL_BARCODE_DEFINITION> \
    -c <CHROM_SIZES_FILENAME> \
    -o <PATH_TO_OUTPUT_FOLDER>

Required Arguments

-f, –sample_fragments

Path to a text file mapping sample names to fragment files.

-b, –cell_type_barcodes

Path to a text file mapping samples to cell types and cell types to cell barcodes.

-c, –chrom

Filename with chromosome sizes (*.chrom.sizes, *.fa.fai).

-o, –output

Path to output folder.

Optional arguments

-t, –temp

Path to temporary folder. Default: /tmp

-n, –n_cpu

Number of cores to use. Default: 1

-v, –verbose

Whether to print progress. Default: False

–clear_temp

Whether to clear the temporary folder. Default: False

-s, –sep

Separator for text files. Default: ‘\t’

–sample_column

Column name for the sample name Default: sample

–fragment_column

Column name for the path to the fragment file Default: path_to_fragment_file

–cell_type_column

Column name for the cell type Default: cell_type

–cell_barcode_column

Column name for the cell barcode Default: cell_barcode

Examples of input files

sample_to_fragment.tsv

sample  path_to_fragment_file
A       a.fragments.tsv.gz
B       b.fragments.tsv.gz

cell_type_to_cell_barcode.tsv

sample  cell_type  cell_barcode
A       type_1     TTAGCTTAGGAGAACA-1
A       type_1     TTAGCTTAGGAGAACA-1
A       type_1     ATATTCCTCTTGTACT-1
A       type_2     TGTGACAGTACAACGG-1
A       type_2     CATGCCTTCTCTGACC-1
A       type_2     ATCGAGTAGGTTCGAG-1
A       type_3     CTCTCAGGTCCCTTTG-1
A       type_3     TTCGGTCTCACGTGTA-1
A       type_3     GTGACATCATTGTTCT-1
A       type_4     AAGGAGCCATCGACCG-1
A       type_4     ACCAAACTCTTAAGCG-1
A       type_4     CATTGGATCTCTTCCT-1
A       type_5     AGGCGAAAGGTCTTTG-1
A       type_5     AACGAGGCATCATGTG-1
A       type_5     CTACTTAGTCATGAGG-1
B       type_1     ATTACCTGTGTGCTTA-1
B       type_1     CATAACGTCGGTTGTA-1
B       type_1     ATGTCTTTCGGTCCGA-1
B       type_2     CAATCCCGTAGCGTTT-1