Command line tools

_template_script

Template script.

usage: _template_script [-h] [-i input]

Named Arguments

-i Input.

add_errors

Add a specified number of errors to random sites for each input sequence.

usage: add_errors [-h] [-n nr_errors] [-t error_type]
                  [input_fasta] [output_fasta]

Positional Arguments

input_fasta

Input fasta (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fasta

Output fasta (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-n

Number of errors to introduce (0).

Default: 0

-t

Possible choices: substitution, insertion, deletion

Error type: substitution, insertion or deletion.

Default: “substitution”

annotate_length

Add sequence length to sequence record descriptions.

usage: annotate_length [-h] [-i in_format] [-o out_format]
                       [input_fastx] [output_fastx]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastx

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

bam_accuracy

Produce accuracy statistics of the input BAM file. Calculates global accuracy and identity and various per-read statistics.
The input BAM file must be sorted by coordinates and indexed.

usage: bam_accuracy [-h] [-c region] [-g global_tsv] [-l read_tsv]
                    [-t bam_tag] [-q aqual] [-e] [-r report_pdf]
                    [-p results_pickle] [-Q]
                    bam

Positional Arguments

bam Input BAM file.

Named Arguments

-c BAM region (None).
-g Tab separated file to save global statistics (None).
-l Tab separated file to save per-read statistics (None).
-t Dataset tag (BAM basename).
-q

Minimum alignment quality (0).

Default: 0

-e

Include hard and soft clipps in alignment length when calculating accuracy (False).

Default: False

-r

Report PDF (bam_accuracy.pdf).

Default: “bam_accuracy.pdf”

-p Save pickled results in this file (None).
-Q

Be quiet and do not print progress bar (False).

Default: False

bam_alignment_length

Produce a tab separated file of alignment lengths and other information.
Rows are sorted by number of aligned reference bases unless the -x option is specified.

usage: bam_alignment_length [-h] [-t tsv_file] [-q aqual] [-x] [-Q] bam

Positional Arguments

bam Input BAM file.

Named Arguments

-t

Tab separated file to save alignment lengths (bam_alignment_length.tsv).

Default: “bam_alignment_length.tsv”

-q

Minimum alignment quality (0).

Default: 0

-x

Sort by number of read bases instead of number of aligned reference bases.

Default: False

-Q

Be quiet and do not print progress bar (False).

Default: False

bam_alignment_qc

Produce alignment based QC plots of the input BAM file.

The input BAM file must be sorted by coordinates and indexed.

It produces the following global plots:
  • Read statistics: number of mapped, unmapped and low mapping quality reads.
  • Distribution of mean quality values in the mapped and unmapped fractions.
  • Distribution of read lengths in the unmapped fraction.
  • Distribution of read lengths in the mapped fraction.
  • Distribution of read lengths in the mapping with quality less than -q
  • Distribution of alignment lengths.
  • Distribution of mapping qualities.
  • Plot of alignment lengths vs. mean base qualities.
  • Basewise statistics: total alignment length, number of insertions, deleltions, matches and mismatches.
  • Precision statistics: accuracy and identity.
  • Frequency of errors in the context specifed by the left and right context sizes (-n). Definition of context: for substitutions the event is happening from the “central base”, in the case of indels the events are located between the central base and the base before. The columns of the heatmap are normalised to sum to one and then the diagonal element are set to zero.
  • Distribution of deletion lengths.
  • Distribution of insertion lengths.
  • Base composition of insertions.
The following plots are produced for every reference unless disabled via -x:
  • Distribution of quality values across the reference as a heatmap.
  • Mean quality values across the reference.
  • Base coverage across the reference.

The tool saves the gathered statistics in a pickle file, which can be fed to bam_multi_qc.py to compare different samples.

usage: bam_alignment_qc [-h] -f reference [-c region] [-n context_sizes] [-x]
                        [-t bam_tag] [-q aqual] [-i qual_ints] [-r report_pdf]
                        [-p results_pickle] [-Q]
                        bam

Positional Arguments

bam Input BAM file.

Named Arguments

-f Reference fasta.
-c BAM region (None).
-n

Left and right context sizes (1,1).

Default: “1,1”

-x

Do not plot per-reference information.

Default: False

-t Dataset tag (BAM basename).
-q

Minimum alignment quality (0).

Default: 0

-i

Number of quality intervals (6).

Default: 6

-r

Report PDF (bam_alignment_qc.pdf).

Default: “bam_alignment_qc.pdf”

-p

Save pickled results in this file (bam_alignment_qc.pk).

Default: “bam_alignment_qc.pk”

-Q

Be quiet and do not show progress bars.

Default: False

bam_alignments_compare

Compare alignments stored in two BAM files.
The two BAM files must have the same set of reads in the same order (name sorted).

usage: bam_alignments_compare [-h] [-w coarse_tolerance] [-g] [-r report_pdf]
                              [-p results_pickle] [-t tsv_file] [-f format]
                              [-Q]
                              bam_one bam_two

Positional Arguments

bam_one First input BAM file.
bam_two Second input BAM file.

Named Arguments

-w

Tolerance when performing coarse comparison of alignments (50).

Default: 50

-g

Do strict comparison of alignment flags.

Default: False

-r

Report PDF (bam_alignments_compare.pdf).

Default: “bam_alignments_compare.pdf”

-p

Save pickled results in this file (bam_alignments_compare.pk).

Default: “bam_alignments_compare.pk”

-t Save results in tsv format in this file (None).
-f

Input format (BAM).

Default: “BAM”

-Q

Be quiet and do not print progress bar (False).

Default: False

bam_count_reads

Count reads mapping to each reference in a BAM file.

usage: bam_count_reads [-h] [-a min_aqual] [-f in_format] [-z ref_fasta]
                       [-k words] [-g] [-p results_pickle] [-t tsv_file] [-Q]
                       [-R] [-F yield_freq]
                       [bam]

Positional Arguments

bam

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

Named Arguments

-a

Minimum mapping quality (0).

Default: 0

-f

Input format (BAM).

Default: “BAM”

-z Reference fasta. GC content and length columns are added if present (None).
-k Include word frequencies of specifed length in output (None).
-g

Include mean GC content of reads mapped to each reference (False).

Default: False

-p Save pickled results in this file (None).
-t

Save results in tsv format in this file (bam_count_reads.tsv).

Default: “bam_count_reads.tsv”

-Q

Be quiet and do not print progress bar (False).

Default: False

-R

Count reads from SAM stream in stdin. Only read count fields are written. Header required! (False).

Default: False

-F

Yield counts after every -Fth mapped record when doing online counting (100).

Default: 100

bam_cov

Produce refrence coverage table.

usage: bam_cov [-h] -f reference [-c region] [-t tsv] [-q aqual] [-Q] bam

Positional Arguments

bam Input BAM file.

Named Arguments

-f Reference fasta.
-c BAM region (None).
-t

Output TSV (bam_cov.tsv).

Default: “bam_cov.tsv”

-q

Minimum alignment quality (0).

Default: 0

-Q

Be quiet and do not show progress bars.

Default: False

bam_fill_unaligned

Generate SAM records for the reads present in the input fastq but missing from
the input SAM/BAM.

usage: bam_fill_unaligned [-h] [-f format] -q fastq input_file output_file

Positional Arguments

input_file Input file.
output_file Output SAM file.

Named Arguments

-f

Input/output format (SAM).

Default: “SAM”

-q Input fastq.

bam_frag_coverage

Produce aggregated and individual plots of fragment coverage.

usage: bam_frag_coverage [-h] -f reference [-c region] [-i intervals]
                         [-b bins] [-x] [-o] [-t bam_tag] [-q aqual]
                         [-l cov80_tsv] [-g glob_cov80_tsv] [-r report_pdf]
                         [-p results_pickle] [-Q]
                         bam

Positional Arguments

bam Input BAM file.

Named Arguments

-f Reference fasta.
-c BAM region (None).
-i

Length intervals ().

Default: “”

-b Number of bins (None = auto).
-x

Plot per-reference information.

Default: False

-o

Do not take log of coverage.

Default: False

-t Dataset tag (BAM basename).
-q

Minimum alignment quality (0).

Default: 0

-l Tab separated file with per-chromosome cov80 scores (None). Requires the -x option to be specified.
-g Tab separated file with global cov80 score (None).
-r

Report PDF (bam_frag_coverage.pdf).

Default: “bam_frag_coverage.pdf”

-p Save pickled results in this file (None).
-Q

Be quiet and do not show progress bars.

Default: False

bam_gc_vs_qual

Produce a plot of GC content of aligned read and reference portion versus their mean quality values.

usage: bam_gc_vs_qual [-h] -f reference [-q aqual] [-r report_pdf] [-t tsv]
                      [-Q]
                      bam

Positional Arguments

bam Input BAM file.

Named Arguments

-f Reference fasta.
-q

Minimum alignment quality (0).

Default: 0

-r

Report PDF (bam_gc_vs_qual.pdf).

Default: “bam_gc_vs_qual.pdf”

-t

Tab separated file to save results (bam_gc_vs_qual.tsv).

Default: “bam_gc_vs_qual.tsv”

-Q

Be quiet and do not show progress bars.

Default: False

bam_multi_qc

Compare alignment QC statistics of multiple samples.

It takes a list of pickle files produced by bam_alignment_qc.py and produces plots comparing the following properties of the input samples:

  • Number of mapped reads.
  • Number of unmapped reads.
  • Distribution of mean quality values in the unaligned fraction.
  • Distribution of mean quality values in the aligned fraction.
  • Distribution of read lengths in the unaligned fraction.
  • Distribution of read lengths in the aligned fraction.
  • Distribution of alignment lengths.
  • Distribution of mapping qualities.
  • Alignment accuracy.
  • Alignment identity.
  • Distribution of deletion lengths.
  • Distribution of insertion lengths.
Per reference plots (can be disabled by -x):
  • Relative coverage across reference.
  • Mean qualities per position.

usage: bam_multi_qc [-h] [-r report_pdf] [-x]
                    [input_pickles [input_pickles ...]]

Positional Arguments

input_pickles Input pickles.

Named Arguments

-r

Report PDF (bam_multi_qc.pdf).

Default: “bam_multi_qc.pdf”

-x

Do not plot reference statistics.

Default: False

bam_ref_base_coverage

Calculate percent covered reference lengths.

usage: bam_ref_base_coverage [-h] -f reference [-c region] [-t tsv]
                             [-m min_cov] [-Q]
                             bam

Positional Arguments

bam Input BAM file.

Named Arguments

-f Reference fasta.
-c BAM region (None).
-t

Output tab separated file (bam_ref_base_coverage.tsv).

Default: “bam_ref_base_coverage.tsv”

-m

Minimum base coverage for a position to be counted (1).

Default: 1

-Q

Be quiet and do not show progress bars.

Default: False

bam_ref_tab

Produce a tab separated file with read identifiers and the corresponding references, sorted by reference.

usage: bam_ref_tab [-h] [-t read_tsv] [-Q] [-s] bam

Positional Arguments

bam Input BAM file.

Named Arguments

-t

Tab separated file to save reference table.

Default: “bam_ref_tab.tsv”

-Q

Be quiet and do not print progress bar (False).

Default: False

-s

Save read strand in output (False).

Default: False

bam_score_filter

Filter SAM/BAM records by score or other criteria.
WARNING: the input records must be sorted by name or the filtering will not work as expected.

usage: bam_score_filter [-h] [-f format] [-s strategy] [-q query_cover]
                        input_file output_file

Positional Arguments

input_file Input file.
output_file Output SAM file.

Named Arguments

-f

Input/output format (SAM).

Default: “SAM”

-s

Possible choices: top_per_query, query_coverage, ref_coverage

Filtering strategy: top_per_query, query_coverage, ref_coverage (top_per_query).

Default: “top_per_query”

-q

Minimum query coverage fraction (0.8).

Default: 0.8

bam_soft_clips_tab

Produce a tab separated file with read identifiers and number of soft clipped bases at each end (relative to the original sequence in the fastq).

usage: bam_soft_clips_tab [-h] [-t tsv] [-Q] bam

Positional Arguments

bam Input BAM file.

Named Arguments

-t

Output tab separated file.

Default: “bam_soft_clips_tab.tsv”

-Q

Be quiet and do not print progress bar (False).

Default: False

bias_explorer

Simple tool for exploring biases in transcript counts. Takes as input count files generated by bam_count_reads.py (with the -z flag) and performs linear regression of log counts against transcript length and GC content.

usage: bias_explorer [-h] [-r report_pdf] [-x] count_file

Positional Arguments

count_file Input counts file with length ang GC content features.

Named Arguments

-r

Report PDF (bias_explorer.pdf).

Default: “bias_explorer.pdf”

-x

Exclude transcripts with zero counts.

Default: False

calculate_coverage

Calculate total number of bases and genome coverage if genome size is given.

usage: calculate_coverage [-h] [-f format] [-s genome_size]
                          [-p results_pickle]
                          [input_fastx]

Positional Arguments

input_fastx

Input (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

Named Arguments

-f

Input format (fastq).

Default: “fastq”

-s Genome size (None).
-p Save pickled results in this file.

compare_genomes_dnadiff

Compare a set of reference sequences (genome) to another set (target assembly) using mummer’s dnadiff.
It prints the alignment results to stdout. All parsed results can be saved in a pickle file.

usage: compare_genomes_dnadiff [-h] [-p results_pickle] [-r raw_file]
                               [-d work_dir] [-k] [-v]
                               reference_fasta target_fasta

Positional Arguments

reference_fasta
 Reference fasta.
target_fasta Target fasta.

Named Arguments

-p Save pickled results in this file (None).
-r Save dnadiff report in this file (None).
-d Use this working directory instead of a temporary directory (None).
-k

Keep dnadiff result files (False).

Default: False

-v

Print out dnadiff output (False).

Default: False

compare_genomes_lastal

Compare a set of reference sequences (genome) to another set (target assembly) using lastal alignment.

Accuracy is the total number of matched bases divided by total alignment length. Coverage is total reference covered by alignment divided by total length of reference.

Caveats:
  • The lastal alignments are filtered by default (use -f to disable) so only the best scoring alignment is kept per query. Hence some shorter valid

alignments might be discarded causing an underestimation of coverage. - The estimated accuracy is dependent on the scoring of gaps and mismatches. By default gap open and gap extend penalties are set to equal.

usage: compare_genomes_lastal [-h] [-p results_pickle] [-l lastal_args]
                              [-t details_tsv] [-f] [-r report_pdf]
                              reference_fasta target_fasta

Positional Arguments

reference_fasta
 Reference fasta.
target_fasta Target fasta.

Named Arguments

-p Save pickled results in this file (None).
-l

Parameters passed to lastal in the <arg>:value,… format (a:1,b:1).

Default: “a:1,b:1”

-t Save details of lastal alignment in this tab-separated file (None).
-f

Do not filter for best alignment per query.

Default: False

-r Report with alignment details plot (None).

convert_alphabet

Convert between DNA and RNA alphabets.

usage: convert_alphabet [-h] [-i in_format] [-o out_format] [-D] [-R]
                        [input_fastx] [output_fastx]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastx

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

-D

RNA->DNA alphabet conversion.

Default: False

-R

DNA->RNA alphabet conversion.

Default: False

correlate_counts

Correlate counts produced by multiple runs of bam_count_reads.py.

usage: correlate_counts [-h] [-r report_pdf] [-c corr_type] [-L] [-o]
                        [input_counts [input_counts ...]]

Positional Arguments

input_counts Input counts as tab separated files.

Named Arguments

-r

Report PDF (bam_multi_qc.pdf).

Default: “correlate_counts.pdf”

-c

Correlation statistic - spearman or pearson (spearman).

Default: “spearman”

-L

Log transform data.

Default: False

-o

Omit lower diagonal.

Default: False

fasta_to_mock_fastq

Convert fasta file to fastq with mock qualities.

usage: fasta_to_mock_fastq [-h] [-q mock_quals] [input_fasta] [output_fastq]

Positional Arguments

input_fasta

Input fasta (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastq

Output fastq (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-q

Mock quality value (40).

Default: 40

fastq_qual_tab

Generate a table of read names and mean quality values.

usage: fastq_qual_tab [-h] [-t tsv] [input_fastq]

Positional Arguments

input_fastq

Input fastq (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

Named Arguments

-t

Output tab separated file.

Default: “fastq_qual_tab.tsv”

fastq_time_slice

Filter a fastq file by starting time.

usage: fastq_time_slice [-h] -t time_tsv [-s start_perc] [-e end_perc]
                        [input_fastq] [output_fastq]

Positional Arguments

input_fastq

Input fastq (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastq

Output fastq (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-t Tab separeted file produced by fastq_time_tab.py.
-s

Start of slice as percent of total time.

Default: 0.0

-e

End of slice as percent of total time.

Default: 100.0

fastq_time_tab

Produce a tab separated file with read start times, read and channel numbers sorted by start time.

usage: fastq_time_tab [-h] [-t read_tsv] fastq

Positional Arguments

fastq Input fastq file.

Named Arguments

-t

Tab separated file to save read time table.

Default: “fastq_time_tab.tsv”

fastx_ends_tab

Generate a tab separated file with the first and last -n bases of the sequences.

usage: fastx_ends_tab [-h] [-i in_format] [-n nr_bases]
                      [input_fastx] [output_tsv]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_tsv

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-n

.

Default: 100

fastx_grep

Filter sequence files by read name.

usage: fastx_grep [-h] [-i in_format] [-o out_format] [-n read_names]
                  [input_fastx] [output_fastx]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastx

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

-n

Comma separated list of read names to select.

Default: “”

fastx_length_tab

Generate a tab separated file with the sequence lengths in the input file.

usage: fastx_length_tab [-h] [-i in_format] [input_fastx] [output_tsv]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_tsv

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fasta).

Default: “fasta”

length_normalise_counts

Calculate RPKM values from raw counts and a transcriptome reference.

usage: length_normalise_counts [-h] -f in_trs input_counts output_count

Positional Arguments

input_counts Input count file.
output_count Output RPKM file.

Named Arguments

-f Input transcriptome.

merge_tsvs

Merge tab separated files on a given field using pandas.

usage: merge_tsvs [-h] [-j join] [-f field] [-o out_tsv] [-z]
                  [input_tsvs [input_tsvs ...]]

Positional Arguments

input_tsvs Input tab separated files.

Named Arguments

-j

Join type (outer).

Default: “outer”

-f

Join on this field (Read).

Default: “Read”

-o

Output tsv (merge_tsvs.tsv).

Default: “merge_tsvs.tsv”

-z

Fill NA values with zero.

Default: False

multi_length_hist

Plot histograms of length distributions from multiple sequence files.

usage: multi_length_hist [-h] [-r report_pdf] [-f in_format] [-b nr_bins]
                         [-l min_len] [-u max_len] [-L]
                         [input_counts [input_counts ...]]

Positional Arguments

input_counts Input sequence files.

Named Arguments

-r

Report PDF.

Default: “multi_length_hist.pdf”

-f

Input format (fastq).

Default: “fastq”

-b

Number of bins (50).

Default: 50

-l Minimum read length (None).
-u Maximum read length (None).
-L

Log transform lengths.

Default: False

pickle_cat

Pretty print the contents of a pickle file.

usage: pickle_cat [-h] pickle_file

Positional Arguments

pickle_file Input pickle file.

plot_counts_correlation

Scatter plot of two set of counts.

usage: plot_counts_correlation [-h] [-r report_pdf] [-T tags] [-t merged_data]
                               [-o Correlation_tsv]
                               counts_one counts_two

Positional Arguments

counts_one Input tab separated file.
counts_two Input tab separated file.

Named Arguments

-r

Report PDF.

Default: “plot_counts_correlation.pdf”

-T Data tags: tag1,tag2.
-t Merged data TSV.
-o Correlation TSV.

plot_gffcmp_stats

Plot a gffcompare stats file.

usage: plot_gffcmp_stats [-h] [-r report_pdf] [-p pickle_out] input_txt

Positional Arguments

input_txt Input gffcompare stats file.

Named Arguments

-r

Report PDF (plot_gffcmp_stats.pdf).

Default: “plot_gffcmp_stats.pdf”

-p

Output pickle file.

Default: “plot_gffcmp_stats.pk”

plot_qualities

Plot the mean quality values across non-overlapping windows in the input sequences.

usage: plot_qualities [-h] [-w win_size] [-r report_pdf] [input_fastx]

Positional Arguments

input_fastx

Input (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

Named Arguments

-w

Window size (50).

Default: 50

-r

Report pdf (plot_qualities.pdf).

Default: “plot_qualities.pdf”

plot_sequence_properties

Plot histograms of lengths and quality values.

usage: plot_sequence_properties [-h] [-f format] [-b bins] [-r report_pdf]
                                [-j]
                                [input_fastx]

Positional Arguments

input_fastx

Input (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

Named Arguments

-f

Input format (fastq).

Default: “fastq”

-b

Number of bins on histograms (50).

Default: 50

-r

Report pdf (plot_sequence_properties.pdf).

Default: “plot_sequence_properties.pdf”

-j

Produce joint plot of lengths and mean quality values (False).

Default: False

reads_across_time

Plot read and alignment properties across time.

usage: reads_across_time [-h] -i time_tab -a aln_tab [-w res_freq]
                         [-r report_pdf] [-t out_tsv]

Named Arguments

-i Tab separated file generated by fastq_time_tab.py
-a Tab separated file generated by bam_alignment_length.py
-w

Resampling frequency in minutes.

Default: 5

-r

Report PDF (reads_across_time.pdf).

Default: “reads_across_time.pdf”

-t

Output tsv (reads_across_time.tsv).

Default: “reads_across_time.tsv”

reads_stats

No documentation available .. _reverse_fastq:

reverse_fastq

Reverse (but not complement!) sequences and qualities in fastq file.

usage: reverse_fastq [-h] [input_fastq] [output_fastq]

Positional Arguments

input_fastq

Input fastq (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastq

Output fastq (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

sequence_filter

Filter sequences by length and mean quality value.

usage: sequence_filter [-h] [-i in_format] [-o out_format] [-q min_qual]
                       [-l min_length] [-c] [-u max_length]
                       [input_fastx] [output_fastx]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastx

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

-q

Minimum mean quality value (0.0).

Default: 0.0

-l

Minimum length (0).

Default: 0

-c

Reverse complement sequences.

Default: False

-u Maximum length (None).

sequence_subtract

Filter out sequences present in the first file from the second file.

usage: sequence_subtract [-h] [-i in_format] [-o out_format]
                         [input_fastx_bait] [input_fastx_target]
                         [output_fastx]

Positional Arguments

input_fastx_bait
 

First input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

input_fastx_target
 

Second input file.

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastx

Output file (default: stdout).

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

simulate_errors

Simulate sequencing errors for each input sequence.

usage: simulate_errors [-h] [-e error_rate] [-w error_weights]
                       [-z random_seed]
                       [input_fasta] [output_fasta]

Positional Arguments

input_fasta

Input fasta (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fasta

Output fasta (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-e

Total rate of substitutions insertions and deletions (0.1).

Default: 0.1

-w

Relative frequency of substitutions,insertions,deletions (1,1,4).

Default: “1,1,4”

-z Random seed (None).

simulate_genome

Simulate genome sequence with the specified number of chromosomes,
length distribution (truncated gamma) and base composition.

usage: simulate_genome [-h] [-n nr_chrom] [-m mean_length] [-a gamma_shape]
                       [-l low_trunc] [-u high_trunc] [-b base_freqs]
                       [-z random_seed]
                       [output_fasta]

Positional Arguments

output_fasta

Output fasta (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-n

Number of chromosomes (23).

Default: 23

-m

Mean length of chromosomes (5000000).

Default: 5000000

-a

Gamma shape parameter (1).

Default: 1.0

-l Lower truncation point (None).
-u Upper truncation point (None).
-b

Relative base frequencies in A,C,G,T order (1,1,1,1) or “random”.

Default: “1,1,1,1”

-z Random seed (None).

simulate_sequences

Simulate sequences of fixed length and specified base composition.

usage: simulate_sequences [-h] [-n nr_seq] [-m length] [-b base_freqs]
                          [-z random_seed]
                          [output_fasta]

Positional Arguments

output_fasta

Output fasta (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-n

Number of sequences (1).

Default: 1

-m

Length of simulated sequences (3000).

Default: 3000

-b

Relative base frequencies in A,C,G,T order (1,1,1,1).

Default: “1,1,1,1”

-z Random seed (None).

simulate_sequencing_simple

Sample fragments from the input genome and simulate sequencing errors.

Read lengths are drawn from the specified truncated gamma distribution. Chromosomes are sampled randomly for each read.

The format of the read names is the following: r<unique_id>_<chromosome>_<frag_start>_<frag_end>_<strand>/q<realised_quality>/s<realised_substiutions>/d<realised_deletions>/i<realised_insertions>

usage: simulate_sequencing_simple [-h] [-n nr_reads] [-m mean_length]
                                  [-a gamma_shape] [-l low_trunc]
                                  [-u high_trunc] [-e error_rate]
                                  [-w error_weights] [-b strand_bias]
                                  [-q mock_quality] [-s true_sam] [-Q]
                                  [-z random_seed]
                                  [input_fasta] [output_fastq]

Positional Arguments

input_fasta

Input genome in fasta format (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_fastq

Output fastq (default: stdout)

Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150>

Named Arguments

-n

Number of simulated reads (1).

Default: 1

-m

Mean read length (5000).

Default: 5000

-a

Read length distribution: gamma shape parameter (1).

Default: 1.0

-l

Read length distribution: lower truncation point (100).

Default: 100

-u Read length distribution: upper truncation point (None).
-e

Total rate of substitutions insertions and deletions (0.1).

Default: 0.1

-w

Relative frequency of substitutions,insertions,deletions (1,1,4).

Default: “1,1,4”

-b

Strand bias: the ratio of forward and reverse reads (0.5).

Default: 0.5

-q

Mock base quality for fastq output (40).

Default: 40

-s Save true alignments in this SAM file (None).
-Q

Be quiet and do not print progress bar (False).

Default: False

-z Random seed (None).

split_fastx

Split sequence records in file to one record per file or batches of records.

usage: split_fastx [-h] [-i in_format] [-o out_format] [-b batch_size]
                   [input_fastx] [output_dir]

Positional Arguments

input_fastx

Input file (default: stdin).

Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0>

output_dir

Output directory (default: .)

Default: “.”

Named Arguments

-i

Input format (fastq).

Default: “fastq”

-o

Output format (fastq).

Default: “fastq”

-b Batch size (None).