Command line tools¶
_template_script¶
Template script.
usage: _template_script [-h] [-i input]
Named Arguments¶
-i | Input. |
add_errors¶
Add a specified number of errors to random sites for each input sequence.
usage: add_errors [-h] [-n nr_errors] [-t error_type]
[input_fasta] [output_fasta]
Positional Arguments¶
input_fasta | Input fasta (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fasta | Output fasta (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-n | Number of errors to introduce (0). Default: 0 |
-t | Possible choices: substitution, insertion, deletion Error type: substitution, insertion or deletion. Default: “substitution” |
annotate_length¶
Add sequence length to sequence record descriptions.
usage: annotate_length [-h] [-i in_format] [-o out_format]
[input_fastx] [output_fastx]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastx | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
bam_accuracy¶
- Produce accuracy statistics of the input BAM file. Calculates global accuracy and identity and various per-read statistics.
- The input BAM file must be sorted by coordinates and indexed.
usage: bam_accuracy [-h] [-c region] [-g global_tsv] [-l read_tsv]
[-t bam_tag] [-q aqual] [-e] [-r report_pdf]
[-p results_pickle] [-Q]
bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-c | BAM region (None). |
-g | Tab separated file to save global statistics (None). |
-l | Tab separated file to save per-read statistics (None). |
-t | Dataset tag (BAM basename). |
-q | Minimum alignment quality (0). Default: 0 |
-e | Include hard and soft clipps in alignment length when calculating accuracy (False). Default: False |
-r | Report PDF (bam_accuracy.pdf). Default: “bam_accuracy.pdf” |
-p | Save pickled results in this file (None). |
-Q | Be quiet and do not print progress bar (False). Default: False |
bam_alignment_length¶
- Produce a tab separated file of alignment lengths and other information.
- Rows are sorted by number of aligned reference bases unless the -x option is specified.
usage: bam_alignment_length [-h] [-t tsv_file] [-q aqual] [-x] [-Q] bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-t | Tab separated file to save alignment lengths (bam_alignment_length.tsv). Default: “bam_alignment_length.tsv” |
-q | Minimum alignment quality (0). Default: 0 |
-x | Sort by number of read bases instead of number of aligned reference bases. Default: False |
-Q | Be quiet and do not print progress bar (False). Default: False |
bam_alignment_qc¶
- Produce alignment based QC plots of the input BAM file.
The input BAM file must be sorted by coordinates and indexed.
- It produces the following global plots:
- Read statistics: number of mapped, unmapped and low mapping quality reads.
- Distribution of mean quality values in the mapped and unmapped fractions.
- Distribution of read lengths in the unmapped fraction.
- Distribution of read lengths in the mapped fraction.
- Distribution of read lengths in the mapping with quality less than -q
- Distribution of alignment lengths.
- Distribution of mapping qualities.
- Plot of alignment lengths vs. mean base qualities.
- Basewise statistics: total alignment length, number of insertions, deleltions, matches and mismatches.
- Precision statistics: accuracy and identity.
- Frequency of errors in the context specifed by the left and right context sizes (-n). Definition of context: for substitutions the event is happening from the “central base”, in the case of indels the events are located between the central base and the base before. The columns of the heatmap are normalised to sum to one and then the diagonal element are set to zero.
- Distribution of deletion lengths.
- Distribution of insertion lengths.
- Base composition of insertions.
- The following plots are produced for every reference unless disabled via -x:
- Distribution of quality values across the reference as a heatmap.
- Mean quality values across the reference.
- Base coverage across the reference.
The tool saves the gathered statistics in a pickle file, which can be fed to bam_multi_qc.py to compare different samples.
usage: bam_alignment_qc [-h] -f reference [-c region] [-n context_sizes] [-x]
[-t bam_tag] [-q aqual] [-i qual_ints] [-r report_pdf]
[-p results_pickle] [-Q]
bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-f | Reference fasta. |
-c | BAM region (None). |
-n | Left and right context sizes (1,1). Default: “1,1” |
-x | Do not plot per-reference information. Default: False |
-t | Dataset tag (BAM basename). |
-q | Minimum alignment quality (0). Default: 0 |
-i | Number of quality intervals (6). Default: 6 |
-r | Report PDF (bam_alignment_qc.pdf). Default: “bam_alignment_qc.pdf” |
-p | Save pickled results in this file (bam_alignment_qc.pk). Default: “bam_alignment_qc.pk” |
-Q | Be quiet and do not show progress bars. Default: False |
bam_alignments_compare¶
- Compare alignments stored in two BAM files.
- The two BAM files must have the same set of reads in the same order (name sorted).
usage: bam_alignments_compare [-h] [-w coarse_tolerance] [-g] [-r report_pdf]
[-p results_pickle] [-t tsv_file] [-f format]
[-Q]
bam_one bam_two
Positional Arguments¶
bam_one | First input BAM file. |
bam_two | Second input BAM file. |
Named Arguments¶
-w | Tolerance when performing coarse comparison of alignments (50). Default: 50 |
-g | Do strict comparison of alignment flags. Default: False |
-r | Report PDF (bam_alignments_compare.pdf). Default: “bam_alignments_compare.pdf” |
-p | Save pickled results in this file (bam_alignments_compare.pk). Default: “bam_alignments_compare.pk” |
-t | Save results in tsv format in this file (None). |
-f | Input format (BAM). Default: “BAM” |
-Q | Be quiet and do not print progress bar (False). Default: False |
bam_count_reads¶
Count reads mapping to each reference in a BAM file.
usage: bam_count_reads [-h] [-a min_aqual] [-f in_format] [-z ref_fasta]
[-k words] [-g] [-p results_pickle] [-t tsv_file] [-Q]
[-R] [-F yield_freq]
[bam]
Positional Arguments¶
bam | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
Named Arguments¶
-a | Minimum mapping quality (0). Default: 0 |
-f | Input format (BAM). Default: “BAM” |
-z | Reference fasta. GC content and length columns are added if present (None). |
-k | Include word frequencies of specifed length in output (None). |
-g | Include mean GC content of reads mapped to each reference (False). Default: False |
-p | Save pickled results in this file (None). |
-t | Save results in tsv format in this file (bam_count_reads.tsv). Default: “bam_count_reads.tsv” |
-Q | Be quiet and do not print progress bar (False). Default: False |
-R | Count reads from SAM stream in stdin. Only read count fields are written. Header required! (False). Default: False |
-F | Yield counts after every -Fth mapped record when doing online counting (100). Default: 100 |
bam_cov¶
Produce refrence coverage table.
usage: bam_cov [-h] -f reference [-c region] [-t tsv] [-q aqual] [-Q] bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-f | Reference fasta. |
-c | BAM region (None). |
-t | Output TSV (bam_cov.tsv). Default: “bam_cov.tsv” |
-q | Minimum alignment quality (0). Default: 0 |
-Q | Be quiet and do not show progress bars. Default: False |
bam_fill_unaligned¶
- Generate SAM records for the reads present in the input fastq but missing from
- the input SAM/BAM.
usage: bam_fill_unaligned [-h] [-f format] -q fastq input_file output_file
Positional Arguments¶
input_file | Input file. |
output_file | Output SAM file. |
Named Arguments¶
-f | Input/output format (SAM). Default: “SAM” |
-q | Input fastq. |
bam_frag_coverage¶
Produce aggregated and individual plots of fragment coverage.
usage: bam_frag_coverage [-h] -f reference [-c region] [-i intervals]
[-b bins] [-x] [-o] [-t bam_tag] [-q aqual]
[-l cov80_tsv] [-g glob_cov80_tsv] [-r report_pdf]
[-p results_pickle] [-Q]
bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-f | Reference fasta. |
-c | BAM region (None). |
-i | Length intervals (). Default: “” |
-b | Number of bins (None = auto). |
-x | Plot per-reference information. Default: False |
-o | Do not take log of coverage. Default: False |
-t | Dataset tag (BAM basename). |
-q | Minimum alignment quality (0). Default: 0 |
-l | Tab separated file with per-chromosome cov80 scores (None). Requires the -x option to be specified. |
-g | Tab separated file with global cov80 score (None). |
-r | Report PDF (bam_frag_coverage.pdf). Default: “bam_frag_coverage.pdf” |
-p | Save pickled results in this file (None). |
-Q | Be quiet and do not show progress bars. Default: False |
bam_gc_vs_qual¶
Produce a plot of GC content of aligned read and reference portion versus their mean quality values.
usage: bam_gc_vs_qual [-h] -f reference [-q aqual] [-r report_pdf] [-t tsv]
[-Q]
bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-f | Reference fasta. |
-q | Minimum alignment quality (0). Default: 0 |
-r | Report PDF (bam_gc_vs_qual.pdf). Default: “bam_gc_vs_qual.pdf” |
-t | Tab separated file to save results (bam_gc_vs_qual.tsv). Default: “bam_gc_vs_qual.tsv” |
-Q | Be quiet and do not show progress bars. Default: False |
bam_multi_qc¶
Compare alignment QC statistics of multiple samples.
It takes a list of pickle files produced by bam_alignment_qc.py and produces plots comparing the following properties of the input samples:
- Number of mapped reads.
- Number of unmapped reads.
- Distribution of mean quality values in the unaligned fraction.
- Distribution of mean quality values in the aligned fraction.
- Distribution of read lengths in the unaligned fraction.
- Distribution of read lengths in the aligned fraction.
- Distribution of alignment lengths.
- Distribution of mapping qualities.
- Alignment accuracy.
- Alignment identity.
- Distribution of deletion lengths.
- Distribution of insertion lengths.
- Per reference plots (can be disabled by -x):
- Relative coverage across reference.
- Mean qualities per position.
usage: bam_multi_qc [-h] [-r report_pdf] [-x]
[input_pickles [input_pickles ...]]
Positional Arguments¶
input_pickles | Input pickles. |
Named Arguments¶
-r | Report PDF (bam_multi_qc.pdf). Default: “bam_multi_qc.pdf” |
-x | Do not plot reference statistics. Default: False |
bam_ref_base_coverage¶
Calculate percent covered reference lengths.
usage: bam_ref_base_coverage [-h] -f reference [-c region] [-t tsv]
[-m min_cov] [-Q]
bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-f | Reference fasta. |
-c | BAM region (None). |
-t | Output tab separated file (bam_ref_base_coverage.tsv). Default: “bam_ref_base_coverage.tsv” |
-m | Minimum base coverage for a position to be counted (1). Default: 1 |
-Q | Be quiet and do not show progress bars. Default: False |
bam_ref_tab¶
Produce a tab separated file with read identifiers and the corresponding references, sorted by reference.
usage: bam_ref_tab [-h] [-t read_tsv] [-Q] [-s] bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-t | Tab separated file to save reference table. Default: “bam_ref_tab.tsv” |
-Q | Be quiet and do not print progress bar (False). Default: False |
-s | Save read strand in output (False). Default: False |
bam_score_filter¶
- Filter SAM/BAM records by score or other criteria.
- WARNING: the input records must be sorted by name or the filtering will not work as expected.
usage: bam_score_filter [-h] [-f format] [-s strategy] [-q query_cover]
input_file output_file
Positional Arguments¶
input_file | Input file. |
output_file | Output SAM file. |
Named Arguments¶
-f | Input/output format (SAM). Default: “SAM” |
-s | Possible choices: top_per_query, query_coverage, ref_coverage Filtering strategy: top_per_query, query_coverage, ref_coverage (top_per_query). Default: “top_per_query” |
-q | Minimum query coverage fraction (0.8). Default: 0.8 |
bam_soft_clips_tab¶
Produce a tab separated file with read identifiers and number of soft clipped bases at each end (relative to the original sequence in the fastq).
usage: bam_soft_clips_tab [-h] [-t tsv] [-Q] bam
Positional Arguments¶
bam | Input BAM file. |
Named Arguments¶
-t | Output tab separated file. Default: “bam_soft_clips_tab.tsv” |
-Q | Be quiet and do not print progress bar (False). Default: False |
bias_explorer¶
Simple tool for exploring biases in transcript counts. Takes as input count files generated by bam_count_reads.py (with the -z flag) and performs linear regression of log counts against transcript length and GC content.
usage: bias_explorer [-h] [-r report_pdf] [-x] count_file
Positional Arguments¶
count_file | Input counts file with length ang GC content features. |
Named Arguments¶
-r | Report PDF (bias_explorer.pdf). Default: “bias_explorer.pdf” |
-x | Exclude transcripts with zero counts. Default: False |
calculate_coverage¶
Calculate total number of bases and genome coverage if genome size is given.
usage: calculate_coverage [-h] [-f format] [-s genome_size]
[-p results_pickle]
[input_fastx]
Positional Arguments¶
input_fastx | Input (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
Named Arguments¶
-f | Input format (fastq). Default: “fastq” |
-s | Genome size (None). |
-p | Save pickled results in this file. |
compare_genomes_dnadiff¶
- Compare a set of reference sequences (genome) to another set (target assembly) using mummer’s dnadiff.
- It prints the alignment results to stdout. All parsed results can be saved in a pickle file.
usage: compare_genomes_dnadiff [-h] [-p results_pickle] [-r raw_file]
[-d work_dir] [-k] [-v]
reference_fasta target_fasta
Positional Arguments¶
reference_fasta | |
Reference fasta. | |
target_fasta | Target fasta. |
Named Arguments¶
-p | Save pickled results in this file (None). |
-r | Save dnadiff report in this file (None). |
-d | Use this working directory instead of a temporary directory (None). |
-k | Keep dnadiff result files (False). Default: False |
-v | Print out dnadiff output (False). Default: False |
compare_genomes_lastal¶
- Compare a set of reference sequences (genome) to another set (target assembly) using lastal alignment.
Accuracy is the total number of matched bases divided by total alignment length. Coverage is total reference covered by alignment divided by total length of reference.
- Caveats:
- The lastal alignments are filtered by default (use -f to disable) so only the best scoring alignment is kept per query. Hence some shorter valid
alignments might be discarded causing an underestimation of coverage. - The estimated accuracy is dependent on the scoring of gaps and mismatches. By default gap open and gap extend penalties are set to equal.
usage: compare_genomes_lastal [-h] [-p results_pickle] [-l lastal_args]
[-t details_tsv] [-f] [-r report_pdf]
reference_fasta target_fasta
Positional Arguments¶
reference_fasta | |
Reference fasta. | |
target_fasta | Target fasta. |
Named Arguments¶
-p | Save pickled results in this file (None). |
-l | Parameters passed to lastal in the <arg>:value,… format (a:1,b:1). Default: “a:1,b:1” |
-t | Save details of lastal alignment in this tab-separated file (None). |
-f | Do not filter for best alignment per query. Default: False |
-r | Report with alignment details plot (None). |
convert_alphabet¶
Convert between DNA and RNA alphabets.
usage: convert_alphabet [-h] [-i in_format] [-o out_format] [-D] [-R]
[input_fastx] [output_fastx]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastx | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
-D | RNA->DNA alphabet conversion. Default: False |
-R | DNA->RNA alphabet conversion. Default: False |
correlate_counts¶
Correlate counts produced by multiple runs of bam_count_reads.py.
usage: correlate_counts [-h] [-r report_pdf] [-c corr_type] [-L] [-o]
[input_counts [input_counts ...]]
Positional Arguments¶
input_counts | Input counts as tab separated files. |
Named Arguments¶
-r | Report PDF (bam_multi_qc.pdf). Default: “correlate_counts.pdf” |
-c | Correlation statistic - spearman or pearson (spearman). Default: “spearman” |
-L | Log transform data. Default: False |
-o | Omit lower diagonal. Default: False |
fasta_to_mock_fastq¶
Convert fasta file to fastq with mock qualities.
usage: fasta_to_mock_fastq [-h] [-q mock_quals] [input_fasta] [output_fastq]
Positional Arguments¶
input_fasta | Input fasta (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastq | Output fastq (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-q | Mock quality value (40). Default: 40 |
fastq_qual_tab¶
Generate a table of read names and mean quality values.
usage: fastq_qual_tab [-h] [-t tsv] [input_fastq]
Positional Arguments¶
input_fastq | Input fastq (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
Named Arguments¶
-t | Output tab separated file. Default: “fastq_qual_tab.tsv” |
fastq_time_slice¶
Filter a fastq file by starting time.
usage: fastq_time_slice [-h] -t time_tsv [-s start_perc] [-e end_perc]
[input_fastq] [output_fastq]
Positional Arguments¶
input_fastq | Input fastq (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastq | Output fastq (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-t | Tab separeted file produced by fastq_time_tab.py. |
-s | Start of slice as percent of total time. Default: 0.0 |
-e | End of slice as percent of total time. Default: 100.0 |
fastq_time_tab¶
Produce a tab separated file with read start times, read and channel numbers sorted by start time.
usage: fastq_time_tab [-h] [-t read_tsv] fastq
Positional Arguments¶
fastq | Input fastq file. |
Named Arguments¶
-t | Tab separated file to save read time table. Default: “fastq_time_tab.tsv” |
fastx_ends_tab¶
Generate a tab separated file with the first and last -n bases of the sequences.
usage: fastx_ends_tab [-h] [-i in_format] [-n nr_bases]
[input_fastx] [output_tsv]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_tsv | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-n | . Default: 100 |
fastx_grep¶
Filter sequence files by read name.
usage: fastx_grep [-h] [-i in_format] [-o out_format] [-n read_names]
[input_fastx] [output_fastx]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastx | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
-n | Comma separated list of read names to select. Default: “” |
fastx_length_tab¶
Generate a tab separated file with the sequence lengths in the input file.
usage: fastx_length_tab [-h] [-i in_format] [input_fastx] [output_tsv]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_tsv | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fasta). Default: “fasta” |
length_normalise_counts¶
Calculate RPKM values from raw counts and a transcriptome reference.
usage: length_normalise_counts [-h] -f in_trs input_counts output_count
Positional Arguments¶
input_counts | Input count file. |
output_count | Output RPKM file. |
Named Arguments¶
-f | Input transcriptome. |
merge_tsvs¶
Merge tab separated files on a given field using pandas.
usage: merge_tsvs [-h] [-j join] [-f field] [-o out_tsv] [-z]
[input_tsvs [input_tsvs ...]]
Positional Arguments¶
input_tsvs | Input tab separated files. |
Named Arguments¶
-j | Join type (outer). Default: “outer” |
-f | Join on this field (Read). Default: “Read” |
-o | Output tsv (merge_tsvs.tsv). Default: “merge_tsvs.tsv” |
-z | Fill NA values with zero. Default: False |
multi_length_hist¶
Plot histograms of length distributions from multiple sequence files.
usage: multi_length_hist [-h] [-r report_pdf] [-f in_format] [-b nr_bins]
[-l min_len] [-u max_len] [-L]
[input_counts [input_counts ...]]
Positional Arguments¶
input_counts | Input sequence files. |
Named Arguments¶
-r | Report PDF. Default: “multi_length_hist.pdf” |
-f | Input format (fastq). Default: “fastq” |
-b | Number of bins (50). Default: 50 |
-l | Minimum read length (None). |
-u | Maximum read length (None). |
-L | Log transform lengths. Default: False |
pickle_cat¶
Pretty print the contents of a pickle file.
usage: pickle_cat [-h] pickle_file
Positional Arguments¶
pickle_file | Input pickle file. |
plot_counts_correlation¶
Scatter plot of two set of counts.
usage: plot_counts_correlation [-h] [-r report_pdf] [-T tags] [-t merged_data]
[-o Correlation_tsv]
counts_one counts_two
Positional Arguments¶
counts_one | Input tab separated file. |
counts_two | Input tab separated file. |
Named Arguments¶
-r | Report PDF. Default: “plot_counts_correlation.pdf” |
-T | Data tags: tag1,tag2. |
-t | Merged data TSV. |
-o | Correlation TSV. |
plot_gffcmp_stats¶
Plot a gffcompare stats file.
usage: plot_gffcmp_stats [-h] [-r report_pdf] [-p pickle_out] input_txt
Positional Arguments¶
input_txt | Input gffcompare stats file. |
Named Arguments¶
-r | Report PDF (plot_gffcmp_stats.pdf). Default: “plot_gffcmp_stats.pdf” |
-p | Output pickle file. Default: “plot_gffcmp_stats.pk” |
plot_qualities¶
Plot the mean quality values across non-overlapping windows in the input sequences.
usage: plot_qualities [-h] [-w win_size] [-r report_pdf] [input_fastx]
Positional Arguments¶
input_fastx | Input (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
Named Arguments¶
-w | Window size (50). Default: 50 |
-r | Report pdf (plot_qualities.pdf). Default: “plot_qualities.pdf” |
plot_sequence_properties¶
Plot histograms of lengths and quality values.
usage: plot_sequence_properties [-h] [-f format] [-b bins] [-r report_pdf]
[-j]
[input_fastx]
Positional Arguments¶
input_fastx | Input (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
Named Arguments¶
-f | Input format (fastq). Default: “fastq” |
-b | Number of bins on histograms (50). Default: 50 |
-r | Report pdf (plot_sequence_properties.pdf). Default: “plot_sequence_properties.pdf” |
-j | Produce joint plot of lengths and mean quality values (False). Default: False |
reads_across_time¶
Plot read and alignment properties across time.
usage: reads_across_time [-h] -i time_tab -a aln_tab [-w res_freq]
[-r report_pdf] [-t out_tsv]
Named Arguments¶
-i | Tab separated file generated by fastq_time_tab.py |
-a | Tab separated file generated by bam_alignment_length.py |
-w | Resampling frequency in minutes. Default: 5 |
-r | Report PDF (reads_across_time.pdf). Default: “reads_across_time.pdf” |
-t | Output tsv (reads_across_time.tsv). Default: “reads_across_time.tsv” |
reads_stats¶
No documentation available .. _reverse_fastq:
reverse_fastq¶
Reverse (but not complement!) sequences and qualities in fastq file.
usage: reverse_fastq [-h] [input_fastq] [output_fastq]
Positional Arguments¶
input_fastq | Input fastq (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastq | Output fastq (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
sequence_filter¶
Filter sequences by length and mean quality value.
usage: sequence_filter [-h] [-i in_format] [-o out_format] [-q min_qual]
[-l min_length] [-c] [-u max_length]
[input_fastx] [output_fastx]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastx | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
-q | Minimum mean quality value (0.0). Default: 0.0 |
-l | Minimum length (0). Default: 0 |
-c | Reverse complement sequences. Default: False |
-u | Maximum length (None). |
sequence_subtract¶
Filter out sequences present in the first file from the second file.
usage: sequence_subtract [-h] [-i in_format] [-o out_format]
[input_fastx_bait] [input_fastx_target]
[output_fastx]
Positional Arguments¶
input_fastx_bait | |
First input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> | |
input_fastx_target | |
Second input file. Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> | |
output_fastx | Output file (default: stdout). Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
simulate_errors¶
Simulate sequencing errors for each input sequence.
usage: simulate_errors [-h] [-e error_rate] [-w error_weights]
[-z random_seed]
[input_fasta] [output_fasta]
Positional Arguments¶
input_fasta | Input fasta (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fasta | Output fasta (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-e | Total rate of substitutions insertions and deletions (0.1). Default: 0.1 |
-w | Relative frequency of substitutions,insertions,deletions (1,1,4). Default: “1,1,4” |
-z | Random seed (None). |
simulate_genome¶
- Simulate genome sequence with the specified number of chromosomes,
- length distribution (truncated gamma) and base composition.
usage: simulate_genome [-h] [-n nr_chrom] [-m mean_length] [-a gamma_shape]
[-l low_trunc] [-u high_trunc] [-b base_freqs]
[-z random_seed]
[output_fasta]
Positional Arguments¶
output_fasta | Output fasta (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-n | Number of chromosomes (23). Default: 23 |
-m | Mean length of chromosomes (5000000). Default: 5000000 |
-a | Gamma shape parameter (1). Default: 1.0 |
-l | Lower truncation point (None). |
-u | Upper truncation point (None). |
-b | Relative base frequencies in A,C,G,T order (1,1,1,1) or “random”. Default: “1,1,1,1” |
-z | Random seed (None). |
simulate_sequences¶
Simulate sequences of fixed length and specified base composition.
usage: simulate_sequences [-h] [-n nr_seq] [-m length] [-b base_freqs]
[-z random_seed]
[output_fasta]
Positional Arguments¶
output_fasta | Output fasta (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-n | Number of sequences (1). Default: 1 |
-m | Length of simulated sequences (3000). Default: 3000 |
-b | Relative base frequencies in A,C,G,T order (1,1,1,1). Default: “1,1,1,1” |
-z | Random seed (None). |
simulate_sequencing_simple¶
- Sample fragments from the input genome and simulate sequencing errors.
Read lengths are drawn from the specified truncated gamma distribution. Chromosomes are sampled randomly for each read.
The format of the read names is the following: r<unique_id>_<chromosome>_<frag_start>_<frag_end>_<strand>/q<realised_quality>/s<realised_substiutions>/d<realised_deletions>/i<realised_insertions>
usage: simulate_sequencing_simple [-h] [-n nr_reads] [-m mean_length]
[-a gamma_shape] [-l low_trunc]
[-u high_trunc] [-e error_rate]
[-w error_weights] [-b strand_bias]
[-q mock_quality] [-s true_sam] [-Q]
[-z random_seed]
[input_fasta] [output_fastq]
Positional Arguments¶
input_fasta | Input genome in fasta format (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_fastq | Output fastq (default: stdout) Default: <open file ‘<stdout>’, mode ‘w’ at 0x7f78293cd150> |
Named Arguments¶
-n | Number of simulated reads (1). Default: 1 |
-m | Mean read length (5000). Default: 5000 |
-a | Read length distribution: gamma shape parameter (1). Default: 1.0 |
-l | Read length distribution: lower truncation point (100). Default: 100 |
-u | Read length distribution: upper truncation point (None). |
-e | Total rate of substitutions insertions and deletions (0.1). Default: 0.1 |
-w | Relative frequency of substitutions,insertions,deletions (1,1,4). Default: “1,1,4” |
-b | Strand bias: the ratio of forward and reverse reads (0.5). Default: 0.5 |
-q | Mock base quality for fastq output (40). Default: 40 |
-s | Save true alignments in this SAM file (None). |
-Q | Be quiet and do not print progress bar (False). Default: False |
-z | Random seed (None). |
split_fastx¶
Split sequence records in file to one record per file or batches of records.
usage: split_fastx [-h] [-i in_format] [-o out_format] [-b batch_size]
[input_fastx] [output_dir]
Positional Arguments¶
input_fastx | Input file (default: stdin). Default: <open file ‘<stdin>’, mode ‘r’ at 0x7f78293cd0c0> |
output_dir | Output directory (default: .) Default: “.” |
Named Arguments¶
-i | Input format (fastq). Default: “fastq” |
-o | Output format (fastq). Default: “fastq” |
-b | Batch size (None). |