wub.bam package¶
Submodules¶
wub.bam.common module¶
wub.bam.compare module¶
Compares alignments in two BAM files.
-
wub.bam.compare.
aligned_pairs_to_matches
(aligned_pairs, offset)[source]¶ Convert aligned pairs into a sequence of reference positions.
Parameters: - aligned_pairs – Iterator of aligned pairs.
- offset – Offset at the beggining of the sequences.
Returns: Iterator of reference positions aligned to the sequences positions.
Return type: generator
-
wub.bam.compare.
bam_compare
(aln_one, aln_two, coarse_tolerance=50, strict_flags=False, in_format='BAM', verbose=False)[source]¶ Count reads mapping to references in a BAM file.
Parameters: - alignment_file – BAM file.
- min_aln_qual – Minimum mapping quality.
- verbose – Show progress bar.
Returns: Dictionary with read counts per reference.
Return type: dict
-
wub.bam.compare.
calc_consistency_score
(segment_one, segment_two, offset_one, offset_two)[source]¶ Calculate the number of bases aligned to the same reference bases in two alignments. :param segment_one: Pysam aligned segments. :param segment_two: Pysam aligned segments. :param offset_one: Hard clipping offset for the first alignment. :param offset_two: Hard clipping offset for the second alignment. :retruns: Number of matching base alignments. :rtype: int
-
wub.bam.compare.
compare_alignments
(segment_one, segment_two, strict_flags=False)[source]¶ Count reads mapping to references in a BAM file.
Parameters: - alignment_file – BAM file.
- min_aln_qual – Minimum mapping quality.
Returns: Dictionary with read counts per reference.
Return type: dict
-
wub.bam.compare.
count_clipped
(aln, target_op)[source]¶ Count hard clipped bases in aligned segment.
Parameters: - aln – Pysam aligned segement.
- target_op – CIGAR operation.
Returns: Number of hard clipped bases in segment.
Return type: int
wub.bam.filter module¶
Filter SAM/BAM records by various criteria.
-
wub.bam.filter.
filter_query_coverage
(records_iter, minimum_coverage)[source]¶ Filter pysam records keeping the ones with sufficient query coverage.
Parameters: - records_iter – Iterator of pysam aligned segments.
- minimum_coverage – Minimum fraction of covered query.
Returns: Generator of filtered records.
Return type: generator
-
wub.bam.filter.
filter_ref_coverage
(records_iter, minimum_coverage, header)[source]¶ Filter pysam records keeping the ones with sufficient reference coverage.
Parameters: - records_iter – Iterator of pysam aligned segments.
- minimum_coverage – Minimum fraction of covered reference.
- header – SAM header with reference lengths.
Returns: Generator of filtered records.
Return type: generator
wub.bam.read_counter module¶
Count reads per reference in BAM/SAM file.
-
wub.bam.read_counter.
count_reads
(alignment_file, in_format='BAM', min_aln_qual=0, verbose=False, reads_gc=False)[source]¶ Count reads mapping to references in a BAM file.
Parameters: - alignment_file – BAM file.
- min_aln_qual – Minimum mapping quality.
- verbose – Verbose if True.
- read_gc – Calculate mean GC content of reads for each reference.
Returns: Dictionary with read counts per reference and read GC contents.
Return type: tuple of dicts
-
wub.bam.read_counter.
count_reads_realtime
(alignment_file='-', in_format='SAM', min_aln_qual=0, yield_freq=1, verbose=False)[source]¶ Online counting of reads mapping to references in a SAM/BAM stream from stdin.
Parameters: - alignment_file – BAM file (stdin).
- min_aln_qual – Minimum mapping quality.
- yield_freq – Yield frequency.
- verbose – Minimum mapping quality.
Returns: Generator of dictionary with read counts per reference.
Return type: generator
wub.bam.sam_writer module¶
-
class
wub.bam.sam_writer.
SamWriter
(out_file, header=None)[source]¶ Simple class to write SAM files.
Initialise SAM writer object
-
new_sam_record
(qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual, tags)[source]¶ Create new SAM record structure.
Parameters: - self – object
- qname – Read name.
- rname – Reference name.
- pos – Position in reference.
- mapq – Mapping quality.
- cigar – CIGAR string.
- rnext – Reference of next read.
- pnext – Position of next read.
- tlen – Template length.
- seq – Read sequence.
- qual – Base qualities.
- tags – Optional tags.
Returns: SAM record.
Return type: OrderedDict
-
wub.bam.stats module¶
-
wub.bam.stats.
error_and_read_stats
(bam, refs, context_sizes=(1, 1), region=None, min_aqual=0, verbose=True)[source]¶ Gather read statistics and context-dependend error statistics from BAM file. WARNING: context overstepping reference start/end boundaries are not registered.
Definition of context: for substitutions the event is happening from the “central base”, in the case of indels the events are located between the central base and the base before.
Parameters: - bam – Input BAM file.
- refs – Dictionary of references.
- context_sizes – The size of the left and right contexts.
- region – samtools regions.
- min_qual – Minimum mappign quality.
- verbose – Show progress bar.
Returns: Dictionary with read and error statistics.
Return type: dict
-
wub.bam.stats.
frag_coverage
(bam, chrom_lengths, region=None, min_aqual=0, ref_cov=True, verbose=True)[source]¶ Calculate fragment coverage vectors on the forward and reverse strands.
Parameters: - bam – Input bam file.
- chrom_lengths – Dictionary of chromosome names and lengths.
- region – Restrict parsing to the specified region.
- min_aqual – Minimum mapping quality.
- verbose – Display progress bar.
Returns: Forward and reverse fragment coverage vectors.
Return type: dict
-
wub.bam.stats.
pileup_stats
(bam, region=None, verbose=True, with_quals=True)[source]¶ Parse pileup columns and extract quality values.
Parameters: - bam – Input BAM file.
- region – samtools region.
- verbose – Show progress bar.
- with_quals – Return quality values per position.
Returns: Dictionaries per reference with per-base coverage and quality values.
Return type: dict
-
wub.bam.stats.
read_stats
(bam, min_aqual=0, region=None, with_clipps=False, verbose=True)[source]¶ Parse reads in BAM file and record various statistics.
Parameters: - bam – BAM file.
- min_aqual – Minimum mapping quality, skip read if mapping quality is lower.
- region – smatools region.
- with_clipps – Take into account clipps when calculating accuracy.
- verbose – Show progress bar.
Returns: A dictionary with various global and per-read statistics.
Return type: dict