wub.bam package

Submodules

wub.bam.common module

wub.bam.common.pysam_open(alignment_file, in_format='BAM')[source]

Open SAM/BAM file using pysam.

Parameters:
  • alignment_file – Input file.
  • in_format – Format (SAM or BAM).
Returns:

pysam.AlignmentFile

Return type:

pysam.AlignmentFile

wub.bam.compare module

Compares alignments in two BAM files.

wub.bam.compare.aligned_pairs_to_matches(aligned_pairs, offset)[source]

Convert aligned pairs into a sequence of reference positions.

Parameters:
  • aligned_pairs – Iterator of aligned pairs.
  • offset – Offset at the beggining of the sequences.
Returns:

Iterator of reference positions aligned to the sequences positions.

Return type:

generator

wub.bam.compare.bam_compare(aln_one, aln_two, coarse_tolerance=50, strict_flags=False, in_format='BAM', verbose=False)[source]

Count reads mapping to references in a BAM file.

Parameters:
  • alignment_file – BAM file.
  • min_aln_qual – Minimum mapping quality.
  • verbose – Show progress bar.
Returns:

Dictionary with read counts per reference.

Return type:

dict

wub.bam.compare.calc_consistency_score(segment_one, segment_two, offset_one, offset_two)[source]

Calculate the number of bases aligned to the same reference bases in two alignments. :param segment_one: Pysam aligned segments. :param segment_two: Pysam aligned segments. :param offset_one: Hard clipping offset for the first alignment. :param offset_two: Hard clipping offset for the second alignment. :retruns: Number of matching base alignments. :rtype: int

wub.bam.compare.compare_alignments(segment_one, segment_two, strict_flags=False)[source]

Count reads mapping to references in a BAM file.

Parameters:
  • alignment_file – BAM file.
  • min_aln_qual – Minimum mapping quality.
Returns:

Dictionary with read counts per reference.

Return type:

dict

wub.bam.compare.count_clipped(aln, target_op)[source]

Count hard clipped bases in aligned segment.

Parameters:
  • aln – Pysam aligned segement.
  • target_op – CIGAR operation.
Returns:

Number of hard clipped bases in segment.

Return type:

int

wub.bam.compare.get_hard_clip_offset(aln)[source]

Get hard clipping offset from alignment.

Parameters:aln – Pysam aligned segment.
Returns:Hard clipping offset.
Return type:int
wub.bam.compare.is_coarse_match(aln_diff, tolerance)[source]

Determine if start and end postions of two alignments are within the specified tolerance levels.

Parameters:aln_diff – Alignment diff structure as returned by compare_alignments.
Returns:True or False
Return type:bool

wub.bam.filter module

Filter SAM/BAM records by various criteria.

wub.bam.filter.filter_query_coverage(records_iter, minimum_coverage)[source]

Filter pysam records keeping the ones with sufficient query coverage.

Parameters:
  • records_iter – Iterator of pysam aligned segments.
  • minimum_coverage – Minimum fraction of covered query.
Returns:

Generator of filtered records.

Return type:

generator

wub.bam.filter.filter_ref_coverage(records_iter, minimum_coverage, header)[source]

Filter pysam records keeping the ones with sufficient reference coverage.

Parameters:
  • records_iter – Iterator of pysam aligned segments.
  • minimum_coverage – Minimum fraction of covered reference.
  • header – SAM header with reference lengths.
Returns:

Generator of filtered records.

Return type:

generator

wub.bam.filter.filter_top_per_query(records_iter)[source]

Filter pysam records keeping top scoring per query. Assumes records are sorted by name.

Parameters:records_iter – Iterator of pysam aligned segments.
Returns:Generator of filtered records.
Return type:generator
wub.bam.filter.get_alignment_score(segement)[source]

Get alignment score from pysam segment.

Parameters:segment – Pysam aligned segment.
Returns:Alignment score.
Return type:int

wub.bam.read_counter module

Count reads per reference in BAM/SAM file.

wub.bam.read_counter.count_reads(alignment_file, in_format='BAM', min_aln_qual=0, verbose=False, reads_gc=False)[source]

Count reads mapping to references in a BAM file.

Parameters:
  • alignment_file – BAM file.
  • min_aln_qual – Minimum mapping quality.
  • verbose – Verbose if True.
  • read_gc – Calculate mean GC content of reads for each reference.
Returns:

Dictionary with read counts per reference and read GC contents.

Return type:

tuple of dicts

wub.bam.read_counter.count_reads_realtime(alignment_file='-', in_format='SAM', min_aln_qual=0, yield_freq=1, verbose=False)[source]

Online counting of reads mapping to references in a SAM/BAM stream from stdin.

Parameters:
  • alignment_file – BAM file (stdin).
  • min_aln_qual – Minimum mapping quality.
  • yield_freq – Yield frequency.
  • verbose – Minimum mapping quality.
Returns:

Generator of dictionary with read counts per reference.

Return type:

generator

wub.bam.sam_writer module

class wub.bam.sam_writer.SamWriter(out_file, header=None)[source]

Simple class to write SAM files.

Initialise SAM writer object

close()[source]

Close SAM file.

Parameters:self – object
Returns:None
Return type:object
new_sam_record(qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual, tags)[source]

Create new SAM record structure.

Parameters:
  • self – object
  • qname – Read name.
  • rname – Reference name.
  • pos – Position in reference.
  • mapq – Mapping quality.
  • cigar – CIGAR string.
  • rnext – Reference of next read.
  • pnext – Position of next read.
  • tlen – Template length.
  • seq – Read sequence.
  • qual – Base qualities.
  • tags – Optional tags.
Returns:

SAM record.

Return type:

OrderedDict

write(record)[source]

Write SAM record to file.

Parameters:
  • self – object
  • record – SAM record.
Returns:

None

Return type:

object

wub.bam.stats module

wub.bam.stats.error_and_read_stats(bam, refs, context_sizes=(1, 1), region=None, min_aqual=0, verbose=True)[source]

Gather read statistics and context-dependend error statistics from BAM file. WARNING: context overstepping reference start/end boundaries are not registered.

Definition of context: for substitutions the event is happening from the “central base”, in the case of indels the events are located between the central base and the base before.

Parameters:
  • bam – Input BAM file.
  • refs – Dictionary of references.
  • context_sizes – The size of the left and right contexts.
  • region – samtools regions.
  • min_qual – Minimum mappign quality.
  • verbose – Show progress bar.
Returns:

Dictionary with read and error statistics.

Return type:

dict

wub.bam.stats.frag_coverage(bam, chrom_lengths, region=None, min_aqual=0, ref_cov=True, verbose=True)[source]

Calculate fragment coverage vectors on the forward and reverse strands.

Parameters:
  • bam – Input bam file.
  • chrom_lengths – Dictionary of chromosome names and lengths.
  • region – Restrict parsing to the specified region.
  • min_aqual – Minimum mapping quality.
  • verbose – Display progress bar.
Returns:

Forward and reverse fragment coverage vectors.

Return type:

dict

wub.bam.stats.pileup_stats(bam, region=None, verbose=True, with_quals=True)[source]

Parse pileup columns and extract quality values.

Parameters:
  • bam – Input BAM file.
  • region – samtools region.
  • verbose – Show progress bar.
  • with_quals – Return quality values per position.
Returns:

Dictionaries per reference with per-base coverage and quality values.

Return type:

dict

wub.bam.stats.read_stats(bam, min_aqual=0, region=None, with_clipps=False, verbose=True)[source]

Parse reads in BAM file and record various statistics.

Parameters:
  • bam – BAM file.
  • min_aqual – Minimum mapping quality, skip read if mapping quality is lower.
  • region – smatools region.
  • with_clipps – Take into account clipps when calculating accuracy.
  • verbose – Show progress bar.
Returns:

A dictionary with various global and per-read statistics.

Return type:

dict

wub.bam.stats.stats_from_aligned_read(read, with_clipps=False)[source]

Create summary information for an aligned read (modified from tang.util.bio).

Parameters:
  • readpysam.AlignedSegment object
  • with_clipps

Module contents