wub.util package

Submodules

wub.util.cmd module

Utilities related to running external commands.

wub.util.cmd.ensure_executable(command)[source]

Find executable in path corresponding to a command and abort if not found.

Parameters:command – Command.
Returns:None
Return type:object
wub.util.cmd.find_executable(command)[source]

Find executable in path corresponding to a command.

Parameters:command – Command.
Returns:Path to executable of False.
Return type:str

wub.util.misc module

Yet uncategorised utility functions.

wub.util.misc.get_extension(fname)[source]

get the file extension.

Parameters:fname – file name
Returns:file extention
Return type:str format ‘.*’
wub.util.misc.get_fname(fname)[source]

get the file name without extension.

Parameters:fname – file name
Returns:file name
Return type:str
wub.util.misc.mkdir(path)[source]

if the dir does not exists it create it

Parameters:path – dir path
Returns:path
Return type:str
wub.util.misc.pickle_dump(obj, fname)[source]

Pickle object to file.

Parameters:obj – Object to be pickled.
Fname:Output file name.
Returns:The name of output file.
Return type:str
wub.util.misc.pickle_load(fname)[source]

Load object from pickle.

Parameters:fname – Input pickle file name.
Returns:Object loaded from pickle file.
Return type:object

wub.util.parse module

wub.util.parse.args_string_to_dict(args_string, elements_separator=', ', keyvalue_separator=':')[source]

Convert a two-level separated list into a dictionary.

Parameters:
  • args_string – Two-level separated string.
  • elements_separator – Separator between elements.
  • keyvalue_separator – Separator between key/value pairs.
Returns:

dict

Return type:

dict

wub.util.parse.interval_string_to_tuples(interval_string, elements_separator='|', interval_separator=', ')[source]

Convert a two-level separated list into a dictionary.

Parameters:
  • interval_string – Two-level separated string.
  • elements_separator – Separator between elements.
  • keyvalue_separator – Separator between interval boundaries.
Returns:

tuple

Return type:

tuple

wub.util.parse.normalise_array(array)[source]

Normalise numpy array so the elments sum to 1.0.

Parameters:array – Input array.
Returns:Normalised array.
Return type:numpy.array
wub.util.parse.separated_list_to_floats(separated_list, separator=', ')[source]

Convert a separated list into a list of floats.

Parameters:
  • separated_list – A separated list as string.
  • separator – List separator.
Returns:

List of floats.

Return type:

list

wub.util.seq module

wub.util.seq.alignment_stats(ref, query, gap_character='-')[source]

Calculate statistics from two aligned sequences.

Parameters:
  • ref – Reference sequence.
  • query – Query sequence.
  • gap_character – Gap symbol.
Returns:

AlnStats namedtuple.

Return type:

namedtuple

wub.util.seq.base_complement(k)[source]

Return complement of base.

Performs the subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is identical to the argument for all other values.

Parameters:k – A base.
Returns:Complement of base.
Return type:str
wub.util.seq.base_composition(seq)[source]

Return letter counts of a string (base) sequence.

Parameters:seq – Input sequence.
Returns:Letter counts.
Return type:dict
wub.util.seq.count_records(input_object, format='fasta')[source]

Count SeqRecord objects from a file in the specified format.

Parameters:
  • input_object – A file object or a file name.
  • format – Input format (fasta by default).
Returns:

Number of records in input file.

Return type:

int

wub.util.seq.dna_record_to_rna(record)[source]

Convert a DNA SeqRecord into RNA SeqRecord.

Parameters:record – DNA SeqRecord.
Returns:The RNA SeqRecord object.
Return type:SeqRecord
wub.util.seq.gc_content(seq)[source]

Return fraction of GC bases in sequence.

Parameters:seq – Input sequence.
Returns:GC content.
Return type:float
wub.util.seq.mean_qscore(scores, qround=True)[source]

Returns the phred score corresponding to the mean of the probabilities associated with the phred scores provided.

Parameters:
  • scores – Iterable of phred scores.
  • qround – Round after calculating mean score.
Returns:

Phred score corresponding to the average error rate, as estimated from the input phred scores.

wub.util.seq.mock_qualities(record, mock_qual)[source]

Add mock quality values to SeqRecord object.

Parameters:
  • record – A SeqRecord object.
  • mock_qual – Mock quality value used for each base.
Returns:

The record augmented with mock quality values.

Return type:

object

wub.util.seq.new_dna_record(sequence, name, qualities=None)[source]

Create a new SeqRecord object using IUPACUnambiguousDNA and the specified sequence.

Parameters:
  • sequence – The sequence.
  • name – Record identifier.
  • qualities – List of base qualities.
Returns:

The SeqRecord object.

Return type:

SeqRecord

wub.util.seq.phred_to_prob(phred)[source]

Convert phred score into error probability.

Parameters:phred – Phred quality score.
Returns:Error probability.
Return type:float
wub.util.seq.prob_to_phred(error_prob, max_q=93, qround=True)[source]

Convert error probability into phred score.

Parameters:
  • error_prob – Base error probability.
  • max_q – Maximum quality value.
  • qround – Round calculated score.
Returns:

Phred score.

Return type:

int

wub.util.seq.quality_array_to_string(quality_list)[source]

Convert list of phred quality values to string.

Parameters:quality_list – List of phred quality scores.
Returns:Quality string.
Return type:str
wub.util.seq.quality_string_to_array(quality_string)[source]

Convert quality string into a list of phred scores.

Parameters:quality_string – Quality string.
Returns:Array of scores.
Return type:array
wub.util.seq.read_alignment(input_file, format='fasta')[source]

Load multiple alignment from file.

Parameters:input_file – Input file name.
Returns:The alignment read from the input file.
Return type:MultipleSeqAlignment
wub.util.seq.read_seq_records(input_object, format='fasta')[source]

Read SeqRecord objects from a file in the specified format.

Parameters:
  • input_object – A file object or a file name.
  • format – Input format (fasta by default).
Returns:

A dictionary with the parsed SeqRecord objects.

Return type:

generator

wub.util.seq.read_seq_records_dict(input_object, format='fasta')[source]

Read SeqRecord objects to a dictionary from a file in the specified format.

Parameters:
  • input_object – A file object or a file name.
  • format – Input format (fasta by default).
Returns:

An iterator of SeqRecord objects.

Return type:

dict

wub.util.seq.record_lengths(input_iter)[source]

Return lengths of SeqRecord obejcts in the input iterator.

Parameters:input_iter – An iterator of SeqRecord objects.
Returns:An ordered dictionary with the lengths of the SeqRecord objects.
Return type:OrderedDict
wub.util.seq.reverse_complement(seq)[source]

Return reverse complement of a string (base) sequence.

Parameters:seq – Input sequence.
Returns:Reverse complement of input sequence.
Return type:str
wub.util.seq.rna_record_to_dna(record)[source]

Convert an RNA SeqRecord into DNA SeqRecord.

Parameters:record – RNA SeqRecord.
Returns:The DNA SeqRecord object.
Return type:SeqRecord
wub.util.seq.word_composition(seq, size)[source]

Return word counts of a nucleotide sequence.

Parameters:
  • seq – Input sequence.
  • size – word length.
Returns:

word counts.

Return type:

OrderedDict

wub.util.seq.write_seq_records(records_iterator, output_object, format='fasta')[source]

Write out SeqRecord objects to a file from an iterator in the specified format.

Parameters:
  • records_iterator – An iterator of SeqRecord objects.
  • output_object – Open file object or file name.
  • format – Output format (fasta by default).
Returns:

None

Return type:

object

Module contents