wub.util package¶

Submodules¶

wub.util.cmd module¶

Utilities related to running external commands.

wub.util.cmd.ensure_executable(command)[source]¶

Find executable in path corresponding to a command and abort if not found.

Parameters:	command – Command.
Returns:	None
Return type:	object

wub.util.cmd.find_executable(command)[source]¶

Find executable in path corresponding to a command.

Parameters:	command – Command.
Returns:	Path to executable of False.
Return type:	str

wub.util.misc module¶

Yet uncategorised utility functions.

wub.util.misc.get_extension(fname)[source]¶

get the file extension.

Parameters:	fname – file name
Returns:	file extention
Return type:	str format ‘.*’

wub.util.misc.get_fname(fname)[source]¶

get the file name without extension.

Parameters:	fname – file name
Returns:	file name
Return type:	str

wub.util.misc.mkdir(path)[source]¶

if the dir does not exists it create it

Parameters:	path – dir path
Returns:	path
Return type:	str

wub.util.misc.pickle_dump(obj, fname)[source]¶

Pickle object to file.

Parameters:	obj – Object to be pickled.
Fname:	Output file name.
Returns:	The name of output file.
Return type:	str

wub.util.misc.pickle_load(fname)[source]¶

Load object from pickle.

Parameters:	fname – Input pickle file name.
Returns:	Object loaded from pickle file.
Return type:	object

wub.util.parse module¶

wub.util.parse.args_string_to_dict(args_string, elements_separator=', ', keyvalue_separator=':')[source]¶

Convert a two-level separated list into a dictionary.

Parameters:	args_string – Two-level separated string. elements_separator – Separator between elements. keyvalue_separator – Separator between key/value pairs.
Returns:	dict
Return type:	dict

wub.util.parse.interval_string_to_tuples(interval_string, elements_separator='|', interval_separator=', ')[source]¶

Convert a two-level separated list into a dictionary.

Parameters:	interval_string – Two-level separated string. elements_separator – Separator between elements. keyvalue_separator – Separator between interval boundaries.
Returns:	tuple
Return type:	tuple

wub.util.parse.normalise_array(array)[source]¶

Normalise numpy array so the elments sum to 1.0.

Parameters:	array – Input array.
Returns:	Normalised array.
Return type:	numpy.array

wub.util.parse.separated_list_to_floats(separated_list, separator=', ')[source]¶

Convert a separated list into a list of floats.

Parameters:	separated_list – A separated list as string. separator – List separator.
Returns:	List of floats.
Return type:	list

wub.util.seq module¶

wub.util.seq.alignment_stats(ref, query, gap_character='-')[source]¶

Calculate statistics from two aligned sequences.

Parameters:	ref – Reference sequence. query – Query sequence. gap_character – Gap symbol.
Returns:	AlnStats namedtuple.
Return type:	namedtuple

wub.util.seq.base_complement(k)[source]¶

Return complement of base.

Performs the subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is identical to the argument for all other values.

Parameters:	k – A base.
Returns:	Complement of base.
Return type:	str

wub.util.seq.base_composition(seq)[source]¶

Return letter counts of a string (base) sequence.

Parameters:	seq – Input sequence.
Returns:	Letter counts.
Return type:	dict

wub.util.seq.count_records(input_object, format='fasta')[source]¶

Count SeqRecord objects from a file in the specified format.

Parameters:	input_object – A file object or a file name. format – Input format (fasta by default).
Returns:	Number of records in input file.
Return type:	int

wub.util.seq.dna_record_to_rna(record)[source]¶

Convert a DNA SeqRecord into RNA SeqRecord.

Parameters:	record – DNA SeqRecord.
Returns:	The RNA SeqRecord object.
Return type:	SeqRecord

wub.util.seq.gc_content(seq)[source]¶

Return fraction of GC bases in sequence.

Parameters:	seq – Input sequence.
Returns:	GC content.
Return type:	float

wub.util.seq.mean_qscore(scores, qround=True)[source]¶

Returns the phred score corresponding to the mean of the probabilities associated with the phred scores provided.

Parameters:	scores – Iterable of phred scores. qround – Round after calculating mean score.
Returns:	Phred score corresponding to the average error rate, as estimated from the input phred scores.

wub.util.seq.mock_qualities(record, mock_qual)[source]¶

Add mock quality values to SeqRecord object.

Parameters:	record – A SeqRecord object. mock_qual – Mock quality value used for each base.
Returns:	The record augmented with mock quality values.
Return type:	object

wub.util.seq.new_dna_record(sequence, name, qualities=None)[source]¶

Create a new SeqRecord object using IUPACUnambiguousDNA and the specified sequence.

Parameters:	sequence – The sequence. name – Record identifier. qualities – List of base qualities.
Returns:	The SeqRecord object.
Return type:	SeqRecord

wub.util.seq.phred_to_prob(phred)[source]¶

Convert phred score into error probability.

Parameters:	phred – Phred quality score.
Returns:	Error probability.
Return type:	float

wub.util.seq.prob_to_phred(error_prob, max_q=93, qround=True)[source]¶

Convert error probability into phred score.

Parameters:	error_prob – Base error probability. max_q – Maximum quality value. qround – Round calculated score.
Returns:	Phred score.
Return type:	int

wub.util.seq.quality_array_to_string(quality_list)[source]¶

Convert list of phred quality values to string.

Parameters:	quality_list – List of phred quality scores.
Returns:	Quality string.
Return type:	str

wub.util.seq.quality_string_to_array(quality_string)[source]¶

Convert quality string into a list of phred scores.

Parameters:	quality_string – Quality string.
Returns:	Array of scores.
Return type:	array

wub.util.seq.read_alignment(input_file, format='fasta')[source]¶

Load multiple alignment from file.

Parameters:	input_file – Input file name.
Returns:	The alignment read from the input file.
Return type:	MultipleSeqAlignment

wub.util.seq.read_seq_records(input_object, format='fasta')[source]¶

Read SeqRecord objects from a file in the specified format.

Parameters:	input_object – A file object or a file name. format – Input format (fasta by default).
Returns:	A dictionary with the parsed SeqRecord objects.
Return type:	generator

wub.util.seq.read_seq_records_dict(input_object, format='fasta')[source]¶

Read SeqRecord objects to a dictionary from a file in the specified format.

Parameters:	input_object – A file object or a file name. format – Input format (fasta by default).
Returns:	An iterator of SeqRecord objects.
Return type:	dict

wub.util.seq.record_lengths(input_iter)[source]¶

Return lengths of SeqRecord obejcts in the input iterator.

Parameters:	input_iter – An iterator of SeqRecord objects.
Returns:	An ordered dictionary with the lengths of the SeqRecord objects.
Return type:	OrderedDict

wub.util.seq.reverse_complement(seq)[source]¶

Return reverse complement of a string (base) sequence.

Parameters:	seq – Input sequence.
Returns:	Reverse complement of input sequence.
Return type:	str

wub.util.seq.rna_record_to_dna(record)[source]¶

Convert an RNA SeqRecord into DNA SeqRecord.

Parameters:	record – RNA SeqRecord.
Returns:	The DNA SeqRecord object.
Return type:	SeqRecord

wub.util.seq.word_composition(seq, size)[source]¶

Return word counts of a nucleotide sequence.

Parameters:	seq – Input sequence. size – word length.
Returns:	word counts.
Return type:	OrderedDict

wub.util.seq.write_seq_records(records_iterator, output_object, format='fasta')[source]¶

Write out SeqRecord objects to a file from an iterator in the specified format.

Parameters:	records_iterator – An iterator of SeqRecord objects. output_object – Open file object or file name. format – Output format (fasta by default).
Returns:	None
Return type:	object