wub.util package¶
Submodules¶
wub.util.cmd module¶
Utilities related to running external commands.
wub.util.misc module¶
Yet uncategorised utility functions.
-
wub.util.misc.
get_extension
(fname)[source]¶ get the file extension.
Parameters: fname – file name Returns: file extention Return type: str format ‘.*’
-
wub.util.misc.
get_fname
(fname)[source]¶ get the file name without extension.
Parameters: fname – file name Returns: file name Return type: str
-
wub.util.misc.
mkdir
(path)[source]¶ if the dir does not exists it create it
Parameters: path – dir path Returns: path Return type: str
wub.util.parse module¶
-
wub.util.parse.
args_string_to_dict
(args_string, elements_separator=', ', keyvalue_separator=':')[source]¶ Convert a two-level separated list into a dictionary.
Parameters: - args_string – Two-level separated string.
- elements_separator – Separator between elements.
- keyvalue_separator – Separator between key/value pairs.
Returns: dict
Return type: dict
-
wub.util.parse.
interval_string_to_tuples
(interval_string, elements_separator='|', interval_separator=', ')[source]¶ Convert a two-level separated list into a dictionary.
Parameters: - interval_string – Two-level separated string.
- elements_separator – Separator between elements.
- keyvalue_separator – Separator between interval boundaries.
Returns: tuple
Return type: tuple
wub.util.seq module¶
-
wub.util.seq.
alignment_stats
(ref, query, gap_character='-')[source]¶ Calculate statistics from two aligned sequences.
Parameters: - ref – Reference sequence.
- query – Query sequence.
- gap_character – Gap symbol.
Returns: AlnStats namedtuple.
Return type: namedtuple
-
wub.util.seq.
base_complement
(k)[source]¶ Return complement of base.
Performs the subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is identical to the argument for all other values.
Parameters: k – A base. Returns: Complement of base. Return type: str
-
wub.util.seq.
base_composition
(seq)[source]¶ Return letter counts of a string (base) sequence.
Parameters: seq – Input sequence. Returns: Letter counts. Return type: dict
-
wub.util.seq.
count_records
(input_object, format='fasta')[source]¶ Count SeqRecord objects from a file in the specified format.
Parameters: - input_object – A file object or a file name.
- format – Input format (fasta by default).
Returns: Number of records in input file.
Return type: int
-
wub.util.seq.
dna_record_to_rna
(record)[source]¶ Convert a DNA SeqRecord into RNA SeqRecord.
Parameters: record – DNA SeqRecord. Returns: The RNA SeqRecord object. Return type: SeqRecord
-
wub.util.seq.
gc_content
(seq)[source]¶ Return fraction of GC bases in sequence.
Parameters: seq – Input sequence. Returns: GC content. Return type: float
-
wub.util.seq.
mean_qscore
(scores, qround=True)[source]¶ Returns the phred score corresponding to the mean of the probabilities associated with the phred scores provided.
Parameters: - scores – Iterable of phred scores.
- qround – Round after calculating mean score.
Returns: Phred score corresponding to the average error rate, as estimated from the input phred scores.
-
wub.util.seq.
mock_qualities
(record, mock_qual)[source]¶ Add mock quality values to SeqRecord object.
Parameters: - record – A SeqRecord object.
- mock_qual – Mock quality value used for each base.
Returns: The record augmented with mock quality values.
Return type: object
-
wub.util.seq.
new_dna_record
(sequence, name, qualities=None)[source]¶ Create a new SeqRecord object using IUPACUnambiguousDNA and the specified sequence.
Parameters: - sequence – The sequence.
- name – Record identifier.
- qualities – List of base qualities.
Returns: The SeqRecord object.
Return type: SeqRecord
-
wub.util.seq.
phred_to_prob
(phred)[source]¶ Convert phred score into error probability.
Parameters: phred – Phred quality score. Returns: Error probability. Return type: float
-
wub.util.seq.
prob_to_phred
(error_prob, max_q=93, qround=True)[source]¶ Convert error probability into phred score.
Parameters: - error_prob – Base error probability.
- max_q – Maximum quality value.
- qround – Round calculated score.
Returns: Phred score.
Return type: int
-
wub.util.seq.
quality_array_to_string
(quality_list)[source]¶ Convert list of phred quality values to string.
Parameters: quality_list – List of phred quality scores. Returns: Quality string. Return type: str
-
wub.util.seq.
quality_string_to_array
(quality_string)[source]¶ Convert quality string into a list of phred scores.
Parameters: quality_string – Quality string. Returns: Array of scores. Return type: array
-
wub.util.seq.
read_alignment
(input_file, format='fasta')[source]¶ Load multiple alignment from file.
Parameters: input_file – Input file name. Returns: The alignment read from the input file. Return type: MultipleSeqAlignment
-
wub.util.seq.
read_seq_records
(input_object, format='fasta')[source]¶ Read SeqRecord objects from a file in the specified format.
Parameters: - input_object – A file object or a file name.
- format – Input format (fasta by default).
Returns: A dictionary with the parsed SeqRecord objects.
Return type: generator
-
wub.util.seq.
read_seq_records_dict
(input_object, format='fasta')[source]¶ Read SeqRecord objects to a dictionary from a file in the specified format.
Parameters: - input_object – A file object or a file name.
- format – Input format (fasta by default).
Returns: An iterator of SeqRecord objects.
Return type: dict
-
wub.util.seq.
record_lengths
(input_iter)[source]¶ Return lengths of SeqRecord obejcts in the input iterator.
Parameters: input_iter – An iterator of SeqRecord objects. Returns: An ordered dictionary with the lengths of the SeqRecord objects. Return type: OrderedDict
-
wub.util.seq.
reverse_complement
(seq)[source]¶ Return reverse complement of a string (base) sequence.
Parameters: seq – Input sequence. Returns: Reverse complement of input sequence. Return type: str
-
wub.util.seq.
rna_record_to_dna
(record)[source]¶ Convert an RNA SeqRecord into DNA SeqRecord.
Parameters: record – RNA SeqRecord. Returns: The DNA SeqRecord object. Return type: SeqRecord
-
wub.util.seq.
word_composition
(seq, size)[source]¶ Return word counts of a nucleotide sequence.
Parameters: - seq – Input sequence.
- size – word length.
Returns: word counts.
Return type: OrderedDict
-
wub.util.seq.
write_seq_records
(records_iterator, output_object, format='fasta')[source]¶ Write out SeqRecord objects to a file from an iterator in the specified format.
Parameters: - records_iterator – An iterator of SeqRecord objects.
- output_object – Open file object or file name.
- format – Output format (fasta by default).
Returns: None
Return type: object