wub.simulate package¶
Submodules¶
wub.simulate.dist module¶
-
wub.simulate.dist.
sample_truncated_gamma
(mean, shape, low=None, high=None)[source]¶ A naive rejection approach to sample from truncated gamma distribution. Note that truncation points ae included in the sample.
Parameters: - mean – Mean of the distribution.
- shape – Shape parameter.
- low – Lower truncation point.
- high – Upper truncation point.
Returns: Random sample from the specified distribution.
Return type: float
wub.simulate.genome module¶
-
class
wub.simulate.genome.
Fragment
(chrom, uid, start, end, seq)¶ Bases:
tuple
Create new instance of Fragment(chrom, uid, start, end, seq)
-
chrom
¶ Alias for field number 0
-
end
¶ Alias for field number 3
-
seq
¶ Alias for field number 4
-
start
¶ Alias for field number 2
-
uid
¶ Alias for field number 1
-
-
wub.simulate.genome.
sample_chromosome
(chromosomes)[source]¶ Sample a random chromosome.
Parameters: chromosomes – A collection of SeqRecord object. Returns: A randomly sampled element from the input collection. Return type: SeqRecord
-
wub.simulate.genome.
simulate_fragment
(chromosome, mean_length, gamma_shape, low_truncation, high_truncation, fragment_number)[source]¶ Simulate a fragment from a chromosome.
Parameters: - chromosome – Chromosome to simulate fragment from, SeqRecord object.
- mean_length – Mean length of simulated fragment.
- gamma_shape – Shape parameter of length distribution.
- low_truncation – Minimum read length.
- high_truncation – Maximum read length.
- fragment_number – The unique identifier of fragment in simulation (number of fragment).
Returns: A named tuple with chromosome id, fragment number, start, end and sequence.
Return type: namedtuple
-
wub.simulate.genome.
simulate_fragments
(chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, number_fragments)[source]¶ Simulate a fragments from a set of chromosomes. Chromosomes are picked randomly for each fragment.
Parameters: - chromosomes – Chromosomes to simulate fragment from, a list of SeqRecord objects.
- mean_length – Mean length of simulated fragments.
- gamma_shape – Shape parameter of length distribution.
- low_truncation – Minimum read length.
- high_truncation – Maximum read length.
- number_fragments – Number of fragments to simulate.
Returns: An iterator named tuples with chromosome id, fragment number, start, end and sequence.
Return type: generator
-
wub.simulate.genome.
simulate_genome
(number_chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, base_frequencies)[source]¶ Generator function for simulating chromosomes in a genome. Chromosome lengths are sampled from a truncated gamma distribution.
Parameters: - number_chromosomes – Number of simulated chromosomes.
- mean_length – Mean length of simulated chromosomes.
- gamma_shape – Shape parameter of the chromosome length distribution.
- low_truncation – Minimum chromosome length.
- high_truncation – Maximum chromosome length.
- base_frequencies – Array of base frequencies in the ACGT order.
Returns: A generator of SeqRecord objects.
Return type: generator
wub.simulate.seq module¶
-
class
wub.simulate.seq.
MutatedSeq
(seq, real_qual, real_subst, real_del, real_ins, cigar)¶ Bases:
tuple
Create new instance of MutatedSeq(seq, real_qual, real_subst, real_del, real_ins, cigar)
-
cigar
¶ Alias for field number 5
-
real_del
¶ Alias for field number 3
-
real_ins
¶ Alias for field number 4
-
real_qual
¶ Alias for field number 1
-
real_subst
¶ Alias for field number 2
-
seq
¶ Alias for field number 0
-
-
wub.simulate.seq.
add_errors
(seq, nr_errors, error_type)[source]¶ Introduce a specified number of errors in the target sequence at random positions.
Parameters: - seq – Input DNA sequence.
- nr_errors – Number of mismatches to introduce.
Returns: Mutated sequence.
Return type: str
-
wub.simulate.seq.
cigar_list_to_string
(cigar_list)[source]¶ Sample error type from error weights dictionary.
Parameters: error_weights – A dcitionary with (type, probability) pairs. Returns: Error type Return type: str
-
wub.simulate.seq.
compress_raw_cigar_list
(raw_cigar)[source]¶ Sample error type from error weights dictionary.
Parameters: error_weights – A dcitionary with (type, probability) pairs. Returns: Error type Return type: str
-
wub.simulate.seq.
random_base
(probs=[0.25, 0.25, 0.25, 0.25])[source]¶ Generate a random DNA base.
Parameters: probs – Probabilities of sampling a base, in the ACGT order. Returns: A sampled base. Return type: str
-
wub.simulate.seq.
random_base_except
(excluded, probs=[0.25, 0.25, 0.25, 0.25])[source]¶ Generate a random base according to the specified probabilities with the exclusion of the specified base.
Parameters: - excluded – Exclude this base from sampling.
- probs – Base sampling probabilities in the ACGT order.
Returns: A sampled base.
Return type: str
-
wub.simulate.seq.
sample_error_type
(error_weights)[source]¶ Sample error type from error weights dictionary.
Parameters: error_weights – A dcitionary with (type, probability) pairs. Returns: Error type Return type: str
-
wub.simulate.seq.
simulate_sequence
(length, probs=[0.25, 0.25, 0.25, 0.25])[source]¶ Simulate sequence of specified length and base composition.
Parameters: - length – Length of simulated sequence.
- probs – Base composition vector in the ACGT order.
Returns: Simulated sequence.
Return type: str
-
wub.simulate.seq.
simulate_sequencing_errors
(sequence, error_rate, error_weights)[source]¶ Simulate substitutions, deletions and insertions.
Parameters: - sequence – Input sequence.
- error_rate – Total error rate.
- error_weights – A dictionary with error types as keys and probabilities as values.
The possible error types are: substitution, deletion, insertion. :returns: A named tuple with elements: mutated sequence, realised quality, number of realised substitutions, number of realised deletions, number of realised insertions, cigar string. :rtype: namedtuple