wub.simulate package¶

Submodules¶

wub.simulate.dist module¶

wub.simulate.dist.sample_truncated_gamma(mean, shape, low=None, high=None)[source]¶

A naive rejection approach to sample from truncated gamma distribution. Note that truncation points ae included in the sample.

Parameters:	mean – Mean of the distribution. shape – Shape parameter. low – Lower truncation point. high – Upper truncation point.
Returns:	Random sample from the specified distribution.
Return type:	float

wub.simulate.genome module¶

class wub.simulate.genome.Fragment(chrom, uid, start, end, seq)¶

Bases: tuple

Create new instance of Fragment(chrom, uid, start, end, seq)

chrom¶: Alias for field number 0

end¶: Alias for field number 3

seq¶: Alias for field number 4

start¶: Alias for field number 2

uid¶: Alias for field number 1

wub.simulate.genome.sample_chromosome(chromosomes)[source]¶

Sample a random chromosome.

Parameters:	chromosomes – A collection of SeqRecord object.
Returns:	A randomly sampled element from the input collection.
Return type:	SeqRecord

wub.simulate.genome.simulate_fragment(chromosome, mean_length, gamma_shape, low_truncation, high_truncation, fragment_number)[source]¶

Simulate a fragment from a chromosome.

Parameters:	chromosome – Chromosome to simulate fragment from, SeqRecord object. mean_length – Mean length of simulated fragment. gamma_shape – Shape parameter of length distribution. low_truncation – Minimum read length. high_truncation – Maximum read length. fragment_number – The unique identifier of fragment in simulation (number of fragment).
Returns:	A named tuple with chromosome id, fragment number, start, end and sequence.
Return type:	namedtuple

wub.simulate.genome.simulate_fragments(chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, number_fragments)[source]¶

Simulate a fragments from a set of chromosomes. Chromosomes are picked randomly for each fragment.

Parameters:	chromosomes – Chromosomes to simulate fragment from, a list of SeqRecord objects. mean_length – Mean length of simulated fragments. gamma_shape – Shape parameter of length distribution. low_truncation – Minimum read length. high_truncation – Maximum read length. number_fragments – Number of fragments to simulate.
Returns:	An iterator named tuples with chromosome id, fragment number, start, end and sequence.
Return type:	generator

wub.simulate.genome.simulate_genome(number_chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, base_frequencies)[source]¶

Generator function for simulating chromosomes in a genome. Chromosome lengths are sampled from a truncated gamma distribution.

Parameters:	number_chromosomes – Number of simulated chromosomes. mean_length – Mean length of simulated chromosomes. gamma_shape – Shape parameter of the chromosome length distribution. low_truncation – Minimum chromosome length. high_truncation – Maximum chromosome length. base_frequencies – Array of base frequencies in the ACGT order.
Returns:	A generator of SeqRecord objects.
Return type:	generator

wub.simulate.seq module¶

class wub.simulate.seq.MutatedSeq(seq, real_qual, real_subst, real_del, real_ins, cigar)¶

Bases: tuple

Create new instance of MutatedSeq(seq, real_qual, real_subst, real_del, real_ins, cigar)

cigar¶: Alias for field number 5

real_del¶: Alias for field number 3

real_ins¶: Alias for field number 4

real_qual¶: Alias for field number 1

real_subst¶: Alias for field number 2

seq¶: Alias for field number 0

wub.simulate.seq.add_errors(seq, nr_errors, error_type)[source]¶

Introduce a specified number of errors in the target sequence at random positions.

Parameters:	seq – Input DNA sequence. nr_errors – Number of mismatches to introduce.
Returns:	Mutated sequence.
Return type:	str

wub.simulate.seq.cigar_list_to_string(cigar_list)[source]¶

Sample error type from error weights dictionary.

Parameters:	error_weights – A dcitionary with (type, probability) pairs.
Returns:	Error type
Return type:	str

wub.simulate.seq.compress_raw_cigar_list(raw_cigar)[source]¶

Sample error type from error weights dictionary.

Parameters:	error_weights – A dcitionary with (type, probability) pairs.
Returns:	Error type
Return type:	str

wub.simulate.seq.random_base(probs=[0.25, 0.25, 0.25, 0.25])[source]¶

Generate a random DNA base.

Parameters:	probs – Probabilities of sampling a base, in the ACGT order.
Returns:	A sampled base.
Return type:	str

wub.simulate.seq.random_base_except(excluded, probs=[0.25, 0.25, 0.25, 0.25])[source]¶

Generate a random base according to the specified probabilities with the exclusion of the specified base.

Parameters:	excluded – Exclude this base from sampling. probs – Base sampling probabilities in the ACGT order.
Returns:	A sampled base.
Return type:	str

wub.simulate.seq.sample_direction(forward_prob)[source]¶

wub.simulate.seq.sample_error_type(error_weights)[source]¶

Sample error type from error weights dictionary.

Parameters:	error_weights – A dcitionary with (type, probability) pairs.
Returns:	Error type
Return type:	str

wub.simulate.seq.simulate_sequence(length, probs=[0.25, 0.25, 0.25, 0.25])[source]¶

Simulate sequence of specified length and base composition.

Parameters:	length – Length of simulated sequence. probs – Base composition vector in the ACGT order.
Returns:	Simulated sequence.
Return type:	str

wub.simulate.seq.simulate_sequencing_errors(sequence, error_rate, error_weights)[source]¶

Simulate substitutions, deletions and insertions.

Parameters:	sequence – Input sequence. error_rate – Total error rate. error_weights – A dictionary with error types as keys and probabilities as values.

The possible error types are: substitution, deletion, insertion. :returns: A named tuple with elements: mutated sequence, realised quality, number of realised substitutions, number of realised deletions, number of realised insertions, cigar string. :rtype: namedtuple

wub.simulate package¶

Submodules¶

wub.simulate.dist module¶

wub.simulate.genome module¶

wub.simulate.seq module¶

Module contents¶