wub.simulate package

Submodules

wub.simulate.dist module

wub.simulate.dist.sample_truncated_gamma(mean, shape, low=None, high=None)[source]

A naive rejection approach to sample from truncated gamma distribution. Note that truncation points ae included in the sample.

Parameters:
  • mean – Mean of the distribution.
  • shape – Shape parameter.
  • low – Lower truncation point.
  • high – Upper truncation point.
Returns:

Random sample from the specified distribution.

Return type:

float

wub.simulate.genome module

class wub.simulate.genome.Fragment(chrom, uid, start, end, seq)

Bases: tuple

Create new instance of Fragment(chrom, uid, start, end, seq)

chrom

Alias for field number 0

end

Alias for field number 3

seq

Alias for field number 4

start

Alias for field number 2

uid

Alias for field number 1

wub.simulate.genome.sample_chromosome(chromosomes)[source]

Sample a random chromosome.

Parameters:chromosomes – A collection of SeqRecord object.
Returns:A randomly sampled element from the input collection.
Return type:SeqRecord
wub.simulate.genome.simulate_fragment(chromosome, mean_length, gamma_shape, low_truncation, high_truncation, fragment_number)[source]

Simulate a fragment from a chromosome.

Parameters:
  • chromosome – Chromosome to simulate fragment from, SeqRecord object.
  • mean_length – Mean length of simulated fragment.
  • gamma_shape – Shape parameter of length distribution.
  • low_truncation – Minimum read length.
  • high_truncation – Maximum read length.
  • fragment_number – The unique identifier of fragment in simulation (number of fragment).
Returns:

A named tuple with chromosome id, fragment number, start, end and sequence.

Return type:

namedtuple

wub.simulate.genome.simulate_fragments(chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, number_fragments)[source]

Simulate a fragments from a set of chromosomes. Chromosomes are picked randomly for each fragment.

Parameters:
  • chromosomes – Chromosomes to simulate fragment from, a list of SeqRecord objects.
  • mean_length – Mean length of simulated fragments.
  • gamma_shape – Shape parameter of length distribution.
  • low_truncation – Minimum read length.
  • high_truncation – Maximum read length.
  • number_fragments – Number of fragments to simulate.
Returns:

An iterator named tuples with chromosome id, fragment number, start, end and sequence.

Return type:

generator

wub.simulate.genome.simulate_genome(number_chromosomes, mean_length, gamma_shape, low_truncation, high_truncation, base_frequencies)[source]

Generator function for simulating chromosomes in a genome. Chromosome lengths are sampled from a truncated gamma distribution.

Parameters:
  • number_chromosomes – Number of simulated chromosomes.
  • mean_length – Mean length of simulated chromosomes.
  • gamma_shape – Shape parameter of the chromosome length distribution.
  • low_truncation – Minimum chromosome length.
  • high_truncation – Maximum chromosome length.
  • base_frequencies – Array of base frequencies in the ACGT order.
Returns:

A generator of SeqRecord objects.

Return type:

generator

wub.simulate.seq module

class wub.simulate.seq.MutatedSeq(seq, real_qual, real_subst, real_del, real_ins, cigar)

Bases: tuple

Create new instance of MutatedSeq(seq, real_qual, real_subst, real_del, real_ins, cigar)

cigar

Alias for field number 5

real_del

Alias for field number 3

real_ins

Alias for field number 4

real_qual

Alias for field number 1

real_subst

Alias for field number 2

seq

Alias for field number 0

wub.simulate.seq.add_errors(seq, nr_errors, error_type)[source]

Introduce a specified number of errors in the target sequence at random positions.

Parameters:
  • seq – Input DNA sequence.
  • nr_errors – Number of mismatches to introduce.
Returns:

Mutated sequence.

Return type:

str

wub.simulate.seq.cigar_list_to_string(cigar_list)[source]

Sample error type from error weights dictionary.

Parameters:error_weights – A dcitionary with (type, probability) pairs.
Returns:Error type
Return type:str
wub.simulate.seq.compress_raw_cigar_list(raw_cigar)[source]

Sample error type from error weights dictionary.

Parameters:error_weights – A dcitionary with (type, probability) pairs.
Returns:Error type
Return type:str
wub.simulate.seq.random_base(probs=[0.25, 0.25, 0.25, 0.25])[source]

Generate a random DNA base.

Parameters:probs – Probabilities of sampling a base, in the ACGT order.
Returns:A sampled base.
Return type:str
wub.simulate.seq.random_base_except(excluded, probs=[0.25, 0.25, 0.25, 0.25])[source]

Generate a random base according to the specified probabilities with the exclusion of the specified base.

Parameters:
  • excluded – Exclude this base from sampling.
  • probs – Base sampling probabilities in the ACGT order.
Returns:

A sampled base.

Return type:

str

wub.simulate.seq.sample_direction(forward_prob)[source]
wub.simulate.seq.sample_error_type(error_weights)[source]

Sample error type from error weights dictionary.

Parameters:error_weights – A dcitionary with (type, probability) pairs.
Returns:Error type
Return type:str
wub.simulate.seq.simulate_sequence(length, probs=[0.25, 0.25, 0.25, 0.25])[source]

Simulate sequence of specified length and base composition.

Parameters:
  • length – Length of simulated sequence.
  • probs – Base composition vector in the ACGT order.
Returns:

Simulated sequence.

Return type:

str

wub.simulate.seq.simulate_sequencing_errors(sequence, error_rate, error_weights)[source]

Simulate substitutions, deletions and insertions.

Parameters:
  • sequence – Input sequence.
  • error_rate – Total error rate.
  • error_weights – A dictionary with error types as keys and probabilities as values.

The possible error types are: substitution, deletion, insertion. :returns: A named tuple with elements: mutated sequence, realised quality, number of realised substitutions, number of realised deletions, number of realised insertions, cigar string. :rtype: namedtuple

Module contents