deepbiop.core

Functions

generate_kmers(base, k)

Generate all possible k-mers from a set of base characters.

generate_kmers_table(base, k)

Generate a lookup table mapping k-mers to unique IDs.

kmers_to_seq(kmers)

Convert k-mers back into a DNA sequence.

normalize_seq(seq, iupac)

Normalize a DNA sequence by converting any non-standard nucleotides to standard ones.

reverse_complement(seq)

Generate the reverse complement of a DNA sequence.

seq_to_kmers(seq, k, overlap)

Convert a DNA sequence into k-mers.

Module Contents

deepbiop.core.generate_kmers(base, k)

Generate all possible k-mers from a set of base characters.

This function takes a string of base characters and a k-mer length, and generates all possible k-mer combinations of that length.

# Arguments

  • base - A string containing the base characters to use (e.g. “ATCG”)

  • k - The length of k-mers to generate

# Returns

A vector containing all possible k-mer combinations as strings

Parameters:
Return type:

list[str]

deepbiop.core.generate_kmers_table(base, k)

Generate a lookup table mapping k-mers to unique IDs.

This function takes a string of base characters and a k-mer length, and generates a HashMap mapping each possible k-mer to a unique integer ID.

# Arguments

  • base - A string containing the base characters to use (e.g. “ATCG”)

  • k - The length of k-mers to generate

# Returns

A HashMap mapping k-mer byte sequences to integer IDs

Parameters:
Return type:

dict[list[int], int]

deepbiop.core.kmers_to_seq(kmers)

Convert k-mers back into a DNA sequence.

This function takes a vector of k-mers and reconstructs the original DNA sequence. The k-mers are assumed to be in order and overlapping.

# Arguments

  • kmers - A vector of k-mers as `String`s

# Returns

The reconstructed DNA sequence as a String, wrapped in a Result

Parameters:

kmers (Sequence[str])

Return type:

str

deepbiop.core.normalize_seq(seq, iupac)

Normalize a DNA sequence by converting any non-standard nucleotides to standard ones.

This function takes a DNA sequence as a String and a boolean flag iupac indicating whether to normalize using IUPAC ambiguity codes. It returns a normalized DNA sequence as a String.

# Arguments

  • seq - A DNA sequence as a String.

  • iupac - A boolean flag indicating whether to normalize using IUPAC ambiguity codes.

# Returns

A normalized DNA sequence as a String.

Parameters:
Return type:

str

deepbiop.core.reverse_complement(seq)

Generate the reverse complement of a DNA sequence.

This function takes a DNA sequence as a String and returns its reverse complement. The reverse complement is generated by reversing the sequence and replacing each nucleotide with its complement (A<->T, C<->G).

# Arguments

  • seq - A DNA sequence as a String

# Returns

The reverse complement sequence as a String

# Example

``` use deepbiop_core::seq::reverse_complement;

let seq = String::from(“ATCG”); let rev_comp = reverse_complement(seq); assert_eq!(rev_comp, “CGAT”); ```

Parameters:

seq (str)

Return type:

str

deepbiop.core.seq_to_kmers(seq, k, overlap)

Convert a DNA sequence into k-mers.

This function takes a DNA sequence and splits it into k-mers of specified length. The sequence is first normalized to handle non-standard nucleotides.

# Arguments

  • seq - A DNA sequence as a String

  • k - The length of each k-mer

  • overlap - Whether to generate overlapping k-mers

# Returns

A vector of k-mers as `String`s

Parameters:
Return type:

list[str]