deepbiop.utils

Classes

GenomicInterval

A segment is a genomic interval defined by a chromosome, a start position and an end position.

PslAlignment

CompressedType

Represents different types of file compression formats.

SequenceFileType

Represents different types of sequence file formats.

Functions

check_compressed_type(path)

Check the compression type of a file.

generate_unmaped_intervals(input, total_length)

highlight_targets(sequence, targets[, text_width])

majority_voting(labels, window_size)

parse_psl_by_qname(file_path)

Parse PSL file by query name.

remove_intervals_and_keep_left(seq, intervals)

Module Contents

class deepbiop.utils.GenomicInterval

A segment is a genomic interval defined by a chromosome, a start position and an end position. The start position is inclusive and the end position is exclusive.

property start: int
Return type:

int

property end: int
Return type:

int

property chr: str
Return type:

str

overlap(other)
Parameters:

other (GenomicInterval)

Return type:

bool

__repr__()
Return type:

str

class deepbiop.utils.PslAlignment
property qname: str
Return type:

str

property qsize: int
Return type:

int

property qstart: int
Return type:

int

property qend: int
Return type:

int

property qmatch: int
Return type:

int

property tname: str
Return type:

str

property tsize: int
Return type:

int

property tstart: int
Return type:

int

property tend: int
Return type:

int

property identity: float
Return type:

float

__repr__()
Return type:

str

class deepbiop.utils.CompressedType

Bases: enum.Enum

Represents different types of file compression formats.

This enum is used to identify and handle various compression formats commonly used for files. It can be used in Python through the deepbiop.utils module.

# Variants

  • Uncompress - Uncompressed/raw file format

  • Gzip - Standard gzip compression (.gz files)

  • Bgzip - Blocked gzip format, commonly used in bioinformatics

  • Zip - ZIP archive format

  • Bzip2 - bzip2 compression format

  • Xz - XZ compression format (LZMA2)

  • Zstd - Zstandard compression format

  • Unknown - Unknown or unrecognized compression format

Uncompress = Ellipsis
Gzip = Ellipsis
Bgzip = Ellipsis
Zip = Ellipsis
Bzip2 = Ellipsis
Xz = Ellipsis
Zstd = Ellipsis
Unknown = Ellipsis
class deepbiop.utils.SequenceFileType

Bases: enum.Enum

Represents different types of sequence file formats.

Fasta = Ellipsis
Fastq = Ellipsis
Unknown = Ellipsis
deepbiop.utils.check_compressed_type(path)

Check the compression type of a file.

Parameters:

path (str | os.PathLike | pathlib.Path) – Path to the file to check

Return type:

The compression type of the file (None, Gzip, Bzip2, Xz)

Raises:

IOError – If the file cannot be opened or read:

deepbiop.utils.generate_unmaped_intervals(input, total_length)
Parameters:
Return type:

list[tuple[int, int]]

deepbiop.utils.highlight_targets(sequence, targets, text_width=None)
Parameters:
Return type:

str

deepbiop.utils.majority_voting(labels, window_size)
Parameters:
  • labels (Sequence[int])

  • window_size (int)

Return type:

list[int]

deepbiop.utils.parse_psl_by_qname(file_path)

Parse PSL file by query name.

Parameters:

file_path (str | os.PathLike | pathlib.Path)

Return type:

dict[str, list[PslAlignment]]

deepbiop.utils.remove_intervals_and_keep_left(seq, intervals)
Parameters:
Return type:

tuple[list[str], list[tuple[int, int]]]