deepbiop.fa

Classes

EncoderOption

Options for configuring the FASTA sequence encoder.

ParquetEncoder

An encoder for converting FASTA records to Parquet format.

RecordData

Functions

convert_multiple_fas_to_one_fa(paths, result_path, ...)

encode_fa_path_to_parquet(fa_path, bases[, result_path])

encode_fa_path_to_parquet_chunk(fa_path, chunk_size, ...)

encode_fa_paths_to_parquet(fa_path, bases)

select_record_from_fa(selected_reads, fq, output)

select_record_from_fa_by_random(fq, number, output)

write_fa(records_data[, file_path])

write_fa_parallel(records_data, file_path, threads)

Module Contents

class deepbiop.fa.EncoderOption

Options for configuring the FASTA sequence encoder.

This struct provides configuration options for encoding FASTA sequences, such as which bases to consider during encoding.

# Fields

  • bases - A vector of valid bases (as bytes) to use for encoding. Defaults to “ATCGN”.

# Example

``` use deepbiop_fa::encode::option::EncoderOption;

let options = EncoderOption::default(); ```

property bases: list[int]
Return type:

list[int]

class deepbiop.fa.ParquetEncoder

An encoder for converting FASTA records to Parquet format.

This struct provides functionality to encode FASTA sequence data into Parquet files, which are an efficient columnar storage format.

# Fields

  • option - Configuration options for the encoder, including which bases to consider

# Example

``` use deepbiop_fa::encode::{option::EncoderOption, parquet::ParquetEncoder};

let options = EncoderOption::default(); let encoder = ParquetEncoder::new(options); ```

class deepbiop.fa.RecordData
property id: str
Return type:

str

property seq: str
Return type:

str

deepbiop.fa.convert_multiple_fas_to_one_fa(paths, result_path, parallel)
Parameters:
Return type:

None

deepbiop.fa.encode_fa_path_to_parquet(fa_path, bases, result_path=None)
Parameters:
Return type:

None

deepbiop.fa.encode_fa_path_to_parquet_chunk(fa_path, chunk_size, parallel, bases)
Parameters:
Return type:

None

deepbiop.fa.encode_fa_paths_to_parquet(fa_path, bases)
Parameters:
Return type:

None

deepbiop.fa.select_record_from_fa(selected_reads, fq, output)
Parameters:
Return type:

None

deepbiop.fa.select_record_from_fa_by_random(fq, number, output)
Parameters:
Return type:

None

deepbiop.fa.write_fa(records_data, file_path=None)
Parameters:
Return type:

None

deepbiop.fa.write_fa_parallel(records_data, file_path, threads)
Parameters:
Return type:

None