API

Lightweight interfaces for reading and writing BED records.

Examples of Parsing BED files

>>> import pybedlite as pybed
>>> from pathlib import Path
>>> with pybed.reader(path=Path("infile.bed")) as in_fh:
        for record in in_fh:
            # Do work with records
            pass

Examples of Writing BED files

>>> import pybedlite as pybed
>>> from pathlib import Path
>>> # Get records from somewhere
>>> records = []
>>> with pybed.reader(path=Path("infile.bed")) as in_fh:
        for record in in_fh:
            records.append(record)
>>> # Write records to somewhere
>>> with pybed.writer(path=Path("outfile.bed"), num_fields=6) as out_fh:
        for record in records:
            out_fh.write(record)

Module Contents

The module contains the following public classes:

  • BedStrand – Enumeration of possible strands for a bed record

  • BedRecord – Lightweight class for storing information

    pertaining to a BED record.

  • BedSource – Reader class for parsing BED files and iterate

    over their contained records

  • BedWriter – Writer class for writing BED files

The module contains the following methods:

class pybedlite.BedRecord(*, chrom: str, start: int, end: int, name: str | None = None, score: int | None = None, strand: BedStrand | None = None, thick_start: int | None = None, thick_end: int | None = None, item_rgb: Tuple[int, int, int] | None = None, block_count: int | None = None, block_sizes: List[int] | None = None, block_starts: List[int] | None = None)

Lightweight class for storing BED records.

A more comprehensive description of BED format can be found at https://genome.ucsc.edu/FAQ/FAQformat.html#format1. Only chrom, start, and end are required.

chrom

the reference name of the interval described by the record

Type:

str

start

the start coordinate in 0-based half open coordinates (inclusive)

Type:

int

end

the end coordinate in 0-based half open coordinates (exclusive)

Type:

int

name

the name of the bed record

Type:

str | None

score

a score for the interval (Officially in the UCSC spec should be between 0 and 1000, however we do not enforce this constraint on the score. We only require that if defined it stores an integer)

Type:

int | None

strand

defines the strand of the interval described by the record

Type:

pybedlite.bed_record.BedStrand | None

thick_start

the starting position at which the bed feature should be drawn thickly

Type:

int | None

thick_end

the ending position at which the bed feature should be drawn thickly

Type:

int | None

item_rgb

an RGB value of the form (R, G, B)

Type:

Tuple[int, int, int] | None

block_count

the number of blocks in the bed line

Type:

int | None

block_sizes

a list of block (exon) sizes. Number of items must correspond to block_count

Type:

List[int] | None

block_starts

a list of block (exon) starts relative to start. 0-based inclusive. The number of items must correspond to block_count

Type:

List[int] | None

as_bed_line(number_of_output_fields: int | None = None) str

Converts a BED record to a tab delimited BED line equivalent, including up to the number of fields specified in the output line.

Parameters:

number_of_output_fields – the number of fields that should be output in the bed line. i.e. if you’d like a BED6 line, this should be set to 6. Etc.

Raises:

ValueError – If number_of_output_fields is not between 3 and 12

property bed_field_num: int

The number of BED fields that are defined in this record.

property bed_fields: List[str]

Converts a BED record to a list of its BED field string equivalents.

classmethod from_interval(interval: Interval) BedRecord

Construct a BedRecord from a Interval instance.

Note that `Interval` cannot represent a `BedRecord` with a missing strand. Converting a record with no strand to Interval and then back to BedRecord will result in a record with positive strand.

Parameters:

interval – The Interval instance to convert.

Returns:

A BedRecord corresponding to the same region specified in the interval.

property negative: bool

True if the interval is negatively stranded, False if the interval is unstranded or positively stranded.

property refname: str

The reference name of the interval described by the record.

class pybedlite.BedSource(path: Path | str | IO[Any])

Reader for BED records stored in a BED file.

num_fields

the number of BED fields present for records in this file. This will be set to the number of fields present in the first record parsed by this class. Note that while the official BED spec indicates that all BED records in the same file should be written with the same number of fields, we do not enforce that this is the case in the BED files this reader parses.

close() None

Closes the BedSource file. Should be called after iterating over the file.

open() BedSource

Open the BedSource file for reading.

Must be called before iterating over the file. Make sure to close when done.

class pybedlite.BedStrand(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Enumerations of strands for BED records.

property opposite: BedStrand

Return the opposite strand of the current strand.

class pybedlite.BedWriter(path: Path | str | IO[Any], num_fields: int | None = None)

Writer class for writing BED records to a file.

num_fields

The number of BED fields to report. Must be between 3 and 12.

close() None

Close the BedWriter file.

Should be called after all records to write have been added.

open() BedWriter

Open the BedWriter’s file handle.

write(record: BedRecord, truncate: bool = False, add_missing: bool = False) None

Write a single BedRecord to the file.

Parameters:
  • record – the BED record to write to the file.

  • truncate – if false and a BED record is passed with more fields than the writer is set to output a ValueError will be raised. If true such a record will be written in a truncated fashion, with only the number of fields written by this writer.

  • add_missing – if false and a BED record is passed with fewer fields than the writer is set to output a ValueError will be raised. If true such a record will be written in a padded fashion, with ‘.’ output for the missing fields up to the number of fields written by this writer.

write_all(records: Iterable[BedRecord], truncate: bool = False, add_missing: bool = False) None

Write multiple BedRecords to a file.

Parameters:
  • records – the BED records to write to the file (must be iterable)

  • truncate – if false and a BED record is passed with more fields than the writer is set to output a ValueError will be raised. If true such records will be written in a truncated fashion, with only the number of fields written by this writer

  • add_missing – if false and a BED record is passed with fewer fields than the writer is set to output a ValueError will be raised. If true such records will be written in a padded fashion, with ‘.’ output for the missing fields up to the number of fields written by this writer.

pybedlite.reader(path: Path | str | IO[Any]) BedSource

Return a BedSource for reading the BED file.

Parameters:

path – a file handle or path to the Bed to read.

pybedlite.writer(path: Path | str | IO[Any], num_fields: int | None = None) BedWriter

Return a BedWriter for writing the BED file.

Parameters:
  • path – a file handle or path to the BED to write.

  • num_fields – the number of BED fields to write for each record. If none this value will be set to the number of fields present in the first BED record written by this object.

Utility Classes for Querying Overlaps with Genomic Regions.

The OverlapDetector class detects and returns overlaps between a set of regions and another region on a reference.

The overlap detector may contain a collection of interval-like Python objects that have the following properties:

  • refname (str): The reference sequence name

  • start (int): A 0-based start position

  • end (int): A 0-based half-open end position

This contract is described by the Span protocol.

Interval-like Python objects may also contain strandedness information which will be used for sorting them in get_overlaps() using the following property if it is present:

  • negative (bool): True if the interval is negatively stranded, False if the interval is unstranded or positively stranded

This contract is described by the StrandedSpan protocol.

Examples of Detecting Overlaps

>>> from pybedlite.overlap_detector import Interval, OverlapDetector
>>> detector = OverlapDetector()
>>> query = Interval("chr1", 2, 20)
>>> detector.overlaps_any(query)
False
>>> detector.add(Interval("chr2", 1, 100))
>>> detector.add(Interval("chr1", 21, 100))
>>> detector.overlaps_any(query)
False
>>> detector.add(Interval("chr1", 1, 1))
>>> detector.overlaps_any(query)
True
>>> detector.get_overlaps(query)
[Interval("chr1", 1, 1)]
>>> detector.add(Interval("chr1", 3, 10))
>>> detector.overlaps_any(query)
True
>>> detector.get_overlaps(query)
[Interval("chr1", 1, 1), interval("chr1", 3, 10)]

Module Contents

The module contains the following public classes:

  • Interval – Represents a region mapping to a reference

    sequence that is 0-based and open-ended.

  • OverlapDetector – Detects and returns overlaps between

    a set of regions and another region on a reference.

class pybedlite.overlap_detector.Interval(refname: str, start: int, end: int, negative: bool = False, name: str | None = None)

A region mapping to a reference sequence that is 0-based and open-ended.

refname

the refname (or chromosome)

Type:

str

start

the 0-based start position

Type:

int

end

the 0-based half-open end position

Type:

int

negative

true if the interval is negatively stranded, False if the interval is unstranded or positively stranded

Type:

bool

name

an optional name assigned to the interval

Type:

Optional[str]

classmethod from_bedrecord(record: BedRecord) Interval

Construct an Interval from a BedRecord instance.

Note that when the BedRecord does not have a specified strand, the Interval’s negative attribute is set to False. This mimics the behavior of OverlapDetector.from_bed() when reading a record that does not have a specified strand.

Parameters:

record – The BedRecord instance to convert.

Returns:

An Interval corresponding to the same region specified in the record.

classmethod from_ucsc(string: str, name: str | None = None) Interval

Construct an Interval from a UCSC “position”-formatted string.

The “Position” format (referring to the “1-start, fully-closed” system as coordinates are “positioned” in the browser):

  • Written as: chr1:127140001-127140001

  • The location may optionally be followed by a parenthetically enclosed strand, e.g. chr1:127140001-127140001(+).

  • No spaces.

  • Includes punctuation: a colon after the chromosome, and a dash between the start and end coordinates.

  • When in this format, the assumption is that the coordinate is 1-start, fully-closed.

https://genome-blog.gi.ucsc.edu/blog/2016/12/12/the-ucsc-genome-browser-coordinate-counting-systems/

Note that when the string does not have a specified strand, the Interval’s negative attribute is set to False. This mimics the behavior of OverlapDetector.from_bed() when reading a record that does not have a specified strand.

Parameters:
  • string – The UCSC “position”-formatted string.

  • name – An optional name for the interval.

Returns:

An Interval corresponding to the same region specified in the string. Note that the Interval is zero-based open-ended.

Raises:

ValueError – If the string is not a valid UCSC position-formatted string.

length() int

Returns the length of the interval.

overlap(other: Interval) int

Returns the overlap between this interval and the other, or zero if there is none.

Parameters:

other (Interval) – the other interval to find the overlap with

class pybedlite.overlap_detector.OverlapDetector(intervals: Iterable[SpanType] | None = None)

Detects and returns overlaps between a set of regions and another region on a reference.

The overlap detector may contain a collection of interval-like Python objects that have the following properties:

  • refname (str): The reference sequence name

  • start (int): A 0-based start position

  • end (int): A 0-based half-open end position

Interval-like Python objects may also contain strandedness information which will be used for sorting them in get_overlaps() using the following property if it is present:

  • negative (bool): True if the interval is negatively stranded, False if the interval is

    unstranded or positively stranded

The same interval may be added multiple times, but only a single instance will be returned when querying for overlaps.

This detector is the most efficient when all intervals are added ahead of time.

add(interval: SpanType) None

Adds an interval to this detector.

Parameters:

interval – the interval to add to this detector

add_all(intervals: Iterable[SpanType]) None

Adds one or more intervals to this detector.

Parameters:

intervals – the intervals to add to this detector

classmethod from_bed(path: Path) OverlapDetector[BedRecord]

Builds a OverlapDetector from a BED file.

Parameters:

path – the path to the BED file

Returns:

An overlap detector for the regions in the BED file.

get_enclosed(interval: Span) List[SpanType]

Returns the set of intervals in this detector that are enclosed by the query interval. I.e. target.start >= query.start and target.end <= query.end.

Parameters:

interval – the query interval

Returns:

The list of intervals in this detector that are enclosed within the query interval. The intervals will be returned sorted using the following sort keys:

  • The interval’s start (ascending)

  • The interval’s end (ascending)

  • The interval’s strand, positive or negative (assumed to be positive if undefined)

  • The interval’s reference sequence name (lexicographically)

get_enclosing_intervals(interval: Span) List[SpanType]

Returns the set of intervals in this detector that wholly enclose the query interval. i.e. query.start >= target.start and query.end <= target.end.

Parameters:

interval – the query interval

Returns:

The list of intervals in this detector that enclose the query interval. The intervals will be returned sorted using the following sort keys:

  • The interval’s start (ascending)

  • The interval’s end (ascending)

  • The interval’s strand, positive or negative (assumed to be positive if undefined)

  • The interval’s reference sequence name (lexicographically)

get_overlaps(interval: Span) List[SpanType]

Returns any intervals in this detector that overlap the given interval.

Parameters:

interval – the interval to check

Returns:

The list of intervals in this detector that overlap the given interval, or an empty list if no overlaps exist. The intervals will be returned sorted using the following sort keys:

  • The interval’s start (ascending)

  • The interval’s end (ascending)

  • The interval’s strand, positive or negative (assumed to be positive if undefined)

  • The interval’s reference sequence name (lexicographically)

overlaps_any(interval: Span) bool

Determines whether the given interval overlaps any interval in this detector.

Parameters:

interval – the interval to check

Returns:

True if and only if the given interval overlaps with any interval in this detector.

class pybedlite.overlap_detector.Span(*args, **kwargs)

A structural type for a span on a reference sequence with zero-based open-ended coordinates.

property end: int

A 0-based open-ended end position.

property refname: str

A reference sequence name.

property start: int

A 0-based start position.

class pybedlite.overlap_detector.SpanType

A generic reference sequence span. This type variable is used for describing the generic type contained within the OverlapDetector.

alias of TypeVar(‘SpanType’, bound=Span | StrandedSpan)

class pybedlite.overlap_detector.StrandedSpan(*args, **kwargs)

A structural type for a stranded span on a reference sequence with zero-based open-ended coordinates.

property negative: bool

True if the interval is negatively stranded, False if the interval is unstranded or positively stranded.