API¶
Lightweight interfaces for reading and writing BED records.¶
Examples of Parsing BED files¶
>>> import pybedlite as pybed
>>> from pathlib import Path
>>> with pybed.reader(path=Path("infile.bed")) as in_fh:
for record in in_fh:
# Do work with records
pass
Examples of Writing BED files¶
>>> import pybedlite as pybed
>>> from pathlib import Path
>>> # Get records from somewhere
>>> records = []
>>> with pybed.reader(path=Path("infile.bed")) as in_fh:
for record in in_fh:
records.append(record)
>>> # Write records to somewhere
>>> with pybed.writer(path=Path("outfile.bed"), num_fields=6) as out_fh:
for record in records:
out_fh.write(record)
Module Contents¶
The module contains the following public classes:
The module contains the following methods:
pybedlite.reader()– opens a BED file for reading.
pybedlite.writer()– opens a BED file for writing.
- class pybedlite.BedRecord(*, chrom: str, start: int, end: int, name: str | None = None, score: int | None = None, strand: BedStrand | None = None, thick_start: int | None = None, thick_end: int | None = None, item_rgb: Tuple[int, int, int] | None = None, block_count: int | None = None, block_sizes: List[int] | None = None, block_starts: List[int] | None = None)¶
Lightweight class for storing BED records.
A more comprehensive description of BED format can be found at https://genome.ucsc.edu/FAQ/FAQformat.html#format1. Only chrom, start, and end are required.
- score¶
a score for the interval (Officially in the UCSC spec should be between 0 and 1000, however we do not enforce this constraint on the score. We only require that if defined it stores an integer)
- Type:
int | None
- strand¶
defines the strand of the interval described by the record
- Type:
- thick_start¶
the starting position at which the bed feature should be drawn thickly
- Type:
int | None
- block_sizes¶
a list of block (exon) sizes. Number of items must correspond to block_count
- Type:
List[int] | None
- block_starts¶
a list of block (exon) starts relative to start. 0-based inclusive. The number of items must correspond to block_count
- Type:
List[int] | None
- as_bed_line(number_of_output_fields: int | None = None) str¶
Converts a BED record to a tab delimited BED line equivalent, including up to the number of fields specified in the output line.
- Parameters:
number_of_output_fields – the number of fields that should be output in the bed line. i.e. if you’d like a BED6 line, this should be set to 6. Etc.
- Raises:
ValueError – If number_of_output_fields is not between 3 and 12
- property bed_fields: List[str]¶
Converts a BED record to a list of its BED field string equivalents.
- classmethod from_interval(interval: Interval) BedRecord¶
Construct a BedRecord from a Interval instance.
Note that `Interval` cannot represent a `BedRecord` with a missing strand. Converting a record with no strand to Interval and then back to BedRecord will result in a record with positive strand.
- Parameters:
interval – The Interval instance to convert.
- Returns:
A BedRecord corresponding to the same region specified in the interval.
- class pybedlite.BedSource(path: Path | str | IO[Any])¶
Reader for BED records stored in a BED file.
- num_fields¶
the number of BED fields present for records in this file. This will be set to the number of fields present in the first record parsed by this class. Note that while the official BED spec indicates that all BED records in the same file should be written with the same number of fields, we do not enforce that this is the case in the BED files this reader parses.
- class pybedlite.BedStrand(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
Enumerations of strands for BED records.
- class pybedlite.BedWriter(path: Path | str | IO[Any], num_fields: int | None = None)¶
Writer class for writing BED records to a file.
- num_fields¶
The number of BED fields to report. Must be between 3 and 12.
- close() None¶
Close the BedWriter file.
Should be called after all records to write have been added.
- write(record: BedRecord, truncate: bool = False, add_missing: bool = False) None¶
Write a single BedRecord to the file.
- Parameters:
record – the BED record to write to the file.
truncate – if false and a BED record is passed with more fields than the writer is set to output a ValueError will be raised. If true such a record will be written in a truncated fashion, with only the number of fields written by this writer.
add_missing – if false and a BED record is passed with fewer fields than the writer is set to output a ValueError will be raised. If true such a record will be written in a padded fashion, with ‘.’ output for the missing fields up to the number of fields written by this writer.
- write_all(records: Iterable[BedRecord], truncate: bool = False, add_missing: bool = False) None¶
Write multiple BedRecords to a file.
- Parameters:
records – the BED records to write to the file (must be iterable)
truncate – if false and a BED record is passed with more fields than the writer is set to output a ValueError will be raised. If true such records will be written in a truncated fashion, with only the number of fields written by this writer
add_missing – if false and a BED record is passed with fewer fields than the writer is set to output a ValueError will be raised. If true such records will be written in a padded fashion, with ‘.’ output for the missing fields up to the number of fields written by this writer.
- pybedlite.reader(path: Path | str | IO[Any]) BedSource¶
Return a BedSource for reading the BED file.
- Parameters:
path – a file handle or path to the Bed to read.
- pybedlite.writer(path: Path | str | IO[Any], num_fields: int | None = None) BedWriter¶
Return a BedWriter for writing the BED file.
- Parameters:
path – a file handle or path to the BED to write.
num_fields – the number of BED fields to write for each record. If none this value will be set to the number of fields present in the first BED record written by this object.
Utility Classes for Querying Overlaps with Genomic Regions.¶
The OverlapDetector class detects and returns overlaps between
a set of regions and another region on a reference.
The overlap detector may contain a collection of interval-like Python objects that have the following properties:
refname (str): The reference sequence name
start (int): A 0-based start position
end (int): A 0-based half-open end position
This contract is described by the Span protocol.
Interval-like Python objects may also contain strandedness information which will be used
for sorting them in get_overlaps() using
the following property if it is present:
negative (bool): True if the interval is negatively stranded, False if the interval is unstranded or positively stranded
This contract is described by the StrandedSpan protocol.
Examples of Detecting Overlaps¶
>>> from pybedlite.overlap_detector import Interval, OverlapDetector
>>> detector = OverlapDetector()
>>> query = Interval("chr1", 2, 20)
>>> detector.overlaps_any(query)
False
>>> detector.add(Interval("chr2", 1, 100))
>>> detector.add(Interval("chr1", 21, 100))
>>> detector.overlaps_any(query)
False
>>> detector.add(Interval("chr1", 1, 1))
>>> detector.overlaps_any(query)
True
>>> detector.get_overlaps(query)
[Interval("chr1", 1, 1)]
>>> detector.add(Interval("chr1", 3, 10))
>>> detector.overlaps_any(query)
True
>>> detector.get_overlaps(query)
[Interval("chr1", 1, 1), interval("chr1", 3, 10)]
Module Contents¶
The module contains the following public classes:
Interval– Represents a region mapping to a referencesequence that is 0-based and open-ended.
OverlapDetector– Detects and returns overlaps betweena set of regions and another region on a reference.
- class pybedlite.overlap_detector.Interval(refname: str, start: int, end: int, negative: bool = False, name: str | None = None)¶
A region mapping to a reference sequence that is 0-based and open-ended.
- negative¶
true if the interval is negatively stranded, False if the interval is unstranded or positively stranded
- Type:
- classmethod from_bedrecord(record: BedRecord) Interval¶
Construct an Interval from a BedRecord instance.
Note that when the BedRecord does not have a specified strand, the Interval’s negative attribute is set to False. This mimics the behavior of OverlapDetector.from_bed() when reading a record that does not have a specified strand.
- Parameters:
record – The BedRecord instance to convert.
- Returns:
An Interval corresponding to the same region specified in the record.
- classmethod from_ucsc(string: str, name: str | None = None) Interval¶
Construct an Interval from a UCSC “position”-formatted string.
The “Position” format (referring to the “1-start, fully-closed” system as coordinates are “positioned” in the browser):
Written as: chr1:127140001-127140001
The location may optionally be followed by a parenthetically enclosed strand, e.g. chr1:127140001-127140001(+).
No spaces.
Includes punctuation: a colon after the chromosome, and a dash between the start and end coordinates.
When in this format, the assumption is that the coordinate is 1-start, fully-closed.
https://genome-blog.gi.ucsc.edu/blog/2016/12/12/the-ucsc-genome-browser-coordinate-counting-systems/
Note that when the string does not have a specified strand, the Interval’s negative attribute is set to False. This mimics the behavior of OverlapDetector.from_bed() when reading a record that does not have a specified strand.
- Parameters:
string – The UCSC “position”-formatted string.
name – An optional name for the interval.
- Returns:
An Interval corresponding to the same region specified in the string. Note that the Interval is zero-based open-ended.
- Raises:
ValueError – If the string is not a valid UCSC position-formatted string.
- class pybedlite.overlap_detector.OverlapDetector(intervals: Iterable[SpanType] | None = None)¶
Detects and returns overlaps between a set of regions and another region on a reference.
The overlap detector may contain a collection of interval-like Python objects that have the following properties:
refname (str): The reference sequence name
start (int): A 0-based start position
end (int): A 0-based half-open end position
Interval-like Python objects may also contain strandedness information which will be used for sorting them in
get_overlaps()using the following property if it is present:- negative (bool): True if the interval is negatively stranded, False if the interval is
unstranded or positively stranded
The same interval may be added multiple times, but only a single instance will be returned when querying for overlaps.
This detector is the most efficient when all intervals are added ahead of time.
- add(interval: SpanType) None¶
Adds an interval to this detector.
- Parameters:
interval – the interval to add to this detector
- add_all(intervals: Iterable[SpanType]) None¶
Adds one or more intervals to this detector.
- Parameters:
intervals – the intervals to add to this detector
- classmethod from_bed(path: Path) OverlapDetector[BedRecord]¶
Builds a
OverlapDetectorfrom a BED file.- Parameters:
path – the path to the BED file
- Returns:
An overlap detector for the regions in the BED file.
- get_enclosed(interval: Span) List[SpanType]¶
Returns the set of intervals in this detector that are enclosed by the query interval. I.e. target.start >= query.start and target.end <= query.end.
- Parameters:
interval – the query interval
- Returns:
The list of intervals in this detector that are enclosed within the query interval. The intervals will be returned sorted using the following sort keys:
The interval’s start (ascending)
The interval’s end (ascending)
The interval’s strand, positive or negative (assumed to be positive if undefined)
The interval’s reference sequence name (lexicographically)
- get_enclosing_intervals(interval: Span) List[SpanType]¶
Returns the set of intervals in this detector that wholly enclose the query interval. i.e. query.start >= target.start and query.end <= target.end.
- Parameters:
interval – the query interval
- Returns:
The list of intervals in this detector that enclose the query interval. The intervals will be returned sorted using the following sort keys:
The interval’s start (ascending)
The interval’s end (ascending)
The interval’s strand, positive or negative (assumed to be positive if undefined)
The interval’s reference sequence name (lexicographically)
- get_overlaps(interval: Span) List[SpanType]¶
Returns any intervals in this detector that overlap the given interval.
- Parameters:
interval – the interval to check
- Returns:
The list of intervals in this detector that overlap the given interval, or an empty list if no overlaps exist. The intervals will be returned sorted using the following sort keys:
The interval’s start (ascending)
The interval’s end (ascending)
The interval’s strand, positive or negative (assumed to be positive if undefined)
The interval’s reference sequence name (lexicographically)
- class pybedlite.overlap_detector.Span(*args, **kwargs)¶
A structural type for a span on a reference sequence with zero-based open-ended coordinates.
- class pybedlite.overlap_detector.SpanType¶
A generic reference sequence span. This type variable is used for describing the generic type contained within the
OverlapDetector.alias of TypeVar(‘SpanType’, bound=
Span|StrandedSpan)