Utilities¶

ccanalyser.utils.invert_dict(d: dict) → Generator[Tuple[str, str], None, None]¶: Inverts key: value pairs into value: key pairs

ccanalyser.utils.is_on(param: str) → bool¶

Returns True if parameter in “on” values

On values:

true
t
on
yes
y
1

ccanalyser.utils.is_off(param: str)¶: Returns True if parameter in “off” values

ccanalyser.utils.is_none(param: str) → bool¶: Returns True if parameter is none

ccanalyser.utils.get_human_readable_number_of_bp(bp: int) → str¶: Converts integer into human readable basepair number

ccanalyser.utils.is_valid_bed(bed: Union[str, pybedtools.bedtool.BedTool], verbose=True) → bool¶: Returns true if bed file can be opened and has at least 3 columns

ccanalyser.utils.bed_has_name(bed: Union[str, pybedtools.bedtool.BedTool]) → bool¶: Returns true if bed file has at least 4 columns

ccanalyser.utils.bed_has_duplicate_names(bed: Union[str, pybedtools.bedtool.BedTool]) → bool¶: Returns true if bed file has no duplicated names

ccanalyser.utils.get_re_site(recognition_site: Optional[str] = None) → str¶

Obtains the recogniton sequence for a supplied restriction enzyme or correctly formats a supplied recognition sequence.

Parameters

- DNA sequence to use for fasta digestion e.g. "GATC" (cut_sequence) –
- Name of restriction enzyme e.g. DpnII (restriction_enzyme) –

Returns

recognition sequence e.g. “GATC”

Raises

ValueError – Error if restriction_enzyme is not in known enzymes

ccanalyser.utils.hash_column(col: Iterable, hash_type=64) → list¶

Convinience function to perform hashing using xxhash on an iterable.

Function is not vectorised.

ccanalyser.utils.split_intervals_on_chrom(intervals: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → dict¶: Creates dictionary from bed file with the chroms as keys

ccanalyser.utils.intersect_bins(bins_1: pandas.core.frame.DataFrame, bins_2: pandas.core.frame.DataFrame, **bedtools_kwargs) → pandas.core.frame.DataFrame¶

Intersects two sets of genomic intervals using bedtools intersect.

Formats the intersection in a clearer way than pybedtool auto names.

ccanalyser.utils.load_json(fn, dtype: str = 'int') → dict¶: Convinence function to load gziped json file using xopen.

ccanalyser.utils.get_timing(task_name=None) → Callable¶: Decorator: Gets the time taken by the wrapped function

ccanalyser.utils.convert_to_bedtool(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → pybedtools.bedtool.BedTool¶: Converts a str or pd.DataFrame to a pybedtools.BedTool object

ccanalyser.utils.categorise_tracks(ser: pandas.core.series.Series) → list¶

Gets a series for grouping tracks together

Parameters: ser (pd.Series) – File names to map
Returns: Mapping for grouping.
Return type: list

ccanalyser.utils.convert_bed_to_dataframe(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶: Converts a bed like object (including paths to bed files) to a pd.DataFrame

ccanalyser.utils.format_coordinates(coordinates: Union[str, os.PathLike]) → pybedtools.bedtool.BedTool¶

Converts coordinates supplied in string format or a .bed file to a BedTool.

Parameters: coordinates (Union[str, os.PathLike]) – Coordinates in the form chr:start-end or a path.
Raises: ValueError – Inputs must be supplied in the correct format.
Returns: BedTool object containing the required coordinates.
Return type: BedTool

ccanalyser.utils.convert_interval_to_coords(interval: Union[pybedtools.cbedtools.Interval, dict], named=False) → Tuple[str]¶

Converts interval object to standard genomic coordinates.

e.g. chr1:1000-2000

Parameters: interval (Union[pybedtools.Interval, dict]) – Interval to convert.
Returns: Genomic coordinates in the format chr:start-end
Return type: Tuple

class ccanalyser.utils.PysamFakeEntry(name, sequence, quality)¶

Bases: object

Testing class used to supply a pysam FastqProxy like object