Utilities

ccanalyser.utils.invert_dict(d: dict)Generator[Tuple[str, str], None, None]

Inverts key: value pairs into value: key pairs

ccanalyser.utils.is_on(param: str)bool

Returns True if parameter in “on” values

On values:
  • true

  • t

  • on

  • yes

  • y

  • 1

ccanalyser.utils.is_off(param: str)

Returns True if parameter in “off” values

ccanalyser.utils.is_none(param: str)bool

Returns True if parameter is none

ccanalyser.utils.get_human_readable_number_of_bp(bp: int)str

Converts integer into human readable basepair number

ccanalyser.utils.is_valid_bed(bed: Union[str, pybedtools.bedtool.BedTool], verbose=True)bool

Returns true if bed file can be opened and has at least 3 columns

ccanalyser.utils.bed_has_name(bed: Union[str, pybedtools.bedtool.BedTool])bool

Returns true if bed file has at least 4 columns

ccanalyser.utils.bed_has_duplicate_names(bed: Union[str, pybedtools.bedtool.BedTool])bool

Returns true if bed file has no duplicated names

ccanalyser.utils.get_re_site(recognition_site: Optional[str] = None)str

Obtains the recogniton sequence for a supplied restriction enzyme or correctly formats a supplied recognition sequence.

Parameters
  • - DNA sequence to use for fasta digestion e.g. "GATC" (cut_sequence) –

  • - Name of restriction enzyme e.g. DpnII (restriction_enzyme) –

Returns

recognition sequence e.g. “GATC”

Raises

ValueError – Error if restriction_enzyme is not in known enzymes

ccanalyser.utils.hash_column(col: Iterable, hash_type=64)list

Convinience function to perform hashing using xxhash on an iterable.

Function is not vectorised.

ccanalyser.utils.split_intervals_on_chrom(intervals: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame])dict

Creates dictionary from bed file with the chroms as keys

ccanalyser.utils.intersect_bins(bins_1: pandas.core.frame.DataFrame, bins_2: pandas.core.frame.DataFrame, **bedtools_kwargs)pandas.core.frame.DataFrame

Intersects two sets of genomic intervals using bedtools intersect.

Formats the intersection in a clearer way than pybedtool auto names.

ccanalyser.utils.load_json(fn, dtype: str = 'int')dict

Convinence function to load gziped json file using xopen.

ccanalyser.utils.get_timing(task_name=None)Callable

Decorator: Gets the time taken by the wrapped function

ccanalyser.utils.convert_to_bedtool(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame])pybedtools.bedtool.BedTool

Converts a str or pd.DataFrame to a pybedtools.BedTool object

ccanalyser.utils.categorise_tracks(ser: pandas.core.series.Series)list

Gets a series for grouping tracks together

Parameters

ser (pd.Series) – File names to map

Returns

Mapping for grouping.

Return type

list

ccanalyser.utils.convert_bed_to_dataframe(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame])pandas.core.frame.DataFrame

Converts a bed like object (including paths to bed files) to a pd.DataFrame

ccanalyser.utils.format_coordinates(coordinates: Union[str, os.PathLike])pybedtools.bedtool.BedTool

Converts coordinates supplied in string format or a .bed file to a BedTool.

Parameters

coordinates (Union[str, os.PathLike]) – Coordinates in the form chr:start-end or a path.

Raises

ValueError – Inputs must be supplied in the correct format.

Returns

BedTool object containing the required coordinates.

Return type

BedTool

ccanalyser.utils.convert_interval_to_coords(interval: Union[pybedtools.cbedtools.Interval, dict], named=False)Tuple[str]

Converts interval object to standard genomic coordinates.

e.g. chr1:1000-2000

Parameters

interval (Union[pybedtools.Interval, dict]) – Interval to convert.

Returns

Genomic coordinates in the format chr:start-end

Return type

Tuple

class ccanalyser.utils.PysamFakeEntry(name, sequence, quality)

Bases: object

Testing class used to supply a pysam FastqProxy like object