Utilities¶
- ccanalyser.utils.invert_dict(d: dict) → Generator[Tuple[str, str], None, None]¶
Inverts key: value pairs into value: key pairs
- ccanalyser.utils.is_on(param: str) → bool¶
Returns True if parameter in “on” values
- On values:
true
t
on
yes
y
1
- ccanalyser.utils.is_off(param: str)¶
Returns True if parameter in “off” values
- ccanalyser.utils.is_none(param: str) → bool¶
Returns True if parameter is none
- ccanalyser.utils.get_human_readable_number_of_bp(bp: int) → str¶
Converts integer into human readable basepair number
- ccanalyser.utils.is_valid_bed(bed: Union[str, pybedtools.bedtool.BedTool], verbose=True) → bool¶
Returns true if bed file can be opened and has at least 3 columns
- ccanalyser.utils.bed_has_name(bed: Union[str, pybedtools.bedtool.BedTool]) → bool¶
Returns true if bed file has at least 4 columns
- ccanalyser.utils.bed_has_duplicate_names(bed: Union[str, pybedtools.bedtool.BedTool]) → bool¶
Returns true if bed file has no duplicated names
- ccanalyser.utils.get_re_site(recognition_site: Optional[str] = None) → str¶
Obtains the recogniton sequence for a supplied restriction enzyme or correctly formats a supplied recognition sequence.
- Parameters
- DNA sequence to use for fasta digestion e.g. "GATC" (cut_sequence) –
- Name of restriction enzyme e.g. DpnII (restriction_enzyme) –
- Returns
recognition sequence e.g. “GATC”
- Raises
ValueError – Error if restriction_enzyme is not in known enzymes
- ccanalyser.utils.hash_column(col: Iterable, hash_type=64) → list¶
Convinience function to perform hashing using xxhash on an iterable.
Function is not vectorised.
- ccanalyser.utils.split_intervals_on_chrom(intervals: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → dict¶
Creates dictionary from bed file with the chroms as keys
- ccanalyser.utils.intersect_bins(bins_1: pandas.core.frame.DataFrame, bins_2: pandas.core.frame.DataFrame, **bedtools_kwargs) → pandas.core.frame.DataFrame¶
Intersects two sets of genomic intervals using bedtools intersect.
Formats the intersection in a clearer way than pybedtool auto names.
- ccanalyser.utils.load_json(fn, dtype: str = 'int') → dict¶
Convinence function to load gziped json file using xopen.
- ccanalyser.utils.get_timing(task_name=None) → Callable¶
Decorator: Gets the time taken by the wrapped function
- ccanalyser.utils.convert_to_bedtool(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → pybedtools.bedtool.BedTool¶
Converts a str or pd.DataFrame to a pybedtools.BedTool object
- ccanalyser.utils.categorise_tracks(ser: pandas.core.series.Series) → list¶
Gets a series for grouping tracks together
- Parameters
ser (pd.Series) – File names to map
- Returns
Mapping for grouping.
- Return type
list
- ccanalyser.utils.convert_bed_to_dataframe(bed: Union[str, pybedtools.bedtool.BedTool, pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶
Converts a bed like object (including paths to bed files) to a pd.DataFrame
- ccanalyser.utils.format_coordinates(coordinates: Union[str, os.PathLike]) → pybedtools.bedtool.BedTool¶
Converts coordinates supplied in string format or a .bed file to a BedTool.
- Parameters
coordinates (Union[str, os.PathLike]) – Coordinates in the form chr:start-end or a path.
- Raises
ValueError – Inputs must be supplied in the correct format.
- Returns
BedTool object containing the required coordinates.
- Return type
BedTool
- ccanalyser.utils.convert_interval_to_coords(interval: Union[pybedtools.cbedtools.Interval, dict], named=False) → Tuple[str]¶
Converts interval object to standard genomic coordinates.
e.g. chr1:1000-2000
- Parameters
interval (Union[pybedtools.Interval, dict]) – Interval to convert.
- Returns
Genomic coordinates in the format chr:start-end
- Return type
Tuple
- class ccanalyser.utils.PysamFakeEntry(name, sequence, quality)¶
Bases:
objectTesting class used to supply a pysam FastqProxy like object