Software

We develop and maintain a number of R/Python tools, which are hosted at the CDSLab GitHub (caravagnalab) page and outlined here by year of development. Packages on top might be in preparation.

GitHub

CNAqc - Copy Number Quality Check


A Copy Number Alterations quality check package to assess the concordance between copy number segments and somatic mutations called from bulk sequencing, using peak detection metrics. It supports clonal and subclonal CNAs, and can compute Cancer Cell Fractions estimates and a number of other useful analyses.
  • In preparation.

TINC - Tumor in normal contamination assessment from bulk sequencing


A method to determine the contamination of a tumour sample by normal cells, which can be used to quality control tumour-normal samples and confirm that normal samples are not contaminated by tumour variants.
  • In preparation.

lineaGT/pylineagt - Bayesian multi-lineage inference from gene therapy assays


Multi-lineage inference from Gene Therapy assays that use insertional mutagenesis and somatic mutations to track clonal expansions in the hematopoietic compartment. Leveraging Bayesian models,Pyro implementation and stochastic variational inference (double package)
  • In preparation.

tapacloth - Classification of cancer mutation clonality and zygosity from target sequencing


A model that uses panel read counts data to determine the clonality status of tumour mutations while estimating their zygosity, fundamental to understand the complex interplay between somatic mutations and aneuploidy from panel data.
  • In preparation.

basilica/pybasilica - Bayesian somatic signature learning with a catalogue


A semi-supervised Bayesian hierarchical model to detect somatic signatures using a catalogue, with a Pyro implementation and stochastic variational inference (double package).
  • In preparation.

Rcongas/congas - Copy Number calling from single-cell RNA sequencing


A Bayesian method to cluster single-cell RNA sequencing data using Copy Number Alterations, written in Pyro with stochastic variational inference, as well as functions for data preprocessing and visualisation in R (double package).
  • Salvatore Milite, Riccardo Bergamin, Lucrezia Patruno, Nicola Calonaci, Giulio Caravagna. A Bayesian method to cluster single-cell RNA sequencing data using Copy Number Alterations. Bioinformatics, Volume 38, Issue 9, 1 May 2022, Pages 2512–2518

mobster - Model-based tumour subclonal deconvolution


Tumour subclonal deconvolution from whole-genome DNA sequencing, exploiting joint Machine Learning and Population Genetics, allowing to determine the number of clones that can be associated to selection forces.
  • G. Caravagna, T. Heide, M.J. Williams, L. Zapata, D. Nichol, K. Chkhaidze, W. Cross, G.D. Cresswell, B. Werner, A. Acar, L. Chesler, C.P. Barnes, G. Sanguinetti, T.A. Graham, A. Sottoriva. Subclonal reconstruction of tumors by using machine learning and population genetics. Nature Genetics 52, 898–907 (2020).

VIBER - Variational multivariate Binomial mixtures for read counts deconvolution


Subclonal deconvolution of read counts from multiple bulk sequencing biopsies, using a variational Bayesian approach..
  • G. Caravagna, T. Heide, M.J. Williams, L. Zapata, D. Nichol, K. Chkhaidze, W. Cross, G.D. Cresswell, B. Werner, A. Acar, L. Chesler, C.P. Barnes, G. Sanguinetti, T.A. Graham, A. Sottoriva. Subclonal reconstruction of tumors by using machine learning and population genetics. Nature Genetics 52, 898–907 (2020).

BMix - Binomial and Beta-Binomial univariate mixtures for read counts deconvolution


Univariate Binomial and Beta-Binomial mixture models for subclonal deconvolution of read counts from bulk assays.. Differently from VIBER this package works with a single biopsy, but provides a maximum-likelihood approach with Beta-Binomial distributions.
  • G. Caravagna, T. Heide, M.J. Williams, L. Zapata, D. Nichol, K. Chkhaidze, W. Cross, G.D. Cresswell, B. Werner, A. Acar, L. Chesler, C.P. Barnes, G. Sanguinetti, T.A. Graham, A. Sottoriva. Subclonal reconstruction of tumors by using machine learning and population genetics. Nature Genetics 52, 898–907 (2020).

REVOLVER - Repeated evolutionary trajectories in cancer via Transfer Learning


A model to computed repeated evolutionary trajectories from a cohort of multi-region samples of multiple tumour patients, with functions to stratify the cohort and determine evolutionary subgroups of tumours that evolve with different patterns.
  • G. Caravagna, Y. Giarratano, D. Ramazzoti, I. Tomlinson, T.A. Graham, G. Sanguinetti, A. Sottoriva. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nature Methods 15, 707–714 (2018).

ctree - Cancer Clone trees


Clones trees from multi-region bulk sequencing data, built of Cancer Cell Fractions (CCFs) clusters computed by tumour subclonal deconvolution algorithms (MOBSTER etc.).
  • G. Caravagna, Y. Giarratano, D. Ramazzoti, I. Tomlinson, T.A. Graham, G. Sanguinetti, A. Sottoriva. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nature Methods 15, 707–714 (2018).

mtree - Cancer mutation trees


Mutation trees from multi-region bulk or single-cell sequencing data, where the presence or absence of a somatic mutation, a Copy Number event, or any other event available in binary format.
  • G. Caravagna, Y. Giarratano, D. Ramazzoti, I. Tomlinson, T.A. Graham, G. Sanguinetti, A. Sottoriva. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nature Methods 15, 707–714 (2018).

evoverse.datasets - Data package


Input datasets and computation results from a number of packages that we have developed.