The RSS Glasgow Local Group are pleased to welcome Prof. Claire Gormley (School of Mathematics and Statistics, University College Dublin)
Abstract:
Background:
The DNA methylation process has been extensively studied for its role in cancer. Promoter
cytosine-guanine dinucleotide (CpG) island hypermethylation has been shown to silence tumour
suppressor genes. The methylation state of a CpG site is hypermethylated if both the alleles are
methylated, hypomethylated if neither of the alleles are methylated and hemimethylated otherwise.
Identifying the differentially methylated CpG (DMC) sites between benign and tumour
samples can help understand the disease.
The Illumina MethylationEPIC BeadChip microarray quantifies the methylation level at a
CpG site as a beta value which lies within [0,1). There is a lack of suitable methods for modelling
the beta values in their innate form. For this reason, the beta values are usually transformed
into M-values for analysis. The DMCs are identified using M-values or beta values via multiple
t-tests but this can be computationally expensive. Also, arbitrary thresholds are often selected
and used to identify the methylation state of a CpG site.
We propose a family of novel beta mixture models (BMMs) which use a model-based clustering
approach to cluster the CpG sites in their innate beta form to (i) objectively identify
methylation state thresholds and (ii) identify the DMCs between different samples. The family
of BMMs employs different parameter constraints that are applicable to different study settings.
Parameter estimation proceeds via an Expectation-Maximisation algorithm, with a novel approximation
during the M-step providing tractability and computational feasibility.
Results:
Performance of the BMMs is assessed through a thorough simulation study, and the BMMs are
used to analyse a prostate cancer dataset and an esophageal squamous cell carcinoma dataset.
The BMM approach objectively identifies methylation state thresholds and identifies more DMCs
between the benign and tumour samples in the prostate and esophageal cancer data than conventional
methods, in a computationally efficient manner. The empirical cumulative distribution
function of the DMCs related to genes implicated in carcinogenesis indicates hypermethylation
of CpG sites in the tumour samples in both cancer settings.
Conclusion:
An R package betaclust is provided to facilitate the widespread use of the developed BMMs to
provide objective thresholds to determine methylation state and to computationally efficiently
identify DMCs by clustering DNA methylation data in its innate form.