Research Paper Volume 13, Issue 5 pp 6442—6458
Male-specific age estimation based on Y-chromosomal DNA methylation
- 1 Department of Genetic Identification, Erasmus University Medical Center Rotterdam, Rotterdam 3000, CA, The Netherlands
Received: March 25, 2020 Accepted: February 25, 2021 Published: March 11, 2021
https://doi.org/10.18632/aging.202775How to Cite
Copyright: © 2021 Vidaki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Although DNA methylation variation of autosomal CpGs provides robust age predictive biomarkers, no male-specific age predictor exists based on Y-CpGs yet. Since sex chromosomes play an important role in aging, a Y-chromosome-based age predictor would allow studying male-specific aging effects and would also be useful in forensics. Here, we used blood-based DNA methylation microarray data of 1,057 males from six cohorts aged 15-87 and identified 75 Y-CpGs with an interquartile range of ≥0.1. Of these, 22 and six were significantly hyper- and hypomethylated with age (p(cor)<0.05, Bonferroni), respectively. Amongst several machine learning algorithms, a model based on support vector machines with radial kernel performed best in male-specific age prediction. We achieved a mean absolute deviation (MAD) between true and predicted age of 7.54 years (cor=0.81, validation) when using all 75 Y-CpGs, and a MAD of 8.46 years (cor=0.73, validation) based on the most predictive 19 Y-CpGs. The accuracies of both age predictors did not worsen with increased age, in contrast to autosomal CpG-based age predictors that are known to predict age with reduced accuracy in the elderly. Overall, we introduce the first-of-its-kind male-specific epigenetic age predictor for future applications in aging research and forensics.
Abbreviations
BIC: Bayesian information criterion; BMIQ: beta mixture quantile; CpG: cytosine-phosphate-guanine site; CV: cross-validation; DNA: Deoxyribonucleic acid; DNAm: DNA methylation age (Horvath clock); EWAS: epigenome-wide association study; FDP: forensic DNA phenotyping; GEO: Gene Expression Omnibus database; HIV: human immunodeficiency viruses; IGV: integrative genomics viewer; IQR: inter-quantile range; MAD: mean absolute deviation; MLR: multiple linear regression; MSE: mean square error; OLS: ordinary least squares; oob: out-of-band; QC: quality control; RELIC: regression on logarithm of internal control probes; RFR: random forest regression; RMSE: root mean square error; RSS: residual sum of squares; SNP: single nucleotide polymorphism; SVM: support vector machine; Y-CpG: Y-chromosome-located CpG.