Research Paper Volume 8, Issue 5 pp 1021—1033

Deep biomarkers of human aging: Application of deep neural networks to biomarker development

Evgeny Putin1,2, , Polina Mamoshina1,3, , Alexander Aliper1, , Mikhail Korzinkin1, , Alexey Moskalev1,4, , Alexey Kolosov5, , Alexander Ostrovskiy5, , Charles Cantor6, , Jan Vijg7, , Alex Zhavoronkov1,3, ,

  • 1 Pharma.AI Department, Insilico Medicine, Inc, Baltimore, MD 21218, USA
  • 2 Computer Technologies Lab, ITMO University, St. Petersburg 197101, Russia
  • 3 The Biogerontology Research Foundation, Oxford, UK
  • 4 School of Systems Biology, George Mason University (GMU), Fairfax, VA 22030, USA
  • 5 Invitro Laboratory, Ltd, Moscow 125047, Russia
  • 6 Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
  • 7 Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA

Received: September 26, 2015       Accepted: May 9, 2016       Published: May 18, 2016      

https://doi.org/10.18632/aging.100968
How to Cite

Abstract

One of the major impediments in human aging research is the absence of a comprehensive and actionable set of biomarkers that may be targeted and measured to track the effectiveness of therapeutic interventions. In this study, we designed a modular ensemble of 21 deep neural networks (DNNs) of varying depth, structure and optimization to predict human chronological age using a basic blood test. To train the DNNs, we used over 60,000 samples from common blood biochemistry and cell count tests from routine health exams performed by a single laboratory and linked to chronological age and sex. The best performing DNN in the ensemble demonstrated 81.5 % epsilon-accuracy r = 0.90 with R2 = 0.80 and MAE = 6.07 years in predicting chronological age within a 10 year frame, while the entire ensemble achieved 83.5% epsilon-accuracy r = 0.91 with R2 = 0.82 and MAE = 5.55 years. The ensemble also identified the 5 most important markers for predicting human chronological age: albumin, glucose, alkaline phosphatase, urea and erythrocytes. To allow for public testing and evaluate real-life performance of the predictor, we developed an online system available at http://www.aging.ai. The ensemble approach may facilitate integration of multi-modal data linked to chronological age and sex that may lead to simple, minimally invasive, and affordable methods of tracking integrated biomarkers of aging in humans and performing cross-species feature importance analysis.

Abbreviations

ML: Machine Learning; SVM: Support Vector Machine; DNN: Deep Neural Network; PFI: Permutation Feature Importance; RF: Random Forests; GBM: Gradient Boosting Machine; kNN: k-Nearest Neighbors; DT: Decision Trees; LR: Linear Regression.