Research Paper Volume 14, Issue 19 pp 8095—8109

Development and assessment of diabetic nephropathy prediction model using hub genes identified by weighted correlation network analysis

Xuelian Zhang1, , Yao Wang1, , Zhaojun Yang1, , Xiaoping Chen1, , Jinping Zhang1, , Xin Wang1, , Xian Jin1, , Lili Wu1, , Xiaoyan Xing1, , Wenying Yang1, , Bo Zhang1, ,

  • 1 Department of Endocrinology, China-Japan Friendship Hospital, Beijing 100029, People’s Republic of China

Received: June 9, 2022       Accepted: September 23, 2022       Published: October 14, 2022      

https://doi.org/10.18632/aging.204340
How to Cite

Copyright: © 2022 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Diabetic nephropathy (DN) is one microvascular complication of diabetes. About 30% of diabetic patients can develop DN, which is closely related to the high incidence and mortality of heart diseases, and then develop end-stage renal diseases. Therefore, early detection and screening of high-risk patients with DN is important. Herein, we explored the differences of serum transcriptomics between DN and non-DN in type II diabetes mellitus (T2DM) patients. We obtained 110 target genes using weighted correlation network analysis. Gene Ontology enrichment analysis indicates these target genes are mainly related to membrane adhesion, alpha-amino acid biosynthesis, metabolism, and binding, terminus, inhibitory synapse, clathrinid-sculpted vesicle, kinase activity, hormone binding, receptor activity, and transporter activity. Kyoto Encyclopedia of Genes and Genomes analysis indicates the process of DN in diabetic patients can involve synaptic vesicle cycle, cysteine and methionine metabolism, N-Glycan biosynthesis, osteoclast differentiation, and cAMP signaling pathway. Next, we detected the expression levels of hub genes in a retrospective cohort. Then, we developed a risk score tool included in the prediction model for early DN in T2DM patients. The prediction model was well applied into clinical practice, as confirmed by internal validation and several other methods. A novel DN risk model with relatively high prediction accuracy was established based on clinical characteristics and hub genes of serum detection. The estimated risk score can help clinicians develop individualized intervention programs for DN in T2DM. External validation data are required before individualized intervention measures.

Introduction

Type 2 diabetes mellitus (T2DM) is chronic metabolic disease caused by defective insulin secretion and/or insulin resistance [1]. The stimulation of chronic hyperglycemia activates various intracellular signal transduction pathways, and produces glucose toxic products, which in turn damage capillary endothelial cells, resulting in capillary basement membrane thickening, extracellular matrix deposition, and finally systemic microvascular diseases [2, 3]. Type 2 diabetic nephropathy (T2DN) is one of its main microvascular complications [4, 5]. The pathological changes of early nephropathy mainly involve the glomeruli distributed in the renal cortex, and the glomeruli are composed of many capillary clusters [6]. The onset of T2DN is insidious. Once it enters the massive proteinuria stage, the progression rate to end-stage renal disease is about 14 times that of other kidney diseases [7, 8]. Therefore, early diagnosis, prevention and delay of the occurrence and development of DN can improve the survival rate of T2DM patients. Improving their quality of life is of great significance.

In the present study, we first identified the hub genes that may be involved in the process of DN using weighted correlation network analysis (WGCNA). Next, we developed DN risk prediction models based on target genes and clinical parameters. Then, we evaluated and validated the prediction ability of the models. Finally, we built a nomogram risk tool to evaluate the DN risk of individuals. Our study will help clinical decision-making and improve the prognosis of T2DM patients.

Materials and Methods

GEO datasets

We downloaded datasets GSE142153 and GSE26168 (both expression profiling by array) from the GEO database (https://www.ncbi.nlm.nih.gov/gds). The GSE142153 based on the GPL6480 platform includes 3 condition experiments: Healthy control (n=10), DN (n=23), and end-stage renal disease (n=7). The tissue type was from peripheral blood. The GSE26168 based on the GPL6883 platform includes healthy controls (n=8), impaired fasting glucose group (n=7), and T2DM group (n=9). We used the DN group of GSE142153 and the T2DM group of GSE26168 for further analyses.

Patients

This is a retrospective cohort study design. The patients included here had T2DM in the hospital. We collected the baseline data from November 2013 to June 2014, and acquired the follow-data from September 2017 to June 2018. We identified the T2DM according to one or more of the following criteria: HbA1c ≥ 6.5%, fasting blood glucose ≥ 7.0 mmol/L, oral glucose tolerance test ≥ 11.1 mmol/L, and random blood glucose ≥ 11.1 mmol/L [9]. The criteria for inclusion were: age under 75 years old; no severe disease including malignant tumor, uremia dialysis, chronic inflammatory disease, autoimmune diseases, and cerebra-cardiovascular disease. Patients are conscious and can communicate with the investigator. Patients with >30 mg/g (albumin /creatinine ratio) at baseline, or incomplete data or information were excluded. This study was approved by the Ethics Committee of China-Japan Friendship Hospital, and all subjects provided written informed consent.

Bioinformatics analysis

With R packages “limma” and “sva”, we first merged the DN group of GSE142153 and the T2DM group of GSE26168. We used the “removeBatchEffect” function in the R package limma. Differential expression analyses were performed between the two groups. The differentially expressed genes (DEGs) were identified using the following criteria: |log2 fold change|>2 and adjusted P <0.05.

WGCNA is a system biology method used to describe gene association patterns between different samples [10, 11]. It can be used to identify highly covarying gene sets and to find out candidate biomarker genes or therapeutic targets based on the interconnectedness of gene sets and the association between gene sets and phenotypes. The WGCNA consists of gene co-expression network construction, and identification of modules, related modules to external information, relationships between modules, and key drives in interesting modules. We used the following criteria to identify the related modules: the module size of 10-20000, the cut height equal to 0.25 and verbose equal to 5. Highly related genes were identified with thresholds greater than 0.1 in the topological overlap matrix (TOM).

We made an overlap between DEGs and related module genes identified by WGCNA. The intersection genes were used for further analyses. Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analyses were achieved using R packages “clusterProfiler”, “enrichplot” and “ggplot2”, and the results were presented in bar plots [12, 13]. We further established protein-protein interaction (PPI) using String (https://www.string-db.org/). Any gene without nodes with other genes was excluded. The PPI was inputted into Cytoscape [14]. Using the application of cytoHubba, we identified the top 10 hub genes using five algorithms: maximal clique centrality (MCC), maximum neighborhood (MN), maximum neighborhood component (MNC), Degree, and edge percolated component (EPC). Intersections among the five algorithms were made, which were considered as the key genes.

Clinical data collection

The following information was collected for each patient: age, gender, smoking status, drinking status, body mass index (BMI, weight/height2, kg/m2), systolic blood pressure, and diastolic blood pressure. The venous blood was extracted before receiving any treatment. The fasting blood glucose (FBG), HbA1c, total cholesterol (TC), triglyceride (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), blood urea nitrogen (BUN), and serum creatinine (SCr) were measured using the Jaffe rate-blanked compensated creatinine assay. Urine acid (UA) was detected using an automated biochemical analyzer. The normal and abnormal standards were from the Guidelines for Prevention and Treatment of Dyslipidemia in Chinese Adults [15]. Patients with urinary albumin creatinine ratio >30 mg/g were diagnosed with DN [16]. According the World Health Organization, people who have smoked continuously or cumulatively for 6 months or more were defined as smokers. Men and women who drank more than 60 g and 40 g of pure alcohol per day respectively were considered as drinking. BMI was calculated as BMI = weight (kg)/height (m)2. Hypertension was defined as systolic blood pressure ≥ 140 mmHg and/or diastolic blood pressure ≥ 90 mmHg.

For evaluating the expression levels of seven hub genes, we extracted the total RNA from the peripheral blood of each T2DM patient using a modified RiboPure™-Blood kit (Ambion, Austin, USA) according to the manufacturer’s protocol. The cDNA was synthesized using a reverse transcription kit (Guangzhou Ribobio, Co., Ltd., China). Quantitative real-time polymerase chain reaction (qRT-PCR) was carried out using a Cobas Z480PCR system. The relative expressions of the hub genes were calculated using the 2-ΔΔCT method. The forward and reverse primer sequences were as follows: TTCTCAGAGGCTGACATGCGCT and CTCGTCCAGAAGGATGTTGGCT (ADRBK1); CATCGAGCAGAGTGTCTACAGC and TGTCGTCGCTATCCAGG TCATG (LMX1A); CACCACGCTCTCCAATGCCTTT and CATCACCAGGATGG ACACAAGC (HTR1B); GACAACCACTGTGAACATCACGC and TGACACTTC CAATGGCTGTGCC (CDH10); TTACTGGTCCGACTGGTACGAC and CAAAGA ACTGCTGAGGCTTGGG (TAC1); CTCCTCTTTGTCATCACGCTTCC and GGA TGAGGACACTGCTGTAGAG (NPY); CTGGAACGTGACCAACGCCATC and TCATTCTCCTCGTACAGGCACG (SLC32A1).

Statistical analysis

All continuous variables were transformed into categorical variables, and expressed using counts and percents like other original categorical variables. We first screened relevant genes from the seven hub genes using the LASSO method, and then calculated the risk score for each patient as follows: risk score =gene1 expression*β1+…+genen expression*βn. Then the risk score was inputted as a continuous variable into logistic regression. We established three models for the prediction of DN: model 1 with only clinical parameters, model 2 with only identified gene expression, and model 3 with combined clinical parameters and gene expression. Multivariate logistic regression was used to validate if the risk score was an independent predictor of DN among T2DM patients or not. The receiver’s operating characteristic (ROC) curves were used to assess the discrimination ability of the models. Nomograms were adopted to quantify the predictive ability of the models. Calibration curves were plotted to evaluate the calibration of DN risk, which was accompanied by a Hosmer-Lemeshow test. Decision curve analysis was conducted to evaluate the clinical practicability of the models based on the net benefit according to the different threshold probabilities among T2DM patients.

Data availability

The expression profiling of genes from diabetics and diabetics nephropathy can be available from the GEO (GSE142153 and GSE26168). The other data can be obtained from the corresponding author (Bo Zhang).

Results

Differential expression analysis of genes

Using 23 DN samples and 9 diabetic samples, we performed differential expression analysis to obtain the DEGs that can be associated with DN. First, we merged the expression data of the DN group of GSE142153 and the T2DM group of GSE26168 to exclude the bath effects. The boxplot showed the whole gene expression levels of the merged dataset were approximately the same, which indicated no batch effect (Figure 1A). Using the criteria of |log2-fold changes (FC)|>1.0 and adjusted P <0.05, we obtained 110 DEGs, including 9 upregulated DEGs and 101 downregulated DEGs between the two groups (Figure 1B). The top 60 DEGs were presented in the hierarchical clustering heatmap (Figure 1C), including the significantly upregulated or downregulated DEGs.

Differentially expressed genes between GSE30529 and GSE30528. (A) Normalized boxplot of control and DN groups. (B) Volcano plot of differentially expressed genes. (C) Heatmap of differentially expressed genes between control and DN groups.

Figure 1. Differentially expressed genes between GSE30529 and GSE30528. (A) Normalized boxplot of control and DN groups. (B) Volcano plot of differentially expressed genes. (C) Heatmap of differentially expressed genes between control and DN groups.

Weighted gene co-expression network analysis (WGCNA)

Using WGCNA that describes the expression correlation among multiple genes, we identified the genes of interest. Two analytical methods were applied to identify the target genes. First, we performed the cluster analysis to assess sample similarity. The outlier’s detection identified three samples that were removed (GSM4221586, GSM532842, GSM4221594, Figure 2A). To construct a scale-free network, we performed WGCNA, and the scale-free topological fitting index was 0.88 and the mean connectivity was 90 by setting the soft threshold power at 10 (Figure 2B). Furthermore, we carried out hierarchical clustering to identify different kinds of gene modules using weighted gene co-expression correlation, which can be found through the branches of clustering tree and different colors. We identified 18 modules in size from 20 to 10000 in the network, and merged cut highs of 0.25 (Figure 2C). The 18 modules can be separated into three clusters according to the correlations among the modules (Figure 2D). Besides, the weighted coexpression correlations of all genes were presented in a heatmap plot (Figure 2E). The module MEmidnightblue showed the highest correlations, the correlation coefficients between the control group and the DN group were 0.99. Finally, we obtained 136 highly-related genes in the TOM matrix with a threshold greater than 0.1. The results of the two analyses can be overlapped, and a list of 110 target genes were identified, which may be involved in DN (Figure 3A).

Weighted gene co-expression network analysis. (A) Sample clustering. (B) Analysis of soft-thresholding powers to fit the scale-free topology model and the mean connectivity of the soft-thresholding powers. (C) Dendrogram of the gene modules. (D) Clustering of 8 gene modules. (E) Module-trait relationships.

Figure 2. Weighted gene co-expression network analysis. (A) Sample clustering. (B) Analysis of soft-thresholding powers to fit the scale-free topology model and the mean connectivity of the soft-thresholding powers. (C) Dendrogram of the gene modules. (D) Clustering of 8 gene modules. (E) Module-trait relationships.

Enrichment and pathways analysis. (A) Venn diagrams of DEG list and highly-related gene list. (B) GO enrichment analysis of 130 gens. (C, D) KEGG pathway analysis of 130 genes.

Figure 3. Enrichment and pathways analysis. (A) Venn diagrams of DEG list and highly-related gene list. (B) GO enrichment analysis of 130 gens. (C, D) KEGG pathway analysis of 130 genes.

Function enrichment analysis of targeted genes

The occurrence of DN may involve complex pathogenesis and biological processes, and understanding the potential function of target genes will help with DN treatment. The GO enrichment analysis indicates these target genes are mainly related to membrane adhesion, alpha-amino acid biosynthesis, metabolism, and binding, terminus, inhibitory synapse, clathrinid-sculpted vesicle, kinase activity, hormone binding, receptor activity, and transporter activity (Figure 3B). KEGG analysis indicates synaptic vesicle cycle, cysteine and methionine metabolism, N-Glycan biosynthesis, osteoclast differentiation, and cAMP signaling pathway can be involved in the process of DN in diabetic patients (Figure 3C, 3D). We also built the PPI based on these target genes (Figure 4A), and identified the top 10 hub genes: LMX1A, TYRP1, NANOG, ADRBK1, HTR1B, TAC1, NPY, SLC32A1, GAD1, and CDH10 (Figure 4B). Finally, we identified the hub genes using five algorithms (Figure 4C). We overlapped the results of the five algorithms and obtained the final hub genes that were used for model building: ADRBK1, LMX1A, HTR1B, CDH10, TAC1, NPY, and SLC32A1.

PPI networks. (A) PPI network of combined genes. (B) PPI network of top 10 Hub genes. (C) Venn diagrams of Hub genes based on MCC, MNC, Degree, DMNC, EPC methods. (D) Cross-validation for identifying parameters in LASSO. (E) LASSO regression of 6 hub genes.

Figure 4. PPI networks. (A) PPI network of combined genes. (B) PPI network of top 10 Hub genes. (C) Venn diagrams of Hub genes based on MCC, MNC, Degree, DMNC, EPC methods. (D) Cross-validation for identifying parameters in LASSO. (E) LASSO regression of 6 hub genes.

Development of DN prediction model based on target genes

Based on 7 target genes, we performed LASSO regression and identified 6 genes (ADRBK1, LMX1A, HTR1B, TAC1, NPY, SLC32A1) included in the prediction model (Figure 4D). We calculated the risk score as follows: risk score = 1.466 * ADRBK1 - 0.627 * LMX1A + 0.199 * HTR1B + 0.130 * TAC1 + 0.477 * NPY + 1.005 * SLC32A1.

The general characteristics between the DN group and the non-DN group were compared. The risk scores were higher in the DN group than in the non-DN group (10.54 vs. 9.03, P<0.001; Table 1). The DN group had higher proportions of males (65.4% vs. 55.9%, P=0.042) and smokers (42.6% vs. 23.1%, P<0.001). The SBP (P<0.001) and FBG (P=0.016) levels were higher in the DN group than in the non-DN group. The TC, TG and BUN levels were also significantly elevated in the DN group (P<0.001). No significant differences were observed in age >60, drinking status, DBP>=90 rate, and HbA1c, HDL, LDL, SCR, or UC levels (P>0.05). The univariate logistic regression indicated the risk score was positively associated with DN (OR (95%CI): 2.72 (2.26-3.26), P<0.001). Besides, male (1.49 (1.03-2.17), P=0.034), FBG>6.9 (1.62 (1.11-2.38), P=0.013), overweight or obesity (1.84 (1.21-2.78), P=0.004), DBP>90 (1.55 (1.02-2.37), P=0.042), and smoking (2.47 (1.69-3.60)), SBP>140 (2.23 (1.55-3.22)), abnormal TC (2.68 (1.84-3.91)) and TG (2.16 (1.48-3.14)), and elevated BUN (2.45 (1.60-3.77)) (all P<0.001) were also risk factors of DN in diabetic patients (Table 2). Using these significant variables in the univariate logistic regression, we performed multiple logistic regression. Our results indicated that risk score was an independent risk factor of DN for diabetic patients (OR:2.76, 9%CI: 2.27-3.41, P<0.001). Besides, smoking, overweight, SBP>=140, abnormal TC, TG and BUN were also risk factors of DN (Figure 5).

Table 1. Comparisons of general characteristics between control and case group.

ParametersLevelControl, n (%)Case, n (%)P
Age (years)≤60199 (42.6)57 (35.2)0.118
>60268 (57.4)105 (64.8)
GenderFemale206 (44.1)56 (34.6)0.042
Male261 (55.9)106 (65.4)
SmokingNo359 (76.9)93 (57.4)<0.001
Yes108 (23.1)69 (42.6)
DrinkingNo299 (64.0)107 (66.0)0.712
Yes168 (36.0)55 (34.0)
SBP (mmHg)<140277 (59.3)64 (39.5)<0.001
≥140190 (40.7)98 (60.5)
DBP (mmHg)<90381 (81.6)120 (74.1)0.053
>=9086 (18.4)42 (25.9)
FBG (mmol/L)≤6.9193 (41.3)49 (30.2)0.016
>6.9274 (58.7)113 (69.8)
HbA1c (%)≤6.9%230 (49.3)71 (43.8)0.272
>6.9%237 (50.7)91 (56.2)
TC (mmol/L)≤4.5261 (55.9)52 (32.1)<0.001
>4.5206 (44.1)110 (67.9)
TG (mmol/L)<1.7239 (51.2)53 (32.7)<0.001
≥1.7228 (48.8)109 (67.3)
HDL (mmol/L)<1.023 (4.9)2 (1.2)0.066
≥1.0444 (95.1)160 (98.8)
LDL (mmol/L)<1.8307 (65.7)108 (66.7)0.906
≥1.8160 (34.3)54 (33.3)
BUN (mmol/L)≤7.1402 (86.1)116 (71.6)<0.001
>7.165 (13.9)46 (28.4)
SCR (μmol/L)≤106436 (93.4)148 (91.4)0.499
>10631 (6.6)14 (8.6)
UC (mmol/L)≤440425 (91.2)141 (87.0)0.168
>44041 (8.8)21 (13.0)
Risk Score (mean (SD))9.03 (1.18)10.54 (1.22)<0.001

Table 2. Results of univariate logistic regression for DN in T2DM patients.

VariableBSEWaldPORLowerUpper
Age (>60)0.3130.1892.7380.0981.370.941.98
Gender (Male)0.4010.1904.4780.0341.491.032.17
Smoking (Yes)0.9030.19321.852<0.0012.471.693.60
Drinking (Yes)-0.0890.1920.2150.6430.920.631.33
BMI (>28)0.6070.2128.1910.0041.841.212.78
SBP (>140)0.8030.18618.585<0.0012.231.553.22
DBP (>90)0.4390.2154.1490.0421.551.022.37
FBG (>6.9)0.4850.1956.1750.0131.621.112.38
HbA1c (6.9%)0.2180.1831.4150.2341.240.871.78
TC (>4.5)0.9860.19226.264<0.0012.681.843.91
TG (≥1.7)0.7680.19116.117<0.0012.161.483.14
HDL (≥1.0)1.4220.7433.6620.0564.140.9717.78
LDL≥ (1.8)-0.0410.1930.0460.8310.960.661.40
BUN (>7.1)0.8970.22016.686<0.0012.451.603.77
SCR (>106)0.2860.3360.7230.3951.330.692.57
UC (>440)0.0600.1590.1420.7071.060.781.45
Risk Score0.9990.093115.102<0.0012.722.263.26
Forest plot of multivariate logistic regression for DN risk.

Figure 5. Forest plot of multivariate logistic regression for DN risk.

Assessment of clinical application of prediction model

To validate the model, we built the following three models: model 1: risk score, mode 2: clinical parameters; model 3: risk score and clinical parameters. For models 1 and 2, the AUCs were 0.763 (95%CI: 0.721-0.805, Figure 6A) and 0.813 (95%CI: 0.775-0.851, Figure 6B) respectively. Model 3 showed the highest predicting ability (AUC:0.870, 9%CI: 0.837-0.902, Figure 6C). Based on model 3, we built a nomogram plot to evaluate the risk probability of DN for individuals (Figure 6D). We only included the significant variables in the univariate logistic regression into the nomogram. Specifically, we can get a point for each variable. We summarized the points of all variables in the nomogram. The summarized points can be projected onto the risk of nonadherence. Then we can know the risk probability of individuals. These descriptions were added in the manuscript.

Development of DN risk model and nomogram prediction based on serum mRNA of hub genes. ROC curves of DN risk models in T2DM based on (A) clinical parameters, (B) risk score of hub genes, and (C) clinical parameters and risk score of hug gens. (D) Nomogram of DN risk model in T2DM.

Figure 6. Development of DN risk model and nomogram prediction based on serum mRNA of hub genes. ROC curves of DN risk models in T2DM based on (A) clinical parameters, (B) risk score of hub genes, and (C) clinical parameters and risk score of hug gens. (D) Nomogram of DN risk model in T2DM.

Furthermore, we evaluated the agreement between actual diagnosed probability and nomogram-predicted probability. Results showed relatively high agreement (Figure 7A). The calculated C-index was 0.870, suggesting that the model has high prediction accuracy. Moreover, decision curve analysis for DN risk indicated that T2DM patients will benefit from this DN risk prediction model when the threshold probability of a patient and clinician was 25%, meaning the net benefit is relatively high based on the DN risk nomogram (Figure 7B). To further assess the model, we randomly extracted half of the whole study sample to validate the prediction model. The internal validation showed that the AUC was 0.821 (95%CI: 0.756-0.886, Figure 7C), and the calibration curve of this sample also indicated high performance (Figure 7D). These results suggest that the DN prediction model can help clinicians make decision for T2DM patients.

Assessment and validation of DN risk model. (A) Calibration curves of DN risk prediction in the whole cohort. (B) Decision curve analysis for DN risk model in whole cohort. (C) ROC curves of DN risk model in validated cohort. (D) Calibration curves of DN risk prediction model in the validated cohort.

Figure 7. Assessment and validation of DN risk model. (A) Calibration curves of DN risk prediction in the whole cohort. (B) Decision curve analysis for DN risk model in whole cohort. (C) ROC curves of DN risk model in validated cohort. (D) Calibration curves of DN risk prediction model in the validated cohort.

Discussion

According to the International Diabetes Federation, the global prevalence of diabetes is expected to rise to 10.2% (about 578 million people) by 2030 and to 10.9% (about 700 million people) by 2045 [17]. At present, the prevalence of diabetes in China is up to 11.2%. The awareness rate, control rate and treatment rate of diabetes have improved, but are still low [18]. About 30% of diabetic patients can develop DN, which is closely related to the high incidence and mortality of heart disease, and then develop end-stage renal disease. Our results show the DN incidence of T2DM patients is 25.7%, which falls into the scope. Therefore, early detection and screening of high-risk patients with DN is the key to prevention and treatment.

DN is the main cause of end-stage renal disease. In the early stage of DN, renal function damage can be alleviated or even reversed. Once at the stage of clinical macroalbuminuria, renal function damage will be progressively aggravated and become irreversible in the environment of continuous hyperglycemia [19]. Thus, it is of great important of establish an evaluation tool for predicting early-stage DN. We explored the differences of serum transcriptomics between DN and non-DN in T2DM patients, and obtained 110 target genes using WGCNA. GO enrichment analysis indicates that these target genes are mainly related to membrane adhesion, alpha-amino acid biosynthesis, metabolism, and binding, terminus, inhibitory synapse, clathrinid-sculpted vesicle, kinase activity, hormone binding, receptor activity, and transporter activity. KEGG analysis indicates synaptic vesicle cycle, cysteine and methionine metabolism, N-Glycan biosynthesis, osteoclast differentiation, and cAMP signaling pathway can be involved in the process of DN in diabetic patients. Next, we detected the expression levels of hub genes in a retrospective cohort, and developed an early DN prediction model for T2DM patients. Finally, we found the prediction model can be well applied in clinical practice, as confirmed by internal validation and several other methods.

A prospective investigation performed in 28 countries indicates that the prevalence of DN in T2DM is 27.9%. The prevalence differs among countries or areas. The prevalence of DN is the highest in Russia and Middle East, but is the lowest in South Asia [20]. The data from USA show that the prevalence of DN is 23.7% among diabetic patients [21]. In South Korea, the prevalence of DN is 26.7% in diabetic patients. Our study shows that the prevalence of DN is 25.8%, which is close to the level of South Korea [22]. However, the prevalence is 20% in European countries, which ranks the first in all complications of diabetes. These results indicate the prevalence of DN varies among different populations.

Nomograms have been extensively used in the prognosis prediction of tumor and non-tumor diseases. Nomograms can well express the relationships between variables and clinical outcomes [23]. There are already some DN prediction models in T2DM patients. Jiang S et al. enrolled 302 T2DM patients who underwent continuous renal biopsy in China-Japan Friendship Hospital, and randomly divided their data into a training set with 70% of patients and a validation set with the remaining 30% for model construction and external validation. Nine variables including gender, course of diabetes, diabetic retinopathy (DR), hematuria, HbA1c, hemoglobin, blood pressure, urinary protein excretion and eGFR were included to construct a Nomogram. The C index of the model was 0.934, and the C index of internal and external verification was 0.91 and 0.875 respectively [24]. Xi C et al. conducted a questionnaire including physical examination, blood routine examination and biochemical index evaluation of 1095 T2DM patients in Guilin. Risk factors such as gender, age, hypertension, duration of drug use, diabetes, BMI, blood urea nitrogen, serum creatinine levels, the ratio of neutrophils and lymphocytes and red blood cell distribution width were combined with selected risk factors and involved into the logistic regression analysis of a prediction nomogram model. The C index was 0.819. The area under the ROC curve (AUC) was 0.813, and the C index of internal verification was 0.796. Decision curve analysis showed the risk profile of DN was clinically appropriate when the risk threshold ranged from 1% to 83% [25]. These two studies show that nomograms play a certain role in the risk assessment of DN. However, the sample sizes of these two studies are small, and the second study has no external verification. Certainly, there are community-based studies with large sample sizes. Shi R et al. included 4219 T2DM patients, and divided them into a T2DM group, a DK group, a DR group and a DR+DN group. The predictive factors in the prediction model included course of disease, BMI, triglyceride, systolic blood pressure, postprandial blood glucose, HbA1c and urea nitrogen. The C index of the model is 0.807, the THE AUC is 0.807, the C index of internal validation is 0.804, and the analytical risk threshold of the decision curve is 16%-75%, indicating this model can predict DK risk [26]. In addition, Wang G et al. selected 2163 diabetic inpatients and established 4 screening equations based on Nomograms. The factors included in the four models were different, and 10 factors were included in the whole model: drinking status, hypertension, duration of diabetes, history of coronary heart disease, SBP, total cholesterol, fasting plasma C-peptide, uric acid, and diabetic retinopathy. Model 1 included 7 factors: sex, SBP, TC, alcohol consumption, hypertension, coronary heart disease and duration of diabetes. HbA1c was added to model 2 based on Model 1. There were 6 factors in the simplified model: sex, SBP, alcohol consumption, hypertension, coronary heart disease and duration of diabetes. The C index of the four models is 0.8450, 0.8149, 0.8171 and 0.8083, respectively, which are 3.2756, 7.749, 10.023 and 12.294 according to Hosmer-Lemeshow goodness of fit test [27]. Both model 1 and Model 2 have good predictive performance and validity, and can be used to screen DKD cases in China. Despite the large sample sizes, these two studies were not verified externally, so their applicability needs further consideration.

We built a risk score for DN in T2DM patients using six genes. The univariate and multivariate logistic regressions showed that risk score was an independent risk factor of DN. To develop a risk prediction tool of DN, we built three models: model 1 based on clinical characteristics; model 2 based on hub genes, and model 3 based on both clinical characteristics and hub genes. The values of the C-index are 0.781, 0.815, and 0.872, respectively. The AUCs are 0.763, 0.813 and 0.870. Like previous studies, our multivariate regression indicates similar risk factors, such as smoking status, overweight, elevated SBP, TC, TG, and BUN [2830]. There are several differences between our study and previous studies. Our study includes clinical characteristics and gene expression. Our data of gene expression were obtained from the serum detection, which is easily achieved compared with tissue biopsy. Our study also has several limitations: the model validation is based on interval data, but external validation is required. Some factors included the model may be inapplicable into community research.

In conclusion, we established a novel DN risk model with relatively high prediction accuracy based on clinical characteristics and hub genes of serum detection. The estimated risk score can help clinicians develop individualized intervention programs for DN in T2DM. External validation data are required before individualized intervention measures are taken.

Author Contributions

ZB designed this study and directed the research group in all aspects, including planning, execution, and analysis of the study. ZXL drafted the manuscript. WY, YZJ, CXP, and ZJP collected the data. WX provided the statistical software, JX performed the data analysis, ZXL arranged the Figures and Tables. ZB and YWY revised the manuscript. All authors have read and approved the final version of the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Ethical Statement and Consent

This study was approved by the Ethics Committee of China-Japan Friendship Hospital, and all subjects provided written informed consent.

Funding

No funding was provided for this study.

References

  • 1. DeFronzo RA, Ferrannini E, Groop L, Henry RR, Herman WH, Holst JJ, Hu FB, Kahn CR, Raz I, Shulman GI, Simonson DC, Testa MA, Weiss R. Type 2 diabetes mellitus. Nat Rev Dis Primers. 2015; 1:15019. https://doi.org/10.1038/nrdp.2015.19 [PubMed]
  • 2. DeFronzo RA. Pathogenesis of type 2 diabetes mellitus. Med Clin North Am. 2004; 88:787–835. https://doi.org/10.1016/j.mcna.2004.04.013 [PubMed]
  • 3. Solis-Herrera C, Triplitt C, Cersosimo E, DeFronzo RA. Pathogenesis of Type 2 Diabetes Mellitus. Endotext. 2021. [PubMed]
  • 4. Cheng LJ, Chen JH, Lin MY, Chen LC, Lao CH, Luh H, Hwang SJ. A competing risk analysis of sequential complication development in Asian type 2 diabetes mellitus patients. Sci Rep. 2015; 5:15687. https://doi.org/10.1038/srep15687 [PubMed]
  • 5. Herrera-Pombo JL, Aguilar-Diosdado M, Hawkins F, Campos MM, Moreno A, Garcia-Hernandez A, Castro E, García-Doncel LG, Serraclara A, Sánchez-Malo C, Escobar-Jiménez F. Is increasing urinary albumin a better marker for microvascular than for macrovascular complication of type 2 diabetes mellitus? Nephron Clin Pract. 2005; 101:c116–21. https://doi.org/10.1159/000086681 [PubMed]
  • 6. Aghadavod E, Soleimani A, Amirani E, Gholriz Khatami P, Akasheh N, Sharafati Chaleshtori R, Shafabakhsh R, Banikazemi Z, Asemi Z. Comparison Between Biomarkers of Kidney Injury, Inflammation, and Oxidative Stress in Patients with Diabetic Nephropathy and Type 2 Diabetes Mellitus. Iran J Kidney Dis. 2020; 14:31–5. [PubMed]
  • 7. Zhang XX, Kong J, Yun K. Prevalence of Diabetic Nephropathy among Patients with Type 2 Diabetes Mellitus in China: A Meta-Analysis of Observational Studies. J Diabetes Res. 2020; 2020:2315607. https://doi.org/10.1155/2020/2315607 [PubMed]
  • 8. Tzamaloukas AH, Murata GH. Prevention of nephropathy in patients with type 2 diabetes mellitus. Int Urol Nephrol. 2005; 37:655–63. https://doi.org/10.1007/s11255-005-2394-3 [PubMed]
  • 9. Zhang R, Li Y, Zhang S, Cai X, Zhou X, Ji L. The Association of Retinopathy and Plasma Glucose and HbA1c: A Validation of Diabetes Diagnostic Criteria in a Chinese Population. J Diabetes Res. 2016; 2016:4034129. https://doi.org/10.1155/2016/4034129 [PubMed]
  • 10. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008; 9:559. https://doi.org/10.1186/1471-2105-9-559 [PubMed]
  • 11. Zou X, Wang S, Zhang P, Lu L, Zou H. Quantitative Proteomics and Weighted Correlation Network Analysis of Tear Samples in Adults and Children With Diabetes and Dry Eye. Transl Vis Sci Technol. 2020; 9:8. https://doi.org/10.1167/tvst.9.13.8 [PubMed]
  • 12. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28:27–30. https://doi.org/10.1093/nar/28.1.27 [PubMed]
  • 13. Nota B. Gogadget: An R Package for Interpretation and Visualization of GO Enrichment Results. Mol Inform. 2017; 36. https://doi.org/10.1002/minf.201600132 [PubMed]
  • 14. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–504. https://doi.org/10.1101/gr.1239303 [PubMed]
  • 15. Opoku S, Gan Y, Yobo EA, Tenkorang-Twum D, Yue W, Wang Z, Lu Z. Awareness, treatment, control, and determinants of dyslipidemia among adults in China. Sci Rep. 2021; 11:10056. https://doi.org/10.1038/s41598-021-89401-2 [PubMed]
  • 16. American Diabetes Association. Diabetic nephropathy. Diabetes Care. 2000 (Suppl 1); 23:S69–72. [PubMed]
  • 17. Saeedi P, Petersohn I, Salpea P, Malanda B, Karuranga S, Unwin N, Colagiuri S, Guariguata L, Motala AA, Ogurtsova K, Shaw JE, Bright D, Williams R, and IDF Diabetes Atlas Committee. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract. 2019; 157:107843. https://doi.org/10.1016/j.diabres.2019.107843 [PubMed]
  • 18. Juan J, Yang H. Prevalence, Prevention, and Lifestyle Intervention of Gestational Diabetes Mellitus in China. Int J Environ Res Public Health. 2020; 17:9517. https://doi.org/10.3390/ijerph17249517 [PubMed]
  • 19. Qi C, Mao X, Zhang Z, Wu H. Classification and Differential Diagnosis of Diabetic Nephropathy. J Diabetes Res. 2017; 2017:8637138. https://doi.org/10.1155/2017/8637138 [PubMed]
  • 20. Litwak L, Goh SY, Hussein Z, Malek R, Prusty V, Khamseh ME. Prevalence of diabetes complications in people with type 2 diabetes mellitus and its association with baseline characteristics in the multinational A1chieve study. Diabetol Metab Syndr. 2013; 5:57. https://doi.org/10.1186/1758-5996-5-57 [PubMed]
  • 21. de Boer IH, Rue TC, Hall YN, Heagerty PJ, Weiss NS, Himmelfarb J. Temporal trends in the prevalence of diabetic kidney disease in the United States. JAMA. 2011; 305:2532–9. https://doi.org/10.1001/jama.2011.861 [PubMed]
  • 22. Ahn JH, Yu JH, Ko SH, Kwon HS, Kim DJ, Kim JH, Kim CS, Song KH, Won JC, Lim S, Choi SH, Han K, Cha BY, Kim NH, and Taskforce Team of Diabetes Fact Sheet of the Korean Diabetes Association. Prevalence and determinants of diabetic nephropathy in Korea: Korea national health and nutrition examination survey. Diabetes Metab J. 2014; 38:109–19. https://doi.org/10.4093/dmj.2014.38.2.109 [PubMed]
  • 23. Zhang X, Zhao X, Huo L, Yuan N, Sun J, Du J, Nan M, Ji L. Risk prediction model of gestational diabetes mellitus based on nomogram in a Chinese population cohort study. Sci Rep. 2020; 10:21223. https://doi.org/10.1038/s41598-020-78164-x [PubMed]
  • 24. Jiang S, Fang J, Yu T, Liu L, Zou G, Gao H, Zhuo L, Li W. Novel Model Predicts Diabetic Nephropathy in Type 2 Diabetes. Am J Nephrol. 2020; 51:130–8. https://doi.org/10.1159/000505145 [PubMed]
  • 25. Xi C, Wang C, Rong G, Deng J. A Nomogram Model that Predicts the Risk of Diabetic Nephropathy in Type 2 Diabetes Mellitus Patients: A Retrospective Study. Int J Endocrinol. 2021; 2021:6672444. https://doi.org/10.1155/2021/6672444 [PubMed]
  • 26. Shi R, Niu Z, Wu B, Zhang T, Cai D, Sun H, Hu Y, Mo R, Hu F. Nomogram for the Risk of Diabetic Nephropathy or Diabetic Retinopathy Among Patients with Type 2 Diabetes Mellitus Based on Questionnaire and Biochemical Indicators: A Cross-Sectional Study. Diabetes Metab Syndr Obes. 2020; 13:1215–29. https://doi.org/10.2147/DMSO.S244061 [PubMed]
  • 27. Wang G, Wang B, Qiao G, Lou H, Xu F, Chen Z, Chen S. Screening Tools Based on Nomogram for Diabetic Kidney Diseases in Chinese Type 2 Diabetes Mellitus Patients. Diabetes Metab J. 2021; 45:708–18. https://doi.org/10.4093/dmj.2020.0117 [PubMed]
  • 28. Hollenberg NK. Diabetes, nephropathy, and the renin system. J Hypertens Suppl. 2006; 24:S81–7. https://doi.org/10.1097/01.hjh.0000220411.76740.bf [PubMed]
  • 29. Quan KY, Yap CG, Jahan NK, Pillai N. Review of early circulating biomolecules associated with diabetes nephropathy - Ideal candidates for early biomarker array test for DN. Diabetes Res Clin Pract. 2021; 182:109122. https://doi.org/10.1016/j.diabres.2021.109122 [PubMed]
  • 30. Zhang DJ, Yang JC, Guan YF. [Proteinomics study in diabetes nephropathy]. Sheng Li Ke Xue Jin Zhan. 2010; 41:129–32. [PubMed]