Research Paper Volume 14, Issue 2 pp 845—868

Tumor microenvironment-related multigene prognostic prediction model for breast cancer

Kai Hong1, , Yingjue Zhang2, , Lingli Yao1, , Jiabo Zhang3, , Xianneng Sheng3, , Yu Guo3, ,

  • 1 Medicine School, Ningbo University, Jiangbei, Ningbo 315211, Zhejiang, China
  • 2 Department of Molecular Pathology, Division of Health Sciences, Graduate School of Medicine, Osaka University, Suita, Osaka 565–0871, Japan
  • 3 Department of Thyroid and Breast Surgery, Ningbo City First Hospital, Haishu, Ningbo 315010, Zhejiang, China

Received: October 28, 2021       Accepted: January 14, 2022       Published: January 20, 2022
How to Cite

Copyright: © 2022 Hong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Background: Breast cancer is an invasive disease with complex molecular mechanisms. Prognosis-related biomarkers are still urgently needed to predict outcomes of breast cancer patients.

Methods: Original data were download from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). The analyses were performed using perl-5.32 and R-x64-4.1.1.

Results: In this study, 1086 differentially expressed genes (DEGs) were identified in the TCGA cohort; 523 shared DEGs were identified in the TCGA and GSE10886 cohorts. Eight subtypes were estimated using non-negative matrix factorization clustering with significant differences seen in overall survival (OS) and progression-free survival (PFS) (P < 0.01). Univariate Cox analysis and least absolute shrinkage and selection operator (LASSO) regression analysis were performed to develop a related risk score related to the 17 DEGs; this score separated breast cancer into low- and high-risk groups with significant differences in survival (P < 0.01) and showed powerful effectiveness (TCGA all group: 1-year area under the curve [AUC] = 0.729, 3-year AUC = 0.778, 5-year AUC = 0.781). A nomogram prediction model was constructed using non-negative matrix factorization clustering, the risk score, and clinical characteristics. Our model was confirmed to be related with tumor microenvironment. Furthermore, DEGs in high-risk breast cancer were enriched in histidine metabolism (normalized enrichment score [NES] = 1.49, P < 0.05), protein export (NES = 1.58, P < 0.05), and steroid hormone biosynthesis signaling pathways (NES = 1.56, P < 0.05).

Conclusions: We established a comprehensive model that can predict prognosis and guide treatment.


AUC: Area under the curve; CI: Confidence interval; DCA: Decision curve analysis; DEG: Differentially expressed gene; ER: Estrogen receptor; FC: Fold change; FDR: False discovery rate; GEO: Gene Expression Omnibus; GO: Gene ontology; GSEA: Gene set enrichment analysis; HER2: Human epidermal growth factor receptor2; HR: Hazard ratio; KEGG: Kyoto Encyclopedia of Genes and Genomes; K-M: Kaplan-Meier; LASSO: Least absolute shrinkage and selection operator; MCP: Microenvironment cell population; NES: Normalized enrichment score; NMF: Negative matrix factorization; OS: Overall survival; PFS: Progression-free survival; PPI: Protein interaction; PR: Progesterone receptor; ROC: Receiver operating characteristic; TAM: Tumor-associated macrophage; TCGA: The Cancer Genome Atlas; TCGA-BRCA: The Cancer Genome Atlas Breast Invasive Carcinoma; THPA: The Human Protein Atlasdatabase; TMB: Tumor mutation burden; TME: Tumor microenvironment.