Research Paper Volume 13, Issue 19 pp 23284—23307

Transcriptome research identifies four hub genes related to primary myelofibrosis: a holistic research by weighted gene co-expression network analysis

Weihang Li1, *, , Yingjing Zhao2, *, , Dong Wang1, , Ziyi Ding1, , Chengfei Li3, , Bo Wang5, , Xiong Xue5, , Jun Ma5, , Yajun Deng5, , Quancheng Liu5, , Guohua Zhang5, , Ying Zhang5, , Kai Wang4, , Bin Yuan5, ,

  • 1 Department of Orthopaedics, Xijing Hospital, The Fourth Military Medical University, Xi’an 710032, People’s Republic of China
  • 2 Department of Intensive Care Unit, Nanjing First Hospital, Nanjing Medical University, Nanjing 210006, Jiangsu Province, China
  • 3 Department of Aerospace Medical Training, School of Aerospace Medicine, Fourth Military Medical University, Xi’an 710032, Shaanxi, China
  • 4 Department of Hematology, Daxing Hospital, Xi’an 710016, Shaanxi, China
  • 5 Department of Spine Surgery, Daxing Hospital, Xi’an 710016, Shaanxi, China
* Co-first author

Received: May 9, 2021       Accepted: September 29, 2021       Published: October 11, 2021
How to Cite

Copyright: © 2021 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Objectives: This study aimed to identify specific diagnostic as well as predictive targets of primary myelofibrosis (PMF).

Methods: The gene expression profiles of GSE26049 were obtained from Gene Expression Omnibus (GEO) dataset, WGCNA was constructed to identify the most related module of PMF. Subsequently, Gene Ontology (GO), Kyoto Encyclopedia Genes and Genomes (KEGG), Gene Set Enrichment Analysis (GSEA) and Protein-Protein interaction (PPI) network were conducted to fully understand the detailed information of the interested green module. Machine learning, Principal component analysis (PCA), and expression pattern analysis including immunohistochemistry and immunofluorescence of genes and proteins were performed to validate the reliability of these hub genes.

Results: Green module was strongly correlated with PMF disease after WGCNA analysis. 20 genes in green module were identified as hub genes responsible for the progression of PMF. GO, KEGG revealed that these hub genes were primarily enriched in erythrocyte differentiation, transcription factor binding, hemoglobin complex, transcription factor complex and cell cycle, etc. Among them, EPB42, CALR, SLC4A1 and MPL had the most correlations with PMF. Machine learning, Principal component analysis (PCA), and expression pattern analysis proved the results in this study.

Conclusions: EPB42, CALR, SLC4A1 and MPL were significantly highly expressed in PMF samples. These four genes may be considered as candidate prognostic biomarkers and potential therapeutic targets for early stage of PMF. The effects are worth expected whether in the diagnosis at early stage or as therapeutic target.


BP: Biological process; CC: Cellular component; DAVID: Database for Annotation, visualization, and integrated discovery; ET: Essential Thrombocythemia; GO: Gene Ontology; GSEA: Gene Set Enrichment Analysis; IHC: Immunohistochemistry; IF: Immunofluorescence; KEGG: Kyoto Encyclopedia Genes and Genomes; MF: Molecular Function; PPI: Protein-Protein Interaction; PMF: Primary Myelofibrosis; PV: Polycythemia Vera; STRING: Search tool for retrieval of interacting genes; WGCNA: Weighted Co-expression Network Analysis.