| 57 | 0 | 29 |
| 下载次数 | 被引频次 | 阅读次数 |
基因表达数据可以在特定条件和时间下揭示疾病的病理机制,然而“维数灾难”,也就是小样本、高维度,限制了传统机器学习分类方法的效果,导致预测精度低、无法识别小样本和稳定性差等问题。本文结合数据增强和基因选择两个方面提出了一种新的方法,命名为CVAE-CWGNA-DAE,尝试解决由“维数灾难”带来的问题。针对基因表达数据中存在的小样本问题,提出基于条件变分自编码器结合基于梯度惩罚的条件Wasserstein生成对抗网络的数据增强方法,通过与现有方法的比较,证明该方法在分类效果和稳定性上的优越性。为了解决基因表达中存在的高维度问题,同时为了验证生成数据的有效性,采用基于降噪自编码器和支持向量机递归特征消除(SVM-RFE)的基因选择方法。结果表明:利用数据增强后的数据集进行基因选择,所选出的基因在分类任务上的准确率在5种不同分类上均得到了提升。这些结果证明本文方法在缓解“维数灾难”方面的有效性,并在基因选择方面取得了显著的改进。
Abstract:Gene expression data can elucidate the pathological mechanisms of diseases under specific conditions and times.However,the “curse of dimensionality”phenomenon characterised by small samples and high dimensions,constrains the performance of traditional machine learning classification methods. This results in low prediction accuracy,an inability to recognise small samples,and poor stability. This article introduces a novel method,namely CVAE-CWGNA-DAE,which integrates data augmentation and gene selection in order to address the issues that arise from the “curse of dimensionality”.Firstly,in order to address the issue of the small sample size in gene expression data,a data augmentation method is proposed,which combines a conditional variational autoencoder with a gradient penalty-based conditional Wasserstein generative adversarial network. A comparison with existing methods demonstrates the superiority of this approach in terms of classification performance and stability. Secondly,to address the high dimensionality in gene expression data and verify the effectiveness of the generated data,this article employs a gene selection method based on a denoising autoencoder and SVMRFE. The results reveal that the use of the augmented dataset for gene selection has resulted in an improvement in the accuracy of selected genes across five distinct classification tasks. Therefore,these results demonstrate the effectiveness of the proposed method in addressing the “curse of dimensionality” and achieving significant improvements in gene selection.
[1]CHEN Y,LI Y,NARAYAN R,et al. Gene expression inference with deep learning[J]. Bioinformatics,2016,32(12):1832-1839.
[2]FINOTELLO F,DI CAMILLO B. Measuring differential gene expression with RNA-seq:challenges and strategies for data analysis[J]. Briefings in functional genomics,2014,14(2):130-142.
[3]IRWIN J J,SHOICHET B K. ZINC:a free database of commercially available compounds for virtual screening[J]. Journal of chemical information and modeling,2005,45(1):177-182.
[4]HUNTER S,APWEILER R,ATTWOOD T K,et al.Inter Pro:the integrative protein signature database[J].Nucleic acids research,2009,37(S1):211-215.
[5]AL ABIR F,SHOVAN S M,HASAN M A M,et al.Biomarker identification by reversing the learning mechanism of an autoencoder and recursive feature elimination[J]. Molecular omics,2022,18(7):652-661.
[6]POLDRACK R A,GORGOLEWSKI K J. OpenfMRI:open sharing of task f MRI data[J]. Neuroimage,2017,144:259-261.
[7]YAN K,WANG X,LU L,et al. DeepLesion:automated mining of large-scale lesion annotations and universal lesion detection with deep learning[J]. Journal of medical imaging,2018,5(3):036501.
[8]WU P Y,CHENG C W,KADDI C D,et al. Omic and electronic health record big data analytics for precision medicine[J]. IEEE Transactions on biomedical engineering,2016,64(2):263-273.
[9]WEI R,MAHMOOD A. Recent advances in variational autoencoders with representation learning for biomedical informatics:a survey[J]. IEEE Access,2020,9:4939-4956.
[10]ASYALI M H,COLAK D,DEMIRKAYA O,et al. Gene expression profile classification:a review[J]. Current bioinformatics,2006,1(1):55-73.
[11]ZEEBAREE D Q,HASAN D A,ABDULAZEEZ A M,et al. Machine learning semi-supervised algorithms for gene selection:a review[C]//IEEE. 2021 IEEE 11th International Conference on System Engineering and Technology(ICSET). Shah Alam:IEEE,2021:165-170.
[12]WANG Y,CHEN Q,SHAO H,et al. Generating bulk RNA-seq gene expression data based on generative deep learning models and utilizing it for data augmentation[J].Computers in biology and medicine,2024,169:107828.
[13]LEE M. Recent advances in generative adversarial networks for gene expression data:a comprehensive review[J]. Mathematics,2023,11(14):3055.
[14]LACAN A,SEBAG M,HANCZAR B. GAN-based data augmentation for transcriptomics:survey and comparative assessment[J]. Bioinformatics,2023,39(S1):111-120.
[15]AHMED K T,SUN J,CHENG S,et al. Multi-omics data integration by generative adversarial network[J]. Bioinformatics,2021,38(1):179-186.
[16]VIÑAS R,ANDRÉS-TERRÉH,LIÒP,et al. Adversarial generation of gene expression data[J]. Bioinformatics,2022,38(3):730-737.
[17]MAROUF M,MACHART P,BANSAL V,et al. Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks[J].Nature communications,2020,11(1):166.
[18]DINCER A,CELIK S,HIRANUMA N,et al. DeepProfile:deep learning of patient molecular profiles for precision medicine in acute myeloid leukemia[EB/OL].[2024-05-01]. http://doi.org/10.1101/278739.
[19]KIM S,KIM K,CHOE J,et al. Improved survival analysis by learning shared genomic information from pancancer data[J]. Bioinformatics,2020,36(S1):389-398.
[20]BICA I,ANDRÉS-TERRÉH,CVEJIC A,et al. Unsupervised generative and graph representation learning for modelling cell differentiation[J]. Scientific reports ,2020,10(1):9790.
[21]YU H,WELCH J D. MichiGAN:sampling from disentangled representations of single-cell data using generative adversarial networks[J]. Genome biology,2021,22(1):158.
[22]ALMUTIRI T,SAEED F. Chi square and support vector machine with recursive feature elimination for gene expression data classification[C]//IEEE. 2019 First International Conference of Intelligent Computing and Engineering(ICOICE). Hadhramout:IEEE,2019:1-6.
[23]DANAEE P,GHAEINI R,HENDRIX D A. A deep learning approach for cancer detection and relevant gene identification[J]. Pacific symposium on biocomputing,2017,22:219-229.
[24]LIU J,WANG X,CHENG Y,et al. Tumor gene expression data classification via sample expansion-based deep learning[J]. Oncotarget,2017,8(65):109646-109660.
[25]UZMA,AL-OBEIDAT F,TUBAISHAT A,et al. Gene encoder:a feature selection technique through unsupervised deep learning-based clustering for large gene expression data[J]. Neural computing and applications,2022,34(11):8309-8331.
[26]FAMITHA S,MOORTHI M. Deep learning approach for cancer detection through gene selection[C]//Proceedings of the Fourth Congress on Intelligent Systems ,Singapore:Springer Nature,2024:333-345.
[27]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of artificial intelligence research,2002,16(1):321-357.
[28]HAN F,ZHU S,LING Q,et al. Gene-CWGAN:a data enhancement method for gene expression profile based on improved CWGAN-GP[J]. Neural computing and applications,2022,34(19):16325-16339.
[29]KONG X,XU R,WANG W,et al. CircularLRRC7 is a potential tumor suppressor associated with miR-1281 and PDXP expression in glioblastoma[J]. Frontiers in molecular biosciences,2021,8:743417.
[30]SCHULZE M,FEDORCHENKO O,ZINK T G,et al.Chronophin is a glial tumor modifier involved in the regulation of glioblastoma growth and invasiveness[J].Oncogene,2016,35(24):3163-3177.
[31]ZHAO L,TANG Y,YANG J,et al. Integrative analysis of circadian clock with prognostic and immunological biomarker identification in ovarian cancer[J]. Frontiers in molecular biosciences,2023,10:1208132.
[32]HUANG Q,ZHAN L,CAO H,et al. Increased mitochondrial fission promotes autophagy and hepatocellular carcinoma cell survival through the ROS-modulated coordinated regulation of the NFKB and TP53 pathways[J]. Autophagy,2016,12(6):999-1014.
[33]QIAN D C,KLEBER T,BRAMMER B,et al. Effect of immunotherapy time-of-day infusion on overall survival among patients with advanced melanoma in the USA(MEMOIR):a propensity score-matched analysis of a single-centre,longitudinal study[J]. The lancet oncology,2021,22(12):1777-1786.
[34]TAILOR D,HAHM E R,KALE R K,et al. Sodium butyrate induces DRP1-mediated mitochondrial fusion and apoptosis in human colorectal cancer cells[J]. Mitochondrion,2014,16:55-64.
[35]ZHAO J,ZHANG J,YU M,et al. Mitochondrial dynamics regulates migration and invasion of breast cancer cells[J]. Oncogene,2013,32(40):4814-4824.
[36]YI K,ZHAN Q,WANG Q,et al. PTRF/cavin-1 remodels phospholipid metabolism to promote tumor proliferation and suppress immune responses in glioblastoma by stabilizing cPLA2[J]. Neuro oncology,2021,23(3):387-399.
[37]BAI M,ZHANG M,LONG F,et al. miR-217 promotes cutaneous squamous cell carcinoma progression by targeting PTRF[J]. American journal of translational research,2017,9(2):647-655.
[38]GOULD M L ,WILLIAMS G ,NICHOLSON H D.Changes in caveolae,caveolin,and polymerase 1 and transcript release factor(PTRF)expression in prostate cancer progression[J]. Prostate ,2010 ,70(15):1609-1621.
[39]HAO X,LI J,LIU B,et al. Cavin1 activates the Wnt/β-catenin pathway to influence the proliferation and migration of hepatocellular carcinoma[J]. Annals of hepatology,2024,29(1):101160.
基本信息:
DOI:10.13364/j.issn.1672-6510.20240105
中图分类号:Q811.4
引用信息:
[1]余钱,李雨蒙,罗军伟,等.基于生成模型和基因表达数据的关键基因筛选[J].天津科技大学学报,2025,40(06):1-8+46.DOI:10.13364/j.issn.1672-6510.20240105.
基金信息:
国家自然科学基金资助项目(62372156)