Advanced Bayesian Survival Modeling for Lung Adenocarcinoma Prognosis: The afthd R Package and Shiny Application
Conteúdo do artigo principal
Resumo
High-dimensional variable selection in time-to-event analysis is a critical area in biostatistics, especially in the context of complex diseases like lung adenocarcinoma (LUAD). LUAD, the most common subtype of lung cancer, presents unique diagnostic and prognostic challenges due to its molecular and genetic diversity. This study introduces an integrated framework for high-dimensional survival analysis, combining feature selection, advanced survival modeling, and robust missing data handling techniques. We developed the afthd R package, designed specifically for Bayesian survival analysis using the Accelerated Failure Time (AFT) model. This package facilitates efficient variable selection in high-dimensional settings, employing regularized methods such as LASSO and Elastic Net, as well as Bayesian approaches for model stability. An accompanying Shiny web application provides an accessible platform for non-programmers, allowing researchers to perform high-dimensional analysis and view results interactively. Using a LUAD dataset from The Cancer Genome Atlas (TCGA), our results identify key biomarkers associated with patient survival, highlighting the practical utility of this framework in LUAD prognosis. This integrated approach lays the groundwork for more precise prognostic modeling, with potential extensions to other cancers and high-dimensional biomedical datasets.
Detalhes do artigo

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Referências
1. Abdelwahab, O., Awad, N., Elserafy, M. & Badr, E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. Plos One 17, e0269126 (2022). https://doi.org/10.1371/journal.pone.0269126
2. Benner, A., Zucknick, M., Hielscher, T., Ittrich, C. & Mansmann, U. High-dimensional Cox models: the choice of penalty as part of the model building process. Biometrical Journal 52, 50–69 (2010). https://doi.org/10.1002/bimj.200900064
3. Bhattacharjee, A., Vishwakarma, G. K. & Thomas, A. Bayesian state-space modeling in gene expression data analysis: An application with biomarker prediction. Mathematical biosciences 305, 96–101 (2018). https://doi.org/10.1016/j.mbs.2018.08.011
4. Cho, H.-J., Lee, S., Ji, Y. G. & Lee, D. H. Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma. PLoS One 13, e0207204 (2018). https://doi.org/10.1371/journal.pone.0207204
5. Csardi, G. & Nepusz, T. The igraph software package for complex network research. Inter-Journal, Complex Systems 1695. https://igraph.org (2006).
6. Fan, J., Feng, Y. & Wu, Y. A Bayesian approach to variable selection in high-dimensional survival data. Biometrika 97, 691–703 (2010).
7. Fan, J. & Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360 (2002). https://doi.org/10.1198/016214501753382273
8. Fanizzi, C., De Marco, M. & De Santis, A. Machine learning survival models trained on clinical data to identify high-risk patients with hormone responsive HER2 negative breast cancer. Scientific Reports 13, e8575 (2023). https://doi.org/10.1038/s41598-023-35344-9
9. Fox, J. & Carvalho, M. S. The RcmdrPlugin. survival package: Extending the R Commander interface to survival analysis. Journal of Statistical Software 49, 1–32 (2012). https://doi.org/10.18637/jss.v049.i07
10. Gabrio, A., Mason, A. J. & Baio, G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Statistics in medicine 38, 1399–1420 (2019). https://doi.org/10.1002/sim.8045
11. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). https://doi.org/10.1093/bioinformatics/btw313
12. Jiao, Y., Li, Y., Jiang, P., Han, W.& Liu, Y. PGM5: a novel diagnostic and prognostic biomarker for liver cancer. PeerJ 7, e7070 (2019). https://doi.org/10.7717/peerj.7070
13. Kaneko, S., Hirakawa, A. & Hamada, C. Enhancing the lasso approach for developing a survival prediction model based on gene expression data. Computational and Mathematical Methods in Medicine 2015, 259474 (2015). https://doi.org/10.1155/2015/259474
14. Kelter, R. Statistical Rethinking: A Bayesian course with examples in R and STAN, Taylor & Francis, 2020.
15. Li, R., Chang, C., Justesen, J. M., Tanigawa, Y.,Qian, J., Hastie, T., Rivas, M. A. & Tibshirani, R. Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics 23, 522–540 (2022). https://doi.org/10.1093/biostatistics/kxaa038
16. Lin, Y., Chen, Y. & Zhang, H. Deep learning-based survival analysis for high-dimensional survival data. Mathematics 9, 1244 (2021).
17. Pedersen, T. L. ggraph: An implementation of grammar of graphics for graphs and networks R package version 2.0.5. 2020. https://CRAN.R-project.org/package=ggraph.
18. Qiu, W.-R., Qi, B.-B., Lin, W.-Z., Zhang, S.-H., Yu, W.-K. & Huang, S.-F. Predicting the lung adenocarcinoma and its biomarkers by integrating gene expression and DNA methylation data. Frontiers in Genetics 13, 926927 (2022). https://doi.org/10.3389/fgene.2022.926927
19. Shin, B., Park, S., Hong, J. H., An, H. J., Chun, S. H., Kang, K., Ahn, Y.-H., Ko, Y. H. & Kang, K. Cascaded Wx: A novel prognosis-related feature selection framework in human lung adenocarcinoma transcriptomes. Frontiers in genetics 10, 662 (2019). https://doi.org/10.3389/fgene.2019.00662
20. Sievert, C. Interactive data visualization for the web (O’Reilly Media, 2020).
21. Suantari, N. G. A. P. P., Fitrianto, A. & Sartono, B. Comparative study of survival support vector machine and random survival forest in survival data. BAREKENG: Jurnal Ilmu Matematika dan Terapan 17, 1495–1502 (2023). http://dx.doi.org/10.30598/barekengvol17iss3pp1495-1502
22. Syed, H., Jorgensen, A. L. & Morris, A. P. SurvivalGWAS_SV: software for the analysis of genome-wide association studies of imputed genotypes with “time-to-event” outcomes. BMC Bioinformatics 18, 1–6 (2017). https://doi.org/10.1186/s12859-017-1683-z
23. Vishwakarma, G. K., Bhattacharjee, A. & Banerjee, S. Handling missingness value on jointly measured time-course and time-to-event data. Communications in Statistics-Simulation and Computation 52, 126–141 (2023). https://doi.org/10.1080/03610918.2020.1851711
24. Vishwakarma, G. K., Bhattacharjee, A. & Kumar, N. Missing data handling technique in joint modeling context. Biomedical Engineering Advances 2, 100012 (2021). https://doi.org/10.1016/j.bea.2021.100012
25. Vishwakarma, G. K., Kumari, P. & Bhattacharjee, A. Thresholding of prominent biomarkers of breast cancer on overall survival using classification and regression tree. Cancer Biomarkers 34, 319–328 (2022). https://doi.org/10.3233/cbm-210470
26. Vishwakarma, G. K., Thomas, A. & Bhattacharjee, A. A weight function method for selection of proteins to predict an outcome using protein expression data. Journal of Computational and Applied Mathematics 391, 113465 (2021). https://doi.org/10.1016/j.cam.2021.113465
27. Wang, H. & Li, R. A selective review on random survival forests for high himensional data. Quantitative Bio-Science 36, 85–95 (2017). https://doi.org/10.22283/qbs.2017.36.2.85
28. Wang, Y., Li, J. & Zhang, Y. Machine learning for survival analysis. ACM Computing Surveys 52 (2019). https://doi.org/10.1145/3214306
29. Wang, Y., Gao, X., Ru, X., Sun, P. & Wang, J. Identification of gene signatures for COAD using feature selection and Bayesian network approaches. Scientific Reports 12, 8761 (2022). https://doi.org/10.1038/s41598-022-12780-7
30. Wickham, H. ggplot2: Elegant graphics for data analysis (Springer-Verlag, 2016).