Generation of pathological descriptions with interpretable reasoning via sequential progressive attention network and knowledge based on location relations

<p>Yunan Shi</p>

doi:10.25236/AJCIS.2023.061326

Academic Journal of Computing & Information Science, 2023, 6(13); doi: 10.25236/AJCIS.2023.061326.

Generation of pathological descriptions with interpretable reasoning via sequential progressive attention network and knowledge based on location relations

Author(s)

Yunan Shi

Corresponding Author:

Yunan Shi

Affiliation(s)

Jiangsu University, Zhenjiang, China

Download PDF
|
Download: 26
|
View: 381

Abstract

Deep learning tools have received tremendous attention when applied to the automatic diagnosis of gastric cancer computed tomography (CT) scans. However, the computer is unable to accurately describe the location information and severity based on visual features of the largest cross-section of the tumor alone, making it difficult to find a globally optimal solution. Also the healthcare industry is a high-risk decision-making area with a high demand for interpretable models and risk assessment. In this paper, we propose the Sequential Progressive Attention Network, which has three main contributions: (1) The relative position coding module is designed to align the gastric cavity and obtain the relative position of the tumor to the cavity by using the cavity as a reference object. (2) A dynamically distributed dilated convolution method based on random directional field perturbations is proposed to construct the uncertainty of the model. The method evaluates the impact of different components on the decisions within the local region by locally perturbing the attention region of the dilated convolution. (3) The Long Short Term Memory (LSTM) is applied to analyze the changes in tumor morphology on consecutive CT images. Specifically, a mask-based non-uniform coding module is put forward to reduce the weighting factor of non-tumor regions and reduce the sensitivity of the LSTM to feature changes in non-target regions. (4) The Location relations between the gastric lumen and the tumour is modelled by LSTM to obtain a triadic external knowledge base with relative interpretability, making the model's decisions transparent. Finally, we conduct image-caption experiments on the gastric CT image dataset and apply the BLUE metric to evaluate the effectiveness of the experiment. The experimental results are improved by 4\% compared with the latest models in recent years.

Keywords

Sequential Progressive Attention Network, Deep learning, The Long Short Term Memory, computed tomography

Cite This Paper

Yunan Shi. Generation of pathological descriptions with interpretable reasoning via sequential progressive attention network and knowledge based on location relations. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 13: 186-195. https://doi.org/10.25236/AJCIS.2023.061326.

References

[1] B. Jing, P. Xie, E. Xing, ``On the automatic generation of medical imaging reports,'' arXiv preprint arXiv:1711.08195 (2017).

[2] B. Yan, M. Pei, M. Zhao, C. Shan and Z. Tian, "Prior Guided Transformer for Accurate Radiology Reports Generation," in IEEE Journal of Biomedical and Health Informatics, 2022, doi: 10.1109/ JBHI.2022.3197162.

[3] L. Zhang et al., "Multi-Focus Network to Decode Imaging Phenotype for Overall Survival Prediction of Gastric Cancer Patients," in IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 10, pp. 3933-3942, Oct. 2021, doi: 10.1109/JBHI.2021.3087634.

[4] L. Zhou, J. Yao and Q. Yan and Z. Lin, ``Medical image retrieval with multi-scale features and attention mechanism,'' \emph{Chinese Journal of Liquid Crystal \& Displays], vol. 36, 2021.

[5] A. Kendall, Y. Gal, ``What uncertainties do we need in bayesian deep learning for computer vision?'' \emph{Advances in neural information processing systems], 30 (2017).

[6] A. Vaswani, N. Shazeer, et al, ``Attention is all you need,'' Advances in neural information processing systems, 30 (2017).

[7] J. Gehring, M. Auli, et al, ``Convolutional sequence to sequence learning,'' International conference on machine learning, PMLR, 2017.

[8] P. Shaw, J. Uszkoreit, A. Vaswani, ``Self-attention with relative position representations,''arXiv preprint arXiv:1803.02155 (2018).

[9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al, ``An image is worth 16x16 words: Transformers for image recognition at scale,''arXiv preprint arXiv: 2010. 11929 (2020).

[10] K. Wu, H. Peng, M. Chen, J. Fu, ``Rethinking and improving relative position encoding for vision transformer,'' Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.

[11] K. He, X. Zhang, et al, ``Spatial pyramid pooling in deep convolutional networks for visual recognition," \emph{IEEE transactions on pattern analysis and machine intelligence], 37.9 (2015): 1904-1916.

[12] J. Li, F. Fang, K. Mei, et al, ``Multi-scale residual network for image super-resolution," \emph{Proceedings of the European conference on computer vision (ECCV)], 2018.

[13] B Zhou, A Khosla, et al, ``Learning deep features for discriminative localization," \emph{In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929], 2016.

[14] M Cogswell, A Das, ``Grad-cam: Visual explanations from deep networks viagradient-based localization,'' \emph{In Proceedings of the IEEE international conference on computer vision, pages 618–626], 2017.

[15] H Wang, Z Wang, ``Score-cam: Score-weighted visual explanations for convolutional neural networks,'' \emph{In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25,], 2020.

[16] D. Demner-Fushman, MD. Kohli, et al, ``Preparing a collection of radiology examinations for distribution and retrieval," \emph{J Am Med Inform Assoc], 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1. PMID: 26133894; PMCID: PMC5009925.

[17] J. Donahue, L. Anne Hendricks, et al, ``Long-term recurrent convolutional networks for visual recognition and description," \emph{Proceedings of the IEEE conference on computer vision and pattern recognition], 2015.

[18] A. Karpathy, L. Fei-Fei, ``Deep visual-semantic alignments for generating image descriptions," \emph{Proceedings of the IEEE conference on computer vision and pattern recognition], 2015, pp.3128-3137.

[19] K. Xu, J. Ba, et al, ``Show, attend and tell: Neural image caption generation with visual attention,'' \emph{International conference on machine learning. PMLR], 2015, pp.2048-2057.

[20] Q. You, H. Jin, et al,``Image captioning with semantic attention,'' \emph{Proceedings of the IEEE conference on computer vision and pattern recognition], 2016, pp.4651-4659.

[21] Z. Chen, Y. Song, et al, ``Generating radiology reports via memory-driven transformer,''arXiv preprint arXiv:2010.16056 (2020).

[22] F. Liu, X. Wu, S. Ge, et al, ``Exploring and distilling posterior and prior knowledge for radiology report generation,'' \emph{Proceedings of the IEEE conference on computer vision and pattern recognition], 2021, pp.13753-13762.

[23] D. You, F. Liu, S.Ge, X. Xie, et al, ``Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation,'' \emph{International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham], 2021, pp.72--82.

[24] J. Krause, J. Johnson, R. Krishna and L. Fei-Fei, "A Hierarchical Approach for Generating Descriptive Image Paragraphs," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3337-3345.

[25] B. Jing, P. Xie and E.Xing, ``On the automatic generation of medical imaging reports,'' arXiv preprint arXiv:1711.08195 (2017).

[26] Y. Li, et al, ``Hybrid retrieval-generation reinforced agent for medical image report generation,'' Advances in neural information processing systems 31 (2018).

[27] B. Jing, Z. Wang, E. Xing, ``Show, describe and conclude: On exploiting the structure information of chest x-ray reports,'' arXiv preprint arXiv:2004.12274, 2020.

[28] DP. Kingma, J. Ba, ``A method for stochastic optimization,'' arXiv preprint arXiv:1412.6980, 2014.

[29] K. He, X. Zhang, S. Ren, J. Sun, ``Deep residual learning for image recognition,'' Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.

[30] J. Long, E. Shelhamer, T. Darrell, ``Fully convolutional networks for semantic segmentation,'' Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 3431-3440.

[31] O. Ronneberger, P. Fischer, T. Brox, ``U-net: Convolutional networks for biomedical image segmentation,'' International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241.