Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071103.
Zeyu Huang, Zhaoman Zhong, Xinru Cui
School of Computer Engineering, Jiangsu Ocean University, Lianyungang, China
Predicting event sequences is crucial across various domains. However, most existing transformer-based point process models struggle with longer sequences due to their quadratic memory complexity. To address this, we propose the Retentive Hawkes Process (RHP) model. The RHP uses a retention mechanism to simplify computations, enable a recurrent formulation, resulting in linear memory complexity and reduced inference latency while effectively modeling the self-exciting nature of event sequences and capturing both temporal dynamics and long-range dependencies. Numerical experiments demonstrate that RHP significantly outperforms traditional Transformer-based models and Hawkes Processes variants across diverse datasets. Furthermore, RHP shows promising scaling results in computational paradigms.
hawkes process; event prediction; retention mechanism
Zeyu Huang, Zhaoman Zhong, Xinru Cui. A Retentive Hawkes Process for Long Event Sequence Prediction. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 20-26. https://doi.org/10.25236/AJCIS.2024.071103.
[1] Hawkes A G. Spectra of some self-exciting and mutually exciting point processes[J]. Biometrika, 1971, 58(1): 83.
[2] Zuo S, Jiang H, Li Z, et al. Transformer hawkes process[C]//International conference on machine learning. PMLR, 2020: 11692.
[3] Sun, Y.; Dong, L.; Huang, S.; Ma, S.; Xia, Y.; Xue, J.; Wang, J.; Wei, F. Retentive network: A Successor to Transformer for Large Language Models. arXiv 2023, arXiv:2307.08621.
[4] Sun Y, Dong L, Patra B, et al. A Length-Extrapolatable Transformer[C]//The 61st Annual Meeting Of The Association For Computational Linguistics. 2023.
[5] Wu Y, He K. Group normalization[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3.
[6] Hendrycks D, Gimpel K. Gaussian error linear units (gelus)[J]. arXiv preprint arXiv:1606.08415, 2016.
[7] Ramachandran P, Zoph B, Le Q V. Searching for activation functions[J]. arXiv preprint arXiv:1710.05941, 2017.
[8] Du N, Dai H, Trivedi R, et al. Recurrent marked temporal point processes: Embedding event history to vector[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016: 1555.
[9] Mei H, Eisner J M. The neural hawkes process: A neurally self-modulating multivariate point process[J]. Advances in neural information processing systems, 2017, 30:6754.
[10] Xiao S, Yan J, Yang X, et al. Modeling the intensity function of point process via recurrent neural networks[C]//Proceedings of the AAAI conference on artificial intelligence. 2017, 31(1).
[11] Zhang Q, Lipani A, Kirnap O, et al. Self-attentive Hawkes process[C]//International conference on machine learning. PMLR, 2020: 11183.
[12] Xie Y, Wu J. HGTHP: a novel hyperbolic geometric transformer hawkes process for event prediction [J]. Applied Intelligence, 2024, 54(1): 357.
[13] Zhao Q, Erdogdu M A, He H Y, et al. Seismic: A self-exciting point process model for predicting tweet popularity[C]//Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015: 1513.
[14] Leskovec J, Krevl A. SNAP Datasets: Stanford Large Network Dataset Collection [OL]. (2014-06). http://snap.stanford.edu/data.
[15] Johnson A E W, Pollard T J, Shen L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific data, 2016, 3(1): 1.
[16] Kingma D P. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[17] Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems, 2019, 32.