A Retentive Hawkes Process for Long Event Sequence Prediction

<p>Zeyu Huang, Zhaoman Zhong, Xinru Cui</p>

doi:10.25236/AJCIS.2024.071103

Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071103.

A Retentive Hawkes Process for Long Event Sequence Prediction

Author(s)

Zeyu Huang, Zhaoman Zhong, Xinru Cui

Corresponding Author:

Zhaoman Zhong

Affiliation(s)

School of Computer Engineering, Jiangsu Ocean University, Lianyungang, China

Download PDF
|
Download: 45
|
View: 2548

Abstract

Predicting event sequences is crucial across various domains. However, most existing transformer-based point process models struggle with longer sequences due to their quadratic memory complexity. To address this, we propose the Retentive Hawkes Process (RHP) model. The RHP uses a retention mechanism to simplify computations, enable a recurrent formulation, resulting in linear memory complexity and reduced inference latency while effectively modeling the self-exciting nature of event sequences and capturing both temporal dynamics and long-range dependencies. Numerical experiments demonstrate that RHP significantly outperforms traditional Transformer-based models and Hawkes Processes variants across diverse datasets. Furthermore, RHP shows promising scaling results in computational paradigms.

Keywords

hawkes process; event prediction; retention mechanism

Cite This Paper

Zeyu Huang, Zhaoman Zhong, Xinru Cui. A Retentive Hawkes Process for Long Event Sequence Prediction. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 20-26. https://doi.org/10.25236/AJCIS.2024.071103.

References

[1] Hawkes A G. Spectra of some self-exciting and mutually exciting point processes[J]. Biometrika, 1971, 58(1): 83.

[2] Zuo S, Jiang H, Li Z, et al. Transformer hawkes process[C]//International conference on machine learning. PMLR, 2020: 11692.

[3] Sun, Y.; Dong, L.; Huang, S.; Ma, S.; Xia, Y.; Xue, J.; Wang, J.; Wei, F. Retentive network: A Successor to Transformer for Large Language Models. arXiv 2023, arXiv:2307.08621.

[4] Sun Y, Dong L, Patra B, et al. A Length-Extrapolatable Transformer[C]//The 61st Annual Meeting Of The Association For Computational Linguistics. 2023.

[5] Wu Y, He K. Group normalization[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3.

[6] Hendrycks D, Gimpel K. Gaussian error linear units (gelus)[J]. arXiv preprint arXiv:1606.08415, 2016.

[7] Ramachandran P, Zoph B, Le Q V. Searching for activation functions[J]. arXiv preprint arXiv:1710.05941, 2017.

[8] Du N, Dai H, Trivedi R, et al. Recurrent marked temporal point processes: Embedding event history to vector[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016: 1555.

[9] Mei H, Eisner J M. The neural hawkes process: A neurally self-modulating multivariate point process[J]. Advances in neural information processing systems, 2017, 30:6754.

[10] Xiao S, Yan J, Yang X, et al. Modeling the intensity function of point process via recurrent neural networks[C]//Proceedings of the AAAI conference on artificial intelligence. 2017, 31(1).

[11] Zhang Q, Lipani A, Kirnap O, et al. Self-attentive Hawkes process[C]//International conference on machine learning. PMLR, 2020: 11183.

[12] Xie Y, Wu J. HGTHP: a novel hyperbolic geometric transformer hawkes process for event prediction [J]. Applied Intelligence, 2024, 54(1): 357.

[13] Zhao Q, Erdogdu M A, He H Y, et al. Seismic: A self-exciting point process model for predicting tweet popularity[C]//Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015: 1513.

[14] Leskovec J, Krevl A. SNAP Datasets: Stanford Large Network Dataset Collection [OL]. (2014-06). http://snap.stanford.edu/data.

[15] Johnson A E W, Pollard T J, Shen L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific data, 2016, 3(1): 1.

[16] Kingma D P. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.

[17] Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems, 2019, 32.