Academic Journal of Computing & Information Science, 2023, 6(9); doi: 10.25236/AJCIS.2023.060904.
School of Information Engineering, Hubei University of Economics, Wuhan, China
The aim of this study is to analyze the data from Chinese movie websites to understand the trend distribution of movie genres and ratings. It used Python third-party libraries and the Selenium tool to crawl data from various movie websites and platforms. Douban Films is one of the most prominent applications. In order to realize the data analysis of Douban movies, the crawler program was designed from multiple perspectives, including two data capture channels, keyword search movies and screening search rankings. By viewing the movie details function module, it can achieve the requirements of obtaining movie ratings, stars, online viewing addresses, cloud disk search links and film and television download resources. Visualization of the data results was conducted using the third-party Python graph library Matplotlib. The results showed that the film rating and the total number of ratings are important factors that ordinary users refer to when watching films. Drama films are the most popular type of film among producers and film companies, while adventure films are the type of film that is easily overlooked by viewers. These data analyses can reflect the public's viewing trends under the guidance of consumers.
Python; Web scrapy; Visualization; Selenium; Movie websites
Shujun Yuan. Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 9: 25-31. https://doi.org/10.25236/AJCIS.2023.060904.
 Nyamathulla S, Ratnababu P, Shaik N S. A Review on Selenium Web Driver with Python[J]. Annals of the Romanian Society for Cell Biology, 2021: 16760-16768.
 Clark A. Pillow (pil fork) documentation[J]. Readthedocs, 2015.
 Feurer M, Van Rijn J N, Kadra A, et al. Openml-python: an extensible python api for openml [J]. The Journal of Machine Learning Research, 2021, 22(1): 4573-4577.