While neural networks have been utilized to predict box office effectively to accomplish satisfactory results ( Zhang et al., 2009 Zhou et al., 2019), as the number of data is not large in movie dataset and the number of eWOM and movie related variables is not small, we focus on other non-deep learning methods such as decision trees and k-NN besides neural networks. First, as there exist a lack of studies regarding how ensemble methods can improve box office predict, we intend to fill this void by proposing the ensemble methods of decision trees, k-nearest-neighbors (k-NN), and linear regression to achieve improved predictive accuracy of box office earnings for one, two, three weeks after release. In the previous studies, using eWOM and movie related variables, many researchers have tried to build up predicting box office revenue mainly using statistical regression algorithms such as multiple linear regression ( Asur and Huberman, 2010) or machine learning algorithms, multi-layer perceptron neural network model ( Ru et al., 2018 Wang et al., 2020), Bayesian belief network and backpropagation (BP) neural network which have robust performance and eradicates the limitations of the regression method with better prediction accuracy and are composed to predict the box office performance ( Lee and Chang, 2009 Sharda and Delen, 2006 Zhang et al., 2009). eWOM include positive reviews than negative reviews (helpfulness), number of reviews in the early stage of release or after the release, total helpful votes of the reviewers, an advertisement of a movie prior to release etc ( Leenders and Eliashberg, 2011). eWOM is becoming available in a large amount in this age of big data and we are increasingly surrounded by enormous data available about the movies over the internet for which it is better to handle with business intelligence (BI) methods to process, manage and utilize these large data sets properly ( Guille and Hacid, 2012). eWOM (online word-of-mouth) have been used to predict box office revenue and these are provided through many forms such as online reviews, discussion boards, video sites, blogs, micro blogs, social networks etc( Baek et al., 2017 Qin, 2001 Oh et al., 2017). movies released in high or low season) whether it is a sequel or not. The success and failure of the movie depend on movie related variables such as timing of a movie release (i.e. The correct prediction of box office revenues is important for the development of the movie industry and to lessen the market risk. Hundreds and thousands of movies are released every month and a year ( Kim et al., 2015). The movie industry is growing day by day in a rapid way locally and globally. The prediction of movie box office revenue after release has always been a challenging problem in a movie industry. This shows that decision trees using ensemble methods provide better application effectiveness of ensemble methods than k-NN and linear regression analysis. For decision tree methods, unlike the other methods, the prediction performance of ensemble methods is greater than that of non-ensemble methods. This is explained by the results that after comparing the prediction performance between ensemble methods and non-ensemble methods. Decision trees using ensemble methods provide better prediction performance than ensemble methods based on linear regression analysis in the box office at week 1 after release. The results indicate that ensemble methods based on decision trees (random forests, bagging, boosting) outperform ensemble methods based on k-NN (bagging, boosting) in predicting box office at week 1, 2, 3 after release.
In this study, we propose decision trees, k-nearest-neighbors (k-NN), and linear regression using ensemble methods and the prediction performance of decision trees based on random forests, bagging and boosting are compared with that of k-NN and linear regression based on bagging and boosting using the sample of 1439 movies. While many business intelligence methods have been applied to predict movie box office revenue, the studies using an ensemble approach to predict box office revenue are almost nonexistent.