I.INTRODUCTION
The time series forecasting in the evolution of complex systems is considered as one of the emerging challenges of modern science [1], and likewise, the prediction accuracy using multivariate data analysis is better compared to univariate data analysis. Predicting the trend of crude palm oil (CPO) price is very challenging, as it is one of the most volatile commodities and its volatility is dependent on various factors. The trading world cannot deny the influence of financial news, weather news, and political news on CPO price volatility. Therefore, text analytics need to be considered in the design of CPO price prediction model. Text sentiment analysis is predominantly used in merchandise online reviews such as books, movies, electronic products, and stocks; and later it is used by researchers to explore the use of textual information mining in predicting future market moves [2]. Sentiment analysis is a frequently used technique to extract text and retrieve opinions from text. It has attracted the interest and attention of researchers, given it is widely used in many domains [3–5] and successfully demonstrated improved performances. Past research shows that traditional methods employ shallow neural networks that consist of few nonlinear operations which are unable to model such complex data accurately. Thus, this study proposes and explores the long short-term memory (LSTM) recurrent network in enhancing the CPO price movement forecasting.
As can be seen from Fig. 1, the prices for Malaysian CPO typically increased in the second half of the year as exports of the product tended to increase, starting in August and continuing through November each year due to increased demand from India and China, the two largest CPO importing nations. These times coincided with their respective holiday seasons.
Fig. 1. Palm oil monthly price in MYR (Oil World (2019), ISTA Mielke GmbH).
However, in 2018, geopolitical events that influenced the market for CPO caused a reversal in this historical pricing trend. At the beginning of 2018 [6], the European Union (EU) Parliament decided to gradually phase out the use of palm oil as a biofuel in Europe, which caused demand to fall and likewise prices to fall. In June 2019 [7], prices also fell due to the political news that Malaysia condemned India regarding Kashmir issues and new citizenship laws. Nevertheless, accuracy in price prediction is especially crucial to facilitate effective decision-making, given the reasonable time lag between making output decisions and the actual production of the commodity in the market [8]. Therefore, price prediction mechanisms are important for market participants and stakeholders to guide them in their CPO production and consumption as well as in the financing decisions on CPO-related issues.
Traditional neural network structures and algorithms are not well suited for time-varying patterns that exist in time series data. They make the temporal prediction a particularly challenging problem [9]. Thus, this study proposes and explores the recurrent network in enhancing the CPO price forecasting using LSTM recurrent network for time series. More specifically, this study intended to test the effect of news headlines sentiment on CPO price movement and also to determine the ideal delay between news sentiment and CPO price market reactions.
The rest of the paper is structured as follows; Section II presents the related works on CPO price predictions. Section III discusses the research methodology and the proposed research framework employed in this study. Section IV discusses the results in detail, while the conclusion is presented in Section V.
II.LITERATURE REVIEW
Text sentiment analysis or opinion mining has also been used to analyze text content and the comments or feedback regarding a product, service, or commodity. Text mining has become increasingly important given the large volumes of data brought about via the internet. Biomedical, customer service, and stock trading are some of the fields where text mining is widely used. Text sentiment analysis is predominantly used in merchandise online reviews such as books, movies, electronic products, and stocks. Later, it is used by researchers to investigate how to use textual format information mining to forecast future market movements [2]. Several research have also shown that news presented in national newspapers are associated with significant price movement [10–12] which has drawn the interest of many researchers. Importantly, recent research shows that information is frequently gathered from financial news articles [13,14], financial reports [15], and also information in micro-blogs [16]. These are considered to be the appropriate sources of information for predicting future market behavior. Although financial news has always been a significant source of information in analyzing market conditions, it is increasing enormously both in volume and broadcasting speed.
Models to make predictions based on news articles have been proposed to help investors to make the best decision at appropriate times. Refs. [17,18] applied Pearson’s correlation and Spearman’s rank analysis to examine the connection between real-time stock price movement and online news, and the findings were then validated. The existing system transforms the received or input text into a numerical value called a sentiment score [17] based on the influence of the news on the respective development of the stock. The impact of news sentiment on CPO prices has not been explored and analyzed in this domain. Thus, news headline sentiment is included as a predictor in this research.
Researchers have explored various techniques to tackle this problem, ranging traditional statistical methods to more advanced artificial intelligence (AI) approaches [19,20]. During the 1970s, statistical approaches using conventional methods were mostly employed in forecasting CPO prices. The most common univariate linear models for time series forecasting are Auto Regressive Integrated Moving Average (ARIMA) models which have displayed their potential application in various forecasting problems of engineering, economics, and stocks over recent decades.
Indeed, different researchers have applied these models for modeling oil price changes [21]. The high performance and robustness of ARIMA models have attracted researchers generally. Econometric models using time series have also been widely used to forecast the CPO prices in the oil palm industry. Ref. [21] used a univariate ARIMA model to forecast the short-run monthly price of CPO. On the other hand, [22] used the Box–Jenkins technique that produced a satisfactory result for Malaysian palm oil production estimation. A Multiple Auto Regressive Moving Average (MARMA) model was also used for short-term CPO price forecasting by some researchers [23].
Additionally, [24] investigated the use of a structural model to describe the Malaysian palm oil industry between 1997 and 1999 by considering the total oil palm field, palm oil production, local consumption, exports, and imports. Compared to conventional statistical methods, soft computing processes have a number of advantages since they often have a high tolerance for error and show reliable performance in noisy data situations. Unlike conventional statistical models, soft computing methods are numerical, data-driven, nonparametric, and self-adaptive mechanisms that require less historical data [25]. Many of these soft computing methods can detect nonlinear relationships between important market variables without the need for prior knowledge or statistical presumptions regarding the input data [26].
Traditional time series prediction techniques such as Box–Jenkins or ARIMA assume time series as a linear process. However, it is unrealistic because real-world problems are often nonlinear and complicated [20,27,28]. As such, deep learning techniques for multivariate time series seem to fit this description better and offer an intriguing new strategy to employ in the financial area in addition to a new challenging application for the deep learning research community, which to the author’s knowledge has not yet been attempted in the prediction of CPO prices.
A recent academic survey by [2] has shown that the utilization of deep learning techniques for financial forecasting remains relatively unexplored and may create further opportunities in predicting financial times series. Likewise, sequence modeling of time series has also attracted the interest of many researchers to explore a recurrent neural network (RNN) for better performance using a shallow network, which is similar to writing a program without the ability to call subroutines. Without this ability, at any place we could otherwise call the subroutine, we need to write the code for the subroutine explicitly. Regarding the number of lines of code, the program for a shallow network is therefore much longer compared to a deep network. Worse still, the execution time is also longer given the computation of subroutines is not appropriately reused [29,30].
Deep neural networks (DNNs) are powerful models that have been applied to difficult learning tasks and achieved excellent performance [31]. DNNs perform well whenever a huge labeled training set is used. However, they do not perform well when they are used to map sequences to sequence data. In this paper, a generic end-to-end method for sequence learning that used a minimal assumption on the sequence structure is presented. Our paper uses a multilayered LSTM to map the input sequence to a vector of fixed dimensionality. Even though DNNs are powerful and good learners, they can only be used to solve problems whose inputs and targets can be coherently expressed with fixed-dimensionality vectors. This is a major drawback since many key issues are best expressed using sequences with unknown lengths [32]. Moreover, due to their reliance on a fixed size sliding window of acoustic frames, DNNs can only give a limited amount of temporal modeling capacity. Therefore, they are inadequate to handle longer-term dependencies since they can only simulate the data contained in the window. In contrast, RNNs contain cycles that feed the network activations from a previous time step as inputs to the network have an impact on predictions at the current time step [33]. These activations are subsequently stored in the network’s internal states, which can, in theory, maintain long-term temporal contextual information. This method enables RNNs to exploit a dynamically shifting contextual window across the input sequence, rather than a static one as in a feedforward network’s fixed-sized window [34]. This limitation has motivated the application of LSTM RNN to be used in this experiment.
A special kind of RNN, LSTM networks, are capable of learning long-term dependencies. First introduced by Hochreiter and Schmidhuber [35], and improved and popularized by many researchers more recently, they are widely utilized and perform incredibly effectively on a wide range of issues. Instead of just modeling fixed inputs, the LSTM sequence model has also been used to describe relationships between sequences and other sequences. The machine translation is another application that utilizes amazingly the sequence-to-sequence learning [32,36]. Regretfully, only a few published papers have applied LSTMs to time series forecasting tasks, all of which, to our knowledge, are outside of the CPO price prediction context. Ref. [37] utilized sigmoidal output units to formulate music composition as a multilabel classification task. More recently, [38] was able to recognize actions in videos using LSTM networks with multilabel outputs. While we could not locate any published papers using LSTMs for multivariate time series price prediction for CPO price prediction, several papers have applied statistical methods and feedforward neural nets for this [39,40]. Furthermore, LSTMs are designed to prevent the issue of long-term dependency. In a traditional RNN, during the gradient backpropagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of time steps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that the magnitude of weights in the transition matrix can have a significant impact on the learning process, leading to a problem known as vanishing or exploding gradients.
As stated earlier, the CPO price has been predicted using conventional methods in past studies. Traditional statistical methods such as ARIMA and econometric methods cannot effectively solve nonlinear problems [41]. In addition, documented evidence in [42] reveals that AI techniques are superior to these conventional methodologies in forecasting nonlinear and volatile financial time series. Similarly, [39] suggested that artificial neural network (ANN) performs better than Autoregressive Fractionally Integrated Moving Average (ARFIMA) models in predicting CPO prices. However, according to [43], both models have limitations as they rely on substantial amounts of data. This is in line with [16] who stated that there are too many issues with neural network models compared to other traditional statistical models.
Despite their success in resolving complicated, nonlinear, and linear issues like price prediction in financial time series, these techniques still have limitations. For example, the most frequently used ANN, the backpropagation ANN, may get trapped in local minima or overfit the training data. Unfortunately, there is not any perfect framework for determining the optimal architecture of a neural network or the selection of the initial training parameters. ANN is unable to cater to nonlinear complex real-world data [44,45] as it performs poorly in interpreting raw input patterns [46]. It is also unsuitable for patterns that vary over time [9]. Moreover, ANN and DNN cannot be used to map sequences to sequence data as in the sequential data modeling [32]. As we study and analyze the issues in existing techniques and behaviors of time series data, especially multivariate time series data [47], we encounter a significant challenge in capturing and analyzing the intricate dynamic interdependencies between different series. The presence of these complex relationships, which are often significant yet difficult to discern using existing forecasting models, underscores the need for a more comprehensive understanding of this data type. RNN model is most suitable for modeling time-dependent and sequential data [30]. However, RNNs suffer from the problem of vanishing gradients, which hampers learning of long data sequences. Thus, a deep learning approach using LSTM network is proposed in this study. This is similar to the study by [46] that focused on predicting the volatility of the stock market, [48] who examined selected foreign exchange rates, and [49] who examined the financial market.
III.MATERIALS AND METHOD
In this study, John Rollins’ framework for descriptive Data Science Methodology [50] was adapted by undertaking three basic steps in the development of the predictive model. Understanding the question at hand is the first step in choosing an analytical strategy or method, which is followed by obtaining, comprehending, preparing, and modeling the data. The process starts with news headlines retrieval, analyzing the data for news sentiment, and finally explaining the detailed process of sentiment score calculation. The detailed research operation framework for deep learning CPO price forecasting model with news sentiments is shown in Fig. 2.
Fig. 2. Operation framework of CPO price prediction with LSTM.
In this study, CPO price movement behavior was predicted based on the sentiment of news headlines. In analyzing this behavior, three components were examined. The initial component gathered news headlines from the internet using a web scrapper, and the second component prepared the news headlines by using a number of document preprocessing steps and finding relevant features before they were sent to a document representation process. The third component calculated the sentiment score of the news headlines [51]. Following the work of [5,52], this study also focused on news headlines instead of examining the content of the news articles for prediction purposes since it was found to produce better results. The impact of weather news and other news on CPO prices using text analytics was analyzed to determine if there is a correlation between text sentiment and the CPO price. The process began by gathering news headlines from the Malaysian Palm Oil Board (MPOB) news portal and compiling it into a corpus to create a sentiment analysis of the news. Due to the high volume and diversity of data available on the internet, data collection and organization were difficult for the researcher to manage manually [53].
To achieve this objective and to compare the forecasting trend with and without sentiment analysis, the text sentiment analysis method was adopted, which was effective in finding the sentiment of news headlines. Here, the sentiment index of each was computed by each sentiment news headline, whereby the sentiment for each month was calculated by summing up the total sentiment score for each day. News headlines from the MPOB portal for the period between January 2001 and 2019 were collected using a specific web scraper mechanism. A web scraper snippet was developed using R programming language to retrieve the news headlines from the portal for a specific period.
The news sentiment of the day was then calculated as the sum of the sentiments of separate news headlines belonging to the day, and the monthly sentiment was calculated by summing up the daily sentiment for a month. The formula for this calculation was adapted from [54] and is similar to the ones used in [55]. In this study, 20,050 news headline articles were used, representing the period between 2001 and 2019 as the test set for the network. Figure 3 displays the framework used for the forecasting model using LSTM through its three layers of processes. First, data preprocessing was done which includes data extraction and data cleansing for historical data of CPO prices and news headlines. This was followed by defining the model architecture and parameters and finally ending with the output layer for CPO price prediction.
Fig. 3. Detailed framework used for the CPO forecasting model using LSTM.
The monthly aggregated inputs retrieved from the sentiment mining and analysis stage were presented as x, implying negative, neutral, or positive sentiment score calculated and transformed in a normalized form. The delay in the synchronization of news trends reflects the influence of news on the market, highlighting that its impact becomes evident once it has been assimilated by market participants, although it will take time to impact the behavior of CPO price movement. Prior studies refer to this lag period as the window of influence [56,57] or the response time of news [58]. In this study, it was referred to as news window since this time lag helped to predict the CPO price trend with the probable impact of the news.
Ref. [59] found the optimal lag to be around 15 days for daily stock market movement. Indeed, the presence of the confounding or fake news will be more when the larger opportunity window is used to extract the news.
Although a wider window of influence aids in encompassing a greater number of news records, thereby enriching the training dataset, this acquisition comes at the expense of introducing noise into the training data. Therefore, finding an optimal window of influence is a challenging task. This study intended to address the research objective, to test the effect of news headlines sentiment on CPO price movement. In addition, this study also sought to determine the optimal time lag in which the CPO price market reacts to the news.
“Web Scraping” is a method that uses technology tools to automatically extract and organize data from the web. It has been used in collecting news headlines [53]. Data cleansing includes mapping to lower case, removing punctuation, removing numbers, stripping white spaces, and stemming, which was performed using R code. Sentiment word identification that uses an algorithm that counted the number of occurrences of “positive” and “negative” words in headline news as proposed by [60] was also used. Based on the counting, sentiment scores were assigned to relevant sentences. News articles were held in memory in the form of a document-term matrix with the headline news depicted in rows and terms in columns by [45,61].
The sentiment score for each word is calculated using SentiNetWord 3.0 which was used as a reference in which the sum of total sentiment scores for each news headline classified the sentence as either positive or negative sentiments. The results of the news sentiment score were then compared to the CPO price movement with the total sentiment score of monthly news headlines to predict the CPO price using LSTM. The architecture of the LSTM is described in a diagram in Fig. 4. In a standard LSTM network, the basic entity is called an LSTM unit or a memory block. Each unit is composed of a cell, the memory part of the unit, and three gates: an input gate, an output gate, and a forget gate [62]. An LSTM unit can remember values over arbitrary time intervals; and the three gates control the flow of information through the cell.
A single input layer with a single hidden layer comprises an LSTM block as shown in Fig. 5. The LSTM recorded the time series sequence as input, a one-time step per LSTM cell, and created an encoding of the input sequence. This encoding is a vector consisting of the hidden and cell states of all the encoder LSTM cells. The encoding is then passed to the LSTM decoder as initial states along with other decoder inputs to produce our predictions. During model training, we set the target output sequence as the decoder outputs for the model to train against. In the first part of the method, regression forecasting, a linear activation function was used for the output layer. The proposed model retained the memory of precise input observations across thousands of time steps, enabling it to predict new sequences based on the patterns learned from previous time steps.
Fig. 5. The architecture of the LSTM network with one hidden layer.
To enhance its robustness, the model incorporated the insertion of random time steps and strategically placed signal data within the input sequence. LSTM models are characterized by their configuration, which includes parameters such as the expected dimensionality of the input data, sequence length, number of features, and various other model parameters. In this setup, each single hidden layer consisted of 50 memory units, and the output layer was fully connected with one neuron representing the target CPO price. The activation function used for the output layer was rectified linear unit (ReLU). The networks were trained using the root mean square error (RMSE) loss function. An optimization algorithm was selected for efficiency, and the accuracy metric of RMSE was reported for each epoch to determine the model’s performance and convergence. The models were fitted on the sample sequences. The dataset was loaded into memory, and then the model was trained using the training data. After training, various parameters were tested and adjusted as outlined in the hyperparameter optimization section to optimize the model’s performance. The models were trained until a convergence over 100 epochs with a batch size of 20 samples. The models were then evaluated after being fitted, in which the accuracy was estimated on the new sequence data. Finally, predictions were made on the time intervals of interest.
The experiments were divided into two parts: the first part was designed to discover how the feature would impact the performance of the predictions on CPO price movement. Therefore, we trained an LSTM network to learn the price and news features that were developed in the section above, with the prediction results generated by learning the features. The second part was designed to show how we could make use of the correlation among the CPO prices to further expand the results generated from the first part of the experiment. Here, we took the prediction output from the first part and propagated them through the correlation matrix to make predictions on the stocks that were unseen from the original samples. Thus, the data were divided into two separate sets, the training set and the test set. The training set was used to generate input samples for training the LSTM network, while the test set was another set of an unseen training set to test the final performance of the trained network. We divided both the historical price data and financial news data into two sets based on their associated timestamp. The data representing the period between 2001-10-01 and 2010-12-31 were used as the training dataset, and data for the period between 2011-01-01 and 2018-12-31 were used as the test dataset.
IV.RESULTS AND DISCUSSION
The forecasting outputs from the LSTM-based forecasting model were compared with the other methods. The comparison analysis was based on predictive performances and the complexity of the models. A common statistical error test was applied to evaluate the output results to check the accuracy of prediction using the predicted values and the actual data. RMSE is a formal way to measure the error of a model in predicting quantitative data and it is defined as below.
In Equation (1), Oi is observed, or target values, and Pi is predicted values at time t, respectively, and n is the total number of observations. When testing the LSTM network with monthly sentiment scores and historical time series data of CPO pricing with various sliding windows in addition to the forecast horizon combination, it was found that a six-month sliding window with a forecast horizon of one month outperformed the others with an RMSE value of 147.26 as shown in Fig. 6.Fig. 6. Performance trends for CPO price forecasting using news headline sentiment.
Clearly, this showed a consistent behavior for the increased RMSE with the increasing months to forecast. Analysis of the LSTM network with the news headlines sentiment score with various sliding windows and forecast horizon combination for CPO price forecasting shows that the six-month sliding windows with a forecast horizon of one month outperformed the other sliding windows option with an RMSE value of 147.26. This shows that it takes quite a long time, a six-month sliding window to react on news in the market. Researchers can ensure the validity and reliability of their findings by selecting and applying appropriate statistical tests to confirm the results of our studies [63]. Thus, a t-test was used to measure the significance of the proposed LSTM model. We have two independent samples from two populations; thus, the two-sample t-test was employed for testing the difference of the two means (predicted CPO price with news headline sentiment and actual CPO price). The t-test was performed under the hypothesis that the project CPO price and the actual CPO price are equally formulated as follows:
Ho4: There is no significant difference between the CPO as forecasted by the LSTM model and the actual CPO price with news headline sentiment.
The result is based on p-values which is typically 0.05 at a 95% confidence interval. The t-test results shown in Table I indicated that the t-test is not significant (p > 0.05). Thus, Ho4 is accepted and concludes there is no significant difference in the means between the CPO price projected by the LSTM2 model and the actual CPO price. This means that the CPO price forecasted by the proposed model is statistically equivalent to the actual price.
Table I. The t-test results of the LSTM model
Actual | Predicted | |
---|---|---|
Mean | 2175.544 | 2173.874 |
Variance | 582925.6 | 557021.9 |
t Stat | –0.2189 | |
P |
This experiment was able to measure the sensitivity of the CPO prices on news headline sentiment. Our results suggest that the CPO price movement has been relatively less sensitive to news shocks in the form of macroeconomic news over monthly frequencies between 2001 and 2018. Overall, the finding shows news sentiments influence on CPO prices is insignificant and relatively weak, in comparison with the results in the literature on other financial assets such as stocks and major exchange rates. The investigation demonstrated that the cumulative impact of news headline sentiment took six months to have an impact on the CPO price. This finding on the relationship between crude oil prices with news sentiments is consistent with other similar studies conducted by researchers [57]. Furthermore, this experiment successfully proves the hypothesis that the monthly news sentiment score contributes significantly to CPO price prediction, in addition to producing satisfactory prediction performance that demonstrates a well-interconnected model structure when testing the LSTM network with monthly sentiment scores and historical time series data of CPO pricing with various sliding windows in addition to the forecast horizon.
V.CONCLUSION
Based on the finding of this study, the results produced provided an important framework required for future experiments. The primary finding drawn from this research study was that the use of LSTM models with a six-month sliding window for news headline sentiments (i.e., it took about six months for the news headlines sentiment to have a significant or complete influence on the monthly price movement of CPO) could achieve the highest accuracy in predicting the desired monthly price trend for Malaysia’s CPO price. This finding is highly valuable as it can become a reliable tool for stakeholders to anticipate future price fluctuations and optimize their strategies accordingly. To further enhance the LSTM forecasting model and maximize its benefits for the Malaysian palm oil industry, it is recommended that researchers explore integrating additional advanced techniques, such as Transfer Learning that has demonstrated promising results in the context of stock price forecasting as well as to include other factors such as other commodity prices, and weather elements which could potentially yield further improvements in the accuracy and reliability of the palm oil price predictions.