You are reading the article Stock Market Price Trend Prediction Using Time Series Forecasting updated in December 2023 on the website Kientrucdochoi.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Stock Market Price Trend Prediction Using Time Series Forecasting
This article was published as a part of the Data Science Blogathon.
Introductionis used to predict future values based on previously observed values and one of the best tools for trend analysis and future prediction.
What is time-series data?It is recorded at regular time intervals, and the order of these data points is important. Therefore, any predictive model based on time series data will have time as an independent variable. The output of a model would be the predicted value or classification at a specific time.
Time series analysis vs time series forecastingLet’s talk about some possible confusion about the Time Series Analysis and Forecasting. Time series forecasting is an example of predictive modeling whereas time series analysis is a form of descriptive modeling.
For a new investor general research which is associated with the stock or share market is not enough to make the decision. The common trend towards the stock market among the society is highly risky for investment so most of the people are not able to make decisions based on common trends. The seasonal variance and steady flow of any index will help both existing and new investors to understand and make a decision to invest in the share market.
To solve this kind of problem time series forecasting is the best technique.
Stock marketStock markets are where individual and institutional investors come together to buy and sell shares in a public venue. Nowadays these exchanges exist as electronic marketplaces.
That supply and demand help determine the price for each security or the levels at which stock market participants — investors and traders — are willing to buy or sell.
The concept behind how the stock market works is pretty simple. Operating much like an auction house, the stock market enables buyers and sellers to negotiate prices and make trades.
Definition of ‘Stock’A Stock or share (also known as a company’s “equity”) is a financial instrument that represents ownership in a company
Machine learning in the stock marketThe stock market is very unpredictable, any geopolitical change can impact the share trend of stocks in the share market, recently we have seen how covid-19 has impacted the stock prices, which is why on financial data doing a reliable trend analysis is very difficult. The most efficient way to solve this kind of issue is with the help of Machine learning and Deep learning.
In this tutorial, we will be solving this problem with ARIMA Model.
To know about seasonality please refer to my previous blog, And to get a basic understanding of ARIMA I would recommend you to go through this blog, this will help you to get a better understanding of how Time Series analysis works.
Implementing stock price forecastingI will be using nsepy library to extract the historical data for SBIN.
Imports and Reading DataPython Code:
The data shows the stock price of SBIN from 2023-1-1 to 2023-11-1. The goal is to create a model that will forecast the closing price of the stock.Let us create a visualization which will show per day closing price of the stock-
plt.figure(figsize=(10,6)) plt.grid(True) plt.xlabel('Dates') plt.ylabel('Close Prices') plt.plot(sbin['Close']) plt.title('SBIN closing price') plt.show() plt.figure(figsize=(10,6)) df_close = sbin['Close'] df_close.plot(style='k.') plt.title('Scatter plot of closing price') plt.show() plt.figure(figsize=(10,6)) df_close = sbin['Close'] df_close.plot(style='k.',kind='hist') plt.title('Hisogram of closing price') plt.show()First, we need to check if a series is stationary or not because time series analysis only works with stationary data.
Testing For Stationarity:
To identify the nature of the data, we will be using the null hypothesis.
H0: The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.
H1: The alternative hypothesis: It is a claim about the population that is contradictory to H0 and what we conclude when we reject H0.
#Ho: It is non-stationary
#H1: It is stationary
If we fail to reject the null hypothesis, we can say that the series is non-stationary. This means that the series can be linear.
If both mean and standard deviation are flat lines(constant mean and constant variance), the series becomes stationary.
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries): #Determing rolling statistics rolmean = timeseries.rolling(12).mean() rolstd = timeseries.rolling(12).std() #Plot rolling statistics: plt.plot(timeseries, color='yellow',label='Original') plt.plot(rolmean, color='red', label='Rolling Mean') plt.plot(rolstd, color='black', label = 'Rolling Std') plt.legend(loc='best') plt.title('Rolling Mean and Standard Deviation') plt.show(block=False) print("Results of dickey fuller test") adft = adfuller(timeseries,autolag='AIC') # output for dft will give us without defining what the values are. #hence we manually write what values does it explains using a for loop output = pd.Series(adft[0:4],index=['Test Statistics','p-value','No. of lags used','Number of observations used']) for key,values in adft[4].items(): output['critical value (%s)'%key] = values print(output) test_stationarity(sbin['Close']) After analysing the above graph, we can see the increasing mean and standard deviation and hence our series is not stationary. Results of dickey fuller test Test Statistics -1.914523 p-value 0.325260 No. of lags used 3.000000 Number of observations used 5183.000000 critical value (1%) -3.431612 critical value (5%) -2.862098 critical value (10%) -2.567067 dtype: float64We see that the p-value is greater than 0.05 so we cannot reject the Null hypothesis. Also, the test statistics is greater than the critical values. so the data is non-stationary.
For time series analysis we separate Trend and Seasonality from the time series.
result = seasonal_decompose(df_close, model='multiplicative', freq = 30) fig = plt.figure() fig = result.plot() fig.set_size_inches(16, 9) from pylab import rcParams rcParams['figure.figsize'] = 10, 6 df_log = np.log(sbin['Close']) moving_avg = df_log.rolling(12).mean() std_dev = df_log.rolling(12).std() plt.legend(loc='best') plt.title('Moving Average') plt.plot(std_dev, color ="black", label = "Standard Deviation") plt.plot(moving_avg, color="red", label = "Mean") plt.legend() plt.show()Now we are going to create an ARIMA model and will train it with the closing price of the stock on the train data. So let us split the data into training and test set and visualize it.
train_data, test_data = df_log[3:int(len(df_log)*0.9)], df_log[int(len(df_log)*0.9):] plt.figure(figsize=(10,6)) plt.grid(True) plt.xlabel('Dates') plt.ylabel('Closing Prices') plt.plot(df_log, 'green', label='Train data') plt.plot(test_data, 'blue', label='Test data') plt.legend() model_autoARIMA = auto_arima(train_data, start_p=0, start_q=0, test='adf', # use adftest to find optimal 'd' max_p=3, max_q=3, # maximum p and q m=1, # frequency of series d=None, # let model determine 'd' seasonal=False, # No Seasonality start_P=0, D=0, trace=True, error_action='ignore', suppress_warnings=True, stepwise=True) print(model_autoARIMA.summary()) Performing stepwise search to minimize aic ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=-16607.561, Time=2.19 sec ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=-16607.961, Time=0.95 sec ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=-16608.035, Time=2.27 sec ARIMA(0,1,0)(0,0,0)[0] : AIC=-16609.560, Time=0.39 sec ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=-16606.477, Time=2.77 sec Best model: ARIMA(0,1,0)(0,0,0)[0] Total fit time: 9.079 seconds SARIMAX Results ============================================================================== Dep. Variable: y No. Observations: 4665 Model: SARIMAX(0, 1, 0) Log Likelihood 8305.780 Date: Tue, 24 Nov 2023 AIC -16609.560 Time: 20:08:50 BIC -16603.113 Sample: 0 HQIC -16607.293 - 4665 Covariance Type: opg ============================================================================== ------------------------------------------------------------------------------ sigma2 0.0017 1.06e-06 1566.660 0.000 0.002 0.002 =================================================================================== Ljung-Box (Q): 24.41 Jarque-Bera (JB): 859838819.58 Prob(Q): 0.98 Prob(JB): 0.00 Heteroskedasticity (H): 7.16 Skew: -37.54 Prob(H) (two-sided): 0.00 Kurtosis: 2105.12 =================================================================================== model_autoARIMA.plot_diagnostics(figsize=(15,8)) plt.show() model = ARIMA(train_data, order=(3, 1, 2)) fitted = model.fit(disp=-1) print(fitted.summary()) ARIMA Model Results ============================================================================== Dep. Variable: D.Close No. Observations: 4664 Model: ARIMA(3, 1, 2) Log Likelihood 8309.178 Method: css-mle S.D. of innovations 0.041 Date: Tue, 24 Nov 2023 AIC -16604.355 Time: 20:09:37 BIC -16559.222 Sample: 1 HQIC -16588.481 ================================================================================= --------------------------------------------------------------------------------- const 8.761e-06 0.001 0.015 0.988 -0.001 0.001 ar.L1.D.Close 1.3689 0.251 5.460 0.000 0.877 1.860 ar.L2.D.Close -0.7118 0.277 -2.567 0.010 -1.255 -0.168 ar.L3.D.Close 0.0094 0.021 0.445 0.657 -0.032 0.051 ma.L1.D.Close -1.3468 0.250 -5.382 0.000 -1.837 -0.856 ma.L2.D.Close 0.6738 0.282 2.391 0.017 0.122 1.226 Roots ============================================================================= Real Imaginary Modulus Frequency ----------------------------------------------------------------------------- AR.1 0.9772 -0.6979j 1.2008 -0.0987 AR.2 0.9772 +0.6979j 1.2008 0.0987 AR.3 74.0622 -0.0000j 74.0622 -0.0000 MA.1 0.9994 -0.6966j 1.2183 -0.0969 MA.2 0.9994 +0.6966j 1.2183 0.0969 ----------------------------------------------------------------------------- # Forecast fc, se, conf = fitted.forecast(519, alpha=0.05) # 95% confidence fc_series = pd.Series(fc, index=test_data.index) lower_series = pd.Series(conf[:, 0], index=test_data.index) upper_series = pd.Series(conf[:, 1], index=test_data.index) plt.figure(figsize=(12,5), dpi=100) plt.plot(train_data, label='training') plt.plot(test_data, color = 'blue', label='Actual Stock Price') plt.plot(fc_series, color = 'orange',label='Predicted Stock Price') plt.fill_between(lower_series.index, lower_series, upper_series, color='k', alpha=.10) plt.title('SBIN Stock Price Prediction') plt.xlabel('Time') plt.ylabel('Actual Stock Price') plt.legend(loc='upper left', fontsize=8) plt.show()
Time Series forecasting is really useful when we have to take future decisions or we have to do analysis, we can quickly do that using ARIMA, there are lots of other Models from we can do the time series forecasting but ARIMA is really easy to understand.
I hope this article will help you and save a good amount of time. Let me know if you have any suggestions.
HAPPY CODING.
Prabhat Pathak (Linkedin profile) is a Senior Analyst and innovation Enthusiast.
Related
You're reading Stock Market Price Trend Prediction Using Time Series Forecasting
Stock Prices Prediction Using Machine Learning And Deep Learning
17 minutes
⭐⭐⭐⭐⭐
Rating: 5 out of 5.
IntroductionPredicting how the stock market will perform is one of the most difficult things to do. There are so many factors involved in the prediction – physical factors vs. psychological, rational and irrational behavior, etc. All these aspects combine to make share prices volatile and very difficult to predict with a high degree of accuracy.
Can we use machine learning as a game-changer in this domain? Using features like the latest announcements about an organization, their quarterly revenue results, etc., machine learning techniques have the potential to unearth patterns and insights we didn’t see before, and these can be used to make unerringly accurate predictions.
The core idea behind this article is to showcase how these algorithms are implemented. I will briefly describe the technique and provide relevant links to brush up on the concepts as and when necessary. In case you’re a newcomer to the world of time series, I suggest going through the following articles first:
Are you a beginner looking for a place to start your data science journey? Presenting a comprehensive course, full of knowledge and data science learning, curated just for you! This course covers everything from basics of Machine Learning to Advanced concepts of ML, Deep Learning and Time series.
Understanding the Problem StatementWe’ll dive into the implementation part of this article soon, but first it’s important to establish what we’re aiming to solve. Broadly, stock market analysis is divided into two parts – Fundamental Analysis and Technical Analysis.
Fundamental Analysis involves analyzing the company’s future profitability on the basis of its current business environment and financial performance.
Technical Analysis, on the other hand, includes reading the charts and using statistical figures to identify the trends in the stock market.
As you might have guessed, our focus will be on the technical analysis part. We’ll be using a dataset from Quandl (you can find historical data for various stocks here) and for this particular project, I have used the data for ‘Tata Global Beverages’. Time to dive in!
Note: Here is the dataset I used for the code: Download
We will first load the dataset and define the target variable for the problem:
Python Code:
There are multiple variables in the dataset – date, open, high, low, last, close, total_trade_quantity, and turnover.
The columns Open and Close represent the starting and final price at which the stock is traded on a particular day.
High, Low and Last represent the maximum, minimum, and last price of the share for the day.
Total Trade Quantity is the number of shares bought or sold in the day and Turnover (Lacs) is the turnover of the particular company on a given date.
Another important thing to note is that the market is closed on weekends and public holidays. Notice the above table again, some date values are missing – 2/10/2023, 6/10/2023, 7/10/2023. Of these dates, 2nd is a national holiday while 6th and 7th fall on a weekend.
The profit or loss calculation is usually determined by the closing price of a stock for the day, hence we will consider the closing price as the target variable. Let’s plot the target variable to understand how it’s shaping up in our data:
#setting index as date df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d') df.index = df['Date'] #plot plt.figure(figsize=(16,8)) plt.plot(df['Close'], label='Close Price history')In the upcoming sections, we will explore these variables and use different techniques to predict the daily closing price of the stock.
Moving Average Introduction‘Average’ is easily one of the most common things we use in our day-to-day lives. For instance, calculating the average marks to determine overall performance, or finding the average temperature of the past few days to get an idea about today’s temperature – these all are routine tasks we do on a regular basis. So this is a good starting point to use on our dataset for making predictions.
The predicted closing price for each day will be the average of a set of previously observed values. Instead of using the simple average, we will be using the moving average technique which uses the latest set of values for each prediction. In other words, for each subsequent step, the predicted values are taken into consideration while removing the oldest observed value from the set. Here is a simple figure that will help you understand this with more clarity.
We will implement this technique on our dataset. The first step is to create a dataframe that contains only the Date and Close price columns, then split it into train and validation sets to verify our predictions.
Implementation
Just checking the RMSE does not help us in understanding how the model performed. Let’s visualize this to get a more intuitive understanding. So here is a plot of the predicted values along with the actual values.
#plot valid['Predictions'] = 0 valid['Predictions'] = preds plt.plot(train['Close']) plt.plot(valid[['Close', 'Predictions']]) InferenceThe RMSE value is close to 105 but the results are not very promising (as you can gather from the plot). The predicted values are of the same range as the observed values in the train set (there is an increasing trend initially and then a slow decrease).
In the next section, we will look at two commonly used machine learning techniques – Linear Regression and kNN, and see how they perform on our stock market data.
Linear Regression IntroductionThe most basic machine learning algorithm that can be implemented on this data is linear regression. The linear regression model returns an equation that determines the relationship between the independent variables and the dependent variable.
The equation for linear regression can be written as:
Here, x1, x2,….xn represent the independent variables while the coefficients θ1, θ2, …. θn represent the weights. You can refer to the following article to study linear regression in more detail:
For our problem statement, we do not have a set of independent variables. We have only the dates instead. Let us use the date column to extract features like – day, month, year, mon/fri etc. and then fit a linear regression model.
ImplementationWe will first sort the dataset in ascending order and then create a separate dataset so that any new feature created does not affect the original data.
#setting index as date values df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d') df.index = df['Date'] #sorting data = df.sort_index(ascending=True, axis=0) #creating a separate dataset new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] #create features from fastai.structured import add_datepart add_datepart(new_data, 'Date') new_data.drop('Elapsed', axis=1, inplace=True) #elapsed will be the time stampThis creates features such as:
‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.
Note: I have used add_datepart from fastai library. If you do not have it installed, you can simply use the command pip install fastai. Otherwise, you can create these feature using simple for loops in python. I have shown an example below.
Apart from this, we can add our own set of features that we believe would be relevant for the predictions. For instance, my hypothesis is that the first and last days of the week could potentially affect the closing price of the stock far more than the other days. So I have created a feature that identifies whether a given day is Monday/Friday or Tuesday/Wednesday/Thursday. This can be done using the following lines of code:
new_data['mon_fri'] = 0 for i in range(0,len(new_data)): if (new_data['Dayofweek'][i] == 0 or new_data['Dayofweek'][i] == 4): new_data['mon_fri'][i] = 1 else: new_data['mon_fri'][i] = 0We will now split the data into train and validation sets to check the performance of the model.
#split into train and validation train = new_data[:987] valid = new_data[987:] x_train = train.drop('Close', axis=1) y_train = train['Close'] x_valid = valid.drop('Close', axis=1) y_valid = valid['Close'] #implement linear regression from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(x_train,y_train) Results #make predictions and find the rmse preds = model.predict(x_valid) rms=np.sqrt(np.mean(np.power((np.array(y_valid)-np.array(preds)),2))) rms 121.16291596523156The RMSE value is higher than the previous technique, which clearly shows that linear regression has performed poorly. Let’s look at the plot and understand why linear regression has not done well:
#plot valid['Predictions'] = 0 valid['Predictions'] = preds valid.index = new_data[987:].index train.index = new_data[:987].index plt.plot(train['Close']) plt.plot(valid[['Close', 'Predictions']]) InferenceAs seen from the plot above, for January 2023 and January 2023, there was a drop in the stock price. The model has predicted the same for January 2023. A linear regression technique can perform well for problems such as Big Mart sales where the independent features are useful for determining the target value.
k-Nearest Neighbours IntroductionAnother interesting ML algorithm that one can use here is kNN (k nearest neighbours). Based on the independent variables, kNN finds the similarity between new data points and old data points. Let me explain this with a simple example.
Consider the height and age for 11 people. On the basis of given features (‘Age’ and ‘Height’), the table can be represented in a graphical format as shown below:
To determine the weight for ID #11, kNN considers the weight of the nearest neighbors of this ID. The weight of ID #11 is predicted to be the average of it’s neighbors. If we consider three neighbours (k=3) for now, the weight for ID#11 would be = (77+72+60)/3 = 69.66 kg.
For a detailed understanding of kNN, you can refer to the following articles:
Implementation #importing libraries from sklearn import neighbors from sklearn.model_selection import GridSearchCV from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1))Using the same train and validation set from the last section:
#scaling data x_train_scaled = scaler.fit_transform(x_train) x_train = pd.DataFrame(x_train_scaled) x_valid_scaled = scaler.fit_transform(x_valid) x_valid = pd.DataFrame(x_valid_scaled) #using gridsearch to find the best parameter params = {'n_neighbors':[2,3,4,5,6,7,8,9]} knn = neighbors.KNeighborsRegressor() model = GridSearchCV(knn, params, cv=5) #fit the model and make predictions model.fit(x_train,y_train) preds = model.predict(x_valid) Results #rmse rms=np.sqrt(np.mean(np.power((np.array(y_valid)-np.array(preds)),2))) rms 115.17086550026721There is not a huge difference in the RMSE value, but a plot for the predicted and actual values should provide a more clear understanding.
#plot valid['Predictions'] = 0 valid['Predictions'] = preds plt.plot(valid[['Close', 'Predictions']]) plt.plot(train['Close']) InferenceThe RMSE value is almost similar to the linear regression model and the plot shows the same pattern. Like linear regression, kNN also identified a drop in January 2023 since that has been the pattern for the past years. We can safely say that regression algorithms have not performed well on this dataset.
Let’s go ahead and look at some time series forecasting techniques to find out how they perform when faced with this stock prices prediction challenge.
Auto ARIMA IntroductionARIMA is a very popular statistical method for time series forecasting. ARIMA models take into account the past values to predict the future values. There are three important parameters in ARIMA:
p (past values used for forecasting the next value)
q (past forecast errors used to predict the future values)
d (order of differencing)
Parameter tuning for ARIMA consumes a lot of time. So we will use auto ARIMA which automatically selects the best combination of (p,q,d) that provides the least error. To read more about how auto ARIMA works, refer to this article:
Implementation from pyramid.arima import auto_arima data = df.sort_index(ascending=True, axis=0) train = data[:987] valid = data[987:] training = train['Close'] validation = valid['Close'] model = auto_arima(training, start_p=1, start_q=1,max_p=3, max_q=3, m=12,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore',suppress_warnings=True) model.fit(training) forecast = model.predict(n_periods=248) forecast = pd.DataFrame(forecast,index = valid.index,columns=['Prediction']) Results rms=np.sqrt(np.mean(np.power((np.array(valid['Close'])-np.array(forecast['Prediction'])),2))) rms 44.954584993246954 #plot plt.plot(train['Close']) plt.plot(valid['Close']) plt.plot(forecast['Prediction']) InferenceAs we saw earlier, an auto ARIMA model uses past data to understand the pattern in the time series. Using these values, the model captured an increasing trend in the series. Although the predictions using this technique are far better than that of the previously implemented machine learning models, these predictions are still not close to the real values.
As its evident from the plot, the model has captured a trend in the series, but does not focus on the seasonal part. In the next section, we will implement a time series model that takes both trend and seasonality of a series into account.
Prophet IntroductionThere are a number of time series techniques that can be implemented on the stock prediction dataset, but most of these techniques require a lot of data preprocessing before fitting the model. Prophet, designed and pioneered by Facebook, is a time series forecasting library that requires no data preprocessing and is extremely simple to implement. The input for Prophet is a dataframe with two columns: date and target (ds and y).
Prophet tries to capture the seasonality in the past data and works well when the dataset is large. Here is an interesting article that explains Prophet in a simple and intuitive manner:
Implementation #importing prophet from fbprophet import Prophet #creating dataframe new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] new_data['Date'] = pd.to_datetime(new_data.Date,format='%Y-%m-%d') new_data.index = new_data['Date'] #preparing data new_data.rename(columns={'Close': 'y', 'Date': 'ds'}, inplace=True) #train and validation train = new_data[:987] valid = new_data[987:] #fit the model model = Prophet() model.fit(train) #predictions close_prices = model.make_future_dataframe(periods=len(valid)) forecast = model.predict(close_prices) Results #rmse forecast_valid = forecast['yhat'][987:] rms=np.sqrt(np.mean(np.power((np.array(valid['y'])-np.array(forecast_valid)),2))) rms 57.494461930575149 #plot valid['Predictions'] = 0 valid['Predictions'] = forecast_valid.values plt.plot(train['y']) plt.plot(valid[['y', 'Predictions']]) InferenceProphet (like most time series forecasting techniques) tries to capture the trend and seasonality from past data. This model usually performs well on time series datasets, but fails to live up to it’s reputation in this case.
As it turns out, stock prices do not have a particular trend or seasonality. It highly depends on what is currently going on in the market and thus the prices rise and fall. Hence forecasting techniques like ARIMA, SARIMA and Prophet would not show good results for this particular problem.
Long Short Term Memory (LSTM) IntroductionLSTMs are widely used for sequence prediction problems and have proven to be extremely effective. The reason they work so well is because LSTM is able to store past information that is important, and forget the information that is not. LSTM has three gates:
The input gate: The input gate adds information to the cell state
The forget gate: It removes the information that is no longer required by the model
The output gate: Output Gate at LSTM selects the information to be shown as output
For a more detailed understanding of LSTM and its architecture, you can go through the below article:
For now, let us implement LSTM as a black box and check it’s performance on our particular data.
Implementation #importing required libraries from sklearn.preprocessing import MinMaxScaler from keras.models import Sequential from keras.layers import Dense, Dropout, LSTM #creating dataframe data = df.sort_index(ascending=True, axis=0) new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] #setting index new_data.index = new_data.Date new_data.drop('Date', axis=1, inplace=True) #creating train and test sets dataset = new_data.values train = dataset[0:987,:] valid = dataset[987:,:] #converting dataset into x_train and y_train scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(dataset) x_train, y_train = [], [] for i in range(60,len(train)): x_train.append(scaled_data[i-60:i,0]) y_train.append(scaled_data[i,0]) x_train, y_train = np.array(x_train), np.array(y_train) x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1)) # create and fit the LSTM network model = Sequential() model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1],1))) model.add(LSTM(units=50)) model.add(Dense(1)) model.fit(x_train, y_train, epochs=1, batch_size=1, verbose=2) #predicting 246 values, using past 60 from the train data inputs = new_data[len(new_data) - len(valid) - 60:].values inputs = inputs.reshape(-1,1) inputs = scaler.transform(inputs) X_test = [] for i in range(60,inputs.shape[0]): X_test.append(inputs[i-60:i,0]) X_test = np.array(X_test) X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1)) closing_price = model.predict(X_test) closing_price = scaler.inverse_transform(closing_price) Results rms=np.sqrt(np.mean(np.power((valid-closing_price),2))) rms 11.772259608962642 #for plotting train = new_data[:987] valid = new_data[987:] valid['Predictions'] = closing_price plt.plot(train['Close']) plt.plot(valid[['Close','Predictions']]) InferenceWow! The LSTM model can be tuned for various parameters such as changing the number of LSTM layers, adding dropout value or increasing the number of epochs. But are the predictions from LSTM enough to identify whether the stock price will increase or decrease? Certainly not!
As I mentioned at the start of the article, stock price is affected by the news about the company and other factors like demonetization or merger/demerger of the companies. There are certain intangible factors as well which can often be impossible to predict beforehand.
ConclusionTime series forecasting is a very intriguing field to work with, as I have realized during my time writing these articles. There is a perception in the community that it’s a complex field, and while there is a grain of truth in there, it’s not so difficult once you get the hang of the basic techniques.
Frequently Asked QuestionsQ1. Is it possible to predict the stock market with Deep Learning?
A. Yes, it is possible to predict the stock market with Deep Learning algorithms such as moving average, linear regression, Auto ARIMA, LSTM, and more.
Q2. What can you use to predict stock prices in Deep Learning?
A. Moving average, linear regression, KNN (k-nearest neighbor), Auto ARIMA, and LSTM (Long Short Term Memory) are some of the most common Deep Learning algorithms used to predict stock prices.
Q3. What are the two methods to predict stock price?
A. Fundamental Analysis and Technical Analysis are the two ways of analyzing and predicting stock prices.
Related
Safemoon Price Prediction, Will Sfm’s Price Hit $0.00031?
SFM should be around $0.00024.
SafeMoon price prediction 22 Jul 2023: SafeMoon’s price for 22 Jul 2023 according to our analysis should range between $0.00022 to $0.00026 and the average price of SFM should be around $0.00024.
SafeMoon price prediction 23 Jul 2023: SafeMoon’s price for 23 Jul 2023 according to our analysis should range between $0.00022 to $0.00026 and the average price of SFM should be around $0.00024.
SafeMoon price prediction 24 Jul 2023: SafeMoon’s price for 24 Jul 2023 according to our analysis should range between $0.00022 to $0.00025 and the average price of SFM should be around $0.00023.
SafeMoon price prediction 29 Jul 2023: SafeMoon’s price for 29 Jul 2023 according to our analysis should range between $0.00021 to $0.00025 and the average price of SFM should be around $0.00023.
SafeMoon price prediction 3 Aug 2023: SafeMoon’s price for 3 Aug 2023 according to our analysis should range between $0.00021 to $0.00024 and the average price of SFM should be around $0.00023.
SafeMoon price prediction 13 Aug 2023: SafeMoon’s price for 13 Aug 2023 according to our analysis should range between $0.00021 to $0.00024 and the average price of SFM should be around $0.00023.
SafeMoon price prediction September 2023: SafeMoon’s price for September 2023 according to our analysis should range between $0.00022 to $0.00025 and the average price of SFM should be around $0.00024.
SafeMoon price prediction October 2023: SafeMoon’s price for October 2023 according to our analysis should range between $0.00023 to $0.00026 and the average price of SFM should be around $0.00024.
SafeMoon price prediction November 2023: SafeMoon’s price for November 2023 according to our analysis should range between $0.00023 to $0.00027 and the average price of SFM should be around $0.00025.
SafeMoon price prediction December 2023: SafeMoon’s price for December 2023 according to our analysis should range between $0.00024 to $0.00028 and the average price of SFM should be around $0.00026.
SafeMoon price prediction 2024: SafeMoon’s price for 2024 according to our analysis should range between $0.00032 to $0.00048 and the average price of SFM should be around $0.0004.
SafeMoon price prediction 2025: SafeMoon’s price for 2025 according to our analysis should range between $0.00042 to $0.00063 and the average price of SFM should be around $0.00053.
SafeMoon price prediction 2026: SafeMoon’s price for 2026 according to our analysis should range between $0.00056 to $0.00084 and the average price of SFM should be around $0.0007.
SafeMoon price prediction 2027: SafeMoon’s price for 2027 according to our analysis should range between $0.00074 to $0.0011 and the average price of SFM should be around $0.00092.
SafeMoon price prediction 2028: SafeMoon’s price for 2028 according to our analysis should range between $0.00097 to $0.0014 and the average price of SFM should be around $0.0012.
SafeMoon price prediction 2029: SafeMoon’s price for 2029 according to our analysis should range between $0.0012 to $0.0019 and the average price of SFM should be around $0.0016.
SafeMoon price prediction 2030: SafeMoon’s price for 2030 according to our analysis should range between $0.0017 to $0.0025 and the average price of SFM should be around $0.0021.
SafeMoon price prediction 2031: SafeMoon’s price for 2031 according to our analysis should range between $0.0022 to $0.0033 and the average price of SFM should be around $0.0028.
SafeMoon price prediction 2032: SafeMoon’s price for 2032 according to our analysis should range between $0.0029 to $0.0044 and the average price of SFM should be around $0.0037.
SafeMoon price prediction 2033: SafeMoon’s price for 2033 according to our analysis should range between $0.0039 to $0.0058 and the average price of SFM should be around $0.0049.
SafeMoon price prediction 2034: SafeMoon’s price for 2034 according to our analysis should range between $0.0051 to $0.0077 and the average price of
Granger Causality In Time Series – Explained Using Chicken And Egg Problem
This article was published as a part of the Data Science Blogathon
Introduction
Time series analysis provides insight into the pattern or characteristics of time series data. Time series data can be decomposed into three components:
Trend – This shows the tendency of the data over a long period of time, it may be upward, downward, or stable.
Seasonality – It is the variation that occurs in a periodic manner and repeats each year.
Noise or Random – Fluctuations in the data that are erratic.
Forecasting the data for a particular time in the future is known as Time Series Forecasting. It is one of the powerful machine learning models widely used in the fields like finance, weather forecasting, the health sector, environmental studies, business, retail, etc to arrive at strategic decisions.
In real-time, most of the data consists of multiple variables, in which independent variables might depend on other independent variables, these relations will have an impact on the predictions or forecasting. Most of the time people generally get misled and build multilinear regression models in such cases. A High R square value will further mislead and make a poor prediction.
Spurious RegressionLinear regression might indicate a strong relationship between two or more variables, but these variables may be totally unrelated in reality. Predictions fail when it comes to domain knowledge, this scenario is known as spurious regression.
There is a strong relationship between chicken consumption and crude oil exports in the below graph even though they are unrelated.
Strong trend / nonstationary and higher R square are observed in spurious regression. Spurious regression has to be eliminated while building the model since they are unrelated and have no causal relationship.
Multilinear regression makes use of a correlation matrix to check the dependency between all the independent variables. If the correlation coefficient value is high between two variables any one variable is retained and another one is discarded to remove the dependency. In the dataset when time is a confounding factor multilinear regression fails, the correlation coefficient which is used to eliminate the variable is not time-bound instead it just gives the correlation between the two variables.
Consider the above time-series graph, variable X has a direct influence on variable Y but there is a lag of 5 between X and Y in which case we cant use the correlation matrix. For eg. an increase in coronavirus positive cases in the city and an increase in the number of people getting hospitalized. For better forecasting, here we would like to know if there is a causal relationship.
Granger Causality comes to RescueIt is basically an econometric hypothetical test for verifying the usage of one variable in forecasting another in multivariate time series data with a particular lag.
A prerequisite for performing the Granger Causality test is that the data need to be stationary i.e it should have a constant mean, constant variance, and no seasonal component. Transform the non-stationary data to stationary data by differencing it, either first-order or second-order differencing. Do not proceed with the Granger causality test if the data is not stationary after second-order differencing.
Let us consider three variables Xt , Yt , and Wt preset in time series data.
Case 1: Forecast Xt+1 based on past values Xt .
Case 2: Forecast Xt+1 based on past values Xt and Yt.
Case3 : Forecast Xt+1 based on past values Xt , Yt , and Wt, where variable Yt has direct dependency on variable Wt.
Here Case 1 is univariate time series also known as the autoregressive model in which there is a single variable and forecasting is done based on the same variable lagged by say order p.
Xt = α + 𝛾1 X𝑡−1 + 𝛾2X𝑡−2 + ⋯ + 𝛾𝑝X𝑡−𝑝
where p parameters (degrees of freedom) to be estimated.
In Case 2 the past values of Y contain information for forecasting Xt+1. Yt is said to “Granger cause” Xt+1 provided Yt occurs before Xt+1 and it contains data for forecasting Xt+1.
Equation using a predictor Yt (UNRESTRICTED MODEL, UM)
Xt = α + 𝛾1 X𝑡−1 + 𝛾2X𝑡−2 + ⋯ + 𝛾𝑝X𝑡−𝑝 + α1Yt-1+ ⋯ + α𝑝 Yt-p
2p parameters (degrees of freedom) to be estimated.
If Yt causes Xt, then Y must precede X which implies:
Lagged values of Y should be significantly related to X.
Lagged values of X should not be significantly related to Y.
Case 3 can not be used to find Granger causality since variable Yt is influenced by variable Wt.
Hypothesis testNull Hypothesis (H0) : Yt does not “Granger cause” Xt+1 i.e., 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑝 = 0
Alternate Hypothesis(HA): Yt does “Granger cause” Xt+1, i.e., at least one of the lags of Y is significant.
Calculate the f-statisticFp,n-2𝑝−1 = (𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒) / (𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
Fp,n-2𝑝−1 = ( (𝑆𝑆𝐸𝑅𝑀−𝑆𝑆𝐸𝑈𝑀) /𝑝) /(𝑆𝑆𝐸𝑈𝑀 /𝑛−2𝑝−1)
SSE is Sum of Squared Errors.
Try different lags (p). The optimal lag can be determined using AIC.
Limitation
Granger causality does not provide any insight on the relationship between the variable hence it is not true causality unlike ’cause and effect’ analysis.
Granger causality fails to forecast when there is an interdependency between two or more variables (as stated in Case 3).
Granger causality test can’t be performed on non-stationary data.
Resolving Chicken and Egg problemLet us apply Granger causality to check whether the egg came first or chicken came first.
Importing libraries import matplotlib.pyplot as plt import seaborn as sns import numpy as np import pandas as pd Loading DatasetData is from the U.S. Department of Agriculture. It consists of two-time series variables from 1930 to 1983, one of U.S. egg production and the other the estimated U.S. chicken population.
df = pd.read_csv('chickegg.csv') Exploring the Dataset df.head() df.dtypes df.shape(53, 3)
df.describe()Check if the data is stationary, if not make it stationary to proceed.
# Draw Plot def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100): plt.figure(figsize=(16,5), dpi=dpi) plt.plot(x, y, color='tab:red') plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel) plt.show() plot_df(df, x=df.Year, y=df.chicken, title='Polulation of the chicken across US') plot_df(df, x=df.Year, y=df.egg, title='Egg Produciton')By visual inspection, both the chicken and egg data are not stationary. Let us confirm this by running Augmented Test (ADF Test).
(ADF test)ADF test is a popular statistical test for checking whether the Time Series is stationary or not which works based on the unit root test. The number of unit roots present in the series indicates the number of differencing operations that are required to make it stationary
Consider the hypothesis test where:
Null Hypothesis (H0): Series has a unit root and is non-stationary.
Alternative Hypothesis (HA): Series has no unit root and is stationary.
from statsmodels.tsa.stattools import adfuller result = adfuller(df['chicken']) print(f'Test Statistics: {result[0]}') print(f'p-value: {result[1]}') print(f'critical_values: {result[4]}')print(“Series is not stationary”) else: print(“Series is stationary”)
result = adfuller(df['egg']) print(f'Test Statistics: {result[0]}') print(f'p-value: {result[1]}') print(f'critical_values: {result[4]}') print("Series is not stationary") else: print("Series is stationary")p-values of both the egg and chicken variables are greater than the significant value (0.05), the Null hypothesis is valid and the series is not stationary.
Data TransformationGranger causality test is carried out only on stationary data hence we need to transform the data by differencing it to make it stationary. Let us perform the first-order differencing on chicken and egg data.
df_transformed = df.diff().dropna() df = df.iloc[1:] print(df.shape) df_transformed.shape df_transformed.head() plot_df(df_transformed, x=df.Year, y=df_transformed.chicken, title='Polulation of the chicken across US') plot_df(df_transformed, x=df.Year, y=df_transformed.egg, title='Egg Produciton')Repeat the ADF test again on differenced data to check for stationarity.
result = adfuller(df_transformed['chicken']) print(f'Test Statistics: {result[0]}') print(f'p-value: {result[1]}') print(f'critical_values: {result[4]}') print("Series is not stationary") else: print("Series is stationary") result = adfuller(df_transformed['egg']) print(f'Test Statistics: {result[0]}') print(f'p-value: {result[1]}') print(f'critical_values: {result[4]}') print("Series is not stationary") else: print("Series is stationary")Transformed chicken and egg data are stationary, hence there is no need to go for second-order differencing.
Test the Granger CausalityThere are several ways to find the optimal lag but for simplicity let’s consider 4th lag as of now.
Do eggs granger cause chickens?
Null Hypothesis (H0) : eggs do not granger cause chicken.
Alternative Hypothesis (HA) : eggs granger cause chicken.
from statsmodels.tsa.stattools import grangercausalitytests grangercausalitytests(df_transformed[['chicken', 'egg']], maxlag=4)p-value is very low, Null hypothesis is rejected hence eggs are granger causing chicken.
That implies eggs came first.
Now repeat the Granger causality test in the opposite direction.
Do chickens granger cause eggs at lag 4?
Null Hypothesis (H0) : chicken does not granger cause eggs.
Alternative Hypothesis (HA) : chicken granger causes eggs.
grangercausalitytests(df_transformed[['egg', 'chicken']], maxlag=4)The p-value is considerably high thus chickens do not granger cause eggs.
The above analysis concludes that the egg came first and not the chicken.
Once the analysis is done the next step is to begin forecasting using time series forecasting models.
EndNoteThank you for reading!
About AuthorHello, Pallavi Padav from Mangalore holding PG degree in Data Science from INSOFE. Passionate about participating in Data Science hackathons, blogathons and workshops.
Would love to catch you on Linkedin. Mail me here for any queries.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Doge And Shib Price Prediction – Collateral Network Set To Surpass Both
Collateral Network (COLT) has swiftly captured all the market attention due to its unique business plan, and mammoth presale growth. According to experts, early Collateral Network investors may get 100x profit by the end of this year. Therefore, bulls have rallied behind Collateral Network, while Dogecoin (DOGE) and Shiba Inu (SHIB) are still in troubled waters.
Musk’s Twitter Resignation Hurts DogecoinAs Elon Musk has stepped out from being the Twitter CEO, experts have been pondering upon its impact on Dogecoin. A few weeks back, Twitter revealed that it would soon allow crypto trading on its platform, and Dogecoin was expected to be its biggest beneficiary, given the billionaire’s support for the meme coin.
However, a change in the Twitter CEO may hurt Dogecoin, which has heavily relied on Musk’s support for growth. The bearish sentiments can also be corroborated by Dogecoin’s price trajectory. The price of Dogecoin (DOGE) has suffered a drop of 9% in the past month. Thus, Dogecoin is available at $0.0717. The immense rise of other meme coins, like Pepe Coin, has also contributed to the market struggle of Dogecoin.
Shiba Inu Sees Improvement In Key CategoriesIn positive news for the Shiba Inu community, the AAA rating of SHIB has been restored by CertiK. The development took place after Shiba Inu fixed several security issues. However, the news has not been able to support the price of Shiba Inu (SHIB). Shiba Inu’s price has plunged by 15% in the past month.
Hence, Shiba Inu is now changing hands at $0.00000856. Meanwhile, Shiba Inu has confirmed preorders for 5000 SHIB-themed cold wallets, whose delivery will start in July 2023. On the sidelines, BitFlyer, a Japanese crypto exchange, recently stated that a few cryptocurrencies, including Shiba Inu, will have to comply with the travel rule to keep operating on its platform.
Collateral Network Surges In PopularityA first-of-a-kind DeFi platform, Collateral Network, has disrupted the lending industry. It is a blockchain crowdlending platform that allows people to take loans against their physical assets at a competitive interest rate. Collateral Network accepts a range of physical assets as collateral to give loans, such as vintage cars, watches, fine wine, art, and many more.
Collateral Network is a borderless and permissionless platform, and people can take a loan by sending their physical assets directly to the company. Collateral Network accepts borrowers’ assets as collateral, and mints non-fungible tokens against them. Prior to minting NFTs, Collateral Network evaluates the assets using artificial intelligence.
Borrowers can get their collateralized assets returned to their addresses after settling the loan. But if borrowers default on their loans, their assets are auctioned to recover the loan amount. Lenders can buy these 100% asset-backed Collateral Network NFTs to lend funds to borrowers and in return, receive a fixed income every week.
Collateral Network (COLT) tokens have been developed on the Ethereum blockchain, with their smart contracts fully audited. The company has a KYC-audited team comprising experienced members. A Collateral Network token can be bought at $0.014, which is predicted to soar by 3500% during the presale round.
Find out more about the Collateral Network presale here:Time For An Xbox 360 And Ps3 Price Drop?
Let the saber-rattling begin. As the E3 rumor machine lumbers to life, prefacing a holiday season teeming with exclusives like Uncharted 3 and inFAMOUS 2 or Gears of War 3 and (maybe) Star Wars Kinect, price drops are once more on analyst and pundit minds.
This summer we’ll be two years into Sony’s PS3 price dive and a year into the Xbox 360’s makeover. Microsoft didn’t actually drop the price of its $300 Xbox 360 when it retooled the system last June, but it did toss in a few extras, like a larger internal hard drive and an integrated wireless card.
The PS3’s makeover in September 2009 was by contrast both transformative and sticker-slashing. Sony tucked the insides of its towering angular monstrosity into a svelte 33 percent smaller, 36 percent lighter case, then lopped $100 off while keeping the system’s core features–including Blu-ray playback–intact.
What’s next? Wedbush Morgan analyst Pachter thinks Microsoft goes first on a price drop, followed by a Sony match. What “going first” means is less clear than it was a couple years ago. If we side with Pachter’s view that the core gamer demographic is probably saturated, we’re looking at a war fought over double-dippers (you have one system, why not two?) and not-really-core buyers, waiting for prices to fall below $200.
Microsoft could drop the 250GB Xbox 360 price by $50 or even $100 and hit that price mark, but then it already sells a fully-loaded Xbox 360 for $200. With the disc-spin noise all but eliminated, do you really need more than 4GB storage for active downloadable content and save games?
It could alternatively play the bundle shell game, swapping out parts and slipping in Kinect or one of its upcoming exclusives to sweeten the deal. Fast as Kinect’s been selling, imagine the sales deluge if Microsoft sold its 4GB Xbox 360 with Kinect and a pack-in game (Kinect Sports?) for just $200.
If that happened, Sony would have to counter, possibly throwing a PlayStation Eye and Move controller into a system bundle, though it’s hard to imagine a $200 PS3, fully loaded, with all the pieces necessary to get Move up and running.
The question’s whether the PS3 has what buyers want. I don’t mean core gamers–they already have one. I’m talking about the rest, especially those eyeballing the motion-control craze. It’s a pretty tough sell on the PS3. For all its superior accuracy, Move requires between two and four control wands (not to mention the optional navigation controllers) for two players to go at it.
Microsoft’s Kinect requires none. And remember, we’re talking about the sort of buyer who plays a couple times a week, say family night, or with guests over. I’m not sure Move’s ever going to reach out and grab that consumer.
And Blu-ray, well, it’s not what it might have been. I’m a wannabe videophile, but the cost to buy the same movies cynically repackaged and clearly resold to bilk consumers has me flipping studios the bird, foregoing Blu-ray “extras,” and reaching for Netflix. I dumped my DVD collection last summer, and the only two Blu-ray discs I own are No Country for Old Men and There Will Be Blood. I’ve essentially given up on spindle-driven video media.
It’s not all Sony doom and gloom. The industry makes most of its bank off software, and Sony may have the edge in 2011 with exclusives like inFAMOUS 2, SOCOM 4, Ratchet and Clank: All For One, Twisted Metal, Resistance 3, Uncharted 3, and The Last Guardian.
Microsoft has Gears of War 3 and Forza 4, but about Kingdoms, Codename D, Star Wars Kinect, Project Draco, and XCOM, we know too little to say. And the rest (like Perfect Dark 2) are still only rumors.
Interact with Game On:
–
–
Get in touch
Update the detailed information about Stock Market Price Trend Prediction Using Time Series Forecasting on the Kientrucdochoi.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!