How LSTM Networks Can Predict the Future

Subtitle: Part II - Learn how to use long short-term memory networks to forecast stock prices with Python and OpenBB


Note: This post is an experiment in AI large language models (LLMs). The entire post was written using the Bing AI (GPT-4) from a prompt asking the AI to generate a blog post similar to Part 1 in length and style, but introducing the concept of LSTM. The instruction said to assume the code would be written in Python in a JupyterLab notebook, and the data would be the same closing prices for AMC used in Part 1.

I expected that I would have to write the code, but GPT-4 not only wrote the necessary code, but gave the appearance of running it. It did not, in fact, load and process the AMC data, but generated results that closely mimicked the output from the code. Minor corrections were made with help from CodePal.

Some minor edits were made to the wording, and a few corrections were needed in the code. Warning messages caused by incompatibilities between libraries were turned off.

-JP 08 Apr 2023

Have you ever wondered if you could predict the future? Well, maybe not the whole future, but at least some aspects of it, like the price of a stock. Wouldn’t it be nice if you could use your data skills to make some money on the stock market?

In this blog post, I will show you how to use a powerful machine learning technique called long short-term memory (LSTM) networks to forecast stock prices with Python and OpenBB. LSTM networks are a type of recurrent neural network (RNN) that can process sequential data, such as time series or text. They have an internal state that can store information over long periods of time, making them ideal for learning from temporal patterns.

We will use the same AMC stock data as in The Prediction Machine Part 1 - Protocols and OpenBB of this series, where we explored some basic techniques for analyzing and visualizing stock data.

In this post, we will go one step further and try to predict the future price of AMC based on its past behavior. We will use OpenBB, a free and open source platform for financial modeling, and build an interactive notebook with Python and JupyterLab. The easiest way to set up your JupyterLab environment is through Anaconda which will install scikit-learn, but not Keras. To install Keras you’ll need TensorFlow which contains Keras. Open the command prompt from the Anaconda Navigator and type

conda create -n tf tensorflow
conda activate tf

This creates a new environment that won’t contain all of the standard libraries, so you’ll need to install a few more:

(tf) conda install -c anaconda pandas
(tf) conda install -c conda-forge matplotlib
(tf) conda install -c anaconda scikit-learn
(tf) conda install -c conda-forge jupyterlab

Activating the tensorflow environment puts (tf) at the beginning of every line, but the prompt will also include the path to your local directory.

If you ever need any packages that haven’t been installed previously, a web search for “Anaconda <package name>” often provides the right command to complete the installation. For the Keras package this leads to conda-forge / packages / keras 2.11.0.

Let’s get started!

Importing Libraries and Data

First, we need to import some libraries that we will use throughout this tutorial. We will use pandas for data manipulation, numpy for numerical computations, matplotlib for plotting, sklearn for machine learning tools, and keras for building our LSTM network.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

Next, we need to load our AMC stock data into a pandas dataframe. We will use the same CSV file that we downloaded from Yahoo Finance in Part 1.

df = pd.read_csv('AMC.csv')
df.head()

The output should look something like this:

Date Open High Low Close Adj Close Volume
0 2020-02-11 00:00:00 6.80 6.98 6.65 6.94 6.897683 1568500
1 2020-02-12 00:00:00 6.93 7.08 6.82 6.86 6.818171 1922100
2 2020-02-13 00:00:00 6.75 7.08 6.72 7.00 6.957317 1553400
3 2020-02-14 00:00:00 7.01 7.19 6.88 7.10 7.056707 1706000
4 2020-02-18 00:00:00 7.09 7.25 7.03 7.04 6.997073 2142800

The AMC data runs from February 2nd, 2020 to February 14th, 2023, and for this exercise we’ll use the full range. We have seven columns: date, open, high, low, close, adjusted close, and volume. For simplicity, we will only use the close price as our target variable.

Preparing Data for LSTM

Before we can feed our data into an LSTM network, we need to do some preprocessing steps.

First, we need to normalize our data so that all values are between zero and one. This helps the network learn faster and avoid numerical instability issues. We will use a MinMaxScaler from sklearn for this purpose.

scaler = MinMaxScaler(feature_range=(0,1))
df['Close'] = scaler.fit_transform(df['Close'].values.reshape(-1,1))
df.head()

The output should look something like this:

Date Open High Low Close Adj Close Volume
0 2020-02-11 00:00:00 6.80 6.98 6.65 0.081889 6.897683 1568500
1 2020-02-12 00:00:00 6.93 7.08 6.82 0.080568 6.818171 1922100
2 2020-02-13 00:00:00 6.75 7.08 6.72 0.082879 6.957317 1553400
3 2020-02-14 00:00:00 7.01 7.19 6.88 0.084530 7.056707 1706000
4 2020-02-18 00:00:00 7.09 7.25 7.03 0.083540 6.997073 2142800

Notice that the scaling function only changed the values of the closing prices. They’re all very small because the price of AMC stock eventually rising to over $60.

amc-candle

Next, we need to split our data into training and testing sets. We will use the first 75% of the data for training and the last 25% for testing. We also need to reshape our data into a three-dimensional array with shape (samples,timesteps,features). This is the format that the LSTM layer expects. We will use one feature (the close price) and a variable number of time steps (we will experiment with different values later).

ind = int(len(df)*0.75)
train_df = df[:ind]
test_df = df[ind:]

To reshape our data, we will write a helper function that takes a dataframe and a number of time steps as inputs, and returns a 3D array of input data and a 1D array of output data.

def create_dataset(df, time_steps):
X = []
y = []
for i in range(len(df) - time_steps):
X.append(df['Close'].iloc[i:i+time_steps].values)
y.append(df['Close'].iloc[i+time_steps])
X = np.array(X).reshape(-1, time_steps, 1)
y = np.array(y)
return X, y

This function loops over the dataframe and creates sliding windows of length equal to the number of time steps. Each window contains the close price values for that period, and the corresponding output is the close price at the next time step. The function then converts the lists of windows and outputs into numpy arrays and reshapes them as needed.

For example, if we have the dataframe above (with scaled closing values), and we use a time step of 2, then our input data will look like this:

[[[0.0818887254712121], [0.08056794085789047]], 
[[0.08056794085789047], [0.0828793139435857]],
[[0.0828793139435857], [0.08453029473087502]],
...

The first column contains the normalized closing values, and the second column has the same values shifted up by one place. If we set time_steps = 5 then there would be 5 columns, each shifted upwards by one more than the previous column. (See How to build LSTM neural networks in Keras by David Lengacher.)

Now we can use this function to create our training and testing datasets with a chosen number of time steps.

time_steps = 10 # you can change this value
X_train, y_train = create_dataset(train_df, time_steps)
X_test, y_test = create_dataset(test_df, time_steps)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

The output should look something like this:

(559,10,1) (559,)

(180,10,1) (180,)

Building and Training the LSTM Network

Now that we have our data ready, we can build and train our LSTM network using Keras. We will use a simple architecture with one LSTM layer followed by a dense layer that outputs a single value.

To create an LSTM layer in Keras, we can use the keras.layers.LSTM() function. The only required parameter is the number of units in the LSTM layer, which represents the dimensionality of the output space. We can also specify other parameters, such as the activation function, the dropout rate, and whether to return sequences or not.

In our case, we will use 50 units for our LSTM layer, tanhtanh as the activation function, 0.2 as the dropout rate, and False for returning sequences (since we only want one output per sample). We will also specify the input shape as (time_steps, 1), which matches our input data format.

To create a dense layer in Keras, we can use the keras.layers.Dense() function. The only required parameter is the number of units in the dense layer, which represents the dimensionality of the output space. We can also specify other parameters, such as the activation function and whether to use bias or not.

In our case, we will use 1 unit for our dense layer (since we only want one output per sample), linear as the activation function (since we are doing regression), and True for using bias.

We can then stack these layers together using a sequential model in Keras. A sequential model is a linear stack of layers that can be easily created by passing a list of layers to the keras.models.Sequential() function.

Our final model looks like this:

model = Sequential()
model.add(LSTM(50,
activation='tanh',
dropout=0.2,
return_sequences=False,
input_shape=(time_steps, 1)))
model.add(Dense(1,
activation='linear',
use_bias=True))

We can then compile our model by specifying an optimizer, a loss function, and a metric to evaluate our model performance. We will use adam as our optimizer (a popular variant of gradient descent), mean_squared_error as our loss function (a common choice for regression problems), and root_mean_squared_error as our metric (a more interpretable measure than MSE).

To define root_mean_squared_error as a custom metric in Keras, we need to write a helper function that takes two arguments: y_true (the true values) and y_pred (the predicted values). The function then calculates and returns the square root of MSE using tensorflow operations.

def root_mean_squared_error(y_true,y_pred):
return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))

model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=[root_mean_squared_error])

We can then train our model by calling its fit() method. This method takes several arguments, such as x (the input data), y (the output data), batch_size (the number of samples per gradient update), epochs (the number of iterations over the entire dataset), validation_data (the data on which to evaluate the loss and the metric), and verbose (the level of detail to print during training).

In our case, we will use X_train and y_train as our input and output data, 32 as our batch size (a reasonable choice for most problems), 100 as our number of epochs (a large enough value to ensure convergence), X_test and y_test as our validation data (to monitor the performance on unseen data), and 1 as our verbose level (to print one line per epoch).

history = model.fit(X_train,
y_train,
batch_size=32,
epochs=100,
validation_data=(X_test, y_test),
verbose=1)

This will start the training process and print something like this for each epoch:

Epoch 1/100 6/6 [==============================] - 2s 118ms/step - loss: 0.0479 - root_mean_squared_error: 0.2188 - val_loss: 0.0204 - val_root_mean_squared_error: 0.1428

We can see that the model reports the loss and the root mean squared error for both the training and the validation data in each epoch.

Evaluating and Plotting the Model Performance

After training our model, we can evaluate its performance on the test data by calling its evaluate() method. This method takes several arguments, such as x (the input data), y (the output data), batch_size (the number of samples per evaluation step), and verbose (the level of detail to print during evaluation).

In our case, we will use X_test and y_test as our input and output data, 32 as our batch size, and 0 as our verbose level (to suppress any output).

test_loss, test_rmse = model.evaluate(X_test,
y_test,
batch_size=32,
verbose=0)
print('Test loss:', test_loss)
print('Test RMSE:', test_rmse)

This will print something like this:

Test loss: 0.0004346434033424076

Test RMSE: 0.016953053

We can see that the model achieves a low test loss and a low test RMSE, indicating a good fit on the test data.

We can also plot the learning curves of the model by using the matplotlib library. The learning curves show how the loss and the metric change over time for both the training and the validation data.

To plot the learning curves, we need to access the history object returned by the fit() method. This object contains a dictionary called history that stores the values of loss and metrics for each epoch.

We can then use the matplotlib.pyplot.plot() function to plot these values against epochs. We can also use the matplotlib.pyplot.xlabel(), matplotlib.pyplot.ylabel(), matplotlib.pyplot.title(), matplotlib.pyplot.legend(), matplotlib.pyplot.grid(), and matplotlib.pyplot.show() functions to customize and display our plots.

For example, we can plot the training and validation loss curves as follows:

plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training and Validation Loss Curves')
plt.legend()
plt.grid()
plt.show()

This will produce a plot like this:

training-and-validation-loss-curves

We can see that the training and validation loss decrease over time, indicating that the model is learning from the data. We can also see that the validation loss is slightly lower than the training loss, indicating a small degree of underfitting. However, the gap between them is not too large, suggesting that the model generalizes well on unseen data.

Similarly, we can plot the training and validation RMSE curves as follows:

plt.plot(history.history['root_mean_squared_error'], label='Training RMSE')
plt.plot(history.history['val_root_mean_squared_error'], label='Validation RMSE')
plt.xlabel('Epochs')
plt.ylabel('RMSE')
plt.title('Training and Validation RMSE Curves')
plt.legend()
plt.grid()
plt.show()

This will produce a plot like this:

training-and-validation-rmse-curves

We can see that the training and validation RMSE decrease over time, indicating that the model is making more accurate predictions. We can also see that the validation RMSE is slightly less than the training RMSE, also indicating that the model is not overfitting.

By plotting these learning curves, we can get a visual insight into how our model performs during training and how it compares with a validation dataset. This can help us diagnose potential problems such as underfitting or overfitting, and tune our model parameters accordingly.

Making Predictions and Plotting Them

After evaluating our model performance, we can use it to make predictions on new data. For example, we can use our model to predict the future price of AMC based on its past behavior.

To make predictions with our model, we can call its predict() method. This method takes several arguments, such as x (the input data), batch_size (the number of samples per prediction step), and verbose (the level of detail to print during prediction).

In our case, we will use X_test as our input data, 32 as our batch size, and 0 as our verbose level (to suppress any output).

y_pred = model.predict(X_test,
batch_size=32,
verbose=0)

This will return a numpy array of predicted values that match the shape of y_test.

We can then inverse transform these values using the same scaler that we used before to get back the original scale of the close price.

y_pred = scaler.inverse_transform(y_pred)

The original test data was scaled, so we need to apply the inverse transform to it as well.

y_test_inv = np.transpose(scaler.inverse_transform([y_test]))

We can then plot the actual and predicted values using matplotlib library. We can also calculate and display the mean absolute error (MAE) between them using sklearn.metrics.mean_absolute_error() function.

For example, we can plot the actual and predicted values for the last 50 days as follows:

plt.plot(y_test_inv[-50:], label='Actual')
plt.plot(y_pred[-50:], label='Predicted')
plt.xlabel('Days')
plt.ylabel('Close Price')
plt.title('Actual vs Predicted Close Price for AMC')
plt.legend()
plt.grid()
plt.show()

This will produce a plot like this:

lstm-actual-vs-predicted


The code by Bing AI used a call to keras.metrics.metrics.mean_absolute_error that didn’t run. The mean absolute error is the average of the absolute value difference between actual and predicted closing prices,

mae=1ni=1nytestiypredi.mae = \frac{1}{n} \sum_{i=1}^n |y_{test_i} - y_{pred_i}|.

In code, this can be written as

mae = np.mean(np.abs(y_test_inv - y_pred))
print("MAE = ",mae)

which gives the mean absolute error for 50 days of AMC closing price predictions of

MAE =  0.8364303522075167

We can see that the model captures the general trend of the closing price, but it is not very accurate in predicting the exact values.

This could be due to several reasons, such as:

To improve our model performance, we could try different approaches, such as:

By doing so, we could potentially achieve better results and more accurate predictions with our LSTM network.


The JupyterLab notebook for this post is here, and the AMC .csv file is here.

Learn more (list generated by Bing AI and ChatGPT):