Gold ETF Price Prediction using the Bayesian Ridge Linear Regression

Featured Photo by Pixabay.

  • Yesterday @Barchart shared the Gold’s performance chart against the S&P 500 after Yield Curve Inversions. Is the uptrend for safe-havens set to continue?
  • According to Florian Kössler, you would expect Gold to be at 1000$
  • Thorsten Polleit also believes that it is time to buy Gold
  • The objective of this post is to predict Gold price using Machine Learning (ML) algorithms in Python.
  • Following the step-by-step guide, we will create a Machine Learning linear regression model that takes information from the past Gold ETF (GLD) prices and returns a Gold price prediction the next day.
  • Recall that GLD is the largest ETF to invest directly in physical gold.

Let’s set the working directory GOLD

import os
os. getcwd()

and import the following libraries

from sklearn.linear_model import LinearRegression

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline‘seaborn-darkgrid’)

import yfinance as yf

Let’s read the data
Df =‘GLD’, ‘2022-01-01’, ‘2023-03-25’, auto_adjust=True)

Df = Df[[‘Close’]]

Df = Df.dropna()

Let’s plot the closing price of GLD
Df.Close.plot(figsize=(10, 7),color=’r’)
plt.ylabel(“Gold ETF Prices”)
plt.title(“Gold ETF Price Series”)


Gold ETF Price Series

Let’s define the explanatory variables

Df[‘S_3’] = Df[‘Close’].rolling(window=3).mean()
Df[‘S_9’] = Df[‘Close’].rolling(window=9).mean()
Df[‘next_day_price’] = Df[‘Close’].shift(-1)

Df = Df.dropna()
X = Df[[‘S_3’, ‘S_9’]]

and the target variable

y = Df[‘next_day_price’]

Let’s split the data into the train and test datasets
t = .8
t = int(t*len(Df))

X_train = X[:t]
y_train = y[:t]

X_test = X[t:]
y_test = y[t:]

Let’s create the following linear regression model

from sklearn import linear_model
clf = linear_model.BayesianRidge()
linear =, y_train)

Predicting the Gold ETF prices
predicted_price = linear.predict(X_test)
predicted_price = pd.DataFrame(
predicted_price, index=y_test.index, columns=[‘price’])
predicted_price.plot(figsize=(10, 7))
plt.legend([‘predicted_price’, ‘actual_price’])
plt.ylabel(“Gold ETF Price”)


Gold ETF ML-predicted vs actual price

The R2-score is
r2_score = linear.score(X[t:], y[t:])*100


Let’s plot the Cumulative Returns

gold = pd.DataFrame()

gold[‘price’] = Df[t:][‘Close’]
gold[‘predicted_price_next_day’] = predicted_price
gold[‘actual_price_next_day’] = y_test
gold[‘gold_returns’] = gold[‘price’].pct_change().shift(-1)

gold[‘signal’] = np.where(gold.predicted_price_next_day.shift(1) < gold.predicted_price_next_day,1,0)

gold[‘strategy_returns’] = gold.signal * gold[‘gold_returns’]
plt.ylabel(‘Cumulative Returns’)

Gold ETF Cumulative Returns

Let’s calculate the Sharpe ratio
sharpe = gold[‘strategy_returns’].mean()/gold[‘strategy_returns’].std()(252*0.5)
‘Sharpe Ratio %.2f’ % (sharpe)

'Sharpe Ratio 2.33'

Let’s get the forecast

import datetime as dt
current_date =

data =‘GLD’, ‘2022-01-01’, current_date, auto_adjust=True)
data[‘S_3’] = data[‘Close’].rolling(window=3).mean()
data[‘S_9’] = data[‘Close’].rolling(window=9).mean()
data = data.dropna()

data[‘predicted_gold_price’] = linear.predict(data[[‘S_3’, ‘S_9’]])
data[‘signal’] = np.where(data.predicted_gold_price.shift(1) < data.predicted_gold_price,”Buy”,”No Position”)


Gold BUY signal


  • The SPDR Gold Trust Shares quote is equal to $185.220 at 2023-03-24, whereas our next-day forecast is $185.136.
  • With the investment starting at 2022-12-15, the revenue is expected to be around +8%.
  • The R2-score of our prediction model is about 80%.
  • The Sharpe ratio of 2.33 is considered very good. The higher a fund’s Sharpe ratio, the better its returns have been relative to the amount of investment risk taken.
  • Results fully support the Barchart opinion – 100% BUY Overall Average Signal calculated from all 13 indicators.

