Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

  • This article is a learn-by-doing introduction to Orbit (Object-Oriented Bayesian Time-Series), an open-source Python framework created by Uber for full Bayesian time series forecasting and inference.
  • Time series models help Uber predict demand so they know where to send their drivers, forecast hardware and computation requirements so their servers don’t go down, and allocate billions of dollars in annual marketing budget.
  • Currently, Orbit supports concrete implementations for the following models:
    • Exponential Smoothing (ETS)
    • Local Global Trend (LGT)
    • Damped Local Trend (DLT)
    • Kernel Time-based Regression (KTR)
  • It also supports the following sampling/optimization methods for model estimation/inferences:
    • Markov-Chain Monte Carlo (MCMC) as a full sampling method
    • Maximum a Posteriori (MAP) as a point estimate method
    • Variational Inference (VI) as a hybrid-sampling method on approximate distribution
  • A notable feature of Orbit is its use of probabilistic modeling to capture the uncertainty inherent in time-series data. This allows users to obtain credible probabilistic forecasts with confidence intervals.
  • Installing from PYPI
!pip install orbit-ml
  • Setting the working directory YOURPATH
import os
os.chdir('YOURPATH')    # Set working directory
os. getcwd()

Table of Contents

  1. Insurance Claims
  2. Store Unit Sales
  3. Summary
  4. Explore More

Insurance Claims

  • Let’s look at the iclaims dataset that contains the weekly initial claims for US unemployment benefits against a few related Google trend queries. 
  • Basic imports
import pandas as pd
import numpy as np
import orbit
import matplotlib.pyplot as plt

from orbit.utils.dataset import load_iclaims
from orbit.diagnostics.plot import plot_predicted_data, plot_predicted_components
from orbit.utils.plot import get_orbit_style
plt.style.use(get_orbit_style())
from orbit.models import ETS

orbit.__version__
'1.1.4.2'
  • Loading the log-log transformed time-series data and train-test split
raw_df = load_iclaims(transform=True)
raw_df.dtypes
week              datetime64[ns]
claims                   float64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
sp500                    float64
vix                      float64
dtype: object
df = raw_df.copy()

test_size=52

train_df=df[:-test_size]
test_df=df[-test_size:]
  • Training the ETS forecasting model
ets = ETS(
    response_col='claims',
    date_col='week',
    seasonality=52,
    seed=2020,
    estimator='stan-mcmc',
)
ets.fit(train_df)
predicted_df = ets.predict(df=df, decompose=True)
_ = plot_predicted_data(training_actual_df=train_df,
                        predicted_df=predicted_df,
                        date_col='week',
                        actual_col='claims',
                        test_actual_df=test_df)
Train/test response vs ETS prediction
_ = plot_predicted_components(predicted_df=predicted_df, date_col='week')
ETS predicted trend and seasonality.
posterior_samples = ets.get_posterior_samples()
posterior_samples.keys()
dict_keys(['l', 'lev_sm', 'obs_sigma', 's', 'sea_sm', 'loglk'])
import arviz as az

posterior_samples = ets.get_posterior_samples(permute=False)

# example from https://arviz-devs.github.io/arviz/index.html
az.style.use("arviz-darkgrid")
az.plot_pair(
    posterior_samples,
    var_names=["sea_sm", "lev_sm", "obs_sigma"],
    kind="kde",
    marginals=True,
    textsize=15,
)
plt.show()
ArviZ posterior samples
  • Training the Local Global Trend (LGT) model
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import orbit
from orbit.models import LGT
from orbit.diagnostics.plot import plot_predicted_data
from orbit.diagnostics.plot import plot_predicted_components
from orbit.utils.dataset import load_iclaims
# load data
df = load_iclaims()
# define date and response column
date_col = 'week'
response_col = 'claims'
df.dtypes
test_size = 52
train_df = df[:-test_size]
test_df = df[-test_size:]
lgt = LGT(
    response_col=response_col,
    date_col=date_col,
    estimator='stan-map',
    seasonality=52,
    seed=8888,
)
%%time
lgt.fit(df=train_df)
CPU times: total: 125 ms
Wall time: 41.4 s
predicted_df = lgt.predict(df=test_df)
_ = plot_predicted_data(training_actual_df=train_df, predicted_df=predicted_df,
                        date_col=date_col, actual_col=response_col,
                        test_actual_df=test_df, title='Prediction with LGTMAP Model')
Train/test response vs LGT prediction
  • Training the Damped Local Trend (DLT) model
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import orbit
from orbit.models import DLT
from orbit.diagnostics.plot import plot_predicted_data,plot_predicted_components
from orbit.utils.dataset import load_iclaims

import warnings
warnings.filterwarnings('ignore')
print(orbit.__version__)
1.1.4.2
# load log-transformed data
df = load_iclaims()
train_df = df[df['week'] < '2017-01-01']
test_df = df[df['week'] >= '2017-01-01']

response_col = 'claims'
date_col = 'week'
regressor_col = ['trend.unemploy', 'trend.filling', 'trend.job']
dlt = DLT(
    response_col=response_col,
    regressor_col=regressor_col,
    date_col=date_col,
    seasonality=52,
    prediction_percentiles=[5, 95],
)

dlt.fit(train_df)
  • Plotting the DLT prediction vs train data
predicted_df = dlt.predict(df=train_df, decompose=True)

_ = plot_predicted_data(train_df, predicted_df,
                        date_col=dlt.date_col, actual_col=dlt.response_col)
DLT prediction vs train data
  • Plotting the DLT prediction vs test data
DLT prediction vs test data
  • Plotting DLT prediction, trend, seasonality, and regression
DLT prediction, trend, seasonality, and regression

Store Unit Sales

  • Our next example deals with the real sales data made available by Favorita, a large Ecuadorian grocery chain.
  • Preparing the input data
import orbit
from orbit.models import DLT
import pandas as pd
import numpy as np
import os

def wmape(y_true, y_pred):
    return np.abs(y_true - y_pred).sum() / np.abs(y_true).sum()

path = 'train.csv'
data = pd.read_csv(path, index_col='id', parse_dates=['date'])
data2 = data.loc[((data['store_nbr'] == 1)), ['date', 'unit_sales', 'onpromotion']]
dec25 = list()
for year in range(2013,2017):
    dec18 = data2.loc[(data2['date'] == f'{year}-12-18')]
    dec25 += [{'date': pd.Timestamp(f'{year}-12-25'), 'unit_sales': dec18['unit_sales'].values[0], 'onpromotion': dec18['onpromotion'].values[0]}]
data2 = pd.concat([data2, pd.DataFrame(dec25)], ignore_index=True).sort_values('date')
train = data2.loc[data2['date'] < '2017-01-01']
valid = data2.loc[(data2['date'] >= '2017-01-01') & (data2['date'] < '2017-04-01')]

df_daily = train.set_index('date').resample('D')["unit_sales"].sum().to_frame()

df_daily.tail()
            unit_sales
date	
2016-12-27	12157.823
2016-12-28	12144.918
2016-12-29	10244.317
2016-12-30	13584.621
2016-12-31	10741.060
  • ETS model prediction
import orbit
from orbit.models import ETS

ets = ETS(date_col='date', 
          response_col='unit_sales', 
          seasonality=7,
          prediction_percentiles=[5, 95],
          seed=1)
p = ets.predict(df=df_daily)
plt.figure(figsize=(15,6))
plt.plot(p['date'],p['prediction'])
plt.plot(p['date'],df_daily['unit_sales'])
Actual sales data (blue curve) vs ETS prediction (yellow curve)
  • Plotting the ETS model prediction with percentiles vs actual data
fig, ax = plt.subplots(1,1, figsize=(1280/96, 720/96))
ax.plot(p['date'], df_daily['unit_sales'], label='actual')
ax.plot(p['date'], p['prediction'], label='prediction')
ax.fill_between(p['date'], p['prediction_5'], p['prediction_95'], alpha=0.2, color='orange', label='prediction percentiles')
ax.set_title('ETS Model')
ax.set_ylabel('Sales')
ax.set_xlabel('Date')
ax.legend()
plt.show()
The ETS model prediction with percentiles vs actual sales data
  • DLT model prediction
df = df_daily.reset_index()
df.tail()
        date	    unit_sales
1455	2016-12-27	12157.823
1456	2016-12-28	12144.918
1457	2016-12-29	10244.317
1458	2016-12-30	13584.621
1459	2016-12-31	10741.060

dlt = DLT(
    response_col='unit_sales',
    date_col='date',
    estimator='stan-map',
    seasonality=52,fig, ax = plt.subplots(1,1, figsize=(1280/96, 720/96))
ax.plot(p1['date'], df['unit_sales'], label='actual')
ax.plot(p1['date'], p1['prediction'], label='prediction')
ax.fill_between(p1['date'], p1['prediction_5'], p1['prediction_95'], alpha=0.2, color='orange', label='prediction percentiles')
ax.set_title('DLT Model')
ax.set_ylabel('Sales')
ax.set_xlabel('Date')
ax.legend()
plt.show()
    seed=8888,
    global_trend_option='logistic',
    # for prediction uncertainty
    n_bootstrap_draws=1000,
)

dlt.fit(df)
p1 = dlt.predict(df)


The DLT model prediction with percentiles vs actual sales data

Summary

  • Time series forecasting is an active R&D topic in academia as well as industry. 
  • In this article, we have validated Orbit, a Bayesian time series modeling user interface which is simple to use, adaptable, interoperable, and high-performing (fast computation).
  • We have utilized the Forecaster objects to initiate processes like fitting, forecasting (prediction), and posterior sample extraction. The Forecaster class is a wrapper class for several Bayesian estimating flows.
  • Throughout the study, we have used 2 public-domain datasets: the weekly initial claims for US unemployment benefits and the real sales data made available by Favorita, a large Ecuadorian grocery chain.
  • Our future work will focus on the Orbit’s kernel-based time-varying regression (KTR) model, which defines a smooth, time-varying representation of regression coefficients using latent variables.

Explore More



Discover more from Our Blogs

Subscribe to get the latest posts sent to your email.

One response to “Uber’s Orbit Full Bayesian Time Series Forecasting & Inference”

  1. TheDogGod avatar

    Loved this post – heress my feedback –

    Thanks for reading , Love The Blog !!
    Thanks – TheDogGod pomeranianpuppies.uk Thanks – Pomeranian Puppies & Adult Dog Guides & Tips http://pomeranianpuppies.uk

    Like

Leave a reply to TheDogGod Cancel reply

Discover more from Our Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading