Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

This article is a learn-by-doing introduction to Orbit (Object-Oriented Bayesian Time-Series), an open-source Python framework created by Uber for full Bayesian time series forecasting and inference.
Time series models help Uber predict demand so they know where to send their drivers, forecast hardware and computation requirements so their servers don’t go down, and allocate billions of dollars in annual marketing budget.
Currently, Orbit supports concrete implementations for the following models:
- Exponential Smoothing (ETS)
- Local Global Trend (LGT)
- Damped Local Trend (DLT)
- Kernel Time-based Regression (KTR)
It also supports the following sampling/optimization methods for model estimation/inferences:
- Markov-Chain Monte Carlo (MCMC) as a full sampling method
- Maximum a Posteriori (MAP) as a point estimate method
- Variational Inference (VI) as a hybrid-sampling method on approximate distribution
A notable feature of Orbit is its use of probabilistic modeling to capture the uncertainty inherent in time-series data. This allows users to obtain credible probabilistic forecasts with confidence intervals.
Installing from PYPI

!pip install orbit-ml

Setting the working directory YOURPATH

import os
os.chdir('YOURPATH')    # Set working directory
os. getcwd()

Table of Contents

Insurance Claims
Store Unit Sales
Summary
Explore More

Insurance Claims

Let’s look at the iclaims dataset that contains the weekly initial claims for US unemployment benefits against a few related Google trend queries.
Basic imports

import pandas as pd
import numpy as np
import orbit
import matplotlib.pyplot as plt

from orbit.utils.dataset import load_iclaims
from orbit.diagnostics.plot import plot_predicted_data, plot_predicted_components
from orbit.utils.plot import get_orbit_style
plt.style.use(get_orbit_style())
from orbit.models import ETS

orbit.__version__
'1.1.4.2'

Loading the log-log transformed time-series data and train-test split

raw_df = load_iclaims(transform=True)
raw_df.dtypes
week              datetime64[ns]
claims                   float64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
sp500                    float64
vix                      float64
dtype: object
df = raw_df.copy()

test_size=52

train_df=df[:-test_size]
test_df=df[-test_size:]

Training the ETS forecasting model

ets = ETS(
    response_col='claims',
    date_col='week',
    seasonality=52,
    seed=2020,
    estimator='stan-mcmc',
)
ets.fit(train_df)
predicted_df = ets.predict(df=df, decompose=True)
_ = plot_predicted_data(training_actual_df=train_df,
                        predicted_df=predicted_df,
                        date_col='week',
                        actual_col='claims',
                        test_actual_df=test_df)

_ = plot_predicted_components(predicted_df=predicted_df, date_col='week')

Extracting and Analyzing Posterior Samples with ArviZ by performing a random walk over the parameter space

posterior_samples = ets.get_posterior_samples()
posterior_samples.keys()
dict_keys(['l', 'lev_sm', 'obs_sigma', 's', 'sea_sm', 'loglk'])
import arviz as az

posterior_samples = ets.get_posterior_samples(permute=False)

# example from https://arviz-devs.github.io/arviz/index.html
az.style.use("arviz-darkgrid")
az.plot_pair(
    posterior_samples,
    var_names=["sea_sm", "lev_sm", "obs_sigma"],
    kind="kde",
    marginals=True,
    textsize=15,
)
plt.show()

Training the Local Global Trend (LGT) model

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import orbit
from orbit.models import LGT
from orbit.diagnostics.plot import plot_predicted_data
from orbit.diagnostics.plot import plot_predicted_components
from orbit.utils.dataset import load_iclaims
# load data
df = load_iclaims()
# define date and response column
date_col = 'week'
response_col = 'claims'
df.dtypes
test_size = 52
train_df = df[:-test_size]
test_df = df[-test_size:]
lgt = LGT(
    response_col=response_col,
    date_col=date_col,
    estimator='stan-map',
    seasonality=52,
    seed=8888,
)
%%time
lgt.fit(df=train_df)
CPU times: total: 125 ms
Wall time: 41.4 s
predicted_df = lgt.predict(df=test_df)
_ = plot_predicted_data(training_actual_df=train_df, predicted_df=predicted_df,
                        date_col=date_col, actual_col=response_col,
                        test_actual_df=test_df, title='Prediction with LGTMAP Model')

Training the Damped Local Trend (DLT) model

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import orbit
from orbit.models import DLT
from orbit.diagnostics.plot import plot_predicted_data,plot_predicted_components
from orbit.utils.dataset import load_iclaims

import warnings
warnings.filterwarnings('ignore')
print(orbit.__version__)
1.1.4.2
# load log-transformed data
df = load_iclaims()
train_df = df[df['week'] < '2017-01-01']
test_df = df[df['week'] >= '2017-01-01']

response_col = 'claims'
date_col = 'week'
regressor_col = ['trend.unemploy', 'trend.filling', 'trend.job']
dlt = DLT(
    response_col=response_col,
    regressor_col=regressor_col,
    date_col=date_col,
    seasonality=52,
    prediction_percentiles=[5, 95],
)

dlt.fit(train_df)

Plotting the DLT prediction vs train data

predicted_df = dlt.predict(df=train_df, decompose=True)

_ = plot_predicted_data(train_df, predicted_df,
                        date_col=dlt.date_col, actual_col=dlt.response_col)

Plotting the DLT prediction vs test data

Plotting DLT prediction, trend, seasonality, and regression

DLT prediction, trend, seasonality, and regression

Store Unit Sales

Our next example deals with the real sales data made available by Favorita, a large Ecuadorian grocery chain.
Preparing the input data

import orbit
from orbit.models import DLT
import pandas as pd
import numpy as np
import os

def wmape(y_true, y_pred):
    return np.abs(y_true - y_pred).sum() / np.abs(y_true).sum()

path = 'train.csv'
data = pd.read_csv(path, index_col='id', parse_dates=['date'])
data2 = data.loc[((data['store_nbr'] == 1)), ['date', 'unit_sales', 'onpromotion']]
dec25 = list()
for year in range(2013,2017):
    dec18 = data2.loc[(data2['date'] == f'{year}-12-18')]
    dec25 += [{'date': pd.Timestamp(f'{year}-12-25'), 'unit_sales': dec18['unit_sales'].values[0], 'onpromotion': dec18['onpromotion'].values[0]}]
data2 = pd.concat([data2, pd.DataFrame(dec25)], ignore_index=True).sort_values('date')
train = data2.loc[data2['date'] < '2017-01-01']
valid = data2.loc[(data2['date'] >= '2017-01-01') & (data2['date'] < '2017-04-01')]

df_daily = train.set_index('date').resample('D')["unit_sales"].sum().to_frame()

df_daily.tail()
            unit_sales
date	
2016-12-27	12157.823
2016-12-28	12144.918
2016-12-29	10244.317
2016-12-30	13584.621
2016-12-31	10741.060

ETS model prediction

import orbit
from orbit.models import ETS

ets = ETS(date_col='date', 
          response_col='unit_sales', 
          seasonality=7,
          prediction_percentiles=[5, 95],
          seed=1)
p = ets.predict(df=df_daily)
plt.figure(figsize=(15,6))
plt.plot(p['date'],p['prediction'])
plt.plot(p['date'],df_daily['unit_sales'])

Actual sales data (blue curve) vs ETS prediction (yellow curve)

Plotting the ETS model prediction with percentiles vs actual data

fig, ax = plt.subplots(1,1, figsize=(1280/96, 720/96))
ax.plot(p['date'], df_daily['unit_sales'], label='actual')
ax.plot(p['date'], p['prediction'], label='prediction')
ax.fill_between(p['date'], p['prediction_5'], p['prediction_95'], alpha=0.2, color='orange', label='prediction percentiles')
ax.set_title('ETS Model')
ax.set_ylabel('Sales')
ax.set_xlabel('Date')
ax.legend()
plt.show()

The ETS model prediction with percentiles vs actual sales data

DLT model prediction

df = df_daily.reset_index()
df.tail()
        date	    unit_sales
1455	2016-12-27	12157.823
1456	2016-12-28	12144.918
1457	2016-12-29	10244.317
1458	2016-12-30	13584.621
1459	2016-12-31	10741.060

dlt = DLT(
    response_col='unit_sales',
    date_col='date',
    estimator='stan-map',
    seasonality=52,fig, ax = plt.subplots(1,1, figsize=(1280/96, 720/96))
ax.plot(p1['date'], df['unit_sales'], label='actual')
ax.plot(p1['date'], p1['prediction'], label='prediction')
ax.fill_between(p1['date'], p1['prediction_5'], p1['prediction_95'], alpha=0.2, color='orange', label='prediction percentiles')
ax.set_title('DLT Model')
ax.set_ylabel('Sales')
ax.set_xlabel('Date')
ax.legend()
plt.show()
    seed=8888,
    global_trend_option='logistic',
    # for prediction uncertainty
    n_bootstrap_draws=1000,
)

dlt.fit(df)
p1 = dlt.predict(df)

The DLT model prediction with percentiles vs actual sales data

Summary

Time series forecasting is an active R&D topic in academia as well as industry.
In this article, we have validated Orbit, a Bayesian time series modeling user interface which is simple to use, adaptable, interoperable, and high-performing (fast computation).
We have utilized the Forecaster objects to initiate processes like fitting, forecasting (prediction), and posterior sample extraction. The Forecaster class is a wrapper class for several Bayesian estimating flows.
Throughout the study, we have used 2 public-domain datasets: the weekly initial claims for US unemployment benefits and the real sales data made available by Favorita, a large Ecuadorian grocery chain.
Our future work will focus on the Orbit’s kernel-based time-varying regression (KTR) model, which defines a smooth, time-varying representation of regression coefficients using latent variables.

Explore More

One response to “Uber’s Orbit Full Bayesian Time Series Forecasting & Inference”

TheDogGod

14th Mar 2024

Loved this post – heress my feedback –

Thanks for reading , Love The Blog !!
Thanks – TheDogGod pomeranianpuppies.uk Thanks – Pomeranian Puppies & Adult Dog Guides & Tips http://pomeranianpuppies.uk

LikeLike

Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

Insurance Claims

Store Unit Sales

Summary

Explore More

Discover more from Our Blogs

One response to “Uber’s Orbit Full Bayesian Time Series Forecasting & Inference”

Leave a comment Cancel reply

Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

Insurance Claims

Store Unit Sales

Summary

Explore More

Share this:

Discover more from Our Blogs

One response to “Uber’s Orbit Full Bayesian Time Series Forecasting & Inference”

Leave a comment Cancel reply

Discover more from Our Blogs