Tag: statistics

  • Python Data Science for Real Estate & REIT Amsterdam: (Auto) EDA, NLP, Maps & ML

    Python Data Science for Real Estate & REIT Amsterdam: (Auto) EDA, NLP, Maps & ML

    The Amsterdam real estate market has experienced a significant resurgence, with property prices increasing by double digits annually since 2013. Data science is being used to analyze the city’s housing and rental markets, revealing insights on the impact of Airbnb and empowering communities with the necessary information. Comprehensive data analysis and machine learning techniques are…

  • Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

    Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML,  HPO & SHAP

    This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

  • Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

    Uber’s Orbit Full Bayesian Time Series Forecasting & Inference

    This article introduces Orbit, an open-source Python framework by Uber for full Bayesian time series forecasting and inference. It supports models like Exponential Smoothing, Local Global Trend, and Kernel Time-based Regression, along with methods like Markov-Chain Monte Carlo and Variational Inference. Orbit captures uncertainty in time-series data, allowing credible probabilistic forecasts with confidence intervals. The…

  • 100 Basic Python Codes

    100 Basic Python Codes

    Source: PYPL Popularity of Programming Language, Feb 2024. Table of Contents Setting Up Your Environment Download Datasets Initial Pandas Data QC Displaying Pandas Data Types Showing Descriptive Statistics Exploring the Dataset Email Slicer User Input & Type Conversion Working with Lists Practicing Loops Calculator Temperature Conversion ADC Temperature Sensor Sorting Numpy Arrays Story Generator Display…

  • Retail Sales, Store Item Demand Time-Series Analysis/Forecasting: AutoEDA, FB Prophet, SARIMAX & Model Tuning

    Retail Sales, Store Item Demand Time-Series Analysis/Forecasting: AutoEDA, FB Prophet, SARIMAX & Model Tuning

    This study compares and evaluates various forecasting models to predict sales and demand for retail businesses. The focus is on Time Series Analysis (TSA) methods such as FB Prophet and SARIMAX. The final FB Prophet model yields MAE=4.252 and MAPE=0.168, while SARIMAX models’ best performing variant achieves MAE=6.285 and MAPE=0.213. The study emphasizes the importance…

  • A Market-Neutral Strategy

    A Market-Neutral Strategy

    The work aims to solve the problem of Markowitz portfolio optimization for a one-year investment horizon through the pairs trading cointegrated strategy. Market-neutral trading strategies seek to generate returns independent of market swings to achieve a zero beta against its relevant market index. Statistical arbitrage (SA), pairs trading, and APO signals are analyzed. The study…

  • Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    The data science project involves evaluating various sales forecasting algorithms in Python using a Kaggle time-series dataset. The forecasting algorithms include tslearn, Random Walk, Holt-Winters, SARIMA, GARCH, Prophet, LSTM and Di Pietro’s Model. The goal is to predict next month’s sales for a list of shops and products, which slightly changes every month. The best…

  • A Balanced Mix-and-Match Time Series Forecasting: ThymeBoost, Prophet, and AutoARIMA

    A Balanced Mix-and-Match Time Series Forecasting: ThymeBoost, Prophet, and AutoARIMA

    The post evaluates the performance of popular Time Series Forecasting (TSF) methods, namely AutoARIMA, Facebook Prophet, and ThymeBoost on four real-world time series datasets: Air Passengers, U.S. Wholesale Price Index (WPI), BTC-USD price, and Peyton Manning. Each TSF model uses historical data to identify trends and make future predictions. Studies indicate that ThymeBoost, which combines…

  • NVIDIA Rolling Volatility: GARCH & XGBoost

    NVIDIA Rolling Volatility: GARCH & XGBoost

    This post examines the prediction of NVIDIA stock volatility using two models: the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) and the Extreme Gradient Boosting (XGBoost). Both models are compared in terms of MSE and MAPE. The post discovers that the machine learning-based XGBoost model outperforms the GARCH model in NVDA volatility forecasting, showing the effectiveness of…

  • NLP of Restaurant Guest Reviews on Tripadvisor

    NLP of Restaurant Guest Reviews on Tripadvisor

    This is a comprehensive study examining restaurant reviews on TripAdvisor across 31 major European cities. The research, based on a dataset scraped from TripAdvisor, aims to perform a sentiment analysis of reviews, exploring average ratings per city, vegetarian-friendly cities, and how local cuisine compares to foreign food. The analysis is carried out using Python, demonstrating…

  • Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    The content discusses a comprehensive analysis on a joint time-series analysis of Bitcoin, Gold and Crude Oil prices from 2021 to 2023. It explores data processing, exploratory data analysis before running a range of statistical tests, ARIMA models fitting, and finally, using the Markowitz portfolio optimization method. It then presents a detailed analysis, including data…

  • Top 6 Reliability/Risk Engineering Learnings

    Top 6 Reliability/Risk Engineering Learnings

    The content provides a review of Eric Marsden’s e-learning Python courseware on risk engineering, loss prevention and safety management. It includes discussions of various topics such as the failure of light bulbs, electronic components, large computing facility maintenance, and oil field pumps. The content also delves into stock market risk analysis like Value at Risk…

  • Portfolio Optimization of 20 Dividend Growth Stocks

    Portfolio Optimization of 20 Dividend Growth Stocks

    The post discusses implementing a stochastic optimization algorithm to create a balanced portfolio of 20 dividend growth stocks for maximum return within defined risk tolerance. By analyzing daily stock and benchmark data, the algorithm optimizes the portfolio to outperform the benchmark index and achieve desired risk-reward outcomes. The results facilitate spreading investment capital across diverse…

  • SARIMAX Crude Oil Prices Forecast – 2. Brent

    SARIMAX Crude Oil Prices Forecast – 2. Brent

    This study focuses on validating the EIA energy forecast for the 2023 Brent crude oil spot price using SARIMAX time-series cross-validation. It includes prerequisites, data loading, ETS decomposition, ADF test, SARIMAX modeling, predictions, model evaluation, and summary. The predictions align with the EIA forecast, with discrepancies within predicted confidence intervals.

  • SARIMAX Crude Oil Prices Forecast – 1. WTI

    SARIMAX Crude Oil Prices Forecast – 1. WTI

    The content discusses a detailed forecast of Brent and WTI oil prices for 2023, using Python, SARIMAX and Time Series Analysis. The data indicates volatility in the oil market starting 2023, with prices set to decrease from 2022 levels. Experts also warn of a potential US recession in 2023, which could further impact the oil…

  • SARIMAX Forecasting of Online Food Delivery Sales

    SARIMAX Forecasting of Online Food Delivery Sales

    This article provides a beginner-friendly guide to understanding and evaluating ARIMA-based time-series forecasting models such as SARIMA and SARIMAX. It focuses on an QC-optimized SARIMA(X) model to forecast the e-commerce sales of a food delivery company. The post covers essential concepts, data processing, model comparisons, and insights. It also includes a comparison between SARIMA and…

  • A Roadmap from Data Science to BI via ML

    A Roadmap from Data Science to BI via ML

    The blog post presents a comprehensive roadmap to Data Science (DS), providing an overview of career prospects, the field’s intersections with Mathematics, Statistics, and Computer Science, and its business relevance. The text details the earning potential of data scientists and the steps towards becoming one, including Data Analysis, Machine Learning, and Business Intelligence. It highlights…

  • ANOVA-OLS Prediction of Surgical Volumes

    Operating rooms (ORs) are some of the most valuable hospital assets, generating a large part of hospital revenue.  Statistical models have been developed using datasets to predict daily surgical volumes weeks in advance. We focus on the VUMC dataset for evaluation of our statistical models. We use the ANOVA null-hypothesis test for the total number…

  • Stock Forecasting with FBProphet

    Stock Forecasting with FBProphet

    Prophet from Meta (Facebook) is a procedure for forecasting time series data such as stocks. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.