Category: Data-Driven Tech

  • Python Data Science for Real Estate & REIT Amsterdam: (Auto) EDA, NLP, Maps & ML

    Python Data Science for Real Estate & REIT Amsterdam: (Auto) EDA, NLP, Maps & ML

    The Amsterdam real estate market has experienced a significant resurgence, with property prices increasing by double digits annually since 2013. Data science is being used to analyze the city’s housing and rental markets, revealing insights on the impact of Airbnb and empowering communities with the necessary information. Comprehensive data analysis and machine learning techniques are…

  • Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

    Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML,  HPO & SHAP

    This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

  • Walmart Weekly Sales Time Series Forecasting using SARIMAX & ML Models

    Walmart Weekly Sales Time Series Forecasting using SARIMAX & ML Models

    The blog post delves into Time Series Forecasting (TSF), using SARIMAX and Supervised Machine Learning algorithms to predict Walmart’s weekly store sales. Factors affecting sales are investigated for strategies to increase revenues. The study additionally covers data preparation, feature correlation analysis, SARIMAX diagnostics, and the training of supervised ML models like Linear Regression, Random Forest,…

  • MLflow SHAP & Transformers

    MLflow SHAP & Transformers

    The post covers simplified MLflow projects for reproducible and reusable data science code. It details local environment setup, ElasticNet model optimization, and SHAP explanations for breast cancer, diabetes, and iris datasets. Additionally, it showcases MLflow Sentence Transformers for a chatbot and translation. This demonstrates their powerful interface for managing transformer models from libraries like Hugging…

  • Malware Detection & Interpretation – PCA, T-SNE & ML

    Malware Detection & Interpretation – PCA, T-SNE & ML

    This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

  • Retail Sales, Store Item Demand Time-Series Analysis/Forecasting: AutoEDA, FB Prophet, SARIMAX & Model Tuning

    Retail Sales, Store Item Demand Time-Series Analysis/Forecasting: AutoEDA, FB Prophet, SARIMAX & Model Tuning

    This study compares and evaluates various forecasting models to predict sales and demand for retail businesses. The focus is on Time Series Analysis (TSA) methods such as FB Prophet and SARIMAX. The final FB Prophet model yields MAE=4.252 and MAPE=0.168, while SARIMAX models’ best performing variant achieves MAE=6.285 and MAPE=0.213. The study emphasizes the importance…

  • H2O AutoML Malware Detection

    H2O AutoML Malware Detection

    This study explores AI-powered malware detection using the H2O AutoML algorithm for effective and rapid classification of PE files. The optimized Stacked Ensemble model achieved high precision, recall, and F1 score. The research validates the H2O AutoML workflow’s accurate malware identification and supports related R&D products and solutions in the field of information security.

  • Anatomy of the Robust 1D Kalman Filter

    Anatomy of the Robust 1D Kalman Filter

    The Kalman Filter (KF) is a powerful tool for tracking, navigation, and data prediction tasks. It is based on the assumption of linearity and Gaussian noise, enabling it to iteratively update predicted models. The article outlines a simplified implementation of KF using Python commands, with examples demonstrating its effectiveness in handling noisy measurements. It also…

  • A Market-Neutral Strategy

    A Market-Neutral Strategy

    The work aims to solve the problem of Markowitz portfolio optimization for a one-year investment horizon through the pairs trading cointegrated strategy. Market-neutral trading strategies seek to generate returns independent of market swings to achieve a zero beta against its relevant market index. Statistical arbitrage (SA), pairs trading, and APO signals are analyzed. The study…

  • A Comprehensive Analysis of Best Trading Technical Indicators w/ TA-Lib – Tesla ’23

    A Comprehensive Analysis of Best Trading Technical Indicators w/ TA-Lib – Tesla ’23

    This study presents a comprehensive stock technical analysis guide for Tesla (TSLA) using the TA-Lib Python library. It explores the use of over 200 technical indicators, analyses historical data, and offers insight for both swing traders and long-term holders. The content includes detailed explanations and plots for various momentum, volume, volatility, and trend indicators, providing…

  • Real-Time Stock Sentiment Analysis w/ NLP Web Scraping

    Real-Time Stock Sentiment Analysis w/ NLP Web Scraping

    Stock sentiment analysis is gaining popularity as a technique to understand public opinions on specific assets. This study uses NLP web scraping in Python to extract stock sentiments from financial news headlines on FinViz. The sentiment analysis can help determine investor opinions and potential impacts on stock prices, though it is not a standalone predictor.

  • Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    The data science project involves evaluating various sales forecasting algorithms in Python using a Kaggle time-series dataset. The forecasting algorithms include tslearn, Random Walk, Holt-Winters, SARIMA, GARCH, Prophet, LSTM and Di Pietro’s Model. The goal is to predict next month’s sales for a list of shops and products, which slightly changes every month. The best…

  • Prediction of NASA Turbofan Jet Engine RUL: OLS, SciKit-Learn & LSTM

    Prediction of NASA Turbofan Jet Engine RUL: OLS, SciKit-Learn & LSTM

    We predict the Remaining Useful Life (RUL) of NASA turbofan jet engines by comparing the statsmodels OLS, ML SciKit-Learn regression vs LSTM Keras in Python. The input dataset is the Kaggle version of the public dataset for asset degradation modeling from NASA. It includes Run-to-Failure simulated data from turbo fan jet engines.

  • The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The Internet of Things (IoT) aids in the improvement of processes and enables new scenarios through network-connected devices. Recognized as a driver of the Fourth Industrial Revolution, IoT applications include predictive maintenance, industry safety, automation, remote monitoring, asset tracking, and fraud detection. Advancements in cloud IoT architectures over recent years have enabled efficient data ingestion,…

  • Health Insurance Cross Sell Prediction with ML Model Tuning & Validation

    Health Insurance Cross Sell Prediction with ML Model Tuning & Validation

    The content discusses the use of AI and Machine Learning (ML) for insurance cross-selling. It covers topics such as data preparation, model training with different algorithms, parameter optimization, and model evaluation. The study showcases the ability of ML models (HGBM, XGBoost, Random Forest) to predict cross-sell customers in the insurance sector, providing potential for improved…

  • Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Foto door Pok Rie Scope: Business Value: Table of Contents U.S.A. Weather Forecast Australian Rainfall Prediction Kerala Flood Prediction Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is asymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP). • Circles are the symmetrical numerical…

  • Hugging Face NLP, Streamlit, PyGWalker, TF & Gradio App

    Hugging Face NLP, Streamlit, PyGWalker, TF & Gradio App

    Table of Contents Streamlit/Dash/Jupyter PyGWalker EDA Demo PyGWalker and Dash — Creating a Data Visualization Dashboard In Less Than 20 Lines of Code PyGWalker Test PyGWalker Tutorial: A Tableau-Like Python Library for Interactive Data Exploration and Visualization PyGWalker: A Python Library for Visualizing Pandas Dataframes You’ll Never Walk Alone: Use Pygwalker to Visualize Data in…

  • Plotly Dash TA Stock Market App

    Plotly Dash TA Stock Market App

    The post explains how to deploy a Plotly Dash stock market app in Python with the dashboard of user-defined stock prices. This includes technical indicators like volume, MACD, and stochastic. The steps include selecting a stock ticker symbol (NVDA), retrieving stock data from yfinance API, adding Moving Averages, saving the stock chart in HTML form,…

  • Low-Code AutoEDA of Dutch eHealth Data in Python

    Low-Code AutoEDA of Dutch eHealth Data in Python

    The article details the usage of Python’s Low-Code AutoEDA for examining Dutch Healthcare Authority’s eHealth data. Utilizing various Python libraries like D-Tale, SweetViz, etc., the study aims to understand the healthcare data’s key features to ready it for AI techniques. The motivations include the Dutch government’s support for digital healthcare applications, especially amidst the recent…

  • Dividend-NG-BTC Diversify Big Tech

    Dividend-NG-BTC Diversify Big Tech

    SEO Title: Can Dividends, Natural Gas and Crypto Diversify Big Techs? Ultimately, we need to answer the following fundamental question: Can Dividend Kings, NGUSD and BTC-USD Diversify Growth Tech assets? Dividends are very popular among investors, especially those who want a steady stream of income from their investments. Some companies choose to share their profits…