Category: Unsupervised Machine Learning

  • Time Series Data Imputation, Interpolation & Anomaly Detection

    Time Series Data Imputation, Interpolation & Anomaly Detection

    The post compares popular time series data imputation, interpolation, and anomaly detection methods. It explores the challenges of missing data and the impact on processing, analyzing, and model accuracy. The study performs data-centric experiments to benchmark optimal methods and highlights the importance of imputation for time series forecasting. It provides practical strategies and techniques for…

  • Malware Detection & Interpretation – PCA, T-SNE & ML

    Malware Detection & Interpretation – PCA, T-SNE & ML

    This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

  • Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    Sales Forecasting: tslearn, Random Walk, Holt-Winters, SARIMAX, GARCH, Prophet, and LSTM

    The data science project involves evaluating various sales forecasting algorithms in Python using a Kaggle time-series dataset. The forecasting algorithms include tslearn, Random Walk, Holt-Winters, SARIMA, GARCH, Prophet, LSTM and Di Pietro’s Model. The goal is to predict next month’s sales for a list of shops and products, which slightly changes every month. The best…

  • Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Foto door Pok Rie Scope: Business Value: Table of Contents U.S.A. Weather Forecast Australian Rainfall Prediction Kerala Flood Prediction Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is asymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP). • Circles are the symmetrical numerical…

  • Dividend-NG-BTC Diversify Big Tech

    Dividend-NG-BTC Diversify Big Tech

    SEO Title: Can Dividends, Natural Gas and Crypto Diversify Big Techs? Ultimately, we need to answer the following fundamental question: Can Dividend Kings, NGUSD and BTC-USD Diversify Growth Tech assets? Dividends are very popular among investors, especially those who want a steady stream of income from their investments. Some companies choose to share their profits…

  • Anomaly Detection using the Isolation Forest Algorithm

    Anomaly Detection using the Isolation Forest Algorithm

    The post describes the application of Isolation Forest, an unsupervised anomaly detection algorithm, to identify abnormal patterns in financial and taxi ride data. The challenge is to accurately distinguish normal and abnormal data points for fraud detection, fault diagnosis, and outlier identification. Using real-world datasets of financial transactions and NYC taxi rides, the algorithm successfully…

  • Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    This study aims to implement and evaluate the K-means algorithm for ranking/clustering S&P 500 stocks based on average annualized return and volatility. The second goal is to detect anomalies in the best performing S&P 500 stocks using the Isolation Forest algorithm. Additionally, anomalies in the S&P 500 historical stock price time series data will be…

  • Real-Time Anomaly Detection of NAB Ambient Temperature Readings using the TensorFlow/Keras Autoencoder

    Real-Time Anomaly Detection of NAB Ambient Temperature Readings using the TensorFlow/Keras Autoencoder

    The content covers a detailed guide on implementing anomaly detection in time series data using autoencoders. The tutorial utilizes Python and real-world temperature dataset from Numenta Anomaly Benchmark (NAB). Following the Python workflow, the algorithm imports required libraries, performs anomaly detection, and visualizes anomalies. A trained autoencoder model identifies anomalies, with Precision, Recall, and F1…

  • Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

  • Unsupervised ML, K-Means Clustering & Customer Segmentation

    Unsupervised ML, K-Means Clustering & Customer Segmentation

    Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

  • Effective 2D Image Compression with K-means Clustering

    Effective 2D Image Compression with K-means Clustering

    The post explores the application of the K-means clustering algorithm, a popular unsupervised Machine Learning algorithm, for image compression. By segmenting 2D images into different clusters, the algorithm effectively reduces storage space without compromising on image quality or resolution. It also demonstrates the application of this approach through a case study, where optimal results were…