Tag: cluster

  • Malware Detection & Interpretation – PCA, T-SNE & ML

    Malware Detection & Interpretation – PCA, T-SNE & ML

    This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

  • Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

    Foto door Pok Rie Scope: Business Value: Table of Contents U.S.A. Weather Forecast Australian Rainfall Prediction Kerala Flood Prediction Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is asymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP). • Circles are the symmetrical numerical…

  • Anomaly Detection using the Isolation Forest Algorithm

    Anomaly Detection using the Isolation Forest Algorithm

    The post describes the application of Isolation Forest, an unsupervised anomaly detection algorithm, to identify abnormal patterns in financial and taxi ride data. The challenge is to accurately distinguish normal and abnormal data points for fraud detection, fault diagnosis, and outlier identification. Using real-world datasets of financial transactions and NYC taxi rides, the algorithm successfully…

  • Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    This study aims to implement and evaluate the K-means algorithm for ranking/clustering S&P 500 stocks based on average annualized return and volatility. The second goal is to detect anomalies in the best performing S&P 500 stocks using the Isolation Forest algorithm. Additionally, anomalies in the S&P 500 historical stock price time series data will be…

  • Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

  • Unsupervised ML, K-Means Clustering & Customer Segmentation

    Unsupervised ML, K-Means Clustering & Customer Segmentation

    Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

  • Effective 2D Image Compression with K-means Clustering

    Effective 2D Image Compression with K-means Clustering

    The post explores the application of the K-means clustering algorithm, a popular unsupervised Machine Learning algorithm, for image compression. By segmenting 2D images into different clusters, the algorithm effectively reduces storage space without compromising on image quality or resolution. It also demonstrates the application of this approach through a case study, where optimal results were…

  • Dabl Auto EDA-ML

    Dabl Auto EDA-ML

    Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

  • K-means Cluster Cohort E-Commerce

    K-means Cluster Cohort E-Commerce

    K-means Clusters – Cohort Analysis applied to E-Commerce Understanding who your customers are and what they want is a fundamental part of any successful business. It can become increasingly challenging to create a one-size-fits-all customer profile. This is where the concept of cluster-based cohort analysis comes in.