Tag: cluster

Malware Detection & Interpretation – PCA, T-SNE & ML

This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

22nd Feb 2024
Weather Forecasting & Flood De-Risking using Machine Learning, Markov Chain & Geospatial Plotly EDA

Foto door Pok Rie Scope: Business Value: Table of Contents U.S.A. Weather Forecast Australian Rainfall Prediction Kerala Flood Prediction Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is asymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP). • Circles are the symmetrical numerical…

26th Nov 2023
Anomaly Detection using the Isolation Forest Algorithm

The post describes the application of Isolation Forest, an unsupervised anomaly detection algorithm, to identify abnormal patterns in financial and taxi ride data. The challenge is to accurately distinguish normal and abnormal data points for fraud detection, fault diagnosis, and outlier identification. Using real-world datasets of financial transactions and NYC taxi rides, the algorithm successfully…

29th Oct 2023
Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

This study aims to implement and evaluate the K-means algorithm for ranking/clustering S&P 500 stocks based on average annualized return and volatility. The second goal is to detect anomalies in the best performing S&P 500 stocks using the Isolation Forest algorithm. Additionally, anomalies in the S&P 500 historical stock price time series data will be…

26th Oct 2023
Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

27th May 2023
Unsupervised ML, K-Means Clustering & Customer Segmentation

Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

22nd May 2023
Effective 2D Image Compression with K-means Clustering

The post explores the application of the K-means clustering algorithm, a popular unsupervised Machine Learning algorithm, for image compression. By segmenting 2D images into different clusters, the algorithm effectively reduces storage space without compromising on image quality or resolution. It also demonstrates the application of this approach through a case study, where optimal results were…

29th Apr 2023
Dabl Auto EDA-ML

Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

19th Apr 2023
K-means Cluster Cohort E-Commerce

K-means Clusters – Cohort Analysis applied to E-Commerce Understanding who your customers are and what they want is a fundamental part of any successful business. It can become increasingly challenging to create a one-size-fits-all customer profile. This is where the concept of cluster-based cohort analysis comes in.

1st Jun 2022