Tag: Supervised machine learning

  • Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

    Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML,  HPO & SHAP

    This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

  • Malware Detection & Interpretation – PCA, T-SNE & ML

    Malware Detection & Interpretation – PCA, T-SNE & ML

    This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

  • Leveraging Predictive Uncertainties of Time Series Forecasting Models

    Leveraging Predictive Uncertainties of Time Series Forecasting Models

    Featured Image via Canva. Table of Contents Introduction Random Simulation Tests TSLA Stock 43 Days TSLA Stock 300 Days Housing in the United States Industrial Production Federal Funds Rate Data S&P 500 Absolute Returns Number of Airline Passengers- 1. Holt-Winters Number of Airline Passengers- 2. Prophet Average Temperature in India Monthly Sales Data Analysis QC…

  • Health Insurance Cross Sell Prediction with ML Model Tuning & Validation

    Health Insurance Cross Sell Prediction with ML Model Tuning & Validation

    The content discusses the use of AI and Machine Learning (ML) for insurance cross-selling. It covers topics such as data preparation, model training with different algorithms, parameter optimization, and model evaluation. The study showcases the ability of ML models (HGBM, XGBoost, Random Forest) to predict cross-sell customers in the insurance sector, providing potential for improved…

  • Wind Energy ML Prediction & Turbine Power Control

    Wind Energy ML Prediction &  Turbine Power Control

    This text presents a detailed project on modeling the power curve of a wind turbine, which is crucial in wind energy management and forecasting. By using machine learning techniques such as Random Forest and Gradient Boosting Regressors, and validating with real-world Scada data from a Turkish wind farm, the project shows it’s possible to create…

  • Robust Fake News Detection: NLP Algorithms for Deep Learning and Supervised ML in Python

    Robust Fake News Detection: NLP Algorithms for Deep Learning and Supervised ML in Python

    The project aims at setting up a robust system for fake news detection using Python. The system adopts a hybrid framework, leveraging Natural Language Processing (NLP) techniques to classify text-based fake vs real news. Involving exploratory data analysis, multi-model training, testing, validation, and performance metrics comparison, it assesses different Deep Learning, Supervised Machine Learning, and…

  • Supervised ML Room Occupancy IoT

    Supervised ML Room Occupancy IoT

    The article presents a study on applying machine learning (ML) to IoT sensor data for workspace occupancy detection. Comparing 14 popular scikit-learn classifiers, the ML systems built use the gathered IoT sensor data to predict room occupancy with high certainty. The results suggest temperature and light are the significant factors affecting occupancy detection. The study…

  • ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

    ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

    The post outlines a ML-based approach to forecast video game sales, using several techniques to enhance training, accuracy, and prediction. The Kaggle’s VGChartz dataset, containing sales data and other game-specific information, was used to build and refine the model. Several ML techniques including RandomForestClassifier and Logistic Regression yielded top predictors, with the critic’s score deemed…

  • Customer Reviews NLP Spacy Analysis and ML/AI Demand Forecasting of the Steam PC Video Game Service

    Customer Reviews NLP Spacy Analysis and ML/AI  Demand Forecasting of the Steam PC Video Game Service

    Steam, a leading digital distribution platform for PC gaming, has seen over 6000 new games released in 2022, averaging over 34 games each day. This post aims to conduct comprehensive customer reviews NLP sentiment analysis and ML/AI demand forecasting using public-domain datasets. It covers EDA, NLP Spacy analysis, ML/AI pipeline, model validation, word clouds, and…

  • Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

    Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

    This post analyzes a public-domain SMS text message dataset to compare various machine learning algorithms’ abilities to classify spam and ham messages. After implementing a Python workflow that includes data preparation, exploratory analysis, natural language processing, supervised machine learning binary classification, and a model performance analysis, the author finds that MLP, Logistic Regression CV, Linear…

  • Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

    Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

    Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…

  • Working with FRED API in Python: U.S. Recession Forecast & Beyond

    Working with FRED API in Python: U.S. Recession Forecast & Beyond

    The FRED API, or Federal Reserve Economic Data, provides over 267,000 economic time series from 80 sources, offering a wealth of data to promote economic education and research. It encompasses U.S. economic and financial data, including interest rates, monetary indicators, exchange rates, and regional economic data. Additionally, we analyzed correlations, trained currency exchange prediction models,…

  • Dabl Auto EDA-ML

    Dabl Auto EDA-ML

    Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

  • A Closer Look at the Azure Cloud Portfolio – 1. Essentials

    A Closer Look at the Azure Cloud Portfolio – 1. Essentials

    The article presents an extensive overview of Microsoft Azure services in comparison with Amazon Web Services (AWS) and Google Cloud Platform (GCP). It reveals that Azure’s cloud revenue for 2021 outperformed AWS and GCP combined, comprising nearly 80% of Fortune 500 companies as clients. The piece elaborates on Azure’s cloud concepts, Azure Synapse SQL Pool,…

  • Overview of AWS Tech Portfolio 2023

    Overview of AWS Tech Portfolio 2023

    This summary focuses on the extensive capabilities of Amazon Web Services (AWS) by 2023, highlighting its 27% year-on-year growth and a net sales increase to $127.1 billion. AWS emerges as the top cloud service provider, offering over 200 services including compute, storage, databases, networking, AI, and machine learning. It is constantly expanding operations, having opened…

  • Gold Price Linear Regression

    Gold Price Linear Regression

    This content focuses on predicting gold prices using machine learning algorithms in Python. With an 80% R2-score and a Sharpe ratio of 2.33, it suggests a potential 8% revenue from an investment starting in December 2022. The forecasted next-day price for SPDR Gold Trust Shares is $185.136, aligning with Barchart’s “100% BUY” signal.

  • About Face Recognition ML Algorithms

    About Face Recognition ML Algorithms

    Facial Recognition (FR) involves mapping an individual’s facial features mathematically and storing the data as a faceprint. This case study outlines the process of Exploratory Data Analysis (EDA) and performance QC analysis for ML/AI workflows using public-domain datasets and real-time webcam GUI. The study includes the use of SVM for FR, dataset splitting, ML model…

  • Comparative ML/AI Performance Analysis of 13 Handwritten Digit Recognition (HDR) Scikit-Learn Algorithms with PCA+HPO

    Comparative ML/AI Performance Analysis of 13 Handwritten Digit Recognition (HDR) Scikit-Learn Algorithms with PCA+HPO

    Featured Photo by Torsten Dettlaff on Pexels The article consists of the following three parts: 3. Unsupervised ML using the Principal Component Analysis (PCA) for the dimensionality reduction within Parts 1 and 2. Our main goal is to build a text and graphics report comparing the main scikit-learn classification metrics: accuracy_score, classification_report (precision, recall, and…

  • ML/AI Image Classifier for Skin Cancer Detection

      Skin cancer is one of the most active types of cancer in the present decade. As the skin is the body’s largest organ, the point of considering skin cancer as the most common type of cancer among humans is understandable. It is generally classified into two major categories: nonmelanoma (benign) and melanoma (malignant) skin cancer…

  • Supervised Machine Learning Use Case: Prediction of House Prices

    This is the application of supervised machine learning to real estate. The goal is to predict sale prices ($) for N selected properties in a state (N>>1000).  We are given a csv dataset as a NxM table, where M is the number of property features describing every aspect of the house and surroundings (typically, M<100).   …