Tag: Supervised machine learning

Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

29th Mar 2024
Malware Detection & Interpretation – PCA, T-SNE & ML

This post discusses the application of PCA, T-SNE, and supervised ML algorithms for malware detection using a benchmark dataset. Techniques such as Logistic Regression, SVC, KNN, and XGBoost are implemented, achieving high performance metrics. Results show potential for improving malware detection using ML while reducing false positives and enhancing cyber defense.

22nd Feb 2024
Leveraging Predictive Uncertainties of Time Series Forecasting Models

Featured Image via Canva. Table of Contents Introduction Random Simulation Tests TSLA Stock 43 Days TSLA Stock 300 Days Housing in the United States Industrial Production Federal Funds Rate Data S&P 500 Absolute Returns Number of Airline Passengers- 1. Holt-Winters Number of Airline Passengers- 2. Prophet Average Temperature in India Monthly Sales Data Analysis QC…

5th Jan 2024
Health Insurance Cross Sell Prediction with ML Model Tuning & Validation

The content discusses the use of AI and Machine Learning (ML) for insurance cross-selling. It covers topics such as data preparation, model training with different algorithms, parameter optimization, and model evaluation. The study showcases the ability of ML models (HGBM, XGBoost, Random Forest) to predict cross-sell customers in the insurance sector, providing potential for improved…

2nd Dec 2023
Wind Energy ML Prediction & Turbine Power Control

This text presents a detailed project on modeling the power curve of a wind turbine, which is crucial in wind energy management and forecasting. By using machine learning techniques such as Random Forest and Gradient Boosting Regressors, and validating with real-world Scada data from a Turkish wind farm, the project shows it’s possible to create…

19th Sep 2023
Robust Fake News Detection: NLP Algorithms for Deep Learning and Supervised ML in Python

The project aims at setting up a robust system for fake news detection using Python. The system adopts a hybrid framework, leveraging Natural Language Processing (NLP) techniques to classify text-based fake vs real news. Involving exploratory data analysis, multi-model training, testing, validation, and performance metrics comparison, it assesses different Deep Learning, Supervised Machine Learning, and…

15th Aug 2023
Supervised ML Room Occupancy IoT

The article presents a study on applying machine learning (ML) to IoT sensor data for workspace occupancy detection. Comparing 14 popular scikit-learn classifiers, the ML systems built use the gathered IoT sensor data to predict room occupancy with high certainty. The results suggest temperature and light are the significant factors affecting occupancy detection. The study…

10th Aug 2023
ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

The post outlines a ML-based approach to forecast video game sales, using several techniques to enhance training, accuracy, and prediction. The Kaggle’s VGChartz dataset, containing sales data and other game-specific information, was used to build and refine the model. Several ML techniques including RandomForestClassifier and Logistic Regression yielded top predictors, with the critic’s score deemed…

21st Jun 2023
Customer Reviews NLP Spacy Analysis and ML/AI Demand Forecasting of the Steam PC Video Game Service

Steam, a leading digital distribution platform for PC gaming, has seen over 6000 new games released in 2022, averaging over 34 games each day. This post aims to conduct comprehensive customer reviews NLP sentiment analysis and ML/AI demand forecasting using public-domain datasets. It covers EDA, NLP Spacy analysis, ML/AI pipeline, model validation, word clouds, and…

17th Jun 2023
Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

This post analyzes a public-domain SMS text message dataset to compare various machine learning algorithms’ abilities to classify spam and ham messages. After implementing a Python workflow that includes data preparation, exploratory analysis, natural language processing, supervised machine learning binary classification, and a model performance analysis, the author finds that MLP, Logistic Regression CV, Linear…

8th Jun 2023
Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…

8th May 2023
Working with FRED API in Python: U.S. Recession Forecast & Beyond

The FRED API, or Federal Reserve Economic Data, provides over 267,000 economic time series from 80 sources, offering a wealth of data to promote economic education and research. It encompasses U.S. economic and financial data, including interest rates, monetary indicators, exchange rates, and regional economic data. Additionally, we analyzed correlations, trained currency exchange prediction models,…

20th Apr 2023
Dabl Auto EDA-ML

Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

19th Apr 2023
A Closer Look at the Azure Cloud Portfolio – 1. Essentials

The article presents an extensive overview of Microsoft Azure services in comparison with Amazon Web Services (AWS) and Google Cloud Platform (GCP). It reveals that Azure’s cloud revenue for 2021 outperformed AWS and GCP combined, comprising nearly 80% of Fortune 500 companies as clients. The piece elaborates on Azure’s cloud concepts, Azure Synapse SQL Pool,…

16th Apr 2023
Overview of AWS Tech Portfolio 2023

This summary focuses on the extensive capabilities of Amazon Web Services (AWS) by 2023, highlighting its 27% year-on-year growth and a net sales increase to $127.1 billion. AWS emerges as the top cloud service provider, offering over 200 services including compute, storage, databases, networking, AI, and machine learning. It is constantly expanding operations, having opened…

26th Mar 2023
Gold Price Linear Regression

This content focuses on predicting gold prices using machine learning algorithms in Python. With an 80% R2-score and a Sharpe ratio of 2.33, it suggests a potential 8% revenue from an investment starting in December 2022. The forecasted next-day price for SPDR Gold Trust Shares is $185.136, aligning with Barchart’s “100% BUY” signal.

24th Mar 2023
About Face Recognition ML Algorithms

Facial Recognition (FR) involves mapping an individual’s facial features mathematically and storing the data as a faceprint. This case study outlines the process of Exploratory Data Analysis (EDA) and performance QC analysis for ML/AI workflows using public-domain datasets and real-time webcam GUI. The study includes the use of SVM for FR, dataset splitting, ML model…

8th Mar 2023
Comparative ML/AI Performance Analysis of 13 Handwritten Digit Recognition (HDR) Scikit-Learn Algorithms with PCA+HPO

Featured Photo by Torsten Dettlaff on Pexels The article consists of the following three parts: 3. Unsupervised ML using the Principal Component Analysis (PCA) for the dimensionality reduction within Parts 1 and 2. Our main goal is to build a text and graphics report comparing the main scikit-learn classification metrics: accuracy_score, classification_report (precision, recall, and…

4th Feb 2023
ML/AI Image Classifier for Skin Cancer Detection

Skin cancer is one of the most active types of cancer in the present decade. As the skin is the body’s largest organ, the point of considering skin cancer as the most common type of cancer among humans is understandable. It is generally classified into two major categories: nonmelanoma (benign) and melanoma (malignant) skin cancer…

30th Apr 2022
Supervised Machine Learning Use Case: Prediction of House Prices

This is the application of supervised machine learning to real estate. The goal is to predict sale prices ($) for N selected properties in a state (N>>1000). We are given a csv dataset as a NxM table, where M is the number of property features describing every aspect of the house and surroundings (typically, M<100). …

10th Feb 2022