Category: Machine Learning
-
Exploratory Data Analysis (EDA) and NLP Visualization of Restaurant Guest Reviews on Tripadvisor – Eating Out in Europe
Photo by Pablo Merchán Montes on Unsplash Tripadvisor, Inc. is an American internet-based travel company headquartered in Needham, MA. Its branded sites and forums operate as online travel guides offering free user-generated reviews of travel-related content, tools for price comparisons, and online travel booking services. As of 2022, Tripadvisor’s total number of user reviews and ratings reached…
-
Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%
Photo by CardMapr.nl on Unsplash Clickable Table of Contents Data Preparation & Exploratory Analysis Let’s set the working directory import osos.chdir(‘YOURPATH’) os. getcwd() and import the necessary packages import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns %matplotlib inlinesns.set_style(“whitegrid”) Let’s load the dataset from the csv file using Pandas data =…
-
Unsupervised ML Clustering, Customer Segmentation, Cohort, Market Basket, Bank Churn, CRM, ABC & RFM Analysis – A Comprehensive Guide in Python
Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…
-
Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons
Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…
-
Risk-Aware Strategies for DCA Investors
Let’s look at the the Dollar-Cost Averaging (DCA) investment approach that involves investing the same amount of money in a target security at regular intervals over a certain period of time, regardless of price. It can make it easier to deal with uncertain markets by making purchases automatic. It also supports an investor’s effort to invest…
-
An Interactive GPT Index and DeepLake Interface – 1. Amazon Financial Statements
Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’) os. getcwd() and install the key libraries !pip install llama-index !pip install deeplake Let’s import the libraries from llama_index import (SimpleDirectoryReader,GPTDeepLakeIndex,GPTSimpleKeywordTableIndex,Document,LLMPredictor,ServiceContext,download_loader,)from langchain.chat_models import ChatOpenAIfrom typing import List, Optional, Tupleimport requestsimport tqdmimport osfrom pathlib import Path Let’s define the PDF file reader PDFReader = download_loader(“PDFReader”) loader = PDFReader()…
-
Effective 2D Image Compression with K-means Clustering
Performance Test Let’s set the working directory YOUR PATH and import the key Python libraries import osos.chdir(‘YOUR PATH’) os. getcwd() import pandas as pdimport numpy as npimport matplotlib as mplimport matplotlib.pyplot as plt from scipy.io import loadmatfrom sklearn.cluster import KMeansfrom sklearn.preprocessing import StandardScalerfrom scipy import linalg pd.set_option(‘display.notebook_repr_html’, False)pd.set_option(‘display.max_columns’, None)pd.set_option(‘display.max_rows’, 150)pd.set_option(‘display.max_seq_items’, None) %matplotlib inline import seaborn…
-
Dealing with Imbalanced Data in HealthTech ML/AI – 1. Stroke Prediction
Specifically, we will compare the (1) SMOTE-balanced Torch NN (viz. the Cross-Entropy Adam Optimizer) against the (2) Sinnott’s Python algorithm from scikit-learn to be validated by various scikit-learn metrics, such as AUC, precision, recall, F-measure and accuracy. Table of Contents Our Jupyter notebook and the entire Python project will be stored in the working directory…
-
Working with FRED API in Python: U.S. Recession Forecast & Beyond
Featured Photo by Lukas on Pexels. FRED stands for Federal Reserve Economic Data, and is a database of time series economic data that has been aggregated from a bunch of sources. This is a great place to find financial data. You can visit the FRED web site to search for a data series or use the Python fredapi to download data…
-
Advanced Integrated Data Visualization (AIDV) in Python – 2. Dabl Auto EDA & ML
Table of Contents First, let’s install dabl !pip install dabl and set the working directory DIR import osos.chdir(‘DIR’)os. getcwd() The Digits Classification Dataset Let’s run dabl.SimpleClassifier() as follows import dablfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_digitsX, y = load_digits(return_X_y=True)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)sc = dabl.SimpleClassifier().fit(X_train, y_train) Running DummyClassifier() accuracy: 0.106 recall_macro: 0.100…
-
A Closer Look at the Azure Cloud Portfolio – 1. Essentials
Table of Contents Azure Cloud Concepts Source: 2023 TomTom Azure packaged software, IaaS, PaaS, and SaaS: Azure Synapse SQL Pool Learn more about Polybase here. Azure DevOps Boards Capability Maturity Model Integration (CMMI) is a process level improvement training and appraisal program. Add new items and divide work into time slots called sprints. Learn more about…
-
Using AI/ANN AUC>90% for Early Diagnosis of Cardiovascular Disease (CVD)
Featured Photo of Karolina Grabowska on Pexels. Data Preparation Let’s set the working directory HEART23 import osos.chdir(‘HEART23’)os. getcwd() and import the libraries import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snssns.set() from scipy.stats import skew from sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score, roc_curve, roc_auc_score, precision_score, recall_score import scikitplot…
-
Overview of AWS Tech Portfolio 2023
This article provides with an overview of 50+ Amazon Web Services (AWS) 2023. AWS is the leading vendor of cloud services and infrastructure, dominating the cloud computing market: Amazon net sales increased by 15% to $127.1 billion in Q3 2022 as compared to $110.8 billion in Q3 2021. AWS segment sales increased by 27% year-over-year to reach…
-
Gold ETF Price Prediction using the Bayesian Ridge Linear Regression
Featured Photo by Pixabay. Let’s set the working directory GOLD import osos.chdir(‘GOLD’) os. getcwd() and import the following libraries from sklearn.linear_model import LinearRegression import pandas as pdimport numpy as np import matplotlib.pyplot as plt%matplotlib inlineplt.style.use(‘seaborn-darkgrid’) import yfinance as yf Let’s read the dataDf = yf.download(‘GLD’, ‘2022-01-01’, ‘2023-03-25’, auto_adjust=True) Df = Df[[‘Close’]] Df = Df.dropna() Let’s…
-
90% ACC Diabetes-2 ML Binary Classifier
Featured Photo by Nataliya Vaitkevich on Pexels. Acknowledgements Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261–265). IEEE Computer Society Press. Diabetes EDA & Prediction|Acc %90.25 & ROC %96.38 The…
-
Applying a Risk-Aware Portfolio Rebalancing Strategy to ETF, Energy, Pharma, and Aerospace/Defense Stocks in 2023
In this post, we will apply the Guillen’s asset rebalancing algorithm (cf. the Python code) to the following risk-aware portfolio: stocks = [‘SPY‘, ‘XOM‘, ‘ABBV‘, ‘AZN‘, ‘LMT‘] The initial portfolio value to be allocated is portfolio_value = 10**6 and the weight allocation per asset is weights = [0.15 , 0.30, 0.40, 0.075, 0.075] Conventionally, our…
-
Performance Analysis of Face Recognition Out-of-Box ML/AI Workflows
Featured Photo by cottonbro studio, Pexels Facial Recognition (FR) is a category of biometric software that maps an individual’s facial features mathematically and stores the data as a faceprint. The goal of this case study is the Exploratory Data Analysis (EDA) and performance QC analysis of out-of-box ML/AI workflows tested on public-domain datasets and real-time webcam GUI.…
-
AI-Driven Object Detection & Segmentation with Meta Detectron2 Deep Learning
Method Computer Resources Python 3 Google Compute Engine backend (GPU) Install Detectron2 !python -m pip install pyyaml==5.1 import sys, os, distutils.core # Note: This is a faster way to install detectron2 in Colab, but it does not include all functionalities. # See https://detectron2.readthedocs.io/tutorials/install.html for full installation instructions !git clone ‘https://github.com/facebookresearch/detectron2’ dist = distutils.core.run_setup(“./detectron2/setup.py”) !python -m pip install {‘ ‘.join([f”‘{x}’” for x in dist.install_requires])} sys.path.insert(0, os.path.abspath(‘./detectron2’)) # Properly install detectron2. (Please do not install twice in both ways) # !python -m pip install ‘git+https://github.com/facebookresearch/detectron2.git’ Import Libraries import torch, detectron2 !nvcc –version TORCH_VERSION = “.”.join(torch.__version__.split(“.”)[:2]) CUDA_VERSION = torch.__version__.split(“+”)[-1] print(“torch: “, TORCH_VERSION, “; cuda: “, CUDA_VERSION) print(“detectron2:”, detectron2.__version__) nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0 torch: 1.13 ;…
-
SARIMAX X-Validation of EIA Crude Oil Prices Forecast in 2023 – 2. Brent
Based on our previous study, our today’s focus is on SARIMAX time-series X-validation of the Brent crude oil spot price USD/b: viz. the goal is to verify the following EIA energy forecast in 2023 According to EIA, the Brent spot price will average $83.63/b in 2023. Table of Contents Prerequisites In this study we will be…