Category: Data-Driven Tech
-
Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%
Photo by CardMapr.nl on Unsplash Clickable Table of Contents Data Preparation & Exploratory Analysis Let’s set the working directory import osos.chdir(‘YOURPATH’) os. getcwd() and import the necessary packages import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns %matplotlib inlinesns.set_style(“whitegrid”) Let’s load the dataset from the csv file using Pandas data =…
-
Unsupervised ML Clustering, Customer Segmentation, Cohort, Market Basket, Bank Churn, CRM, ABC & RFM Analysis – A Comprehensive Guide in Python
Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…
-
Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons
Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…
-
Risk-Aware Strategies for DCA Investors
Let’s look at the the Dollar-Cost Averaging (DCA) investment approach that involves investing the same amount of money in a target security at regular intervals over a certain period of time, regardless of price. It can make it easier to deal with uncertain markets by making purchases automatic. It also supports an investor’s effort to invest…
-
A Closer Look at the Azure Cloud Portfolio – 3. Azure DevOps Boards
1. Getting Started with AB You need the MS account to start AB. Choose Start free option. Choose Public option Click Advanced: Git version control (there could be code versions or file management) and Basic work item process The first/second one is the distributed/centralized version control Project management process: Agile, Basic, CMMI, Scrum Capability Maturity…
-
An Interactive GPT Index and DeepLake Interface – 1. Amazon Financial Statements
Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’) os. getcwd() and install the key libraries !pip install llama-index !pip install deeplake Let’s import the libraries from llama_index import (SimpleDirectoryReader,GPTDeepLakeIndex,GPTSimpleKeywordTableIndex,Document,LLMPredictor,ServiceContext,download_loader,)from langchain.chat_models import ChatOpenAIfrom typing import List, Optional, Tupleimport requestsimport tqdmimport osfrom pathlib import Path Let’s define the PDF file reader PDFReader = download_loader(“PDFReader”) loader = PDFReader()…
-
Effective 2D Image Compression with K-means Clustering
Performance Test Let’s set the working directory YOUR PATH and import the key Python libraries import osos.chdir(‘YOUR PATH’) os. getcwd() import pandas as pdimport numpy as npimport matplotlib as mplimport matplotlib.pyplot as plt from scipy.io import loadmatfrom sklearn.cluster import KMeansfrom sklearn.preprocessing import StandardScalerfrom scipy import linalg pd.set_option(‘display.notebook_repr_html’, False)pd.set_option(‘display.max_columns’, None)pd.set_option(‘display.max_rows’, 150)pd.set_option(‘display.max_seq_items’, None) %matplotlib inline import seaborn…
-
Dealing with Imbalanced Data in HealthTech ML/AI – 1. Stroke Prediction
Specifically, we will compare the (1) SMOTE-balanced Torch NN (viz. the Cross-Entropy Adam Optimizer) against the (2) Sinnott’s Python algorithm from scikit-learn to be validated by various scikit-learn metrics, such as AUC, precision, recall, F-measure and accuracy. Table of Contents Our Jupyter notebook and the entire Python project will be stored in the working directory…
-
A Closer Look at the Azure Cloud Portfolio – 2. From VMs to Web Servers
In this post, you’ll read about creating virtual machines (VMs) and deploying your web servers from Azure. Read more here. Courtesy of Mario Ferraro. Prerequisites: An active Azure subscription. Before taking this guide, if you don’t have an Azure subscription yet, please create an Azure Free Trial beforehand. Step 1: Create a Resource Group Step…
-
Working with FRED API in Python: U.S. Recession Forecast & Beyond
Featured Photo by Lukas on Pexels. FRED stands for Federal Reserve Economic Data, and is a database of time series economic data that has been aggregated from a bunch of sources. This is a great place to find financial data. You can visit the FRED web site to search for a data series or use the Python fredapi to download data…
-
Advanced Integrated Data Visualization (AIDV) in Python – 2. Dabl Auto EDA & ML
Table of Contents First, let’s install dabl !pip install dabl and set the working directory DIR import osos.chdir(‘DIR’)os. getcwd() The Digits Classification Dataset Let’s run dabl.SimpleClassifier() as follows import dablfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_digitsX, y = load_digits(return_X_y=True)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)sc = dabl.SimpleClassifier().fit(X_train, y_train) Running DummyClassifier() accuracy: 0.106 recall_macro: 0.100…
-
A Closer Look at the Azure Cloud Portfolio – 1. Essentials
Table of Contents Azure Cloud Concepts Source: 2023 TomTom Azure packaged software, IaaS, PaaS, and SaaS: Azure Synapse SQL Pool Learn more about Polybase here. Azure DevOps Boards Capability Maturity Model Integration (CMMI) is a process level improvement training and appraisal program. Add new items and divide work into time slots called sprints. Learn more about…
-
Joint Analysis of Bitcoin, Gold and Crude Oil Prices with Optimized Risk/Return in 2023
Referring to the recent fintech R&D study in Python, let’s discuss joint time-series analysis of Bitcoin (BTC), Gold (GC=F) and Crude Oil (CL=F) prices 2021-23 with the subsequent Markowitz portfolio optimization of these 3 assets in 2023. Goals: Scope: Input Data Let’s set the working directory import os os.chdir(‘PORTFOLIORISK’) os. getcwd() and import the following…
-
Video Game Sales Data Visualization, Wrangling and Market Analysis in Python
Featured Photo by Element5 Digital on Pexels. Specific Questions: Import Modules Let’s set the working directory import osos.chdir(‘VIDEOGAMES’)os. getcwd() and import the necessary modules/libraries import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns%matplotlib inlinesns.set_style(‘darkgrid’) Input Dataset Let’s read the dataset df = pd.read_csv(‘vgsales.csv’)df.head() Dataset shapedf.shape (16598, 11) Dataset typedf.dtypes Rank int64…
-
Advanced Integrated Data Visualization (AIDV) in Python – 1. Stock Technical Indicators
Featured Photo by Monstera on Pexels. In this project, we will implement the following Technical Indicators in Python: Conventionally, we will look at the following three main groups of technical indicators: Input Stock Data Let’s set the working directory VIZ import osos.chdir(‘VIZ’)os. getcwd() and import the key libraries import datetime as dtimport pandas as pdimport…
-
Using AI/ANN AUC>90% for Early Diagnosis of Cardiovascular Disease (CVD)
Featured Photo of Karolina Grabowska on Pexels. Data Preparation Let’s set the working directory HEART23 import osos.chdir(‘HEART23’)os. getcwd() and import the libraries import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snssns.set() from scipy.stats import skew from sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score, roc_curve, roc_auc_score, precision_score, recall_score import scikitplot…
-
Overview of AWS Tech Portfolio 2023
This article provides with an overview of 50+ Amazon Web Services (AWS) 2023. AWS is the leading vendor of cloud services and infrastructure, dominating the cloud computing market: Amazon net sales increased by 15% to $127.1 billion in Q3 2022 as compared to $110.8 billion in Q3 2021. AWS segment sales increased by 27% year-over-year to reach…
-
Gold ETF Price Prediction using the Bayesian Ridge Linear Regression
Featured Photo by Pixabay. Let’s set the working directory GOLD import osos.chdir(‘GOLD’) os. getcwd() and import the following libraries from sklearn.linear_model import LinearRegression import pandas as pdimport numpy as np import matplotlib.pyplot as plt%matplotlib inlineplt.style.use(‘seaborn-darkgrid’) import yfinance as yf Let’s read the dataDf = yf.download(‘GLD’, ‘2022-01-01’, ‘2023-03-25’, auto_adjust=True) Df = Df[[‘Close’]] Df = Df.dropna() Let’s…