Category: Data-Driven Tech

  • Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

    Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

    This post analyzes a public-domain SMS text message dataset to compare various machine learning algorithms’ abilities to classify spam and ham messages. After implementing a Python workflow that includes data preparation, exploratory analysis, natural language processing, supervised machine learning binary classification, and a model performance analysis, the author finds that MLP, Logistic Regression CV, Linear…

  • A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    Exploratory Data Analysis (EDA) is an important part of data science projects, designed to identify patterns, anomalies, and relationships. It can employ univariate, bivariate, and multivariate data analytics, and can be accelerated using automated EDA tools. The article discusses Python libraries such as Pandas-Profiling and SweetViz for automating EDA and demonstrates their application to improve…

  • NLP of Restaurant Guest Reviews on Tripadvisor

    NLP of Restaurant Guest Reviews on Tripadvisor

    This is a comprehensive study examining restaurant reviews on TripAdvisor across 31 major European cities. The research, based on a dataset scraped from TripAdvisor, aims to perform a sentiment analysis of reviews, exploring average ratings per city, vegetarian-friendly cities, and how local cuisine compares to foreign food. The analysis is carried out using Python, demonstrating…

  • Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

  • Unsupervised ML, K-Means Clustering & Customer Segmentation

    Unsupervised ML, K-Means Clustering & Customer Segmentation

    Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

  • Top Fast-Growing Apps in 2023

    Top Fast-Growing Apps in 2023

    The OKTA Business at Work report and blogs by Leon Zucchini discuss the fastest-growing and new app categories. Key trends include the growth of collaboration, communication, and travel apps, and the adoption of multi-cloud. Ten notable growing apps are Kandji, Grammarly, Bob, Notion, Prisma Access, Navan, GitLab, Ironclad, Terraform Cloud, and Figma. Emerging apps include…

  • Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

    Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

    Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…

  • Risk-Aware Strategies for DCA Investors

    Risk-Aware Strategies for DCA Investors

    Dollar-Cost Averaging (DCA) is an investment approach that involves investing a fixed amount regularly, regardless of market price. It offers benefits such as risk reduction and market downturn resilience. It’s useful for beginners and can be combined with other strategies for a disciplined investment approach. References include Investopedia and Yahoo Finance.

  • A Closer Look at the Azure Cloud Portfolio – 3. Azure DevOps Boards

    A Closer Look at the Azure Cloud Portfolio – 3. Azure DevOps Boards

    Azure Boards (AB), part of Azure DevOps suite, is a tool for managing software development work that allows planning, tracking, customization, and discussion. The post outlines how to start with AB, set up a project, create and manage work items, customize a project’s boards, and organize work into sprints. AB’s key advantages include its flexibility,…

  • GPT & DeepLake NLP: Amazon Financial Statements

    GPT & DeepLake NLP: Amazon Financial Statements

    The post outlines the implementation of an AI-powered chatbot using NLP to process and analyze financial data from Amazon’s financial statements. The tool employs LlamaIndex and DeepLake to answer queries, summarize financial information, and analyze trends. This approach enhances the efficiency of data analysis, making it a valuable resource for finance and banking professionals.

  • Effective 2D Image Compression with K-means Clustering

    Effective 2D Image Compression with K-means Clustering

    The post explores the application of the K-means clustering algorithm, a popular unsupervised Machine Learning algorithm, for image compression. By segmenting 2D images into different clusters, the algorithm effectively reduces storage space without compromising on image quality or resolution. It also demonstrates the application of this approach through a case study, where optimal results were…

  • The $0 MarTech Stack for Small Business

    The $0 MarTech Stack for Small Business

    The post is a comprehensive guide to marketing technology, or martech. It covers the development of customer data platforms for managing marketing operations, as well as ten categories of free martech SaaS tools useful for startups. The categories include marketing automation, social media, SEO, lead generation, graphic design, PR, email marketing, project management, conversion rate…

  • Dealing with Imbalanced Data in HealthTech ML/AI – 1. Stroke Prediction

    Dealing with Imbalanced Data in  HealthTech ML/AI – 1. Stroke Prediction

    This post discusses the prediction of stroke using machine learning (ML) models, focusing on the use of early warning systems and data balancing techniques to manage the highly imbalanced stroke data. It includes a detailed exploration of the torch artificial neural network training and performance evaluation, as well as the implementation and evaluation of various…

  • A Closer Look at the Azure Cloud Portfolio – 2. From VMs to Web Servers

    A Closer Look at the Azure Cloud Portfolio – 2. From VMs to Web Servers

    This guide explains how to create virtual machines (VMs) for deploying web servers from Azure. It covers the process of creating a VM and connecting it to a secured subnet within a virtual network (VNet), using Azure’s Bastion service for secure RDP/SSH connections, and installing a Nextcloud server on the VM. Additional steps include making…

  • Working with FRED API in Python: U.S. Recession Forecast & Beyond

    Working with FRED API in Python: U.S. Recession Forecast & Beyond

    The FRED API, or Federal Reserve Economic Data, provides over 267,000 economic time series from 80 sources, offering a wealth of data to promote economic education and research. It encompasses U.S. economic and financial data, including interest rates, monetary indicators, exchange rates, and regional economic data. Additionally, we analyzed correlations, trained currency exchange prediction models,…

  • Dabl Auto EDA-ML

    Dabl Auto EDA-ML

    Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

  • A Closer Look at the Azure Cloud Portfolio – 1. Essentials

    A Closer Look at the Azure Cloud Portfolio – 1. Essentials

    The article presents an extensive overview of Microsoft Azure services in comparison with Amazon Web Services (AWS) and Google Cloud Platform (GCP). It reveals that Azure’s cloud revenue for 2021 outperformed AWS and GCP combined, comprising nearly 80% of Fortune 500 companies as clients. The piece elaborates on Azure’s cloud concepts, Azure Synapse SQL Pool,…

  • Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    The content discusses a comprehensive analysis on a joint time-series analysis of Bitcoin, Gold and Crude Oil prices from 2021 to 2023. It explores data processing, exploratory data analysis before running a range of statistical tests, ARIMA models fitting, and finally, using the Markowitz portfolio optimization method. It then presents a detailed analysis, including data…

  • Video Game Sales Data Exploration

    Video Game Sales Data Exploration

    The post explores the gaming industry’s size and state, highlighting a potential market value of $314bn by 2027. It emphasizes the industry’s three main subsectors: console, PC, and smartphone gaming. Moreover, the post conducts extensive data analysis on video game sales data, using Python to examine aspects such as genre profitability, platform sales prices, and…

  • Data Visualization in Python – 1. Stock Technical Indicators

    Data Visualization in Python – 1. Stock Technical Indicators

    Featured Photo by Monstera on Pexels. In this project, we will implement the following Technical Indicators in Python: Conventionally, we will look at the following three main groups of technical indicators: Input Stock Data Let’s set the working directory VIZ import osos.chdir(‘VIZ’)os. getcwd() and import the key libraries import datetime as dtimport pandas as pdimport…