Category: Visualization
-
Plotly Dash TA Stock Market App

The post explains how to deploy a Plotly Dash stock market app in Python with the dashboard of user-defined stock prices. This includes technical indicators like volume, MACD, and stochastic. The steps include selecting a stock ticker symbol (NVDA), retrieving stock data from yfinance API, adding Moving Averages, saving the stock chart in HTML form,…
-
NLP & Stock Impact of ChatGPT-Related Tweets

This Python project extends a recent study on half a million tweets about OpenAI’s language model, ChatGPT. It uncovers public sentiment about this rapidly growing app and examines its impact on the future of AI-powered LLMs, including stock influences. The project uses data analysis techniques such as text processing, sentiment analysis, identification of key influencers,…
-
ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

The post outlines a ML-based approach to forecast video game sales, using several techniques to enhance training, accuracy, and prediction. The Kaggle’s VGChartz dataset, containing sales data and other game-specific information, was used to build and refine the model. Several ML techniques including RandomForestClassifier and Logistic Regression yielded top predictors, with the critic’s score deemed…
-
An Overview of Video Games in 2023: Trends, Technology, and Market Research

The gaming industry is rapidly growing, projected to reach a revenue of $365.6 billion in 2023. Major trends include Web3 gaming, AI integration, and a push for consolidation. Fashion brands collaborate for virtual sales, and advances in gaming technology, such as AR/VR and cloud-based gaming, promise an even more immersive experience for gamers.
-
Customer Reviews NLP Spacy Analysis and ML/AI Demand Forecasting of the Steam PC Video Game Service

Steam, a leading digital distribution platform for PC gaming, has seen over 6000 new games released in 2022, averaging over 34 games each day. This post aims to conduct comprehensive customer reviews NLP sentiment analysis and ML/AI demand forecasting using public-domain datasets. It covers EDA, NLP Spacy analysis, ML/AI pipeline, model validation, word clouds, and…
-
Comparison of 20 ML + NLP Algorithms for SMS Spam-Ham Binary Classification

This post analyzes a public-domain SMS text message dataset to compare various machine learning algorithms’ abilities to classify spam and ham messages. After implementing a Python workflow that includes data preparation, exploratory analysis, natural language processing, supervised machine learning binary classification, and a model performance analysis, the author finds that MLP, Logistic Regression CV, Linear…
-
A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

Exploratory Data Analysis (EDA) is an important part of data science projects, designed to identify patterns, anomalies, and relationships. It can employ univariate, bivariate, and multivariate data analytics, and can be accelerated using automated EDA tools. The article discusses Python libraries such as Pandas-Profiling and SweetViz for automating EDA and demonstrates their application to improve…
-
Unsupervised ML, K-Means Clustering & Customer Segmentation

Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…
-
Early Heart Attack Prediction using ECG Autoencoder and 19 ML/AI Models with Test Performance QC Comparisons

Table of Contents Embed Socials: ECG Autoencoder Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’)os. getcwd() and import the following libraries import tensorflow as tfimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd from tensorflow.keras import layers, lossesfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.models import Model Let’s read the input dataset df = pd.read_csv(‘ecg.csv’, header=None) Let’s…
-
Dabl Auto EDA-ML

Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…
-
Joint Analysis of Bitcoin, Gold and Crude Oil Prices

The content discusses a comprehensive analysis on a joint time-series analysis of Bitcoin, Gold and Crude Oil prices from 2021 to 2023. It explores data processing, exploratory data analysis before running a range of statistical tests, ARIMA models fitting, and finally, using the Markowitz portfolio optimization method. It then presents a detailed analysis, including data…
-
Overview of AWS Tech Portfolio 2023

This summary focuses on the extensive capabilities of Amazon Web Services (AWS) by 2023, highlighting its 27% year-on-year growth and a net sales increase to $127.1 billion. AWS emerges as the top cloud service provider, offering over 200 services including compute, storage, databases, networking, AI, and machine learning. It is constantly expanding operations, having opened…
-
Towards Max(ROI/Risk) Trading

This post compares 1-year ROI/Risk of selected stocks vs ETF using stock analyzer functions. It includes comparing prices, visualizing annual risk and return, and examining correlation matrix of stock returns. It provides insights for selecting CPB stock for trading based on low correlation with ^GSPC, high return (~20%), and low risk (~23%).
-
Trending YouTube Video Data Science, NLP Predictions & Sentiment Analysis

Table of Contents Global YT WordCloud Let’s begin with the Kaggle YT TextHero dataset containing 3599 rows and 4 columns. Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’) os. getcwd() and import all necessary modulesfrom wordcloud import WordCloud, STOPWORDSimport matplotlib.pyplot as pltimport pandas as pd Let’s read the input dataset df = pd.read_csv(r”youtube0.csv”, encoding =”latin-1″)…
-
Turkey/Syria Earthquake Live Knowledge Update & Charity Guide

Turkey has recently experienced a high frequency of earthquakes, the largest being a magnitude of 7.8 near Nurdağı, Gaziantep, on February 6th, 2023. This was one of the most powerful earthquakes in the country’s history, resulting in over 22,000 deaths and significant damage across Southern Turkey and Northern Syria. The aftermath left many survivors homeless,…
-
SARIMAX Forecasting of Online Food Delivery Sales

This article provides a beginner-friendly guide to understanding and evaluating ARIMA-based time-series forecasting models such as SARIMA and SARIMAX. It focuses on an QC-optimized SARIMA(X) model to forecast the e-commerce sales of a food delivery company. The post covers essential concepts, data processing, model comparisons, and insights. It also includes a comparison between SARIMA and…
-
Semantic Analysis and NLP Visualizations of Wine Reviews

The study aims to develop a predictive model that identifies wines using the syntax and language prevalent in wine reviews like a master sommelier. Drawn from a Kaggle set of 130k reviews, the model identifies common vocabulary and usage patterns among wine experts, enabling automatic prediction of wine characteristics based purely on review text. The…
-
Interactive Global COVID-19 Data Visualization with Plotly

COVID-19, caused by SARS-CoV-2 virus, has affected 227.2 million people and caused 4,672,629 deaths. The disease, first reported in Wuhan, has spread globally. Data visualization tools like Plotly and analysis of Kaggle datasets provide insights into the pandemic’s impact, with the US leading in confirmed cases and deaths. China has managed to control the spread.
-
Hands-On USGS Webscraping of Earthquakes- Worldwide (24 Hours)

A live global earthquake tracker has been developed using the USGS earthquake data feed. This tool, which functions 24/7, distinguishes between underground nuclear explosions and organic or man-made seismic activities such as earthquakes and mining explosions. This tracker is crucial given that a third of the world’s population is exposed to earthquakes.
