Tag: data visualization

  • Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

    Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML,  HPO & SHAP

    This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

  • A Market-Neutral Strategy

    A Market-Neutral Strategy

    The work aims to solve the problem of Markowitz portfolio optimization for a one-year investment horizon through the pairs trading cointegrated strategy. Market-neutral trading strategies seek to generate returns independent of market swings to achieve a zero beta against its relevant market index. Statistical arbitrage (SA), pairs trading, and APO signals are analyzed. The study…

  • The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The Internet of Things (IoT) aids in the improvement of processes and enables new scenarios through network-connected devices. Recognized as a driver of the Fourth Industrial Revolution, IoT applications include predictive maintenance, industry safety, automation, remote monitoring, asset tracking, and fraud detection. Advancements in cloud IoT architectures over recent years have enabled efficient data ingestion,…

  • Plotly Dash TA Stock Market App

    Plotly Dash TA Stock Market App

    The post explains how to deploy a Plotly Dash stock market app in Python with the dashboard of user-defined stock prices. This includes technical indicators like volume, MACD, and stochastic. The steps include selecting a stock ticker symbol (NVDA), retrieving stock data from yfinance API, adding Moving Averages, saving the stock chart in HTML form,…

  • Low-Code AutoEDA of Dutch eHealth Data in Python

    Low-Code AutoEDA of Dutch eHealth Data in Python

    The article details the usage of Python’s Low-Code AutoEDA for examining Dutch Healthcare Authority’s eHealth data. Utilizing various Python libraries like D-Tale, SweetViz, etc., the study aims to understand the healthcare data’s key features to ready it for AI techniques. The motivations include the Dutch government’s support for digital healthcare applications, especially amidst the recent…

  • Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    Returns-Volatility Domain K-Means Clustering and LSTM Anomaly Detection of S&P 500 Stocks

    This study aims to implement and evaluate the K-means algorithm for ranking/clustering S&P 500 stocks based on average annualized return and volatility. The second goal is to detect anomalies in the best performing S&P 500 stocks using the Isolation Forest algorithm. Additionally, anomalies in the S&P 500 historical stock price time series data will be…

  • Wind Energy ML Prediction & Turbine Power Control

    Wind Energy ML Prediction &  Turbine Power Control

    This text presents a detailed project on modeling the power curve of a wind turbine, which is crucial in wind energy management and forecasting. By using machine learning techniques such as Random Forest and Gradient Boosting Regressors, and validating with real-world Scada data from a Turkish wind farm, the project shows it’s possible to create…

  • Morocco Earthquake EDA

    Morocco Earthquake EDA

    Featured design via Canva. Clickable Table of Contents Basic Installations and Imports Let’s set the working directory YOURPATH Let’s install and import the following libraries Download Earthquake Input Data For this project, we’ll use a dataset that contains all seismic events over the last seven days, which have a magnitude of 1.0 or greater: Output:…

  • NLP & Stock Impact of ChatGPT-Related Tweets

    NLP & Stock Impact of ChatGPT-Related Tweets

    This Python project extends a recent study on half a million tweets about OpenAI’s language model, ChatGPT. It uncovers public sentiment about this rapidly growing app and examines its impact on the future of AI-powered LLMs, including stock influences. The project uses data analysis techniques such as text processing, sentiment analysis, identification of key influencers,…

  • An Overview of Video Games in 2023: Trends, Technology, and Market Research

    An Overview of Video Games in 2023: Trends, Technology, and Market Research

    The gaming industry is rapidly growing, projected to reach a revenue of $365.6 billion in 2023. Major trends include Web3 gaming, AI integration, and a push for consolidation. Fashion brands collaborate for virtual sales, and advances in gaming technology, such as AR/VR and cloud-based gaming, promise an even more immersive experience for gamers.

  • A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    Exploratory Data Analysis (EDA) is an important part of data science projects, designed to identify patterns, anomalies, and relationships. It can employ univariate, bivariate, and multivariate data analytics, and can be accelerated using automated EDA tools. The article discusses Python libraries such as Pandas-Profiling and SweetViz for automating EDA and demonstrates their application to improve…

  • NLP of Restaurant Guest Reviews on Tripadvisor

    NLP of Restaurant Guest Reviews on Tripadvisor

    This is a comprehensive study examining restaurant reviews on TripAdvisor across 31 major European cities. The research, based on a dataset scraped from TripAdvisor, aims to perform a sentiment analysis of reviews, exploring average ratings per city, vegetarian-friendly cities, and how local cuisine compares to foreign food. The analysis is carried out using Python, demonstrating…

  • Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

  • Datapane Stock Screener App from Scratch

    Datapane Stock Screener App from Scratch

    This content provides a quick guide for value investors to use the Datapane stock screener API in Python. It includes instructions for installation, importing standard libraries, setting the stock ticker, downloading stock Adj Close price, and creating visualizations. The post also describes how to build a powerful report using Datapane’s layout components.

  • Unsupervised ML, K-Means Clustering & Customer Segmentation

    Unsupervised ML, K-Means Clustering & Customer Segmentation

    Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

  • Dabl Auto EDA-ML

    Dabl Auto EDA-ML

    Dabl, short for Data Analysis Baseline Library, is a high-level data exploration library in Python that automates repetitive data wrangling tasks in the early stages of supervised machine learning model development. Developed by Andreas Mueller and the scikit-learn community, it facilitates data preprocessing, advanced integrated visualization, exploratory data analysis (EDA), and ML model development, demonstrated…

  • Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    Joint Analysis of Bitcoin, Gold and Crude Oil Prices

    The content discusses a comprehensive analysis on a joint time-series analysis of Bitcoin, Gold and Crude Oil prices from 2021 to 2023. It explores data processing, exploratory data analysis before running a range of statistical tests, ARIMA models fitting, and finally, using the Markowitz portfolio optimization method. It then presents a detailed analysis, including data…

  • Video Game Sales Data Exploration

    Video Game Sales Data Exploration

    The post explores the gaming industry’s size and state, highlighting a potential market value of $314bn by 2027. It emphasizes the industry’s three main subsectors: console, PC, and smartphone gaming. Moreover, the post conducts extensive data analysis on video game sales data, using Python to examine aspects such as genre profitability, platform sales prices, and…

  • Using AI/ANN AUC>90% for Early Diagnosis of Cardiovascular Disease (CVD)

    Using AI/ANN AUC>90% for Early Diagnosis of Cardiovascular Disease (CVD)

    The project utilizes AI-driven cardiovascular medicine with a focus on early diagnosis of heart disease using Artificial Neural Networks (ANN). Aiming to improve early detection of heart issues, the project processed a dataset of 303 patients using Python libraries and conducted extensive exploratory data analysis. A Sequential ANN model was subsequently built, revealing excellent performance…

  • Overview of AWS Tech Portfolio 2023

    Overview of AWS Tech Portfolio 2023

    This summary focuses on the extensive capabilities of Amazon Web Services (AWS) by 2023, highlighting its 27% year-on-year growth and a net sales increase to $127.1 billion. AWS emerges as the top cloud service provider, offering over 200 services including compute, storage, databases, networking, AI, and machine learning. It is constantly expanding operations, having opened…