Tag: data analytics

  • Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML, HPO & SHAP

    Titanic Benchmark Hypothesis Testing in Disaster Risk Management: (Auto)EDA, ML,  HPO & SHAP

    This project aims to apply the Titanic benchmark to hypothesis testing in disaster risk management. Using the Titanic dataset on Kaggle, a Machine Learning (ML) analysis was performed to determine the statistical significance relation between a person’s death and their passenger class, age, sex, and port of embarkation. The project involved comprehensive ML pipeline implementation…

  • 100 Basic Python Codes

    100 Basic Python Codes

    Source: PYPL Popularity of Programming Language, Feb 2024. Table of Contents Setting Up Your Environment Download Datasets Initial Pandas Data QC Displaying Pandas Data Types Showing Descriptive Statistics Exploring the Dataset Email Slicer User Input & Type Conversion Working with Lists Practicing Loops Calculator Temperature Conversion ADC Temperature Sensor Sorting Numpy Arrays Story Generator Display…

  • Basic Python Programming

    Basic Python Programming

    This guide introduces basic concepts and features of the Python programming language. It covers a range of topics, including installation, variables, strings, lists, tuples, sets, dictionaries, loops, conditionals, functions, and modules. The comprehensive content provides valuable information for beginners seeking to learn Python for data science or general programming.

  • Leveraging Predictive Uncertainties of Time Series Forecasting Models

    Leveraging Predictive Uncertainties of Time Series Forecasting Models

    Featured Image via Canva. Table of Contents Introduction Random Simulation Tests TSLA Stock 43 Days TSLA Stock 300 Days Housing in the United States Industrial Production Federal Funds Rate Data S&P 500 Absolute Returns Number of Airline Passengers- 1. Holt-Winters Number of Airline Passengers- 2. Prophet Average Temperature in India Monthly Sales Data Analysis QC…

  • The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The 5-Step GCP IoT Device-to-Report via AI Roadmap

    The Internet of Things (IoT) aids in the improvement of processes and enables new scenarios through network-connected devices. Recognized as a driver of the Fourth Industrial Revolution, IoT applications include predictive maintenance, industry safety, automation, remote monitoring, asset tracking, and fraud detection. Advancements in cloud IoT architectures over recent years have enabled efficient data ingestion,…

  • Low-Code AutoEDA of Dutch eHealth Data in Python

    Low-Code AutoEDA of Dutch eHealth Data in Python

    The article details the usage of Python’s Low-Code AutoEDA for examining Dutch Healthcare Authority’s eHealth data. Utilizing various Python libraries like D-Tale, SweetViz, etc., the study aims to understand the healthcare data’s key features to ready it for AI techniques. The motivations include the Dutch government’s support for digital healthcare applications, especially amidst the recent…

  • NVIDIA Rolling Volatility: GARCH & XGBoost

    NVIDIA Rolling Volatility: GARCH & XGBoost

    This post examines the prediction of NVIDIA stock volatility using two models: the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) and the Extreme Gradient Boosting (XGBoost). Both models are compared in terms of MSE and MAPE. The post discovers that the machine learning-based XGBoost model outperforms the GARCH model in NVDA volatility forecasting, showing the effectiveness of…

  • Practical SQL Queries, Cheat Sheets, and Interview Q&A for Data Scientists

    Practical SQL Queries, Cheat Sheets, and Interview Q&A for Data Scientists

    Professionals aspiring for a career in data science must master SQL, a crucial skill. This comprehensive SQL server tutorial includes practical exercises, cheat sheets, interview Q&A tailored to data scientists, and installation requirements. From RDBMS basics to advanced concepts for data science interviews, this resource emphasizes the significance of SQL in database operations.

  • ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

    ML Prediction of High/Low Video Game Hits with Data Resampling and Model Tuning

    The post outlines a ML-based approach to forecast video game sales, using several techniques to enhance training, accuracy, and prediction. The Kaggle’s VGChartz dataset, containing sales data and other game-specific information, was used to build and refine the model. Several ML techniques including RandomForestClassifier and Logistic Regression yielded top predictors, with the critic’s score deemed…

  • An Overview of Video Games in 2023: Trends, Technology, and Market Research

    An Overview of Video Games in 2023: Trends, Technology, and Market Research

    The gaming industry is rapidly growing, projected to reach a revenue of $365.6 billion in 2023. Major trends include Web3 gaming, AI integration, and a push for consolidation. Fashion brands collaborate for virtual sales, and advances in gaming technology, such as AR/VR and cloud-based gaming, promise an even more immersive experience for gamers.

  • A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

    Exploratory Data Analysis (EDA) is an important part of data science projects, designed to identify patterns, anomalies, and relationships. It can employ univariate, bivariate, and multivariate data analytics, and can be accelerated using automated EDA tools. The article discusses Python libraries such as Pandas-Profiling and SweetViz for automating EDA and demonstrates their application to improve…

  • NLP of Restaurant Guest Reviews on Tripadvisor

    NLP of Restaurant Guest Reviews on Tripadvisor

    This is a comprehensive study examining restaurant reviews on TripAdvisor across 31 major European cities. The research, based on a dataset scraped from TripAdvisor, aims to perform a sentiment analysis of reviews, exploring average ratings per city, vegetarian-friendly cities, and how local cuisine compares to foreign food. The analysis is carried out using Python, demonstrating…

  • Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    Improved Multiple-Model ML/DL Credit Card Fraud Detection: F1=88% & ROC=91%

    In 2023, the global card industry is projected to suffer $36.13 billion in fraud losses. This has necessitated a priority focus on enhancing credit card fraud detection by banks and financial organizations. AI-based techniques are making fraud detection easier and more accurate, with models able to recognize unusual transactions and fraud. The post discusses a…

  • Datapane Stock Screener App from Scratch

    Datapane Stock Screener App from Scratch

    This content provides a quick guide for value investors to use the Datapane stock screener API in Python. It includes instructions for installation, importing standard libraries, setting the stock ticker, downloading stock Adj Close price, and creating visualizations. The post also describes how to build a powerful report using Datapane’s layout components.

  • Unsupervised ML, K-Means Clustering & Customer Segmentation

    Unsupervised ML, K-Means Clustering & Customer Segmentation

    Table of Clickable Contents Motivation Methods Open-Source Datasets This file contains the basic information (ID, age, gender, income, and spending score) about the customers. Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion…

  • Video Game Sales Data Exploration

    Video Game Sales Data Exploration

    The post explores the gaming industry’s size and state, highlighting a potential market value of $314bn by 2027. It emphasizes the industry’s three main subsectors: console, PC, and smartphone gaming. Moreover, the post conducts extensive data analysis on video game sales data, using Python to examine aspects such as genre profitability, platform sales prices, and…

  • Overview of AWS Tech Portfolio 2023

    Overview of AWS Tech Portfolio 2023

    This summary focuses on the extensive capabilities of Amazon Web Services (AWS) by 2023, highlighting its 27% year-on-year growth and a net sales increase to $127.1 billion. AWS emerges as the top cloud service provider, offering over 200 services including compute, storage, databases, networking, AI, and machine learning. It is constantly expanding operations, having opened…

  • LSTM Price Predictions of 4 Tech Stocks

    LSTM Price Predictions of 4 Tech Stocks

    The given content explains the process of using Exploratory Data Analysis (EDA) and Long Short-Term Memory (LSTM) Sequential model for comparing the risk/return of four major tech stocks: Apple, Google, Microsoft, and Amazon, considering the tech scenario in 2023. The analysis involves examining stock price patterns, their correlations, risk-return assessment, and predicting stock prices using…

  • Towards Max(ROI/Risk) Trading

    Towards Max(ROI/Risk) Trading

    This post compares 1-year ROI/Risk of selected stocks vs ETF using stock analyzer functions. It includes comparing prices, visualizing annual risk and return, and examining correlation matrix of stock returns. It provides insights for selecting CPB stock for trading based on low correlation with ^GSPC, high return (~20%), and low risk (~23%).

  • Trending YouTube Video Data Science, NLP Predictions & Sentiment Analysis

    Trending YouTube Video Data Science, NLP Predictions & Sentiment Analysis

    Table of Contents Global YT WordCloud Let’s begin with the Kaggle YT TextHero dataset containing 3599 rows and 4 columns. Let’s set the working directory YOURPATH import osos.chdir(‘YOURPATH’) os. getcwd() and import all necessary modulesfrom wordcloud import WordCloud, STOPWORDSimport matplotlib.pyplot as pltimport pandas as pd Let’s read the input dataset df = pd.read_csv(r”youtube0.csv”, encoding =”latin-1″)…