Hugging Face NLP, Streamlit, PyGWalker, TF & Gradio App

Table of Contents

  1. Streamlit/Dash/Jupyter PyGWalker EDA Demo
  2. Keras Multi-Label Image Classification
  3. Super Soaker Failures Analysis
  4. Diabetes Prediction App
  5. FE+ML SciKit-Learn Diabetes Predictor
  6. Crop Recommendation
  7. Student Prediction Score
  8. Image-to-Sketch Conversion
  9. Text Substitution
  10. Story Generation with GPT-2
  11. BigGAN Text-to-Image Demo
  12. French Story Generator using Opus MT and GPT-2
  13. Image Classification with ImageNet
  14. Iris Flower Predictor
  15. Hugging Face NLP Sentiment Analysis
  16. Conclusions
  17. Explore More

Streamlit/Dash/Jupyter PyGWalker EDA Demo

pip install pygwalker
import pygwalker as pyg
import pandas as pd
import streamlit.components.v1 as components
import streamlit as st
 
# Adjust the width of the Streamlit page
st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)
 
# Add Title
st.title("Use Pygwalker In Streamlit")
 
# Import your data
df = pd.read_csv("bike_sharing_dc.csv")

# Generate the HTML using Pygwalker
pyg_html = pyg.to_html(df)
 
# Embed the HTML into the Streamlit app
components.html(pyg_html, height=1000, scrolling=True)
  • streamlit run app_pygwalker.py
Streamlit PyGWalker app dashboard: causal, registered counts vs season
Streamlit PyGWalker app dashboard: causal, work yes or not registered vs season
Streamlit PyGWalker app dashboard: example stacked histograms
  • Airbnb EDA: Implementing PyGWalker Dash UI in Jupyter IDE with gradio/NYC-Airbnb-Open-Data
#https://www.kaggle.com/code/lxy21495892/airbnb-eda-pygwalker-demo

import pygwalker as pyg
import dash
import dash_html_components as html
import dash_dangerously_set_inner_html
import gradio
from datasets import load_dataset
dataset = load_dataset("gradio/NYC-Airbnb-Open-Data", split="train")
df = dataset.to_pandas()
walker = pyg.walk(df, debug=False)
app = dash.Dash(__name__)
html_code = walker.to_html()
app.layout = html.Div([
    dash_dangerously_set_inner_html.DangerouslySetInnerHTML(html_code),
])
if __name__ == '__main__':
    app.run_server(debug=True)
Airbnb EDA: Implementing PyGWalker Dash UI in Jupyter IDE with gradio/NYC-Airbnb-Open-Data
PyGWalker data table
Airbnb EDA: number of reviews vs neighborhood
PyGWalker EDA in real estate: stories, bedrooms, prefarea vs price stdev and area sum
import pygwalker as pyg
import pandas as pd
df = pd.read_excel('01_DBC.xlsx')
pyg.walk(df)
PyGWalker EDA Dutch eHealth Data Table
PyGWalker EDA Dutch eHealth Data vertical histogram
PyGWalker EDA Dutch eHealth Data multiple charts
PyGWalker EDA Dutch eHealth Data several horizontal bar charts
PyGWalker EDA Dutch eHealth Data several single horizontal bar chart
  • References:

PyGWalker and Dash — Creating a Data Visualization Dashboard In Less Than 20 Lines of Code

PyGWalker Test

PyGWalker Tutorial: A Tableau-Like Python Library for Interactive Data Exploration and Visualization

PyGWalker: A Python Library for Visualizing Pandas Dataframes

You’ll Never Walk Alone: Use Pygwalker to Visualize Data in Jupyter Notebook

PyGWalker 0.1.6. Update: Export Visualizations to Code

A Hands-On Demonstration of the PyGWalker Data Visualization Library – Part 1

Tableau + Python = PyGWalker

Keras Multi-Label Image Classification

  • Following the deep learning CNN CV tutorial, we will demonstrate how to easily build an image classification model using Keras and deploy it using Gradio.
  • Setting the working directory YOURPATH and importing the key libraries
import os
os.chdir('YOURPATH')    # Set working directory
os. getcwd()
import gradio as gr
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import PIL
import pathlib
from tensorflow import keras
from tensorflow.keras import layers
  • Importing the input images of flowers and setting CNN parameters
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
import tensorflow as tf
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
3670
print(os.listdir(data_dir))
['daisy', 'dandelion', 'LICENSE.txt', 'roses', 'sunflowers', 'tulips']
import PIL
roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[1]))
  • Examples of images
import PIL
roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[1]))
Training image of roses
daisy = list(data_dir.glob('daisy/*'))
PIL.Image.open(str(daisy[2]))
Training image of daisy
  • Training images
batch_size = 32
img_height = 180
img_width = 180
train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.3,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
Found 3670 files belonging to 5 classes.
Using 2569 files for training.
  • Validation images
val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
class_names = train_ds.class_names
print(class_names)
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
  • Plotting training images with 5 labels
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 12))
for images, labels in train_ds.take(1):
  for i in range(12):
    ax = plt.subplot(3, 4, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")
Training images
  • Training CNN model
import tensorflow as tf
import keras
from keras import layers

num_classes = len(class_names)
model = keras.Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes,activation='softmax')
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling (Rescaling)       (None, 180, 180, 3)       0         
                                                                 
 conv2d (Conv2D)             (None, 180, 180, 16)      448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 90, 90, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 90, 90, 32)        4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 45, 45, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 45, 45, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 22, 22, 64)       0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 30976)             0         
                                                                 
 dense (Dense)               (None, 128)               3965056   
                                                                 
 dense_1 (Dense)             (None, 5)                 645       
                                                                 
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
epochs=20
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
Epoch 1/20
81/81 [==============================] - 22s 272ms/step - loss: 0.8036 - accuracy: 0.6999 - val_loss: 1.0209 - val_accuracy: 0.6063
Epoch 2/20
81/81 [==============================] - 23s 281ms/step - loss: 0.5807 - accuracy: 0.7902 - val_loss: 1.0150 - val_accuracy: 0.6417
Epoch 3/20
81/81 [==============================] - 24s 292ms/step - loss: 0.3573 - accuracy: 0.8758 - val_loss: 1.1478 - val_accuracy: 0.6226
Epoch 4/20
81/81 [==============================] - 24s 293ms/step - loss: 0.2325 - accuracy: 0.9233 - val_loss: 1.5997 - val_accuracy: 0.6104
Epoch 5/20
81/81 [==============================] - 24s 298ms/step - loss: 0.1499 - accuracy: 0.9560 - val_loss: 1.4177 - val_accuracy: 0.6226
Epoch 6/20
81/81 [==============================] - 24s 300ms/step - loss: 0.0902 - accuracy: 0.9755 - val_loss: 1.7358 - val_accuracy: 0.6376
Epoch 7/20
81/81 [==============================] - 24s 295ms/step - loss: 0.0531 - accuracy: 0.9860 - val_loss: 2.0204 - val_accuracy: 0.6226
Epoch 8/20
81/81 [==============================] - 24s 296ms/step - loss: 0.0295 - accuracy: 0.9946 - val_loss: 2.1318 - val_accuracy: 0.6131
Epoch 9/20
81/81 [==============================] - 24s 294ms/step - loss: 0.0135 - accuracy: 0.9981 - val_loss: 2.1777 - val_accuracy: 0.6131
Epoch 10/20
81/81 [==============================] - 25s 302ms/step - loss: 0.0072 - accuracy: 0.9992 - val_loss: 2.2860 - val_accuracy: 0.6090
Epoch 11/20
81/81 [==============================] - 25s 302ms/step - loss: 0.0198 - accuracy: 0.9938 - val_loss: 2.2578 - val_accuracy: 0.6253
Epoch 12/20
81/81 [==============================] - 24s 301ms/step - loss: 0.0224 - accuracy: 0.9942 - val_loss: 2.3855 - val_accuracy: 0.6199
Epoch 13/20
81/81 [==============================] - 25s 306ms/step - loss: 0.0500 - accuracy: 0.9926 - val_loss: 2.0227 - val_accuracy: 0.6144
Epoch 14/20
81/81 [==============================] - 25s 305ms/step - loss: 0.0491 - accuracy: 0.9875 - val_loss: 2.0545 - val_accuracy: 0.6131
Epoch 15/20
81/81 [==============================] - 25s 305ms/step - loss: 0.0667 - accuracy: 0.9805 - val_loss: 2.1308 - val_accuracy: 0.6035
Epoch 16/20
81/81 [==============================] - 25s 307ms/step - loss: 0.0801 - accuracy: 0.9728 - val_loss: 2.1201 - val_accuracy: 0.6090
Epoch 17/20
81/81 [==============================] - 25s 309ms/step - loss: 0.0582 - accuracy: 0.9813 - val_loss: 2.1666 - val_accuracy: 0.5940
Epoch 18/20
81/81 [==============================] - 25s 305ms/step - loss: 0.0146 - accuracy: 0.9965 - val_loss: 2.5094 - val_accuracy: 0.6049
Epoch 19/20
81/81 [==============================] - 25s 307ms/step - loss: 0.0026 - accuracy: 0.9996 - val_loss: 2.5327 - val_accuracy: 0.6035
Epoch 20/20
81/81 [==============================] - 25s 306ms/step - loss: 7.3366e-04 - accuracy: 1.0000 - val_loss: 2.6016 - val_accuracy: 0.6049
print(history.history.keys())
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
CNN model accuracy vs epoch
CNN model loss vs epoch
  • Deploying the trained CNN model in Gradio
import gradio as gr
image = gr.inputs.Image(shape=(180,180))
label = gr.outputs.Label(num_top_classes=5)

def predict_input_image(img):
  img_4d=img.reshape(-1,180,180,3)
  prediction=model.predict(img_4d)[0]
  return {class_names[i]: float(prediction[i]) for i in range(5)}

import gradio as gr
image = gr.inputs.Image(shape=(180,180))
label = gr.outputs.Label(num_top_classes=5)

gr.Interface(fn=predict_input_image, inputs=image, outputs=label,interpretation='default').launch(debug='True')

Running on local URL
To create a public link, set `share=True` in `launch()`.
Deploying the trained CNN model in Gradio - roses

Remark: Use via API

Deploying the trained CNN model in Gradio - roses

Super Soaker Failures Analysis

  • As an example of UI, let’s consider the super soaker failure analysis dashboard demonstrated as the Gradio workflow
import gradio as gr
import pandas as pd
import datasets
import seaborn as sns
import matplotlib.pyplot as plt

df = datasets.load_dataset("merve/supersoaker-failures")
df = df["train"].to_pandas()
df.dropna(axis=0, inplace=True)

def plot(df):
  plt.scatter(df.measurement_13, df.measurement_15, c = df.loading,alpha=0.5)
  plt.savefig("scatter.png")
  df['failure'].value_counts().plot(kind='bar')
  plt.savefig("bar.png")
  sns.heatmap(df.select_dtypes(include="number").corr())
  plt.savefig("corr.png")
  plots = ["corr.png","scatter.png", "bar.png"]
  return plots

inputs = [gr.Dataframe(label="Supersoaker Production Data")]
outputs = [gr.Gallery(label="Profiling Dashboard", columns=(1,3))]

gr.Interface(plot, inputs=inputs, outputs=outputs, examples=[df.head(100)], title="Supersoaker Failures Analysis Dashboard").launch()
Super Soaker failures analysis dashboard
Super soaker elements correlation heatmap
Super soaker elements failure analysis scatter plot
Super soaker failure counter

Diabetes Prediction App

import numpy as np
import pandas as pd
import gradio as gr
import warnings
warnings.filterwarnings('ignore')

from sklearn import datasets

diabetes = datasets.load_diabetes(as_frame=True)

print(type(diabetes['data']))
<class 'pandas.core.frame.DataFrame'>
diabetes['data'].head()
Input diabetes data table
print(diabetes['DESCR'])
.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - age     age in years
      - sex
      - bmi     body mass index
      - bp      average blood pressure
      - s1      tc, total serum cholesterol
      - s2      ldl, low-density lipoproteins
      - s3      hdl, high-density lipoproteins
      - s4      tch, total cholesterol / HDL
      - s5      ltg, possibly log of serum triglycerides level
      - s6      glu, blood sugar level

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)

  • Importing additional libraries and trained models
#https://colab.research.google.com/github/bambadij/Cesear-Cypher/blob/main/Diabete_prediction.ipynb#scrollTo=ZT_TFKQMeEjH
#https://github.com/bambadij/gradio_diabete/tree/main/toolkit
#https://github.com/bambadij/gradio_diabete/blob/main/app.py

import gradio as gr
import pandas as pd
import numpy as np
import joblib

#Load scaler ,encoder and model

scaler = joblib.load(r'scaler.joblib')
model = joblib.load(r'model_final.joblib')
encoder = joblib.load(r'encoder.joblib')

# Create a function that applies the ML Pipeline

def predict(Gender,Urea,Cr,HbA1c,Chol,TG,HDL,LDL,VLDL,BMI,age_range):
    
    # Convertir l'âge en un format numérique
    age_range_mapping = {
        '20-30': 1,
        '30-40': 2,
        '40-50': 3,
        '50-60': 4,
        '60-70': 5,
        '70-80': 6,
        '80-90': 7,
        '90-100': 8,
    }
    age_range_numeric = age_range_mapping.get(age_range, 0)
    #Create a dataframe
    input_df = pd.DataFrame({'Gender':[Gender],
        'Urea':[Urea],
        'Cr':[Cr],
        'HbA1c':[HbA1c],
        'Chol':[Chol],
        'TG':[TG],
        'HDL':[HDL],
        'LDL':[LDL],
        'VLDL':[VLDL],
        'BMI':[BMI],
        'age_range':[age_range_numeric]
        })
    
    input_df['Urea'] = input_df['Urea'].astype(float)  # Convert  "Urea" in float
    input_df['Gender']=encoder.fit_transform(input_df['Gender'])
    # input_df['CLASS']=encoder.fit_transform(input_df['CLASS'])
    input_df['age_range']=encoder.fit_transform(input_df['age_range'])
    final_df = input_df
    # print("Gender (encoded):", input_df['Gender'])
    # print("Age Range (encoded):", input_df['age_range'])
    #Make prediction using model
    prediction = model.predict(final_df)
    
    #prediction label 
    prediction_label = {
        0: "No Diabetic",
        1: "Predicted Diabetic",
        2: "Diabetic"
    }
    # print(prediction_label)
    print("Prediction (numeric value):", prediction[0])
    print("Prediction (label):", prediction_label[int(prediction[0])])
    return prediction_label[int(prediction[0])]

input_interface=[]

with gr.Blocks() as app:
    Title =gr.Label('Predicting Diabetes App')
    with gr.Row():
        Title
    
    with gr.Row():
        gr.Markdown("This app predict of Patient to mean Diabete No Diabete or Predicted Diabete")
        
    with gr.Accordion("Open for information on input"):
        gr.Markdown(""" This app receives the following as inputs and processes them return the prediction on whether a patient given the input will diabeted ,no diabete or predicted. 
                        - Gender
                        - Urea: Urea
                        - Cr : Creatinine ratio(Cr)
                        - HbA1c:Sugar Level Blood
                        - Chol : Cholesterol (Chol)
                        - TG :Triglycerides(TG)
                        - HDL :  HDL Cholesterol
                        - LDL:LDL
                        - VLDL:VLDL
                        - BMI :Body Mass Index (BMI)
                        - CLASS : Class (the patient's diabetes disease class may be Diabetic, Non-Diabetic, or Predict-Diabetic)
                        - AGE :Age
                    """)  
    with gr.Row():
        with gr.Column():
            input_interface_column_1 = [
                gr.components.Radio(['M','F'],label='Select your gender'),
                gr.components.Dropdown([1 ,2,3,4,5,6,7,8],label='Choose age tranche 1=>(20-30),2=>(30-40),3=>(40-50)...'),
                gr.components.Slider(label='Level urea mg/dl',minimum=0, maximum=100),
                gr.components.Number(label='Level Creatine mg/dl',minimum=0, maximum=100),
                gr.components.Number(label='Sugar Level Blood',minimum=0, maximum=100),
            ]
        with gr.Column():
            input_interface_column_2 = [
                gr.components.Number(label='Level Cholesterol mg/dl',minimum=0, maximum=100),
                gr.components.Number(label='Level Triglycerides mg/dl',minimum=0, maximum=100),
                gr.components.Slider(label='HDL Cholesterol',minimum=0, maximum=100),
                gr.components.Number(label='level LDL',minimum=0, maximum=100),
                gr.components.Number(label='VLDL level',minimum=0, maximum=10),
                gr.components.Slider(label='BMI Body Mass Index',minimum=0, maximum=100),
            ]
    with gr.Row():
        input_interface.extend(input_interface_column_1)
        input_interface.extend(input_interface_column_2)
        
    with gr.Row():
        predict_btn = gr.Button('Predict',variant="primary")
    
    #define the output interface
    output_interface = gr.Label(label="Diabetes Status")

    predict_btn.click(fn=predict, inputs=input_interface, outputs=output_interface)

app.launch()
Predicting diabetes App: top panel
Predicting diabetes App: bottom panel.

FE+ML SciKit-Learn Diabetes Predictor

# Load the data
from sklearn.metrics import accuracy_score
diabetes = pd.read_csv('https://assets.datacamp.com/production/course_1939/datasets/diabetes.csv')
X = diabetes[diabetes.columns[0:-1]] 
y = diabetes['diabetes']

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Import and train a machine learning model
rf = RandomForestClassifier(random_state=42, n_estimators=100)
rf.fit(X_train, y_train)

# Predict on the test data
y_pred = rf.predict(X_test)

# Evaluate the model
accuracy_score(y_test, y_pred)

0.734375

importances = rf.feature_importances_
importances

array([0.07449143, 0.27876091, 0.08888318, 0.07157507, 0.07091345,
       0.15805822, 0.11822478, 0.13909297])

# Find index of those with highest importance, sorted from largest to smallest:
indices = np.argsort(importances)[::-1]

for f in range(X.shape[1]): 
    print(f'{X.columns[indices[f]]}: {np.round(importances[indices[f]],2)}')

glucose: 0.28
bmi: 0.16
age: 0.14
dpf: 0.12
diastolic: 0.09
pregnancies: 0.07
triceps: 0.07
insulin: 0.07
  • Plotting RF FE
# Set the figure size and the number of decimal places to round the importances to
fig_size = (12, 8)
decimals = 2
plt.rcParams.update({'font.size': 22})
# Create the horizontal bar plot of feature importances
f, ax = plt.subplots(figsize=fig_size)
plt.barh(X.columns[indices], np.round(importances[indices], decimals))
ax.set_xlabel("Relative importance")
ax.set_title("Feature importances")
plt.show()
Diabetes Random Forest Feature Importance
# Import the permutation_importance function
from sklearn.inspection import permutation_importance

# Calculate feature importances using permutation importance
n_repeats = 30
random_state = 0
r = permutation_importance(rf, X_test, y_test,
                            n_repeats=n_repeats,
                            random_state=random_state)
for i in r.importances_mean.argsort()[::-1]:
    print(f"{X_test.columns[i]:<8}"
    f"{r.importances_mean[i]:.3f}"
    f" +/- {r.importances_std[i]:.3f}")

glucose 0.098 +/- 0.026
bmi     0.013 +/- 0.020
dpf     0.013 +/- 0.016
insulin 0.010 +/- 0.011
age     0.000 +/- 0.020
pregnancies-0.009 +/- 0.014
diastolic-0.010 +/- 0.011
triceps -0.012 +/- 0.009
# Get the sorted indices of the features according to mean importance
r_sorted_idx = r.importances_mean.argsort()

# Set the figure size
fig_size = (12, 12)
plt.rcParams.update({'font.size': 22})
# Create a boxplot
fig, ax = plt.subplots(figsize=fig_size)
ax.boxplot(
    r.importances[r_sorted_idx].T,
    vert=False,
    labels=X_test.columns[r_sorted_idx],
)

# Add axis labels
plt.xlabel('Feature importance')
plt.ylabel('Features')

plt.show()
Random Forest feature importance boxplots
  • Partial dependence plot for the “glucose” feature
# Import the PartialDependenceDisplay function
from sklearn.inspection import PartialDependenceDisplay

# Set the figure size
fig_size = (10, 6)
plt.rcParams.update({'font.size': 22})
# Create a plot of partial dependence for the 'glucose' feature
f, ax = plt.subplots(figsize=fig_size)
PartialDependenceDisplay.from_estimator(rf, X_train, ['glucose'], kind='both', subsample=50,
                                       ax=ax)

# Add a title to the plot
plt.title('Partial dependence plot for the "glucose" feature')

plt.show()
Partial dependence plot for the "glucose" feature
  • Partial dependence plot for the “bmi” feature
f, ax = plt.subplots(figsize=(10,6))
plt.rcParams.update({'font.size': 22})
PartialDependenceDisplay.from_estimator(rf, X_train, ['bmi'], kind='both', subsample=50,
                                       ax=ax)
plt.show()
Partial dependence plot for the "bmi" feature
  • Partial dependence plot for the “age” feature
f, ax = plt.subplots(figsize=(10,6))
plt.rcParams.update({'font.size': 22})
PartialDependenceDisplay.from_estimator(rf, X_train, ['age'], kind='both',  subsample=50, 
                                       ax=ax)
plt.show()
Partial dependence plot for the "age" feature
  • Partial dependence plot for the “bmi-glucose” features
f, ax = plt.subplots(figsize=(12,8))
plt.rcParams.update({'font.size': 22})
PartialDependenceDisplay.from_estimator(rf, X_train, [('glucose', 'bmi')], kind='average',  subsample=50, 
                                        ax=ax)
plt.show()
Partial dependence plot for the "bmi-glucose" features
  • Implementing the ML diabetes predictor App in Gradio
#https://www.kaggle.com/code/alexanderlundervold/2023-dat158-1-1-simple-examples-ipynb
import gradio as gr
def predict_diabetes(age, bmi, glucose):
    """
    Predicts whether a person is diabetic or non-diabetic based on their age, BMI, and glucose level.
    
    Args:
        age (float): The person's age in years.
        bmi (float): The person's body mass index (BMI).
        glucose (float): The person's glucose level in mg/dL.
        
    Returns:
        str: "Diabetic" if the person is predicted to be diabetic, "Non-diabetic" otherwise.
    """
    
    # Use the mean values for the unused features
    mean_insulin = diabetes.insulin.mean()
    mean_dpf = diabetes.dpf.mean()
    mean_pregnancies = diabetes.pregnancies.mean()
    mean_triceps = diabetes.triceps.mean()
    mean_diastolic = diabetes.diastolic.mean()

    # Reshape the input data for prediction
    input_data = np.array([mean_pregnancies, glucose, mean_diastolic, mean_triceps, 
                           mean_insulin, bmi, mean_dpf, age]).reshape(1, -1)
    
    # Get the prediction
    prediction = rf.predict(input_data)
    
    # Return the result
    return "Diabetic" if prediction[0] == 1 else "Non-diabetic"
# Set the minimum, maximum, and default values for the sliders
age_min, age_max, age_default = 0, 100, 30
bmi_min, bmi_max, bmi_default = 15, 50, 25
glucose_min, glucose_max, glucose_default = 0, 200, 100

# Create the interface
iface = gr.Interface(
    fn=predict_diabetes, 
    inputs=[
        gr.components.Slider(minimum=age_min, maximum=age_max, value=age_default, label="Age"),
        gr.components.Slider(minimum=bmi_min, maximum=bmi_max, value=bmi_default, label="BMI"),
        gr.components.Slider(minimum=glucose_min, maximum=glucose_max, value=glucose_default, label="Glucose Level")
    ], 
    outputs=gr.components.Textbox(label="Prediction"),
    title="Diabetes Predictor",
    description="""Enter your age, BMI, and glucose level to predict whether you are diabetic or non-diabetic.
    Data source: Pima Indians Diabetes Database; Model: Random Forest Classifier""",
)

# Launch the interface
iface.launch(share=True)
ML diabetes predictor

Crop Recommendation

  • Let’s look at the crop recommendation algorithm with Gradio deployment
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.read_csv("Crop_recommendation.csv")
plt.figure(figsize=(12,8))
plt.rcParams.update({'font.size': 18})
sns.heatmap(df.corr(),annot=True)
Crop features correlation heatmap
  • Training, testing and validating several ML models
features = df[['N','P','K','temperature','humidity','ph','rainfall']]
target = df['label']
labels = df['label']
acc=[]
model = []
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(features,target,random_state=42,test_size=0.2)

  • Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
from sklearn.metrics import classification_report
dt = DecisionTreeClassifier(criterion="entropy",random_state=42,max_depth=5)
dt.fit(xtrain,ytrain)
predicted_values = dt.predict(xtest)
x = metrics.accuracy_score(ytest, predicted_values)
acc.append(x)
model.append('Decision Tree')
print("DecisionTrees's Accuracy is: ", x*100)

print(classification_report(ytest,predicted_values))

DecisionTrees's Accuracy is:  86.5909090909091
              precision    recall  f1-score   support

       apple       1.00      1.00      1.00        23
      banana       1.00      1.00      1.00        21
   blackgram       0.61      1.00      0.75        20
    chickpea       1.00      0.96      0.98        26
     coconut       0.96      0.96      0.96        27
      coffee       1.00      1.00      1.00        17
      cotton       1.00      1.00      1.00        17
      grapes       1.00      1.00      1.00        14
        jute       0.63      0.96      0.76        23
 kidneybeans       0.00      0.00      0.00        20
      lentil       0.42      1.00      0.59        11
       maize       1.00      1.00      1.00        21
       mango       1.00      1.00      1.00        19
   mothbeans       0.00      0.00      0.00        24
    mungbean       1.00      1.00      1.00        19
   muskmelon       1.00      1.00      1.00        17
      orange       1.00      1.00      1.00        14
      papaya       1.00      0.87      0.93        23
  pigeonpeas       0.59      1.00      0.74        23
 pomegranate       1.00      0.91      0.95        23
        rice       0.92      0.63      0.75        19
  watermelon       1.00      1.00      1.00        19

    accuracy                           0.87       440
   macro avg       0.82      0.88      0.84       440
weighted avg       0.82      0.87      0.83       440

from sklearn.model_selection import cross_val_score
score = cross_val_score(dt, features, target,cv=5)
score
array([0.93636364, 0.91136364, 0.92045455, 0.87272727, 0.93636364])
import pickle as pk
pk.dump(dt,open('Decision Tree','wb'))
  • Gaussian NB
from sklearn.naive_bayes import GaussianNB 
gnb = GaussianNB()

gnb.fit(xtrain,ytrain)

predicted_values = gnb.predict(xtest)
x = metrics.accuracy_score(ytest, predicted_values)
acc.append(x)
model.append('Naive Bayes')
print("Naive Bayes's Accuracy is: ", x)

print(classification_report(ytest,predicted_values))

Naive Bayes's Accuracy is:  0.9954545454545455
              precision    recall  f1-score   support

       apple       1.00      1.00      1.00        23
      banana       1.00      1.00      1.00        21
   blackgram       1.00      1.00      1.00        20
    chickpea       1.00      1.00      1.00        26
     coconut       1.00      1.00      1.00        27
      coffee       1.00      1.00      1.00        17
      cotton       1.00      1.00      1.00        17
      grapes       1.00      1.00      1.00        14
        jute       0.92      1.00      0.96        23
 kidneybeans       1.00      1.00      1.00        20
      lentil       1.00      1.00      1.00        11
       maize       1.00      1.00      1.00        21
       mango       1.00      1.00      1.00        19
   mothbeans       1.00      1.00      1.00        24
    mungbean       1.00      1.00      1.00        19
   muskmelon       1.00      1.00      1.00        17
      orange       1.00      1.00      1.00        14
      papaya       1.00      1.00      1.00        23
  pigeonpeas       1.00      1.00      1.00        23
 pomegranate       1.00      1.00      1.00        23
        rice       1.00      0.89      0.94        19
  watermelon       1.00      1.00      1.00        19

    accuracy                           1.00       440
   macro avg       1.00      1.00      1.00       440
weighted avg       1.00      1.00      1.00       440
score = cross_val_score(gnb,features,target,cv=5)
score
array([0.99772727, 0.99545455, 0.99545455, 0.99545455, 0.99090909])
pk.dump(gnb,open('Guassian Naive Bayes','wb'))
  • SVM/SVC
from sklearn.svm import SVC
# data normalization with sklearn
from sklearn.preprocessing import MinMaxScaler
# fit scaler on training data
norm = MinMaxScaler().fit(xtrain)
x_train_norm = norm.transform(xtrain)
# transform testing dataabs
x_test_norm = norm.transform(xtest)
SVM = SVC(kernel='poly', degree=3, C=1)
SVM.fit(x_train_norm,ytrain)
predicted_values = SVM.predict(x_test_norm)
x = metrics.accuracy_score(ytest, predicted_values)
acc.append(x)
model.append('SVM')
print("SVM's Accuracy is: ", x)

print(classification_report(ytest,predicted_values))

SVM's Accuracy is:  0.9636363636363636
              precision    recall  f1-score   support

       apple       1.00      1.00      1.00        23
      banana       1.00      1.00      1.00        21
   blackgram       0.90      0.95      0.93        20
    chickpea       1.00      1.00      1.00        26
     coconut       1.00      1.00      1.00        27
      coffee       1.00      0.94      0.97        17
      cotton       0.94      1.00      0.97        17
      grapes       1.00      1.00      1.00        14
        jute       0.80      0.87      0.83        23
 kidneybeans       0.91      1.00      0.95        20
      lentil       0.79      1.00      0.88        11
       maize       1.00      0.95      0.98        21
       mango       1.00      1.00      1.00        19
   mothbeans       1.00      0.92      0.96        24
    mungbean       1.00      1.00      1.00        19
   muskmelon       1.00      1.00      1.00        17
      orange       1.00      1.00      1.00        14
      papaya       0.96      1.00      0.98        23
  pigeonpeas       1.00      0.83      0.90        23
 pomegranate       1.00      1.00      1.00        23
        rice       0.88      0.79      0.83        19
  watermelon       1.00      1.00      1.00        19

    accuracy                           0.96       440
   macro avg       0.96      0.97      0.96       440
weighted avg       0.97      0.96      0.96       440
score = cross_val_score(SVM,features,target,cv=5)
score

array([0.97954545, 0.975     , 0.98863636, 0.98863636, 0.98181818])

pk.dump(SVM,open('SVM','wb'))
  • Logistic Regression
from sklearn.linear_model import LogisticRegression

LogReg = LogisticRegression(random_state=2)

LogReg.fit(xtrain,ytrain)

predicted_values = LogReg.predict(xtest)

x = metrics.accuracy_score(ytest, predicted_values)
acc.append(x)
model.append('Logistic Regression')
print("Logistic Regression's Accuracy is: ", x)

print(classification_report(ytest,predicted_values))

Logistic Regression's Accuracy is:  0.9454545454545454
              precision    recall  f1-score   support

       apple       1.00      1.00      1.00        23
      banana       0.95      1.00      0.98        21
   blackgram       0.83      0.75      0.79        20
    chickpea       1.00      1.00      1.00        26
     coconut       1.00      1.00      1.00        27
      coffee       0.94      1.00      0.97        17
      cotton       0.80      0.94      0.86        17
      grapes       1.00      1.00      1.00        14
        jute       0.91      0.87      0.89        23
 kidneybeans       1.00      0.95      0.97        20
      lentil       0.83      0.91      0.87        11
       maize       0.94      0.76      0.84        21
       mango       0.95      1.00      0.97        19
   mothbeans       0.85      0.92      0.88        24
    mungbean       0.95      1.00      0.97        19
   muskmelon       1.00      1.00      1.00        17
      orange       1.00      1.00      1.00        14
      papaya       0.95      0.91      0.93        23
  pigeonpeas       0.95      0.91      0.93        23
 pomegranate       1.00      1.00      1.00        23
        rice       0.89      0.89      0.89        19
  watermelon       1.00      1.00      1.00        19

    accuracy                           0.95       440
   macro avg       0.94      0.95      0.94       440
weighted avg       0.95      0.95      0.94       440

pk.dump(LogReg,open('LogisticRegression','wb'))
  • Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(xtrain,ytrain)

predicted_values = RF.predict(xtest)

x = metrics.accuracy_score(ytest, predicted_values)
acc.append(x)
model.append('RF')
print("RF's Accuracy is: ", x)

RF's Accuracy is:  0.9931818181818182
  • ML accuracy comparison bar plot
plt.figure(figsize=[10,5])
plt.rcParams.update({'font.size': 22})
plt.title('Accuracy Comparison')
plt.xlabel('Accuracy')
plt.ylabel('Algorithm')
sns.barplot(x = acc,y = model,palette='dark')
Crop prediction ML accuracy comparison bar plot
accuracy_models = dict(zip(model, acc))
for k, v in accuracy_models.items():
    print (k, '-->', v)
Decision Tree --> 0.865909090909091
Naive Bayes --> 0.9954545454545455
SVM --> 0.9636363636363636
Logistic Regression --> 0.9454545454545454
RF --> 0.9931818181818182

data = np.array([[104,18, 30, 23.603016, 60.3, 6.7, 140.91]])
prediction = RF.predict(data)
print(prediction)
['coffee']

import gradio as gr
def crop_recom(N,P,K,tem,hum,ph,rain):
    crop = RF.predict([[N,P,K,tem,hum,ph,rain]])
    return crop
  • Run Gradio app
interface = gr.Interface(
  fn =crop_recom, #function
  inputs = ['number','number','number','number','number','number','number'],
  outputs = ['text']  


).launch(share=False)
Crop recommendation app

Student Prediction Score

  • Predicting the Score of a student based upon the number of study hours.
#https://medium.com/@kalyaniavhale7/tutorial-on-gradio-library-ecb8055923a1
import gradio as gr
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
#load the dataset to pandas dataframe
URL = "http://bit.ly/w-data"
student_data = pd.read_csv(URL)
#Prepare data
X = student_data.copy()
y = student_data['Scores']
del X['Scores']
#create a machine learning model and train it
lineareg = LinearRegression()
lineareg.fit(X,y)
print('Accuracy score : ',lineareg.score(X,y),'\n')
#now the model has been trained well let test it
#function to predict the input hours
def predict_score(hours):
    hours = np.array(hours)
    pred_score = lineareg.predict(hours.reshape(-1,1))
    return np.round(pred_score[0], 2)
input = gr.inputs.Number(label='Number of Hours studied')
output = gr.outputs.Textbox(label='Predicted Score')
gr.Interface( fn=predict_score,
              inputs=input,
              outputs=output).launch();

Accuracy score :  0.9529481969048356 

Running on local URL:  

To create a public link, set `share=True` in `launch()`.
Student predicted score vs Number of hours studied

Image-to-Sketch Conversion

  • Converting RGB image to sketch using opencv library
#https://medium.com/@kalyaniavhale7/tutorial-on-gradio-library-ecb8055923a1
import cv2
import gradio as gr
def convert_photo_to_Sketch(image):
  img = cv2.resize(image, (256, 256))
  #convert image to RGB from BGR
  RGB_img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
  #convert imge to grey
  grey_img=cv2.cvtColor(RGB_img, cv2.COLOR_BGR2GRAY)
  #invert grey scale image
  invert_img=255-grey_img
  #Gaussian fun to blur the image
  blur_img=cv2.GaussianBlur(invert_img, (21,21),0)
  #invert the blur image
  inverted_blurred_img = 255 - blur_img
  #skecth the image
  sketch_img=cv2.divide(grey_img,inverted_blurred_img, scale=256.0)
  rgb_sketch=cv2.cvtColor(sketch_img, cv2.COLOR_BGR2RGB)
  #return the final sketched image
  return rgb_sketch
 
#built interface with gradio to test the function
imagein = gr.inputs.Image(label='Original Image')
imageout =  gr.outputs.Image(label='Sketched Image',type='pil')
gr.Interface(fn=convert_photo_to_Sketch, inputs=imagein, outputs=imageout,title='Convert RGB Image to Sketch').launch();
Converting RGB image to sketch using opencv library

Text Substitution

  • Let’s perform text substitution in Gradio, e.g. text.replace(‘World’, ‘Databricks’)
import gradio as gr
import time

def replace(text):
    return text.replace('World', 'Databricks')

gr.Interface(fn=replace, 
             inputs='textbox', 
             outputs='textbox').launch(share=True);

Text substitution in Gradio

Story Generation with GPT-2

import gradio as gr

description = "Story generation with GPT-2"

title = "Generate your own story"

examples = [["It was the best of time it was the worst of times. It was difficult for me to love someone. Until I met her. It happened on a rainy evening."]]





interface = gr.Interface.load("huggingface/pranavpsv/gpt2-genre-story-generator",
                              description=description,
                              examples=examples)

interface.launch(share=True);

Fetching model from: https://huggingface.co/pranavpsv/gpt2-genre-story-generator
Running on local URL:  
 Story generation with GPT-2

BigGAN Text-to-Image Demo

  • Let’s discuss the Sarus tech portfolio – a TensorFlow 2 implementation of BigGAN deep neural network, viz. the BigGAN ImageNet text-to-image transformation.
import gradio as gr
description = "BigGAN text-to-image demo."
title = "BigGAN ImageNet"
interface = gr.Interface.load("huggingface/osanseviero/BigGAN-deep-128", 
            description=description,
            title = title,
            examples=[["american robin"]]
)
interface.launch(share=True)

Fetching model from: https://huggingface.co/osanseviero/BigGAN-deep-128
Running on local URL:  

BigGAN text-to-image demo: American robin and tiger.

BigGAN text-to-image demo American robin
BigGAN text-to-image demo tiger

French Story Generator using Opus MT and GPT-2

  • LLM integration test: let’s try to generate our own D&D story using Opus MT and GPT-2
import gradio as gr
from gradio.mix import Series

description = "Generate your own D&D story!"
title = "French Story Generator using Opus MT and GPT-2"
translator_fr = gr.Interface.load("huggingface/Helsinki-NLP/opus-mt-fr-en")
story_gen = gr.Interface.load("huggingface/pranavpsv/gpt2-genre-story-generator")
translator_en = gr.Interface.load("huggingface/Helsinki-NLP/opus-mt-en-fr")
examples = [["L'aventurier est approché par un mystérieux étranger, pour une nouvelle quête."]]

Series(translator_fr, story_gen, translator_en, description = description,
        title = title,
        examples=examples, inputs = gr.inputs.Textbox(lines = 10)).launch(share=True)

Fetching model from: https://huggingface.co/Helsinki-NLP/opus-mt-fr-en
Fetching model from: https://huggingface.co/pranavpsv/gpt2-genre-story-generator
Fetching model from: https://huggingface.co/Helsinki-NLP/opus-mt-en-fr
Running on local URL:  
French Story Generator using Opus MT and GPT-2

Input: The adventurer is approached by a mysterious stranger for a new quest.

Output: L’aventurier est approché par un mystérieux étranger pour une nouvelle quête. De son chemin, l’aventurier rencontre la mystérieuse baronne, Tzachi, mais la rencontre dans le processus.

Image Classification with ImageNet

  • Let’s download the human-readable labels for ImageNet to perform image classification
#https://www.kaggle.com/code/abidlabs/gradio-inceptionnet-demo
import gradio as gr
import tensorflow as tf
import requests
import urllib

inception_net = tf.keras.applications.InceptionV3() # load the model

# Download human-readable labels for ImageNet.
response = requests.get("https://git.io/JJkYN")
labels = response.text.split("\n")
# Download example image
urllib.request.urlretrieve("https://git.io/JtScc", "lion.jpg")

def classify_image(inp):
    inp = inp[None, ...]  # Add batch dimension
    inp = tf.keras.applications.inception_v3.preprocess_input(inp)
    prediction = inception_net.predict(inp).flatten()
    return {labels[i]: float(prediction[i]) for i in range(1000)}

# Create the Gradio Interface
image = gr.inputs.Image(shape=(299, 299))
label = gr.outputs.Label(num_top_classes=3)

gr.Interface(fn=classify_image, inputs=image, outputs=label, examples=[["lion.jpg"]], interpretation="default").launch(share=True);

Running on local URL:

Image classification examples: lion, dog, and cat.

Image classification example: tabby 95%
Image classification example: German shepherd 90%
Image classification example: tabby 90%

Iris Flower Predictor

  • Let’s use the Iris dataset to predict whether an Iris flower is a Versicolor, Virginica, or Setosa
#https://www.kaggle.com/code/alexanderlundervold/simple-gradio-kaggle-example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import gradio as gr
# Load the data and split into data and labels
from sklearn.datasets import load_iris
iris_dataset = load_iris()
# We use only The 0'th and 1'st columns.
# These correspond to sepal length and width.
X = iris_dataset['data'][:, [0, 1]]
y = iris_dataset['target']

# Split into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Import and train a machine learning model
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42, n_estimators=100)
rf.fit(X_train, y_train)

# Predict on the test data
y_pred = rf.predict(X_test)

# Evaluate the model
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))
0.7894736842105263
def predict_flower(sepal_length, sepal_width):
    """
    Predicts whether an Iris flower is a Versicolor, Virginica, or Setosa.
    
    Args:
        sepal_length (float): The flower's average sepal length in cm.
        sepal_width (float): The flower's average sepal width in cm.
        
    Returns:
        str: "Virginica", "Setosa", or "Versicolor".
    """

    # Reshape the input data for prediction
    input_data = np.array([sepal_length, sepal_width]).reshape(1, -1)
    
    # Get the prediction
    prediction = rf.predict(input_data)
    
    # Return the result
    if prediction[0] == 0:
        return "Setosa"
    elif prediction[0] == 1:
        return "Versicolor"
    elif prediction[0] == 2:
        return "Virginica"
    else:
        raise Exception
# Set the minimum, maximum, and default values for the sliders
# This is optional
sl_min = X_train[:,0].min().round(2) 
sl_max = X_train[:,0].max().round(2) 
sl_default = X_train[:,0].mean().round(2)

sw_min = X_train[:,1].min().round(2) 
sw_max = X_train[:,1].max().round(2) 
sw_default = X_train[:,1].mean().round(2)

# Create the interface
iface = gr.Interface(
    fn=predict_flower, 
    inputs=[
        gr.components.Slider(minimum=sl_min, maximum=sl_max, 
                             value=sl_default, label="Sepal length"),
        gr.components.Slider(minimum=sw_min, maximum=sw_max, 
                             value=sw_default, label="Sepal width"),
    ], 
    outputs=gr.components.Textbox(label="Prediction"),
    title="Flower Predictor",
    description="""Enter your measurement of Sepal Length and Sepal Width to 
    predict the class of flower.
    Data source: Iris Flower Dataset; Model: Random Forest Classifier""",
)

# Launch the interface
iface.launch(share=True,debug=True)
Running on local URL:
Iris flower predictor

Hugging Face NLP Sentiment Analysis

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
  • QC test examples of the classifier
classifier("I hate that movie and the cast was bad ")
[{'label': 'NEGATIVE', 'score': 0.9996799230575562}]

classifier("The meal was delicious ")
[{'label': 'POSITIVE', 'score': 0.9998772144317627}]

classifier("That is an ordinary burger ")
[{'label': 'NEGATIVE', 'score': 0.9684451818466187}]
  • Initializing BertForSequenceClassification
classifier2 = pipeline('sentiment-analysis', model='bert-base-uncased')
  • QC test examples of the classifier2
classifier2("I loved that movie and the cast was good ")
[{'label': 'LABEL_1', 'score': 0.5746493935585022}]

results = classifier(["We are very happy to show you the 🤗 Transformers library.",
           "We hope you don't hate it."])
for i in results:
    print(f"{i['label']}, {round(i['score'],2)}")
POSITIVE, 1.0
NEGATIVE, 0.53
  • Initializing and QC testing the text generator
generator = pipeline('text-generation')

print(generator("As far as I am concerned, I will",))
[{'generated_text': 'As far as I am concerned, I will not become a member: if you want, I will show you more images of my real life.\n\nI will not use your personal pictures. When I use your picture on something (for example a'}]
unmasker = pipeline("fill-mask")
from pprint import pprint
pprint(unmasker(f"HuggingFace is creating a {unmasker.tokenizer.mask_token} that the community uses to solve NLP tasks."))
[{'score': 0.1792752891778946,
  'sequence': 'HuggingFace is creating a tool that the community uses to solve '
              'NLP tasks.',
  'token': 3944,
  'token_str': ' tool'},
 {'score': 0.113493911921978,
  'sequence': 'HuggingFace is creating a framework that the community uses to '
              'solve NLP tasks.',
  'token': 7208,
  'token_str': ' framework'},
 {'score': 0.05243545398116112,
  'sequence': 'HuggingFace is creating a library that the community uses to '
              'solve NLP tasks.',
  'token': 5560,
  'token_str': ' library'},
 {'score': 0.034935273230075836,
  'sequence': 'HuggingFace is creating a database that the community uses to '
              'solve NLP tasks.',
  'token': 8503,
  'token_str': ' database'},
 {'score': 0.02860250696539879,
  'sequence': 'HuggingFace is creating a prototype that the community uses to '
              'solve NLP tasks.',
  'token': 17715,
  'token_str': ' prototype'}]
ner_pipe = pipeline("ner")
sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""
print(ner_pipe(sequence))
[{'entity': 'I-ORG', 'score': 0.99957865, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}, {'entity': 'I-ORG', 'score': 0.9909764, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}, {'entity': 'I-ORG', 'score': 0.9982224, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}, {'entity': 'I-ORG', 'score': 0.9994879, 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}, {'entity': 'I-LOC', 'score': 0.9994344, 'index': 11, 'word': 'New', 'start': 40, 'end': 43}, {'entity': 'I-LOC', 'score': 0.9993197, 'index': 12, 'word': 'York', 'start': 44, 'end': 48}, {'entity': 'I-LOC', 'score': 0.9993794, 'index': 13, 'word': 'City', 'start': 49, 'end': 53}, {'entity': 'I-LOC', 'score': 0.98625815, 'index': 19, 'word': 'D', 'start': 79, 'end': 80}, {'entity': 'I-LOC', 'score': 0.951427, 'index': 20, 'word': '##UM', 'start': 80, 'end': 82}, {'entity': 'I-LOC', 'score': 0.9336589, 'index': 21, 'word': '##BO', 'start': 82, 'end': 84}, {'entity': 'I-LOC', 'score': 0.9761654, 'index': 28, 'word': 'Manhattan', 'start': 114, 'end': 123}, {'entity': 'I-LOC', 'score': 0.9914629, 'index': 29, 'word': 'Bridge', 'start': 124, 'end': 130}]
question_answerer = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""
result = question_answerer(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
​
result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the task of extracting an answer from a text given a question', score: 0.6177, start: 34, end: 95
Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
summarizer = pipeline("summarization")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE))

[{'summary_text': ' Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, nine of them between 1999 and 2002 . She is believed to still be married to four men, and at one time, she was married to eight men at once .'}]
translator = pipeline("translation_en_to_de")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]
  • Let’s invoke the nlptown/bert-base-multilingual-uncased-sentiment with the QC test
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
result = model(tokens)

r = requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

reviews
["Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. I definitely enjoyed this cafe, the outdoor seating, the service and the food!!",
 "Six of us met here for breakfast before our walk to Manly. We were enjoying visiting with each other so much that I apologize for not taking any photos. We all enjoyed our food, as well as our coffee and tea drinks.We were greeted immediately by a friendly server asking if we would like to sit inside or out. We said we would like inside, but weren't exactly sure how many were joining us yet- at least 4. We were told this was no problem, the more the merrier. A few minutes later when 4 more joined our party and we explained to the server we had 6, he just quickly switched our table. I really enjoyed my serenity tea, just what I needed after a long flight in from Sfo that morning. Everyone else were more interested in the lattes for expresso drinks. All said they were hot and delicious. 2 of us ordered the avo on toast. So yummy with the beetroot... I will start adding this to mine now at home, and have fond memories for my trip to Sydney. 2 friends ordered the salmon Benedict- saying it was delicious, and their go to every time they come here. 2 friends had a breakfast sandwich- I'm not sure of the name. It did look delicious. Adorable cafe, friendly staff, clean restroomsVery popular with the locals. I plan to come back the next time I'm in Sydney",
 'Great place with delicious food and friendly staff. It is small but has outdoor seating and a relaxed ambiance. Perfect place to enjoy a cup of coffee. I am visiting Sydney for the first time but this place seems like is a local favorite.',
 'Some of the best Milkshakes me and my daughter ever tasted. MMMMMM HMMMMMMMM.',
 'Great food amazing coffee and tea. Short walk from the harbor. Staff was very friendly',
 "It was ok. Had coffee with my friends. I'm new in the area, still need to discover new places.",
 "Ricotta hot cakes! These were so yummy. I ate them pretty fast and didn't share with anyone because they were that good ;). I ordered a green smoothie to balance it all out. Smoothie was a nice way to end my brekkie at this restaurant. Others with me ordered the salmon Benedict and the smoked salmon flatbread. They were all delicious and all plates were empty. Cheers!",
 "We came for brunch twice in our week-long visit to Sydney. Everything on the menu not only sounds delicious, but is really tasty. It really gave us a sour taste of how bad breaky is in America with what's so readily available in Sydney!  Both days we went were Saturdays and there was a bit of a wait to be seated, the cafe is extremely busy for both dine-in and take-away. Service is fairly quick and servers are all friendly. The location is in Surrey Hills a couple blocks away from the bustling touristy Darling Harbor.The green smoothie is very tasty and refreshing. We tried the smoked salmon salad, the soft shell crab tacos, ricotta hotcakes, and the breaky sandwich. All were delicious, well seasoned, and a solid amount of food for the price. A definite recommend for anyone's trip into Sydney!",
 'Great staff and food.  Must try is the pan fried Gnocchi!  The staff were really friendly and the coffee was good as well',
 'I came to Social brew cafe for brunch while exploring the city and on my way to the aquarium. I sat outside. The service was great and the food was good too!I ordered smoked salmon, truffle fries, black coffee and beer.']
df = pd.DataFrame(np.array(reviews), columns=['review'])
df['review'].iloc[0]
"Very cute coffee shop and restaurant. They have a lovely outdoor seating area and several tables inside.  It was fairly busy on a Tuesday morning but we were to grab the last open table. The server was so enjoyable, she chatted and joked with us and provided fast service with our ordering, drinks and meals. The food was very good. We ordered a wide variety and every meal was good to delicious. The sweet potato fries on the Chicken Burger plate were absolutely delicious, some of the best I've ever had. I definitely enjoyed this cafe, the outdoor seating, the service and the food!!"
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1
sentiment_score(df['review'].iloc[1])
4
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))
df
                                                 review	sentiment
0	Very cute coffee shop and restaurant. They hav...	4
1	Six of us met here for breakfast before our wa...	4
2	Great place with delicious food and friendly s...	5
3	Some of the best Milkshakes me and my daughter...	5
4	Great food amazing coffee and tea. Short walk ...	5
5	It was ok. Had coffee with my friends. I'm new...	3
6	Ricotta hot cakes! These were so yummy. I ate ...	5
7	We came for brunch twice in our week-long visi...	4
8	Great staff and food. Must try is the pan fri...	5
9	I came to Social brew cafe for brunch while ex...	5
df['review'].iloc[3]
'Some of the best Milkshakes me and my daughter ever tasted. MMMMMM HMMMMMMMM.'
  • Creating the EN-to-DE translator app in Gradio with the QC test
import gradio as gr   
translation_pipeline = pipeline('translation_en_to_de')
results = translation_pipeline('I love Juventus')
results[0]['translation_text']
'Ich liebe Juventus'
def translate_transformers(from_text):
    results = translation_pipeline(from_text)
    return results[0]['translation_text']
translate_transformers('My name is Alex')
'Mein Name ist Alex'
interface = gr.Interface(fn=translate_transformers, 
                         inputs=gr.inputs.Textbox(lines=2, placeholder='Text to translate'),
                        outputs='text')
interface.launch(share=True)
Running on local URL:
EN-to-DE translator in Gradio
from transformers import pipeline
from bs4 import BeautifulSoup
import requests
summarizer = pipeline("summarization")
URL = "https://towardsdatascience.com/a-bayesian-take-on-model-regularization-9356116b6457"
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all(['h1', 'p'])
text = [result.text for result in results]
ARTICLE = ' '.join(text)
ARTICLE
'Sign up Sign In Sign up Sign In Member-only story A Bayesian Take On Model Regularization Ryan Sander Follow Towards Data Science -- 1 Share I’m currently reading “How We Learn” by Stanislas Dehaene. First off, I cannot recommend this book enough to anyone interested in learning, teaching, or AI. One of the main themes of this book is explaining the neurological and psychological bases of why humans are so good at learning things quickly and with great sample-efficiency, i.e. given only a limited amount of experience¹. One of Dehaene’s main arguments of why humans can learn so effectively is because we are able to reduce the complexity of models we formulate of the world. In accordance with the principle of Occam’s Razor², we find the simplest model possible that explains the data we experience, rather than opting for more complicated models. But why do we do this, even from birth¹? One argument is that, contrary to the frequentist view in child psychology (the belief that babies learn solely through their experiences), we are already imparted with prior beliefs about the world when we are born¹. This notion of simplified model selection has a common name in the field of machine learning: model regularization. In this article, we’ll talk about regularization from a Bayesian perspective. What’s one way we can control the complexity of the models we learn from observations? We can do this by placing a prior on our distribution of models. Before we show this, let’s briefly go over regularization, in this case, analytic regularization for supervised learning. Background on Regularization In machine learning, regularization, or model complexity control, is an essential and common practice to ensure that a model attains high out-of-sample performance, even if the distribution of out-of-sample data (test/validation data) differs significantly from the distribution of in-sample data (training data). In essence, the model must balance having a small empirical loss (how “wrong” it is on the data it is given) with a small regularization loss (how complicated the model is). -- -- 1 Towards Data Science Image Scientist, MIT CSAIL Alum, Tutor, Dark Roast Coffee Fan, GitHub: https://github.com/rmsander/ Ryan Sander in Towards Data Science -- 3 Marco Peixeiro in Towards Data Science -- 18 Adrian H. Raudaschl in Towards Data Science -- 24 Ryan Sander in Towards Data Science -- 2 Marco Peixeiro in Towards Data Science -- 18 Cassie Kozyrkov -- 99 Okan Yenigün -- 1 Vishal Rajput in AIGuys -- 8 AL Anany -- 559 MS Somanna -- 1 Help Status About Careers Blog Privacy Terms Text to speech Teams'
max_chunk = 500
ARTICLE = ARTICLE.replace('.', '')
ARTICLE = ARTICLE.replace('?', '')
ARTICLE = ARTICLE.replace('!', '')
sentences = ARTICLE.split(' ')
current_chunk = 0 
chunks = []
for sentence in sentences:
    if len(chunks) == current_chunk + 1: 
        if len(chunks[current_chunk]) + len(sentence.split(' ')) <= max_chunk:
            chunks[current_chunk].extend(sentence.split(' '))
        else:
            current_chunk += 1
            chunks.append(sentence.split(' '))
    else:
        print(current_chunk)
        chunks.append(sentence.split(' '))

for chunk_id in range(len(chunks)):
    chunks[chunk_id] = ' '.join(chunks[chunk_id])
0
chunks[0]
'Sign up Sign In Sign up Sign In Member-only story A Bayesian Take On Model Regularization Ryan Sander Follow Towards Data Science -- 1 Share I’m currently reading “How We Learn” by Stanislas Dehaene First off, I cannot recommend this book enough to anyone interested in learning, teaching, or AI One of the main themes of this book is explaining the neurological and psychological bases of why humans are so good at learning things quickly and with great sample-efficiency, ie given only a limited amount of experience¹ One of Dehaene’s main arguments of why humans can learn so effectively is because we are able to reduce the complexity of models we formulate of the world In accordance with the principle of Occam’s Razor², we find the simplest model possible that explains the data we experience, rather than opting for more complicated models But why do we do this, even from birth¹ One argument is that, contrary to the frequentist view in child psychology (the belief that babies learn solely through their experiences), we are already imparted with prior beliefs about the world when we are born¹ This notion of simplified model selection has a common name in the field of machine learning: model regularization In this article, we’ll talk about regularization from a Bayesian perspective What’s one way we can control the complexity of the models we learn from observations We can do this by placing a prior on our distribution of models Before we show this, let’s briefly go over regularization, in this case, analytic regularization for supervised learning Background on Regularization In machine learning, regularization, or model complexity control, is an essential and common practice to ensure that a model attains high out-of-sample performance, even if the distribution of out-of-sample data (test/validation data) differs significantly from the distribution of in-sample data (training data) In essence, the model must balance having a small empirical loss (how “wrong” it is on the data it is given) with a small regularization loss (how complicated the model is) -- -- 1 Towards Data Science Image Scientist, MIT CSAIL Alum, Tutor, Dark Roast Coffee Fan, GitHub: https://githubcom/rmsander/ Ryan Sander in Towards Data Science -- 3 Marco Peixeiro in Towards Data Science -- 18 Adrian H Raudaschl in Towards Data Science -- 24 Ryan Sander in Towards Data Science -- 2 Marco Peixeiro in Towards Data Science -- 18 Cassie Kozyrkov -- 99 Okan Yenigün -- 1 Vishal Rajput in AIGuys -- 8 AL Anany -- 559 MS Somanna -- 1 Help Status About Careers Blog Privacy Terms Text to speech Teams'
res = summarizer(chunks, max_length=120, min_length=30, do_sample=False)
res[0]

{'summary_text': ' We’ll talk about regularization from a Bayesian perspective . We can do this by placing a prior on our distribution of models . This notion of simplified model selection has a common name in the field of machine learning: model regularization .'}
  • NLP sentence classification using the Persian demo model HooshvareLab/bert-fa-zwnj-base
#Persian
from transformers import AutoConfig, AutoTokenizer, AutoModel

# v3.0
model_name_or_path = "HooshvareLab/bert-fa-zwnj-base"
config = AutoConfig.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

# model = TFAutoModel.from_pretrained(model_name_or_path)  For TF
model = AutoModel.from_pretrained(model_name_or_path)

text = "ما در هوش‌واره معتقدیم با انتقال صحیح دانش و آگاهی، همه افراد میتوانند از ابزارهای هوشمند استفاده کنند. شعار ما هوش مصنوعی برای همه است."
tokenizer.tokenize(text)


 'در',
 'هوش',
 '[ZWNJ]',
 'واره',
 'معتقدیم',
 'با',
 'انتقال',
 'صحیح',
 'دانش',
 'و',
 'آ',
 '##گاهی',
 '،',
 'همه',
 'افراد',
 'میتوانند',
 'از',
 'ابزارهای',
 'هوشمند',
 'استفاده',
 'کنند',
 '.',
 'شعار',
 'ما',
 'هوش',
 'مصنوعی',
 'برای',
 'همه',
 'است',
 '.']
from __future__ import print_function
import ipywidgets as widgets
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
config = AutoConfig.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
model = AutoModelForSequenceClassification.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
config = AutoConfig.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
model = AutoModelForSequenceClassification.from_pretrained("HooshvareLab/bert-fa-base-uncased-sentiment-snappfood")
model.save_pretrained("HooshvareLab-bert-fa-base-uncased-sentiment-snappfood")
tokenizer.save_pretrained("HooshvareLab-bert-fa-base-uncased-sentiment-snappfood")

('HooshvareLab-bert-fa-base-uncased-sentiment-snappfood\\tokenizer_config.json',
 'HooshvareLab-bert-fa-base-uncased-sentiment-snappfood\\special_tokens_map.json',
 'HooshvareLab-bert-fa-base-uncased-sentiment-snappfood\\vocab.txt',
 'HooshvareLab-bert-fa-base-uncased-sentiment-snappfood\\added_tokens.json',
 'HooshvareLab-bert-fa-base-uncased-sentiment-snappfood\\tokenizer.json')

nlp_sentence_classif = pipeline('sentiment-analysis', model="HooshvareLab-bert-fa-base-uncased-sentiment-snappfood/")
nlp_sentence_classif('این خوراک به شدت خوشمزه است')
[{'label': 'HAPPY', 'score': 0.9949201941490173}]

Conclusions

  • We have talked about the Hugging Face NLP, Streamlit/Dash & Jupyter PyGWalker EDA, TensorFlow Keras & Gradio App Deployment Showcase as a sequence of pipelines, interactive dashboards and simple web apps.
  • We have looked at the Tableau-style PyGWalker app in Streamlit using the dataset bike_sharing_dc.csv. Other visualization examples include Airbnb EDA (implementing PyGWalker Dash UI in Jupyter IDE with gradio/NYC-Airbnb-Open-Data), the House price dataset from Kaggle to showcase the PyGWalker EDA in real estate, and PyGWalker EDA of Dutch eHealth Data .
  • Image analysis: we have demonstrated the Keras Multi-Label Image Classification, Image Classification with ImageNet, Image-to-Sketch Conversion (converting RGB image to sketch using opencv library), BigGAN Text-to-Image, and Iris Flower Predictor pipelines and apps.
  • ML Diabetes Prediction App: we have implemented the Diabetes Prediction App using the Gradio tutorial for ML (cf. Gradio documentation) and the SciKit-learn diabetes dataset. This includes the detailed RF Feature Engineering analysis involving partial dependence plots of selected model features.
  • Super Soaker Failures Analysis demonstrated as the Gradio dashboard.
  • The crop recommendation algorithm as multi-label classification (Decision Tree Classifier, Gaussian NB, SVM/SVC, Logistic Regression, and Random Forest Classifier) combined with Gradio deployment.
  • We have employed the Hugging Face Transformers to perform NLP Sentiment Analysis as a pipeline.
  • NLP sentence classification using the Persian demo model HooshvareLab/bert-fa-zwnj-base.
  • LLM integration test: we have tried to generate our own D&D story using Opus MT and GPT-2.
  • Article Summarization using the BeautifulSoup html parser and the default model sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
  • Initializing EN-to-DE translation using t5-base and revision 686f1db (https://huggingface.co/t5-base) with the QC test.
  • We have performed text substitution in Gradio, e.g. text.replace(‘World’, ‘Databricks’).
  • We have implemented a simple app that predicts the score of a student based upon the number of study hours.
  • We have examined story generation with GPT-2, BertForSequenceClassification, fill-mask, NER, and question answering.
  • While ML deployment is the third stage of the data science lifecycle (manage, develop, deploy and monitor), every aspect of a model’s creation is performed with deployment in mind. So final trained models can be accessed by end users.
  • The Road Ahead: Implementing A 4-Step ML Deployment Chain (see below).
A 4-step ML deployment chain

Explore More


Go back

Your message has been sent

Warning

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

€5.00
€15.00
€100.00
€5.00
€15.00
€100.00
€5.00
€15.00
€100.00

Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Discover more from Our Blogs

Subscribe to get the latest posts sent to your email.

Leave a comment

Discover more from Our Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading