The discipline of Data Science (DS) sits at the interface between Technology, the quantitative sciences (such as mathematics, statistics, computer science) and engineering across various business applications and sectors. This page aims to review new methods, research findings, opinions, hypothesis articles and poster presentations on all relevant aspects of DS.

As DS bridges data analytics, statistics, business intelligence (BI), artificial intelligence (AI)-powered technology and data engineering, the page is focused on applying advanced predictive analytics techniques and scientific principles to extract valuable information from data for business decision-making, strategic planning and other uses. It’s increasingly critical to businesses: The insights that data science generates help organizations increase operational efficiency, identify new business opportunities and improve marketing and sales programs, among other benefits. Ultimately, they can lead to competitive advantages over business rivals.

An effective DS team may include the following specialists: Data engineer, data analyst, Machine Learning (ML) engineer, data visualization analyst, data translator, and data architect.

The following most recent business applications drive a wide variety of DS use cases in organizations globally:

  • HealthTech
  • E-Commerce
  • Customer experience
  • Risk management
  • FinTech
  • Stock trading
  • Digital marketing
  • Industrial IoT applications
  • Logistics & supply chain management
  • Image/Speech Recognition
  • Cybersecurity
  • LegalTech
data science (DS)
start your DS adventure
applied science

Non-Linear Regression Analysis

Nonlinear regression is a form of regression analysis in which data is fit to a model and then expressed as a mathematical function. Simple linear regression relates two variables (X and Y) with a straight line (y = mx + b), while nonlinear regression relates the two variables in a nonlinear (curved) relationship.

Let’s learn about non-linear regressions by considering a few examples in Python. The scikit-learn library contains the simplified example of 1D regression using linear, polynomial and RBF kernels, as shown below. As a real-worls example, we fit a non-linear model to the datapoints corrensponding to China’s GDP from 1960 to 2014.

supervised ML/AI non-linear regression
scikit-learn ML library Python Anaconda IDE Jupyter notebook
Summary best data examples non-linear and linear regression analysis Jupyter 6.4.5
supervised machine learning 
linear regression
more data pre-processing (EDA)
Hyperparameter Optimization (HPO)
Need non-linear regression

China’s GDP Example

Let’s consider the China’s GDP Kaggle Dataset to test the non-linear regression algorithm

Import and install libraries

import numpy as np
import pandas as pd

!pip install wget

Read the csv file

df = pd.read_csv(“YourPath/china_gdp.csv”)

China GDP data - table

Let’s plot the data

import matplotlib.pyplot as plt
%matplotlib inline
x_data, y_data = (df[“Year”].values, df[“Value”].values)
plt.plot(x_data, y_data, ‘ro’)

CHina GDP data - Kaggle dataset

Let’s introduce the non-linear sigmoid function

X = np.arange(-5.0, 5.0, 0.1)
Y = 1.0 / (1.0 + np.exp(-X))

plt.ylabel(‘Dependent Variable’)
plt.xlabel(‘Indepdendent Variable’)

non-linear sigmoid function

def sigmoid(x, Beta_1, Beta_2):
y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
return y

beta_1 = 0.10
beta_2 = 1990.0

logistic function

Y_pred = sigmoid(x_data, beta_1 , beta_2)

Let’s plot initial prediction against datapoints

plt.plot(x_data, Y_pred*15000000000000.)
plt.plot(x_data, y_data, ‘ro’)

sigmoid function and data
GDP China

Lets normalize our data

xdata =x_data/max(x_data)
ydata =y_data/max(y_data)

Let’s perform non-linear curve fitting

from scipy.optimize import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata)

And print the final parameters

print(” beta_1 = %f, beta_2 = %f” % (popt[0], popt[1]))

beta_1 = 690.451712, beta_2 = 0.997207

x = np.linspace(1960, 2015, 55)
x = x/max(x)
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, ‘ro’, label=’data’)
plt.plot(x,y, linewidth=3.0, label=’fit’)

data fitting non-linear curve sigmoid function from scikit-learn

Let’s split data into the train and test sets

msk = np.random.rand(len(df)) < 0.8
train_x = xdata[msk]
test_x = xdata[~msk]
train_y = ydata[msk]
test_y = ydata[~msk]

Let’s build the model using the train set

popt, pcov = curve_fit(sigmoid, train_x, train_y)

Predict using test set

y_hat = sigmoid(test_x, *popt)

Perform evaluation

print(“Mean absolute error: %.2f” % np.mean(np.absolute(y_hat – test_y)))
print(“Residual sum of squares (MSE): %.2f” % np.mean((y_hat – test_y) ** 2))
from sklearn.metrics import r2_score
print(“R2-score: %.2f” % r2_score(y_hat , test_y) )

Mean absolute error: 0.03
Residual sum of squares (MSE): 0.00
R2-score: 0.96

Posts of Interest

XebiaLabs to Update Periodic Table of DevOps Tools

XebiaLabs to Update Periodic Table of DevOps Tools


Version 4 of the industry’s most popular DevOps market landscape tool, the Periodic Table of DevOps. Selected Vendors: Snowflake, Moogsoft, Instana, DataDog, GitLab, among others.

Tweets by @xebialabs

The Content Marketing (CM) in a Nutshell

CM is a strategic marketing approach focused on creating and distributing valuable, relevant, and consistent content to attract and retain a clearly defined audience — and, ultimately, to drive profitable customer action.

Benefits of CM:

* Grow brand awareness
* Drive organic visitors
* Generate sales leads
* Build trust
* Earn customer loyalty
* Create demand

The 3 key elements of effective CM are:

  • Move your audience
  • Earn your audiences attention
  • Have a spark
The SEO Pyramid 
social links keywords content

A Start-Up Marketing Plan

content marketing 7A framework
agile midset
Calmar ratio
risk-adjusted performance metric for mutual funds, hedge funds and commodity trading.

Investor FAQ

The Calmar Ratio (CR) or the drawdown ratio is a risk-adjusted key performance metric for mutual funds, hedge funds and commodity trading. In fact, it measures the return per unit of risk and lets the investor decide whether the given amount of return is worth it at the given level of risk or not. Calmar is short for California Managed Accounts Report and is very similar to MAR ratio. The CR was first published in 1991. It is most similar to the Sterling ratio in its calculation, it takes the average annual compounded rate of return and divides it by the maximum drawdown for that same time period, usually over a period of 3 years. The higher the CR the better with anything over 0.5 or close to 1.0 is good (Amber), CR=3-5 is really good (Green).  

Let’s consider a fund started 3 years ago. It reached a value of $2 mln but went as low as $1.5 mln. Over this period the average annual return was 10%. In this example, the CR should help in evaluating whether the fund is worth investing. The outcome is given below:

Example Camar ratio
highest lowest value
average annual return
maximum drawdown
Calmar ratio

It appears that the risk-adjusted ratio is 0.4. If the investor has a criterion of a minimum CR of 0.5 [21], then the fund is not worth investing in (Red). Further, we may compare CR to another fund which has CR>0.5 or CR~1.0, and therefore has a higher risk-adjusted return and should be selected over this fund.

The CR is an improvement of both the Sharpe and Sterling Ratios in that it provides an up-to-date appraisal of commodity trading advisor (CTA) performance.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s