This is the application of supervised machine learning to real estate.
The goal is to predict sale prices ($) for N selected properties in a state (N>>1000).
We are given a csv dataset as a NxM table, where M is the number of property features describing every aspect of the house and surroundings (typically, M<100).
The dataset is partitioned into the training, test and deployment subsets as 70:20:10%.
The data preparation phase (Step 0) also includes data cleaning and editing to remove outliers, missing values, etc.
Step 1 (regression) performs mean fitting of a continuous surface such as plane F(data;A) to the training data by updating the fitting parameters A. The fitting parameter set that provides the minimum average error between predicted and input house prices is stored in an array A.
Step 2 tests the solution A (current best output of Step 1) by computing the mean difference or error between actual and predicted house prices within the test dataset. The large error requires further updates of the parameters A by returning to Step 1. Otherwise, we may proceed to Step 3 as follows:
we compute again the mean difference or error between actual and predicted house prices within the deployment dataset. The large error requires further updates of the parameters A by returning to Step 1 and repeating Step 2.
Otherwise, we deliver the final output as the best predicted house price F(P3;A) and the corresponding error |idat(P3)-F(P3;A)| between the actual and predicted house prices.
Conventionally, we plot them as candlesticks or error bars.