CREATE PREDICTOR Statement
The CREATE PREDICTOR
statement is used to train new model. The basic syntax for training the model is:
CREATE PREDICTOR predictor_name
FROM integration_name
(SELECT column_name, column_name2 FROM table_name) as ds_name
PREDICT column_name as column_alias;
CREATE PREDICTOR predictor_name
- wherepredictor_name
is the name of the model.FROM integration_name (select column_name FROM table_name)
- whereintegration_name
is the name of the datasource, where(select column_name FROM table_name)
is the SELECT statement for selecting the data. If you want to change the default name of the datasource you can use the aliasas ds_name
.PREDICT column_name
- wherecolumn_name
is the column name of the target variable. If you want to change the name of the target variable you can use theas column_alias
.
Example Data
The bellow database table contains prices of properties from a metropolitan area in the US. This table will be used in all of the docs examples.
number_of_rooms | number_of_bathrooms | sqft | location | days_on_market | initial_price | neighborhood | rental_price |
---|---|---|---|---|---|---|---|
0 | 1 | 484 | great | 10 | 2271 | south_side | 2271 |
1 | 1 | 674 | good | 1 | 2167 | downtown | 2167 |
0 | 1 | 529 | great | 3 | 2431 | south_side | 2431 |
3 | 2 | 1219 | great | 3 | 5510 | south_side | 5510 |
1 | 1 | 398 | great | 11 | 2272 | south_side | 2272 |
Create Predictor example
This example shows how you can train the Machine Learning Model called house_price_model
to predict the rentals price from the above data.
CREATE PREDICTOR house_price_model
FROM db_integration (SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price;
ORDER BY keyword
The ORDER BY
keyword is used to order the data by descending(DESC) or ascending(ASC) order. The default order will always be ASC
CREATE PREDICTOR predictor_name
FROM integration_name
(SELECT column_name, column_name2 FROM table_name) as ds_name
PREDICT column_name as column_alias
ORDER BY column_name column_name2 ASC OR DESC;
ORDER BY ASC example
The following example trains the new house_price_model
model which predicts the rental_price
and orders the data in ascending order by the number of days on the market.
CREATE PREDICTOR house_price_model
FROM db_integration (SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price
ORDER BY days_on_market ASC;
ORDER BY DESC example
The following example trains the new house_price_model
model which predicts the rental_price
and orders the data in descending order by the number of days on the market.
CREATE PREDICTOR house_price_model
FROM db_integration (SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price
ORDER BY days_on_market DESC;
GROUP BY statement
The GROUP BY
statement is used to group the rows that contain the same values into one row.
CREATE PREDICTOR predictor_name
FROM integration_name
(SELECT column_name, column_name2 FROM table_name) as ds_name
PREDICT column_name as column_alias
GROUP BY column_name;
GROUP BY example
The following example trains the new house_price_model
model which predicts the rental_price
and groups the data per location(good, great).
CREATE PREDICTOR house_price_model
FROM db_integration
(SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price
GROUP BY location;
USING keyword
The USING
keyword accepts arguments as a JSON format where additional arguments can be provided to the CREATE PREDICTOR
statement as:
stop_train_in_x_seconds
- Stop model training after X seconds.use_gpu
- Switch between training on CPU or GPU(True|False).sample_margin_of_error
- The ammount of random sampling error in results (0 - 1)ignore_columns
- Columns to be removed from the model training.is_timeseries
- Training from time series data (True|False).
CREATE PREDICTOR predictor_name
FROM integration_name
(SELECT column_name, column_name2 FROM table_name) as ds_name
PREDICT column_name as column_alias
USING {"ignore_columns": "column_name3"}
USING example
The following example trains the new house_price_model
model which predicts the rental_price
and removes the number of bathrooms.
CREATE PREDICTOR house_price_model
FROM db_integration
(SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price
USING {"ignore_columns": "number_of_bathrooms"}
Time Series keywords
To train a timeseries model, MindsDB provides additional keywords.
* WINDOW
- keyword specifies the number of rows to "look back" into when making a prediction after the rows are ordered by the order_by column and split into groups. Could be used to specify something like "Always use the previous 10 rows".
* HORIZON
- keyword specifies the number of future predictions.
CREATE PREDICTOR predictor_name
FROM db_integration
(SELECT column_name, column_name2 FROM table_name) as ds_name
PREDICT column_name as column_alias
GROUP BY column_name
WINDOW 10
HORIZON 7;
USING {"is_timeseries": "Yes"}