Robin's AI Lab

AutoML框架

相对于其他 AutoML工具来说， Auto_ml 是从0->1的开创性的工作，其附加功能很多，比如feature response，data clean，ml analysis，feature learning ，deep learning，Categorical Ensembling（类别组合）等。但是对于auto_ml的核心——model selection，hyper parameter optimation等，它仅仅做的很基础——使用SGCV进行网格搜索，而在不选择model selection时，model是默认给定的（默认GradientBoostingRegressor for regressor &GradientBoostingClassifier for classifier）。总之，auto_ml做的很基础很全面，根据阅读源代码熟悉 autoML的Pipeline，适合auto_ml新手进行探索。

特性

使整个机器学习过程自动化，使其非常易于用于分析和获取生产中的实时预测。该项目自动执行：

Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you’re trying to predict).
Feature Engineering (particularly around dates, and soon, NLP).
Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse matrices).
Feature Selection (picking only the features that actually prove useful).
Data formatting (turning a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems).
Model Selection (which model works best for your problem).
Hyperparameter Optimization (what hyperparameters work best for that model).
Ensembling Subpredictors (automatically training up models to predict smaller problems within the meta problem).
Ensembling Weak Estimators (automatically training up weak models on the larger problem itself, to inform the meta-estimator’s decision).
Big Data (feed it lots of data).
Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

GetStart

Installation

auto_ml模块可直接使用pip命令进行安装，目前版本为auto_ml 2.9.10

python

pip install auto_ml

第三方软件包

TensorFlow & Keras, XGBoost, LightGBM

auto_ml集成了所有这些很棒的库，包括 DeepLearningClassifier 、 DeepLearningRegressor - XGBClassifier、 XGBRegressor - LGBMClassifer、 LGBMRegressor - CatBoostClassifier和 CatBoostRegressor等等，这些项目对单个预测的预测时间都在1毫秒范围内，并且能够序列化到磁盘，并在训练后加载到新的环境中。auto_ml在没有安装它们的情况下可以很好地运行。只需为model_names传递其中之一即可实现调用。

python

ml_predictor.train(data, model_names=['DeepLearningClassifier'])

使用示例

1、导入auto_ml中的包Predictor和get_boston_dataset
2、通过get_boston_dataset获取训练集和测试集
3、告诉机器数据输出列和不是纯数字的列
4、建立模型并训练模型
5、模型训练效果评估
6、保存模型
7、使用模型进行预测

python

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output'
    , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

test_score = ml_predictor.score(df_test, df_test.MEDV)
file_name = ml_predictor.save()
trained_model = load_ml_model(file_name)evironments)
predictions = trained_model.predict(df_test)
print(predictions)

运行结束

python

+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+
|    | Feature Name   |   Importance |    Delta |   FR_Decrementing |   FR_Incrementing |   FRD_abs |   FRI_abs |   FRD_MAD |   FRI_MAD |
|----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------|
| 12 | CHAS           |            7 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  1 | ZN             |           12 |  11.5619 |           -0.0062 |            0.0380 |    0.0239 |    0.0474 |    0.0000 |    0.0000 |
|  7 | RAD            |           17 |   4.2895 |           -0.1370 |            0.1163 |    0.1588 |    0.1673 |    0.1175 |    0.1035 |
|  2 | INDUS          |           42 |   3.4430 |            0.1632 |           -0.1283 |    0.2332 |    0.2536 |    0.0278 |    0.0785 |
|  8 | TAX            |           50 |  82.9834 |            0.7771 |           -0.3183 |    0.8333 |    0.4623 |    0.3680 |    0.1181 |
|  9 | PTRATIO        |           59 |   1.1130 |            0.6408 |           -0.4830 |    0.6931 |    0.5645 |    0.4029 |    0.2925 |
|  3 | NOX            |           79 |   0.0588 |            0.4932 |           -0.1203 |    0.8052 |    0.6728 |    0.1935 |    0.2717 |
|  0 | CRIM           |          142 |   4.4320 |           -0.6586 |            1.3412 |    0.9985 |    1.7242 |    0.9479 |    1.6564 |
|  5 | AGE            |          143 |  13.9801 |            0.4331 |           -0.4677 |    0.6308 |    0.6133 |    0.5353 |    0.4838 |
|  6 | DIS            |          176 |   1.0643 |            1.7858 |           -0.3931 |    1.8339 |    0.8874 |    1.0996 |    0.3497 |
| 10 | B              |          179 |  45.7266 |           -0.7798 |            0.1640 |    1.1877 |    0.4785 |    1.0904 |    0.1836 |
|  4 | RM             |          197 |   0.3543 |           -1.1633 |            1.6018 |    1.3489 |    1.7951 |    0.9774 |    1.2344 |
| 11 | LSTAT          |          213 |   3.5508 |            2.5125 |           -1.4640 |    2.6368 |    1.5911 |    1.7992 |    1.1952 |
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+

python

***********************************************
Advanced scoring metrics for the trained regression model on this particular dataset:

Here is the overall RMSE for these predictions:
2.7608924254130724

Here is the average of the predictions:
21.350718828184544

Here is the average actual value on this validation set:
21.488235294117654

Here is the median prediction:
20.70223186516366

Here is the median actual value:
20.15

Here is the mean absolute error:
2.055851103106312

Here is the median absolute error (robust to outliers):
1.625601367533859

Here is the explained variance:
0.8963149604640271

Here is the R-squared value:
0.8960570877948729
Count of positive differences (prediction > actual):
48
Count of negative differences:
54
Average positive difference:
2.0382305519965307
Average negative difference:
-2.071513815203894

***********************************************

AutoML框架 ​

特性 ​

GetStart ​

Installation ​

第三方软件包 ​

使用示例 ​

AutoML框架

特性

GetStart

Installation

第三方软件包

使用示例