Package cva

The Python package cva allows users to use the automatic machine learning capabilities of ClearVu Analytics within Python. This includes fitting, comparing, using, loading, saving and exporting models.

Automatic machine learning

Automatic machine learning includes the training of nonlinear models like neural networks, random forest and others and the optimization of the hyperparameter of the training algorithm. Afterward the different modeling approaches are compared and the best model can be selected. The optimization of the hyperparameter uses cross-validation with the aim to minimize the mean error on the validation part.

Getting started

First you need to instantiate a manager to connect to the CVA core.

from cva import Base
manager = Base.Manager()

Preparing models

The manager object can be used to create or import models. To create a model use

model = manager.create_model('neural_network')

The list of available model types can be get with Manager.available_model_types(). The model object contains information on how the model will be trained. Therefore, a set of model independent task parameter and a set of model specific parameter can be set. The default values can be shown using

model.get_fit_task().get_task_parameters()
model.get_fit_task().get_model_parameters()

Each parameter is of type CvaParameter. To change for example the flag for calculating the variable importance, use:

model.get_fit_task().get_task_parameters()['DoCalcVariableImportance'].set_value(False)

Cross-validation and optimization are on by default.

Fitting models

In order to fit a model we need a data frame with the data, the names of the input variables (column names) and the name of the output variable (column name). This package includes some sample data (cva.Examples). We use the Rosenbrock example to continue.

from cva import Examples
example = Examples.example_rosenbrock()
input_variable_names = example['input_variable_names']
output_variable_name = example['output_variable_names'][0]

Now we can fit the model with

model.fit(example['df'], input_variable_names, output_variable_name)

The fit function returns True on success and False, if it is impossible to fit the model. Detailed information can be obtained using the function

info = model.get_fit_info()

which returns a CvaModelInfo object. It contains information on quality characteristics, input variable importance, input variable sensitivity and validation prediction values. For more information refer to CvaModelInfo.

Comparing models

One great advantage of CVA is the possibility to compare the resulting different models for one task and rank them and pick automatically the best one for the given task. Before starting to compare just fit some different models for one task. from cva import Examples
example = Examples.example_rosenbrock()
input_variable_names = example['input_variable_names']
output_variable_name = example['output_variable_names'][0]
models = [manager.create_model(model_type) for model_type in manager.available_model_types()]
models[manager.available_model_types().index('random_forest')].get_fit_task().get_model_parameters()['mtry'].set_value(2)
[model.fit(example['df'], input_variable_names, output_variable_name) for model in models]

Note, that we need to adapt the mtry parameter for a random forest if we have only two variables. Now we have ten different models for the same task. To compare the models use

comp = manager.compare_models(models)

Manager.compare_models() returns a list of comparisons. The elements are grouped by input variables sets and output variable names. In this case we have just one entry. To get the best models use

winner = comp[0].get_winner()

To get the ranking use

wins = comp[0].get_wins()

A sorted list of tuples of the model and how many time it won against other models is returned. To get a complete overview use

matrix = comp[0].get_matrix()

A matrix with pairwise comparison is returned. A -1 means row lost against column, 0 no significant difference, 1 row won against column (see also CvaModelCompare.get_matrix()). You can visualize this for example using seaborn.

import seaborn as sn
sn.set(font_scale=1.4)
sn.heatmap(comp[0].get_matrix(), annot=True, annot_kws={"size": 16})

This results in the following chart:

Managing models

You can export the model using the function Model.export(), which writes an xml file which can be used with other CVA tools. Alternatively, you can use pickle to save the model. To import an xml file use Manager.import_model(). To reload from a pickle you can either use the function Manager.load_model() or use unpickle and then connect the model with manager via Model.connect().

New data can be predicted with the model function Model.predict(). The predict function returns a tuple of the predicted value and a confidence value. The confidence value is between one and zero, with one being the highest confidence.

Sub-modules

cva.Base

Base module

cva.Examples

Example module

cva.Structures

Information structures