Package cva
The Python package cva allows users to use the automatic machine learning capabilities of ClearVu Analytics within Python. This includes fitting, comparing, using, loading, saving and exporting models.
Automatic machine learning
Automatic machine learning includes the training of nonlinear models like neural networks, random forest and others and the optimization of the hyperparameter of the training algorithm. Afterward the different modeling approaches are compared and the best model can be selected. The optimization of the hyperparameter uses crossvalidation with the aim to minimize the mean error on the validation part.
Getting started
First you need to instantiate a manager to connect to the CVA core.
from cva import Base
manager = Base.Manager()
Preparing models
The manager object can be used to create or import models. To create a model use
model = manager.create_model('neural_network')
The list of available model types can be get with Manager.available_model_types()
. The model object contains information
on how the model will be trained. Therefore, a set of model independent task parameter and a set of model specific parameter can be set.
The default values can be shown using
model.get_fit_task().get_task_parameters()
model.get_fit_task().get_model_parameters()
Each parameter is of type CvaParameter
. To change for example the flag for calculating the variable importance, use:
model.get_fit_task().get_task_parameters()['DoCalcVariableImportance'].set_value(False)
Crossvalidation and optimization are on by default.
Fitting models
In order to fit a model we need a data frame with the data, the names of the input variables (column names) and the name of the
output variable (column name). This package includes some sample data (cva.Examples
). We use the Rosenbrock example to continue.
from cva import Examples
example = Examples.example_rosenbrock()
input_variable_names = example['input_variable_names']
output_variable_name = example['output_variable_names'][0]
Now we can fit the model with
model.fit(example['df'], input_variable_names, output_variable_name)
The fit function returns True on success and False, if it is impossible to fit the model. Detailed information can be obtained using the function
info = model.get_fit_info()
which returns a CvaModelInfo
object. It contains information on quality characteristics, input variable importance,
input variable sensitivity and validation prediction values. For more information refer to CvaModelInfo
.
Comparing models
One great advantage of CVA is the possibility to compare the resulting different models for one task and rank them and pick automatically the best one for the given task. Before starting to compare just fit some different models for one task.from cva import Examples
example = Examples.example_rosenbrock()
input_variable_names = example['input_variable_names']
output_variable_name = example['output_variable_names'][0]
models = [manager.create_model(model_type) for model_type in manager.available_model_types()]
models[manager.available_model_types().index('random_forest')].get_fit_task().get_model_parameters()['mtry'].set_value(2)
[model.fit(example['df'], input_variable_names, output_variable_name) for model in models]
Note, that we need to adapt the mtry parameter for a random forest if we have only two variables. Now we have ten different models for the same task. To compare the models use
comp = manager.compare_models(models)
Manager.compare_models()
returns a list of comparisons. The elements are grouped by input variables sets and output variable names.
In this case we have just one entry. To get the best models use
winner = comp[0].get_winner()
To get the ranking use
wins = comp[0].get_wins()
A sorted list of tuples of the model and how many time it won against other models is returned. To get a complete overview use
matrix = comp[0].get_matrix()
A matrix with pairwise comparison is returned. A 1 means row lost against column, 0 no significant difference, 1 row won against column (see also CvaModelCompare.get_matrix()
).
You can visualize this for example using seaborn.
import seaborn as sn
sn.set(font_scale=1.4)
sn.heatmap(comp[0].get_matrix(), annot=True, annot_kws={"size": 16})
This results in the following chart:
Managing models
You can export the model using the function Model.export()
, which writes an xml file which can be used with other CVA tools. Alternatively,
you can use pickle to save the model. To import an xml file use Manager.import_model()
. To reload from a pickle you can either use the function
Manager.load_model()
or use unpickle and then connect the model with manager via Model.connect()
.
New data can be predicted with the model function Model.predict()
. The predict function returns a tuple of the predicted value
and a confidence value. The confidence value is between one and zero, with one being the highest confidence.
Submodules
cva.Base

Base module
cva.Examples

Example module
cva.Structures

Information structures