19. March 2015

The advantage of decision trees

Decision trees are one of the most established methods for statistical modeling. First, they were developed for classification and then they were extended to regression.

There are two possibilities of creating decision trees: by an expert or data based. Since an expert is only able to use his own knowledge, complex depencies among variablescould remain undetected. Moreover, such a procedure is very time consuming.

Data based approaches offer the advantage to be quickly applicable and to extract all information contained in the data set. There are many approaches to create and enhance decision trees. Among the best known are ID3 and C4.5.

Due to their good readability they are often used to reveal sources for errors or quality issues in industrial processes. Therefore, they are often used to reveal sources for errors or quality issues in industrial production processes.

Such a decision tree is able to set boundaries for production parameters to avoid errors and quality issues. Moreover, the detected boundaries can be discussed with process specialists, thus increasing acceptance of the results.

Example of a rule derived from that tree: If X1 < 5 and X8 >= 100, the “not OK”.