In the case of classification problems, the neutrons of the output layer correspond to the different categories in the dataset and the neuron with the largest output value determines the winner category. For example, in the case of image classification, the set of image categories are fixed and the training set should cover all categories in a uniform way. The problem of unbalanced class distribution is a widely investigated issue as it can cause efficiency degeneration due to difficulty in the learning of decision boundaries or to misleading performance metrics.
In some situations, the test set may contain cases which do not belong to any of the categories presented in the training set. The test dataset may contain instances of previously unseen classes. This problem situation is called an openset learning problem [C4], where the key challenge here is to detect these unseen cases, the neural network should recognize that the input significantly differs from any trained categories.
In the literature, we can find several approaches to cope with this kind of problem domain. The paper summarizes our experiences in the efficiency comparison of the different outlier class detection methods.
For the tests, we applied different datasets including synthetic tabular data and benchmark image datasets. CIFAR-10 benchmark dataset. The test system was implemented in Python Keras-Tensorflow framework using the Colab development environment.
Based on the test results, we can summarize our experiences in the following points:
-
The threshold-based category acceptance methods provided weak results.
-
MLP neural network for similarity regression approach had only a slightly better efficiency
-
The target space transformation methodology is the most promising way in handling the openset learning cases.
The main goal of the future investigation is on the integration of these methods into a suitable ensemble framework.