A technique for training data selection based on clustering
Abstract
Along with the development of recognition systems, buiding training data sets not only need to express well on the object of interest but also need to be effective, consistent with the selected machine learning model. This article presents a processing technique for selecting data sets basing on clustering approach to reduce the very similar samples. This technology was installed and test on trial to select input data for K-nearest neighbors model and has proven its effectiveness with many data sets, namely the data generated randomly in standard distribution, MNIST database- data sets of handwritten digits and YawDD face data sets.