A technique for training data selection based on clustering

Authors: Le Thi Kim Nga; Đinh Thi My Canh
Journal: Quy Nhon University Journal of Science
Published: 2019/10/30
Volume/Issue: Vol. 13, Issue 5
Pages: 41-48

Abstract

Along with the development of recognition systems, buiding training data sets not only need to express well on the object of interest but also need to be effective, consistent with the selected machine learning model. This article presents a processing technique for selecting data sets basing on clustering approach to reduce the very similar samples. This technology was installed and test on trial to select input data for K-nearest neighbors model and has proven its effectiveness with many data sets, namely the data generated randomly in standard distribution, MNIST database- data sets of handwritten digits and YawDD face data sets.

Links