A technique for training data selection based on clustering

Le Thi Kim Nga; Đinh Thi My Canh

A technique for training data selection based on clustering

Authors: Le Thi Kim Nga; Đinh Thi My Canh

Journal: Quy Nhon University Journal of Science

Published: 2019/10/30

Volume/Issue: Vol. 13, Issue 5

Pages: 41-48

Abstract

Along with the development of recognition systems, buiding training data sets not only need to express well on the object of interest but also need to be effective, consistent with the selected machine learning model. This article presents a processing technique for selecting data sets basing on clustering approach to reduce the very similar samples. This technology was installed and test on trial to select input data for K-nearest neighbors model and has proven its effectiveness with many data sets, namely the data generated randomly in standard distribution, MNIST database- data sets of handwritten digits and YawDD face data sets.

A technique for training data selection based on clustering

Abstract

Links