การเปรียบเทียบเทคนิคการคัดเลือกคุณลักษณะแบบการกรองและการควบรวมของการทำเหมืองข้อความเพื่อการจำแนกข้อความ วาทินี นุ้ยเพียร 1,2*, และ พยุง มีสัจ 3 1 สำนักคอมพิวเตอร์และเทคโนโลยีสารสนเทศ, มหาวิทยาลัยเทคโนโลยีพระจอมเกล้าพระนครเหนือ 2 ภาควิชาคอมพิวเตอร์ศึกษา คณะครุศาสตร์อุตสาหกรรม, มหาวิทยาลัยเทคโนโลยีพระจอมเกล้าพระนครเหนือ 3 คณะเทคโนโลยีสารสนเทศ, มหาวิทยาลัยเทคโนโลยีพระจอมเกล้าพระนครเหนือ

Abstract

The main problem for text categorization is the highest dimensionality of feature space. Many researchers focus on instruction feature selection techniques to represent a document which in turn, increases the overall efficiency of a classification model. There are two general feature selection approaches: the Filter approach and the Wrapper approach. The Filter approach used Information Gain, Gain Ratio and Chi-square. The results showed that Chi-Square had highest performance with F-measure equaling 92.2%, the Wrapper approach used Support Vector Machine consisting of Genetic Algorithm (SVMGA) and Greedy (SVMGD). The results also found that Greedy (SVMGD) was the best algorithm with F-measure which equaled 94%. Both feature selection approaches employed Support Vector Machine with kernel Radial basis function as a classifier. When comparing the effectiveness of Filter approaches to Wrapper approaches, evaluated via F-measure shown that the value of Wrapper approaches were higher than that of Filter approaches at 1.8%. In conclusion, this technique enables researchers to increase the efficiency of a wrapper approach when implemented for information classification.Keywords : Text Mining, Filter Approach, Wrapper Approach, Genetic Algorithm, Greedy, Support Vector Machine