Grid data mining by means of learning classifier systems and distributed model induction

By Santos, M.F.; Mathew, W.; Santos, H.D.

2011

Abstract

This paper introduces a distributed data mining approach suited to grid computing environments based on a supervised learning classifier system. Different methods of merging data mining models generated at different distributed sites are explored. Centralized Data Mining (CDM) is a conventional method of data mining in distributed data. In CDM, data that is stored in distributed locations have to be collected and stored in a central repository before executing the data mining algorithm. CDM method is reliable; however it is expensive (computational, communicational and implementation costs are high). Alternatively, Distributed Data Mining (DDM) approach is economical but it has limitations in combining local models. In DDM, the data mining algorithm has to be executed at each one of the sites to induce a local model. Those induced local models are collected and combined to form a global data mining model. In this work six different tactics are used for constructing the global model in DDM: Generalized Classifier Method (GCM); Specific Classifier Method (SCM); Weighed Classifier Method (WCM); Majority Voting Method (MVM); Model Sampling Method (MSM); and Centralized Training Method (CTM). Preliminary experimental tests were conducted with two synthetic data sets (eleven multiplexer and monks3) and a real world data set (intensive care medicine). The initial results demonstrate that the performance of DDM methods is competitive when compared with the CDM methods.

RepositoriUM:

Google Scholar: