Data mining in the study of the chronic obstructive pulmonary disease

By Santos, M.Y.; Cruz, J.; Teles De Araújo, A.



Data Mining algorithms have been used to analyse huge amounts of data and extract useful models or patterns from the analysed data. Those models or patterns can be used to support the decision making process in organizations. In the health domain, and besides the support to the decision process, those algorithms are useful in the analysis and characterization of several diseases. This paper presents the particular case of the use of different Data Mining algorithms to support health care specialists in the analysis and characterization of symptoms and risk factors related with the Chronic Obstructive Pulmonary Disease. This is an airflow limitation that is not fully reversible and that affects up to one quarter of the adults with 40 or more years. For this specific study, data from 1.880 individuals were analysed with decision trees and artificial neural networks in order to identify predictive models for this disease. Clustering was used to identify groups of individuals, with the chronic obstructive pulmonary disease, presenting similar risk factors and symptoms. Furthermore, association rules were used to identify correlations among the risk factors and the symptoms. The results obtained so far are promising as several models confirm the difficulties that are normally associated to the diagnosis of this disease and point to characteristics that must be taken into account in its comprehension.


Google Scholar: