A missing data treatment for data mining applications in medical information systems
Liao, S.C.; Lee, I.N.
Kaohsiung Journal of Medical Sciences 17(4): 198-206
2001
ISSN/ISBN: 1607-551X PMID: 11482131 Document Number: 528538
To apply user-friendly, easily operated and accessible tools to handle missing data resulting from an auto-stored medical information system, these tools are applied to satisfy general users from different disciplines (i.e. statistics and machine-learning), followed by medical information system development. This study attempts to develop a new logic separation inference method applied to a database with a format like most real-world medical records containing many missing data and miscellaneous variables. It is expected that this method should have better performance than currently accessible methods. The newly developed logic separation inference method shows a classification power of 0.997 (elimination method is 1), which is better than the simple replacing method (replaced by mode shows 0.974). Both inference methods (mode and mean) have superior classification power to the simple replacing method. The missing data treatment processes introduced in this study can be completed on a MS Excel spreadsheet without any complicated calculation; therefore, they can satisfy general users. This new missing data treatment method is only applied up to 60% of the missing data (missing at random). However, when there is large amount of data, it is expected that this method also can be applied to a database missing more than 60%.