A Randomized Exhaustive Propositionalization Approach for Molecule Classification
INFORMS Journal on Computing, Volume 23, Issue 3, Summer 2011, pp. 331-345
University of Alberta School of Business Research Paper No. 2013-1099
27 Pages Posted: 2 Jul 2013 Last revised: 22 Jan 2014
Date Written: May 26, 2010
Abstract
Drug discovery is the process of designing compounds that have desirable properties, such as activity and non-toxicity. Molecule classification techniques are used along this process to predict the properties of the compounds in order to expedite their testing. Ideally, the classification rules found should be accurate and reveal novel chemical properties, but current molecule representation techniques lead to less than adequate accuracy and knowledge discovery. This work extends the propositionalization approach recently proposed for multi-relational data mining in two ways: it generates expressive attributes exhaustively and it uses randomization to sample a limited set of complex (“deep”) attributes. Our experimental tests show that the procedure is able to generate meaningful and interpretable attributes from molecular structural data, and that these features are effective for classification purposes.
Suggested Citation: Suggested Citation