*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Classification, the development of rules for the allocation of
observations to one or more groups, is a fundamental problem in machine
learning and has been applied to many problems in medicine and business.
We consider aspects of a classification model developed by Gallagher, Lee,
and Patterson that is based on a result by Anderson. The model seeks to
maximize the probability of correct G-group classification, subject to
limits on misclassification probabilities. The mixed-integer programming
formulation of the model is an empirical method for estimating the
parameters of an optimal classification rule, which are identified as
coefficients of linear functions by Anderson.
The model is shown to be a consistent method for estimating the parameters
of the optimal solution to the problem of maximizing the probability of
correct classification subject to limits on inter-group misclassification
probabilities. A polynomial time algorithm is described for two-group
instances. The method is NP-complete for a general number of
groups, and an approximation is formulated as a mixed-integer program
(MIP). The MIP is difficult to solve due to the formulation of
constraints wherein certain variables are equal to the maximum of a set of
linear functions. These constraints are conducive to an ill-conditioned
coefficient matrix. Methods for generating edges of the conflict graph
and conflict hypergraphs are discussed. The conflict graph is employed
for finding cuts in a branch-and-bound framework. This technique and
others lead to improvement in solution time over industry-standard
software on instances generated by real-world data. The classification
accuracy of the model in relation to standard classification methods on
real-world and simulated data is also noted.