A survey of discretization techniques: taxonomy and empirical analysis in supervised learning

More discussion here
The complete discretization techniques

Main characteristics of discretizers

Statics vs dynamic: static is independent of the learner and acts prior to the learning task. Most are static. Dynamic are ID3 discretizers, ITFP
Univariate vs multivariate: multivariate consider all attributes to define the initial set of cut points (or final ones). Univariate inly work with one attribute at the time
Supervised vs unsupervised: Supervised consider heuristic measures to determine the best cut points( entropy, interdependence etc..). Most are supervised. Unsupervised discretizers are equal width and equal frequency.
Splitting vs merging: Splitting methods establish a cut point among all possible boundaries and divide the domain into two intervals. Merging starts with a predefined partition and remove a candidate cut point to mix both intervals
Global vs local: To make a decision a discretizer can either require all available data in the attribute or use only partial information
Direct vs incremental: direct discretizers divide the range into k intervals simultaneously. By contrast incremental pass through an improvement process (also called hierarchical discretizers)
Evaluation measure: metrics used by the discretizer to compare to candidate schemes: (information measure, statistical measure, rough sets, wrappers, binning)
Parametric vs nonparametric: parametric (ChiMerge, CADD) requires a maximum number of intervals fixed by the user. Non parametric (MDLP, CAIM) computes the minimum number of intervals considering a tradeoff with loss of information
Top down vs bottom up: Top down starts with an empty discretization and adds a new cut point every time (MDLP). Bottom up starts with all possible cutpoints and merge two intervals (ChiMerge)
Stopping condition: Must be specified for non parametric approaches (minimum description length measure, confidence thresholds, inconsistency ratios)
Disjoint vs non disjoint: In a disjoint discretization intervals can not overlap
Ordinal vs nominal: ordinal discretization transforms quantitative data into ordered qualitative data (non common)

Comparison criteria

Number of intervals
Inconsistency
Accuracy: Use Cohen’ s kappa which compensates for random hits. Original purpose was to measure the degree of dis/agreement between two people observing the came phenomenon. Less expressive as than ROC curves when applied to binary classification but effective for multiclass problems
Predictive classification rate
Time

Results

Can not recommend one best performing method, it depends upon the problem tackled

FUSINTER, ChiMerge, CAIM and Modified CHi2 offer excellent performances over all
PKID, FFD are suitable methods for lazy d learning and CACC, Distance and MODL are good choices in the rule induction learning
FUSINTER, Distance, Chi2, MDLP and UCPD obtain satisfactory tradeoff between the number of intervals and accuracy
CAIM is one of the simplest discretizer and is pretty effective

Main characteristics of discretizers

Comparison criteria

Results

Related Hamster Notes

Latest Hamster Notes