- More discussion here
- The complete discretization techniques
Main characteristics of discretizers
- Statics vs dynamic: static is independent of the learner and acts prior to the learning task. Most are static. Dynamic are ID3 discretizers, ITFP
- Univariate vs multivariate: multivariate consider all attributes to define the initial set of cut points (or final ones). Univariate inly work with one attribute at the time
- Supervised vs unsupervised: Supervised consider heuristic measures to determine the best cut points( entropy, interdependence etc..). Most are supervised. Unsupervised discretizers are equal width and equal frequency.
- Splitting vs merging: Splitting methods establish a cut point among all possible boundaries and divide the domain into two intervals. Merging starts with a predefined partition and remove a candidate cut point to mix both intervals
- Global vs local: To make a decision a discretizer can either require all available data in the attribute or use only partial information
- Direct vs incremental: direct discretizers divide the range into k intervals simultaneously. By contrast incremental pass through an improvement process (also called hierarchical discretizers)
- Evaluation measure: metrics used by the discretizer to compare to candidate schemes: (information measure, statistical measure, rough sets, wrappers, binning)
- Parametric vs nonparametric: parametric (ChiMerge, CADD) requires a maximum number of intervals fixed by the user. Non parametric (MDLP, CAIM) computes the minimum number of intervals considering a tradeoff with loss of information
- Top down vs bottom up: Top down starts with an empty discretization and adds a new cut point every time (MDLP). Bottom up starts with all possible cutpoints and merge two intervals (ChiMerge)
- Stopping condition: Must be specified for non parametric approaches (minimum description length measure, confidence thresholds, inconsistency ratios)
- Disjoint vs non disjoint: In a disjoint discretization intervals can not overlap
- Ordinal vs nominal: ordinal discretization transforms quantitative data into ordered qualitative data (non common)
Comparison criteria
- Number of intervals
- Inconsistency
- Accuracy: Use Cohen’ s kappa which compensates for random hits. Original purpose was to measure the degree of dis/agreement between two people observing the came phenomenon. Less expressive as than ROC curves when applied to binary classification but effective for multiclass problems
- Predictive classification rate
- Time
Results
Can not recommend one best performing method, it depends upon the problem tackled
- FUSINTER, ChiMerge, CAIM and Modified CHi2 offer excellent performances over all
- PKID, FFD are suitable methods for lazy d learning and CACC, Distance and MODL are good choices in the rule induction learning
- FUSINTER, Distance, Chi2, MDLP and UCPD obtain satisfactory tradeoff between the number of intervals and accuracy
- CAIM is one of the simplest discretizer and is pretty effective