Data reduction techniques can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. That is, mining on the reduced data set should be more efficient yet produce the same (or almost the same)
Strategies for data reduction include the following:
- Date cube aggregation, where aggregation operations are applied to the data in the construction of a data cube.
- Dimension reduction, where irrelevant, weakly relevant, or redundant attributes or dimensions may be detected and removed.
- Data compression, where encoding mechanisms are used to reduce the data set size.
- Numerosity reduction, where the data are replaced or estimated by alternative, smaller data representations such as a parametric models (which need store only the model parameters instead of the actual data), or nonparametric methods such as clustering, sampling, and the use of histograms.
- Discretization and concept hierarchy generation, where raw data values for attributes are replaced by ranges or higher conceptual levels. Concept hierarchies allow the mining of data at multiple levels of abstraction and are a powerful tool for data mining. We therefore defer the discussion of automatic concept hierarchy generation to Section.
- answered 5 years ago
- G John