A new method for transforming spectra into multivalued fingerprints is presented here and applied to multivariate regression. The method, aimed at enlarging differences between long-dimensional vectors showing a high degree of similarity, is based on the following stages: (1) removal of spectral outliers; (2) data normalization aimed at transforming the spectral matrix into a new data set within the range [0, 1]; and (3) selection of threshold values for assigning significance values to each variable according to both normalized and threshold values. A case study is described: the processing of mid-infrared spectra in partial least squares regression processes for predicting total acidity and content of reducing sugars in wines. The original spectra matrix consisted of 156 objects (samples) and 1142 columns (predictors)-a wavelength range of 3000-800 cm-1 with a spectral resolution slightly greater than 2 cm-1. Using the proposed method yielded better predictions than those obtained by means of both classical treatments and spectral data without any processing.
Spectra-based multivalued fingerprints as predictive vectors for partial least squares regression processes
Int. J. Comp. Math. 2008, 85, 691-702.