The mean values of contents in calibration Selleckchem MLN0128 set and validation set were approximately equal with similar ranges in variation ( Table 3). The PLS regression statistics of cross-validation and test set validation are shown in Table 4. The model for the ground powder protein had the highest coefficient
of correlation (r2 = 0.97) followed by starch (r2 = 0.93). The protein model also had the highest RPD of 4.09 in the cross validation and 4.05 in the external validation, which indicated extremely on good prediction. The starch model of the milled powder, with a coefficient of correlation of 0.93 and RPDs of 2.64 and 2.95 in cross validation and external validation, demonstrated a good predictive capacity. The RPDs over 2.00 and below 2.50 showed that the predictive
capability for total polyphenol in the milled powder and for protein and starch in whole seeds could be used for rough estimation of their content. The oil NIR models could not be used find more for practical germplasm analysis. The optimal model for ground powder with lower values of rank was better than for seeds ( Table 4). Fig. 3 and Fig. 4 represent the optimized regression lines of PLS models in the cross validation of the constituents. As determined by automatic selection which was based on the values of BIC (Bayesian Information Criterions) across different clustering solutions, the optimized number of clustering was three. The clustering features covered constructors (sample number and producing area) and seed composition characteristics. The three groupings consisted of 91 samples in Group 1 (46.7%), 62 samples in Group 2 (31.8%) as well as other 42 samples in Group 3 (21.5%, Table 5). Group 1 was characterized by low content of starch (40.96 ± 1.49%) Selleck 5 FU and total polyphenol (3.52 ± 0.79 mg g− 1) with a high content of oil (1.30 ± 0.32%). Group 2 had high content of protein (28.12 ± 1.39%). Group 3 was in low content of protein (26.56 ± 1.12%) and oil (0.93 ± 0.24%) but was high in starch (44.04 ± 1.05%) and total polyphenol (5.06 ± 0.98 mg g− 1). These
results showed the typical features of groupings clustered by a two-step cluster analysis. Canonical discriminant analysis demonstrated that the concentration of protein (Wilk’s Lambda = 0.825, F = 20.302, P = 0.000), starch (Wilk’s Lambda = 0.615, F = 60.129, P = 0.000), oil (Wilk’s Lambda = 0.785, F = 26.232, P = 0.000) and total polyphenol (Wilk’s Lambda = 0.671, F = 46.999, P = 0.000) were all significantly important in the determination of the three groups. The correction ratio of validation was high (79.5%), which indicated agreement with the results of the calibration set. The outliers of discrimination included that thirteen varieties in Group 1 were predicted to Group 2 and one to Group 3; eighteen in Group 2 were assigned to Group 1 and five to Group 3; one in Group 3 was placed in Group 2 and two in Group 1. Group 2 was clustered into two subgroups (Table 5).