Resumo: | Within our previous work [1] we developed computational models to predict strains with specific phenotypes (e.g. low ethanol resistance, growth at 30ºC and growth in media containing galactose, raffinose or urea) from microsatellite allelic patterns. The objective of the present work was to gain deeper understanding of the phenotypic diversity of a heterogeneous Saccharomyces cerevisiae strain collection, using a large battery of tests with biotechnological relevance, and apply computational data mining algorithms to predict a strain´s potential to be used as a winemaking strain from a few selected phenotypic data. A S. cerevisiae collection was constituted, comprising 172 strains of different geographical origins and technological uses (winemaking, brewing, bakery, distillery, etc.). Phenotypic screening was performed considering 30 physiological traits that are important from an oenological point of view, such as ethanol tolerance, growth in synthetic must media at various temperatures or resistance to fungicides. Data was analyzed using Principal Component Analysis and some phenotypes were identified (growth in the presence of potassium bisulfite, growth at 40˚C, and resistance to ethanol) as being responsible for the highest strain variability. Statistical analysis revealed relevant associations between several phenotypes and the strains technological use. Based on the phenotypic data, naїve Bayesian classifier, as implemented in the software Orange [2], correctly assigned (AUC=0.70) most of strains from vineyards (73%) and commercial strains (77%) to the respective group. Data mining approaches identified, for the group of commercial strains, 18 phenotypic tests with the highest weight. Globally, the growth patterns of this group of strains in must containing iprodion (0,05mg/mL) or cycloheximide (0,1µg/mL) revealed to have the highest predictive score for the assignment of a strain as a commercial strain. The results obtained herein demonstrate the potential of computational approaches to explore phenotypic variability and to predict the probability of a S. cerevisiae strain to be used as a commercial strain.
|