Summary: | Much of the information processed by organisations nowadays comes in the form of unstructured text. Using natural language processing tools, it is possible to extract quite valuable information from huge datasets. Without these tools, the data would otherwise remain inaccessible to the various business units. This dissertation focuses on the task of establishing links between data related to comments of employees and a broad set of professional occupations. Using machine learning tools, it is intended to leverage the features of the information provided to promote a link between an employee's career perspectives and possible jobs to which he may be eligible. Several natural language processing techniques were applied for information extraction including stemming, lemmatization and Part-of-Speech Tagging. Given that the datasets used are somewhat unbalanced and composed of relatively short text structures, this study focused heavily on usability and knowledge extraction. The first experiment consisted in solving a multiclass problem by applying a clustering algorithm. an attempt is made to find the likely category for a given professional occupation, in order to define a set of distinct labour market areas. The second experiment uses a set of classifiers, including SVM and Naive Bayes to assign each job description to a predetermined cluster. Finally, a similarity algorithm was implemented, to make the link between the aspirations of a given professional and the set of most suitable professional occupations. The resulting models are a starting method for the application of complex natural language processing algorithms or by empowering taxonomies in the area of human resources that allow a more exhaustive analysis of the job descriptions performed by professionals in the labour market.
|