Icon

Exercise 1 - CV classification

Exercise: For an HR manager of a large company, finding the right person to hire among a flood of Curriculum Vitae can be a complicated and time consuming job.Automated text classification can support HR's work by dramatically reducing the time spent in pre-selecting CVs. In the dataset for this exercise there are 1050 CVs submitted by candidates for different positions within your company. Some of these CVs were manually labeled by HRduring past selections. For 6 categories of personal skills ("Communication,""Teamwork,""Problem solving,""IT skills,""Languages") a score from 1 to 7 was assigned.The score assesses the candidate's level of competence on that particular skill, where 1 means "low level" and 7 means "high level."The goal of the analysis is to create an NLP system that can automatically recognize CVs that show a good level of competence on IT Skills (consider a score greaterthan or equal to 5). Perform the following steps:1) Import the dataset and convert the content into a "Document"2) Perform all the necessary text pre-processing steps on the document column3) Create the target variable as a new dummy variable assumes a value of"1" when IT skills are greater than or equal to 5 and"0" otherwise.4) Create word embeddings of the text through the Word2Vec algorithm (try different embedding dimensionalities, depending on the results obtained)5) Filter out all rows where the target variable is missing.6) Split the dataset into train and test.7) Train one or more supervised binary classification algorithms to predict the level of IT skills depending on just the word embeddings created in step 4.8) Evaluate the algorithms on the test set, choose the best performing one, and rank CVs excluded in step 5 so as to select the 50 CVs with the highest probability ofhaving a good level of IT skills. Import CV CSV Reader Exercise: For an HR manager of a large company, finding the right person to hire among a flood of Curriculum Vitae can be a complicated and time consuming job.Automated text classification can support HR's work by dramatically reducing the time spent in pre-selecting CVs. In the dataset for this exercise there are 1050 CVs submitted by candidates for different positions within your company. Some of these CVs were manually labeled by HRduring past selections. For 6 categories of personal skills ("Communication,""Teamwork,""Problem solving,""IT skills,""Languages") a score from 1 to 7 was assigned.The score assesses the candidate's level of competence on that particular skill, where 1 means "low level" and 7 means "high level."The goal of the analysis is to create an NLP system that can automatically recognize CVs that show a good level of competence on IT Skills (consider a score greaterthan or equal to 5). Perform the following steps:1) Import the dataset and convert the content into a "Document"2) Perform all the necessary text pre-processing steps on the document column3) Create the target variable as a new dummy variable assumes a value of"1" when IT skills are greater than or equal to 5 and"0" otherwise.4) Create word embeddings of the text through the Word2Vec algorithm (try different embedding dimensionalities, depending on the results obtained)5) Filter out all rows where the target variable is missing.6) Split the dataset into train and test.7) Train one or more supervised binary classification algorithms to predict the level of IT skills depending on just the word embeddings created in step 4.8) Evaluate the algorithms on the test set, choose the best performing one, and rank CVs excluded in step 5 so as to select the 50 CVs with the highest probability ofhaving a good level of IT skills. Import CV CSV Reader

Nodes

Extensions

Links