Detailing automated essay evaluation and scoring systems
August 5, 2022
Essay evaluation and scoring depends on various features such as word choice, grammatical accuracy, sentence structure and clarity of writing. The dilemma with a manual essay screening process lies in the subjectivity of evaluations. It is difficult to understand the methodology behind essay scoring – do recruiters use different metrics when evaluating candidates? What aspect of writing do they consider to be the most important? One recruiter may focus on a candidate’s vocabulary, while another may concentrate on how well a message is conveyed. Recruiters may also favour a particular writing style, potentially adding bias to the selection process, resulting in inaccurate hiring.
This raises the question – using datasets of candidate essays and recruiter ratings, can we create an objective list of metrics that recruiters use? Moreover, can we use these metrics to develop an automated essay evaluation system? Our team at impress.ai looked to tackle this problem by analyzing various metrics available based on the different aspects of writing they evaluate.
Based on our team’s findings, here are five metrics that had the strongest correlation with recruiter evaluations:
Lexical diversity is defined as the measure of how many different words appear in an essay. It is a good indicator of a candidate’s vocabulary and reflects their variety in word choices. The most popular measure of lexical diversity is the type-token ratio, which is calculated by finding the ratio between the total number of unique words in an essay against the word count. A result is a number between 0 and 1 – a score closer to 1 reflects a well-developed vocabulary.
Lexical words convey information and have a shared meaning in identifying an object or action. The lexical density of an essay is, therefore, a measure of how much information it contains. Calculated by finding the ratio between the number of lexical words over total word count, essays with a higher lexical density are said to convey more information.
Flesch Reading Ease
A metric widely used by marketers, researchers and writers, Flesch reading ease is calculated by a formula developed by Rudolf Flesch, a consultant with the Associated Press. A readability score for essays is calculated using the average number of words in a sentence and the average number of syllables in a word. The higher the score, the easier it is to understand.
Grammatical errors in an essay can easily be found with the help of natural language processing. Grammar error percentage is determined by the ratio of grammatical mistakes over the total number of words in an essay. A higher grammar error percentage reflects poor writing ability.
These metrics can be used as the foundations for developing automated essay evaluation and scoring systems that can save recruiters time and effort. The customizability of feature-engineered models further introduces the possibility of customizing evaluation metrics based on a recruiter’s requirements or the nature of the essay-based question. Automated essay evaluation and scoring systems have the potential to significantly improve recruitment efficiency while ensuring accurate and useful insights.
Join our mailing list and stay updated on recruitment automation news & trends.
Thanks for your interest! We'll get back to you soon