Auto-grading technology

Mathematics Machine Learning Artificial Intelligence Graph Data Algorithm Programming Digital Transformation Algorithms and Data structures Navigation of this blog

Auto-Grading (automatic grading)

Auto-grading refers to the process of using computer programmes and algorithms to automatically assess and score learning activities and assessment tasks. The technology is primarily used in the fields of education and assessment.

Some of the features of auto-grading include.

1. efficiency: auto-grading helps to process large scale learning content and assessment tasks, allowing educational institutions and online education platforms to efficiently assess thousands of students and participants.

2. real-time feedback: real-time feedback can be provided to students and participants, which facilitates learners’ understanding of assignments and increases opportunities for self-assessment

3. standardised assessment: automated scoring provides standardised assessments and helps eliminate subjective elements and human bias, thus ensuring fair assessment.

4. analysis of large data sets: large amounts of data from automated scoring can be accumulated and used for analysis, providing useful information for improving the learning process and optimising assessment methods.

Possible ways to realise auto-gurating include the following.

1. programmed testing: computer-programmed automated testing is widely used to assess the correctness of a programme, designing test cases and automatically checking whether the programme passes them

2. automated evaluation formulae: where numerical answers are expected, such as in mathematical problems or programming tasks, automated evaluation formulae are used to calculate the answers and assess their correctness.

3. natural language processing: the grammar and content of essays, reports and texts are assessed using natural language processing to assess reading comprehension, grammatical accuracy and logical development of the text.

4. machine learning: train machine learning models to automatically assess complex assessment tasks. Examples include image recognition, speech recognition and automatic translation.

5. automatic grading of programming tasks: to automatically grade programming tasks, the syntax and semantics of the code are checked and the expected output is generated. The efficiency and correctness of the programme may also be assessed.

Automatic scoring is not only used in education, but also in real-world applications and is used in various areas such as quality control and test automation. However, it is important to select and design appropriate automated scoring methods, which need to be customised to specific tasks and assessment criteria.

Algorithms used for auto-grading (automatic grading)

Different algorithms and methods are used for auto-grading (automatic grading). The algorithm chosen depends on the type of assessment task and the task. The general algorithms are described below.

1. rule-based assessment: rule-based approaches create a set of rules according to assessment criteria and assess learning activities or assessment tasks. Examples include rules for detecting grammatical errors or checking for the presence of certain keywords.

2. natural language processing (NLP): NLP technology analyses and evaluates the grammar, context and content of texts. It includes automatic summarisation of sentences, grammar checking and sentiment analysis, e.g. in essay evaluation, NLP algorithms are used to assess grammatical accuracy, lexical diversity, logical development, etc.

3. machine learning: machine learning algorithms use large data sets to learn patterns and rules. They are widely used for automated scoring of programming tasks and mathematical problems and include supervised, unsupervised and reinforcement learning approaches.

4. computer vision: computer vision algorithms are used to evaluate image and video data, including, for example, shape recognition, face recognition and object detection. This helps to assess the accuracy and design of programmes.

5. speech recognition: speech recognition algorithms are used to evaluate speech data, e.g. to assess the content of speech tasks and conversations.

6. statistical methods: statistical methods analyse and evaluate numerical data, for example, statistical measures and models may be used to evaluate the solution to a mathematical problem.

7. specialised domain knowledge: utilises specialised knowledge of a particular domain to evaluate, e.g. medical domain evaluation tasks may require medical knowledge.

Application examples of auto-grading (automatic grading)

The following are examples of the application of Auto Grading.

1. online education platforms: many online education platforms (e.g. Coursera, edX, Udacity) use Auto Grading to assess course homework and tests. This allows even large online courses (MOOCs) to be assessed quickly and efficiently.

2. programming education: programming learning platforms (e.g. LeetCode, HackerRank, CodeSignal) typically have systems that automatically assess submitted code against test cases, allowing learners to receive immediate feedback and received, making it easier for them to understand problems with their code.

3. language learning: language learning apps such as Duolingo and Rosetta Stone use systems that automatically assess user responses (spelling, grammar, pronunciation, etc.). This allows users to receive real-time feedback and learn more effectively.

4. automated marking of tests and examinations: schools and universities have introduced automated marking systems for tests and examinations, such as multiple choice, fill-in-the-blank and short answer questions. This reduces the burden on teachers and provides grades quickly.

5. training programmes: auto-grading is used in corporate training programmes to assess employee knowledge and skills. This allows them to effectively manage employees’ learning progress and provide the necessary support.

6. e-learning systems: many e-learning systems (e.g. Moodle, Blackboard, Canvas) incorporate auto-grading of quizzes and assignments, allowing teachers to spend more time preparing and teaching lessons.

Examples of auto-grading (automatic grading) implementations

This section describes an example implementation of auto-grading (automatic grading). The following is a simple example of automatic grading of programming assignments using Python. In this example, the score is calculated by comparing the correct programme with the programme submitted by the student.

# Correct programme
def correct_solution(input_data):
    # Write the correct code here.
    return result

# Student-submitted programmes.
def student_solution(input_data):
    # Describe the student's code here
    return result

# Functions to evaluate the program
def grade_program(correct, student, input_data):
    # Output of correct answers
    correct_output = correct(input_data)
    # Student outputs.
    student_output = student(input_data)
    
    # Check that the outputs match.
    if correct_output == student_output:
        return 100  # 100 points for an exact match.
    else:
        return 0  # Zero points if there is a discrepancy.

# test data
test_input = 10
expected_output = 20

# Evaluating the programme
score = grade_program(correct_solution, student_solution, test_input)

print("得点:", score)

In this example, the correct_solution and student_solution functions evaluate the correct answers and student submissions, while the grade_program function compares the correct answers and student outputs, returning 100 points for an exact match and 0 for a mismatch. Finally, test data is set and the programme is evaluated.

This example is very simple and actual auto-grading systems may require more sophisticated algorithms and functions. Factors such as security measures and programme efficiency also need to be considered.

Auto-grading can also be implemented using similar principles in areas other than programming assignments, for example, when assessing natural language processing assignments, the correct text can be compared with the student’s submitted text and a score can be assigned according to the degree of agreement.

Challenges of auto-grading (automatic grading)

Several challenges and limitations exist with auto-grading. The main challenges are described below.

1. handling subjectivity: some assessment tasks contain subjective elements, for example, the assessment of essays or artistic works. It is difficult to accurately assess such tasks using automatic scoring, and algorithms and models need to be developed to automatically assess subjectivity.

2. setting evaluation criteria: in order to perform automatic grading, it is necessary to set evaluation criteria. Accurate automatic grading may be difficult if the assessment criteria are unclear or if the assignment has complex criteria.

3. complexity of programming assignments: programming assignments are very complex and not only the correctness of the code but also factors such as efficiency and design need to be assessed, requiring the development of sophisticated automated scoring systems.

4. designing appropriate test cases: in the case of programming assignments, it is important to design appropriate test cases, and if it is difficult to automatically generate test cases appropriate to the task, the assessment may be inaccurate.

5. privacy and security: there are concerns about the security and privacy of student submissions and response data, and the need for careful handling of data.

6. lack of feedback: if automated marking systems are inaccurate, they may not provide adequate feedback to learners, and improving the quality of feedback is a challenge, as incorrect assessments can have a negative impact on learners.

7. complexity of assessment methods: automatically assessing assessment tasks with multiple elements and criteria is complex, especially when different types of data need to be combined, increasing the complexity of the algorithm.

Measures to address the issue of auto-grading (automatic grading)

The measures to address these auto-grading challenges are as follows.

1. automating subjective evaluation: use specialised natural language processing (NLP) models to automate subjective evaluation criteria and integrate NLP algorithms to process subjective elements such as sentiment analysis, grammar checking and evaluation of logical development. Furthermore, the models are trained using training data and scored according to evaluation criteria.

2. clarifying assessment criteria: it is important to clarify and communicate the assessment criteria to the learner; detailed explanations of the purpose of the task, expected outputs and assessment criteria will help both the learner and the educator understand the assessment process.

3. assessment of programming assignments: to address programming assignments, professional automated scoring tools should be developed to assess code accuracy, efficiency, conformance to style guides, etc., with suggestions for designing appropriate test cases and code refactoring also incorporated into automated scoring.

4. privacy and security considerations: measures to address the security and privacy of student submissions and response data, including data anonymisation, encryption and enhanced access controls to prevent data leaks and unauthorised access.

5. improving feedback: where automated marking systems are inaccurate, provide learners with detailed explanations and suggestions for improvement to improve the quality of feedback and help learners understand their errors and facilitate learning

6. addressing the complexity of assessment methods: use a multi-layered automated scoring approach to address assessment tasks with multiple components and criteria, integrating multiple assessment criteria, calculating scores for each and providing an overall assessment.

7. utilising domain expertise: for tasks related to a specific domain, utilise domain expertise to build an automated scoring model. This allows for assessment criteria specific to a particular domain to be addressed.

8. implement a feedback loop: collect learner feedback to update the automated scoring system in order to drive improvements. Establish a continuous improvement process.

Addressing auto-grading challenges requires a combination of technical approaches and pedagogical strategies, and improving the transparency of assessment criteria and the quality of feedback will contribute to learner satisfaction and learning effectiveness.

Reference Information and Reference Books

For general machine learning algorithms including search algorithms, see “Algorithms and Data Structures” or “General Machine Learning and Data Analysis.

“Algorithms” and other reference books are also available.

① Theory and Overview (General Framework & Background)

Handbook of Automated Scoring: Theory into Practice
By Duanli Yan, André A. Rupp, Peter W. Foltz (2020, CRC Press)
→ A comprehensive 560-page reference covering psychometrics, NLP, and real-world implementation.
The Routledge International Handbook of Automated Essay Evaluation
Edited by Mark D. Shermis & Joshua Wilson (2024, Routledge)
→ Covers international trends, multimodal scoring, ethical considerations, and policy developments.

② Essay and Short-Answer Grading (NLP-Based Scoring)

Automated Essay Scoring
By Beata Beigman Klebanov & Nitin Madnani (2022, Synthesis Lectures)
→ Hands-on guide for building small-scale models along with theoretical background.
Handbook of Automated Essay Evaluation
By Mark D. Shermis & Jill Burstein (2013, Routledge)
→ Classic handbook comparing commercial systems like e-rater™ and IntelliMetric™.
Auto-Grader: Auto-Grading Free Text Answers
By Robin Richner (2022, Springer)
→ Focuses on semantic similarity scoring using BERT for short-answer questions.

③ Code Assignment Auto-Grading (Programming Tasks)

Chapter 20 of the Handbook of Automated Scoring
→ Covers test case-based scoring, AST analysis, and style evaluation for code assignments.
“Design and Evaluation of an AI-Assisted Grading Tool for Source Code (TA Buddy)“
ACM Paper (2025)
→ Hybrid approach using LLMs and static code analysis with proven effectiveness in real classrooms.
Gradescope Programming Assignments Guide
→ Detailed implementation guide using Docker and customizable auto-grader scripts.

④ Japanese Resources (Domestic Context & Use Cases)

“Automated Scoring in English Education – Current Status and Challenges”
By Takayasu Ishii, Yusuke Kondo, Makoto Miyoshi (2023)
→ Case studies of automated scoring tools in Japanese high schools and universities.
IPSJ (Information Processing Society of Japan) Special Issue (May 2023)
Mini-feature: “AI Grading Systems”
→ Covers CBT, OCR for handwritten answers, and the impact of AI scoring on university entrance exams.