AI Shows Racial Bias When Grading Essays — and Can’t Tell Good Writing From Bad – The 74



Get stories like this delivered directly to your reception box. Sign up for newsletter 74

Each day, artificial intelligence reaches the country’s classrooms more deeply, helping teachers personalize learning, killing students and developing course plans. But the jury is always on the way it makes some of these jobs, including the writing of students. A new study Since The learning agency found that even if Chatgpt can imitate the human score with regard to tests, it has difficulty distinguishing the good writing from the bad. And that has serious implications for students.

To better understand these implications, we have evaluated the notation capacity of chatgpt trials using the Student Evaluation Award (ASAP) 2.0 benchmark. This includes about 24,000 argumentative tests Written by students in the middle and secondary. What makes ASAP 2.0 particularly useful for this type of research is that each test was noted by humans, and it includes demographic data, such as race, the status of the English learner, the sex and the economic status of each student author. This means that researchers can see how AI works not only compared to human markers, but in different groups of students.

So what did we find? Chat GPT did Allow different average scores For different demographic groups, but most of these differences were so small, they probably would not have much importance. However, there was an exception: Black students received less scores than those of Asian studentsAnd this gap was great enough to justify some attention.

But here is the thing: This same disparity appeared in the scores attributed by man. In other words, Chatgpt has not introduced new biases, but rather reproduced the bias that already existed in human notation data. Although this may suggest that the model precisely reflects current standards, this also highlights a serious risk. When training data reflects existing demographic disparities, these inequalities can be cooked in the model itself. The result is then predictable: the same students who have always been neglected remain neglected.

And that matters a lot. If AI models reinforce existing rating disparities, students could see lower notes not due to poor writing, but because of the way the performance has been historically judged. Over time, this could have an impact on academic confidence, access to advanced courses or even university admissions, amplifying educational inequalities rather than their closure.

In addition, our study also revealed that Chatgpt has trouble making the difference Between great and bad writing. Unlike human students, who gave more and FS, Chatgpt has distributed many CS. This means that strong writers may not obtain the recognition they deserve, while weaker writing may not be controlled. For students of marginalized backgrounds who often have to work harder to be noticed, it is potentially a serious loss.

To be clear, the human classification is not perfect. Teachers can accommodate unconscious biases or apply incoherent standards when rating tests. But if I reproduced these biases and does not recognize exceptional work, that does not solve the problem. It strengthens the same inequalities as so many defenders and educators are trying to repair.

This is why schools and educators must carefully consider when and how to use AI for notation. Rather than replacing the classification, they could provide comments on grammar or the structure of the paragraphs while leaving the final evaluation to the teacher. Meanwhile, ED technology developers are responsible for assessing their tools critically. It is not enough to measure precision; Developers must ask: Who is it correct and under what circumstances? Who benefits and who to leave for?

Reference data sets like ASAP 2.0, which include demographic details and human scores, are essential for anyone trying to assess equity in an AI system. But there is a need for more. Developers need access to more high quality data sets, researchers need funding to create them and industry needs clear directives that prioritize equity from the start, and not as a reflection afterwards.

AI begins to reshape the way students are taught and judged. But if this future will be fair, developers must create AI tools that explain biases, and educators must use them with clear limits in place. These tools should help all students shine, not to flatten their potential to adapt to the average. The promise of an educational AI is not only a question of efficiency. It is equity. And no one can afford to be wrong.


Get stories like these delivered directly to your reception box. Sign up for newsletter 74



Leave a Reply

Your email address will not be published. Required fields are marked *