Chatbots in class?

History Instructor Betty Luther-Hillman has thoughts about artificial intelligence and essay writing.

Betty Luther-Hillman

October 31, 2023

This image was generated by DALL-E, an artificial intelligence system, based on the following prompt: Paint a female history teacher with short black hair wearing dark-framed glasses standing in front of the Academy Building at Phillips Exeter Academy in the style of Claude Monet.

Throughout my time at Exeter, history classes and essay writing have been inseparable.

My own public school history class assessments mostly consisted of multiple-choice exams, memorizing facts and circling the correct answer, occasionally compiling them in a string of sentences that passed as an essay. But at Exeter, we strive to teach history as a means for developing critical thinking skills, emphasizing use of evidence, engaging narration, and close reading of sources. And while there are many ways to hone these skills, writing has been a crucial one; it’s hard to imagine teaching history at Exeter without having a stack of essays to grade at some point each term. But last year, ChatGPT threatened to upend everything.

An acronym for Chat Generative Pre-trained Transformer, ChatGPT is perhaps the most infamous version of a number of new artificial intelligence chatbots that can converse using “humanlike conversational dialogue.” According to one tech website, “the language model can respond to questions and compose various written content, including articles, social media posts, essays, code and emails.” Basically, ask ChatGPT a question, and it will give you an answer, sometimes a lengthy one. Shortly after it was released last November, I decided to try it out by asking it to write the essay my students were currently working on. I typed in the essay question, and in a matter of seconds, the interface produced what seemed to be, at first, a well-organized and cogent three-page essay.

I’ll be honest that I had a moment of alarm. Was it possible that every student in my class could simply spend two minutes using ChatGPT and come away with a submittable essay? But, as I read the piece more carefully, I became less impressed. The analysis was general, vague, and lacked specific examples. When I prompted the interface to provide more specifics, problems arose. Scenarios and page numbers were inaccurate, and the writing did not clearly link examples to analysis. We now know these problems to be widespread; one lawyer submitted a ChatGPT-written brief that cited imaginary court cases, and librarians have received bibliographies of sources that do not exist.

Feeling confident that the essay assignment was secure, I decided to go bold: I walked into class and projected ChatGPT for the students to see. “I know you’ve heard about this, but I just want to show you why it’s no good,” I explained as I entered the essay prompt. As the screen started writing, though, the reaction of my students was not what I wanted.

“This is amazing!” several students said. “I’m never going to write an essay again,” one student remarked. (Perhaps my boldness rubbed off on them.) Even as I tried to point out the vagueness, generalizations and analytical flaws, the students remained captivated. The best I could do was to remind them of our new history department policy that using AI was simply not allowed, but I left the class wondering if they had gotten the message.

Despite my chagrin in that moment, I still have hope for the longevity of human-written essays, and I plan to continue assigning them. After all, I have high standards for my students: to distinguish evidence from generalizations; to support their analysis with strong, specific evidence; and to inspect the texts they read for these details, too. I haven’t yet seen evidence that AI can do this type of analysis well; AI-generated text seems to specialize in writing the generalizations that make me comment in the margins, “What evidence supports this claim?”

Most importantly, I want my students to write about history with accuracy and nuance. “Accurate” is the first category on the rubric I give my students, and I weight it twice as much as any other category when I grade essays. And despite the promises that technologies will improve over time, the “bugs” of inaccuracy I have described seem to be “features” of AI-generated text that works through prediction. But even if AI-generated text is accurate, I want my students to be able to explain how we know it’s accurate, and that requires attention to sources, inspecting them for trustworthiness and limitations. Arguably, in an era where “information” (accurate or not) is just at our fingertips, evidence-based essay writing using credible sources is more important than ever. Until we can trust the accuracy of AI, its uses for history essays will remain limited.

As a history teacher, I’m also skeptical about the use of AI for deeper historical analysis, for a simple reason: Humans are not algorithms. Despite what many nonhistorians believe, human behavior is not easily predictable. Indeed, that’s what makes the study of history so interesting; humans over time have made surprising, impressive, disappointing, maddening and fascinating decisions, some of which parallel the actions of their predecessors and some of which are unique. I want my students to study these people of the past closely, dig into the details of their decisions, and understand how the world around them shaped their actions; it’s not clear to me that a predictive algorithm can inspect the specifics of the past with this level of rigor.

This does not mean that I plan to forever ban AI from the classroom; like any technology tool, it has its uses. I’ve found that it provides good basic descriptions of historical events, for example, not unlike an encyclopedia (or Wikipedia, for that matter). But just like our math teachers who still insist that certain assessments are “no calculators allowed,” I want my students to recognize the errors and flaws that can occur if one is not using the technology carefully. I’d love to see my students annotate a ChatGPT-written essay, for example, using the margins to add evidence or critique its claims. While it’s important to learn how to use the technology, it’s just as important to develop the analytical skills to recognize when the technology is incorrect, the same way students might punch the wrong numbers into a calculator but know enough math to realize they made an error.

My most important goal as a teacher is to get my students to think, and I’ll keep striving to create assessments and provide feedback to sharpen their thinking, so that when they use AI technology, they can use it well and avoid its pitfalls. Even if that means I’ll still have that stack of essays to grade.

Betty Luther-Hillman is the Lewis Perry Professor in the Humanities and an instructor in history. She has taught at Exeter since 2011 after earning a Ph.D. in history from Yale University.

This essay was first published in the Fall 2023 issue of The Exeter Bulletin.

Make a difference

Chatbots in class?

More to explore

History on the edge of the woods

Lessons in empathy and identity

Make a difference

You are here

Chatbots in class?

More to explore

History on the edge of the woods

Lessons in empathy and identity