Whole genome sequencing

An Exeter senior maps the fruit fly genome. Along the way he finds new mysteries to explore.

Nicole Pellaton
July 29, 2020
Students prep fruit fly DNA in wet lab at Exeter

At the end of winter term, students in BIO 586: Molecular Genetics prep fruit fly DNA for the first whole genome sequence ever created at Exeter.

It’s 10 a.m. on Sunday, a few days before the end of winter term, and 25 students are gathering in classroom 307 of Phelps Science Center to do something that no one at Exeter has done before. As cool light pours in through the windows, the students unbundle from coats and sweatshirts. Twelve sit at the Harkness table in the inside corner of the room. The others perch on high stools or lean against lab tables. Dotted throughout are indicators of the room’s focus: beakers, plastic models of chromosomes, 12 microscopes, a near-life-size human anatomical model. The students, enrolled in two sections of BIO 586: Molecular Genetics, are here on Sunday because it’s the one day that allows for extended unscheduled time. They have the relaxed poise of teenagers unhooked from weekday or Saturday demands.

Today’s task is the “wet lab” prep work that will isolate, clean and amplify (or copy) small snippets of DNA from four lines of Drosophila melanogaster, the common fruit fly. These snippets will be packed with dry ice and shipped to a lab for sequencing. The hope is that the resulting data will be accurate enough to allow an Exeter senior, Sanath Govindarajan, to analyze it and assemble a whole genome for one or more of the lines as a spring-term project. A lot is riding on today.

All eyes are on Science Instructor Anne Rankin ’92 as she describes the protocol. There will be four procedures, she explains. You will work in pairs at your own pace, but we will wait until everyone is done to start the next procedure. Avoiding contamination is essential, so wear gloves and use new pipette tips at each step. “We will spend hours in the lab, and then you will end up with less than a drop in a vial,” Rankin says with a broad, encouraging smile.

Students eagerly file into the adjoining lab, find spots at one of the six large black lab tables, don gloves, and lean down over tubes of DNA. Thumbs pump pipettes as liquids are precisely measured and released into vials. Rankin works the room to inspect progress, lend a gloved hand or answer questions. The students gravitate to the incubator and thermocycler (a device that raises and lowers the temperature of DNA samples in preprogrammed steps) as they work through the 10-step procedure. The first pair finishes at 10:25. Twelve minutes later, everyone is back in room 307.

“Your DNA is getting snipped up!” Rankin says before she launches into instructions for the second procedure, which uses a specialized magnetic rack to separate the DNA from the liquid used to suspend and clean the fragments. “Remember to pipette slowly!” she says as the students stream back to the lab. At 11:45, two boys high-five as they complete the 20-step procedure.

Selma Unver in science lab at Exeter

By 2 p.m., everyone has finished the third procedure — amplification of the fragments, the most sensitive of the protocols, which involved a friendly scrum at the centrifuge as nine students waited for their 30-second turn, and lots of excitement at the thermocycler (“It’s the big moment!” one boy announced) — and they are nearing the end of the final 33-step procedure: purifying the DNA libraries.

The energy in the room is palpable.

At 2:09, Ellie Griffin ’21 and Audrey Choi ’20 are the first to place their small vial, carefully, into the carrier that will ultimately hold 12 precious drops of liquid. “Did you mark your vial correctly?” Rankin asks repeatedly as pairs approach, aware that mislabeling could wreak havoc with the data. Within minutes, the carrier is full. The wet lab is over.

Why genome sequencing?

In 2012, Exeter and Stanford University initiated a collaboration dubbed “StanEx,” which brought real-world genetics research into the Exeter curriculum. StanEx was originated by Dr. Seung Kim ’81, Stanford professor of developmental biology and of medicine, along with Rankin and Science Instructor Townley Chisholm. The first StanEx course, which continues today, teaches Exonians how to genetically modify and breed new stocks of fruit flies. To create the stocks, students insert into fruit flies an artificial transposable element (a small chunk of DNA that can change its position within the genome) that carries a “switch” to highlight proteins in fluorescent green, allowing researchers to visually track specific genes in a fly’s development. At Kim’s Stanford lab, scientists use these stocks to research pancreatic disease and diabetes. The stocks, all derived from the StanEx-1 base line, are also made available to researchers around the world.

In 2019, a group of almost 90 Exonians, assisted by three science instructors — Rankin, Chisholm and Erik Janicki — along with Kim and Dr. Lutz Kockel from the Stanford lab, published an academic paper in which they addressed a perceived anomaly in the Exeter-created stocks. The process of inserting the transposable element was meant to support totally random locations, but analysis indicated a “hot spot.”

“The fact that we were getting all these things in the same spot was, first of all, a disappointment because you don’t need something in the same spot over and over again. But then it became just plain interesting,” Rankin says. “We were observing that there was this little piece of DNA that was calling our piece to it, and basically they were switching places. It was a really, really cool outcome. I mean, that’s real science.” The question was: Why is this happening?

The Stanford lab tested a few hypotheses, to no avail. “So then comes the next logical question: ‘What else is in the DNA structure that we didn’t know?’” Rankin explains. “We tried to buy something clean, that didn’t have funny background stuff that was going to act like DNA magnets. And that’s where the whole genome sequencing project comes in, because with sequencing, you get all of the DNA. You can look at it and see, ‘Well, what is actually in there?’”

Anne Rankin in science lab at Exeter

Teacher as learner

You could say Rankin’s route to Exeter dates to the 1950s when H. Hamilton Bissell ’29 offered a newsboy scholarship to her father, Dr. Kenneth Rankin ’59, then a newspaper delivery boy in Cleveland. She and all her siblings in turn attended Exeter. In 1999, at a crossroads, Rankin applied for a teaching job at Exeter and to her top-choice doctoral program. She was accepted by both and decided to teach at Exeter to “get it out of my system.” She’s still working on that.

“You never get bored in a Harkness class because you never quite know what the students are going to ask,” she says. “And you never know how things are going to unfold in front of you.”

To observe Rankin in the classroom is to see many roles: teacher, researcher, scientist and learner. “I think kids respond super well to me when I invest everything I have into learning something new and share with them the places I see for more growth in myself,” she says. “I try to model how to learn rather than present myself as someone who has learned.”

She jokes easily with the students (“You are good at many things,” she told one student during the wet lab, “but slow pipetting is not one of them!”) while making it clear that the work in the classroom is important and of much greater value than the topic at hand. “By becoming learners shoulder-to-shoulder with our students, we authentically model the characteristics we hope to instill — humility, empathy, inquisitiveness, openness, adaptability and continual growth,” Rankin wrote a few years ago in a note to herself that she titled, “Paradigm shift on the concept of teacher.” As one of the founding faculty members of the StanEx project, she has helped it expand over the years into more classes at Exeter, summer internships at the Stanford lab for Exeter students, collaborations with other high schools, and the whole genome sequencing project. All of this has helped feed Rankin’s hunger for “a culture of continual exploration and experimentation.”

I try to model how to learn rather than present myself as someone who has learned.”
Anne Rankin

Central to Rankin’s approach is balancing failure with success and, ultimately, encouraging students to feel comfortable taking risks. “The dance for me as a teacher is always how much failure can they tolerate. What’s the trade-off? They can’t fail all the time. The authenticity is not worth it then. But if you tip the scale the other way and there’s no authenticity but they always succeed, that’s not really worth it either. That’s like a cookie-cutter kit lab.”

For the students in BIO 586: Molecular Genetics, she carefully planned a trajectory. “We worked our way from things that I was pretty sure would work, all related to the Stanford project, all the way up to the thing that I was pretty sure would fail. I feel that it’s my job to bring the students on that development of building their tolerance for failure.”

To Rankin’s surprise, the data that came back from Genewiz, the lab that processed those 12 precious drops of liquid, was excellent. “It looks as though it worked — for every group!” she wrote on March 20 in an email to Science Department Chair Alison Hobbie. “I honestly cannot believe it and had not even thought to hope for this level of success.”

Sanath Govindarajan on the Exeter campus.

Sequencing from home

Sanath Govindarajan ’20 was ready to dive into the data from Genewiz as soon as it arrived, but he faced a problem. Exeter’s campus had closed down because of the coronavirus pandemic. On March 18, Govindarajan received an email stating that distance learning would extend to the end of the term. This was far different from the plan he had laid out in his senior project proposal, which included upgrading a server on campus with software to allow for serious number crunching.

He soon discovered that the data set from Genewiz was massive, “millions and millions and millions of lines” representing almost 150,000 megabases of DNA data. (A megabase equals 1 million base pairs, or individual “rungs” in the DNA double helix; in contrast, a fruit fly genome comprises 175 megabases.) The data consisted of sequences of the letters A, C, G and T, the bases of DNA. Some sequences overlapped, some were duplicates. “It’s impossible to analyze this manually,” Govindarajan explains. “You need computer programs to filter it and make sense of it.” When Govindarajan tried to download the data to his PC, he was obliged to let it run all night. “Just opening up the file in a text editor caused it to crash,” he says.

I had to learn how to create and refine a question, so it really felt like discovery.”
Sanath Govindarajan

Govindarajan structured his senior project as three major activities: quality control of the data; developing a reusable “pipeline” of software that can analyze and align the data to assemble a whole genome; and comparing the StanEx-1 genome to the published Drosophila melanogaster genome to see if there are any significant differences.

He had invested long hours researching open-source programs that could perform the complex computing tasks — alignment and transposable element mapping — but they had been designed to run on a server. The path forward was to rewrite them. He started by converting them to a lower- level programming language to gain more control and changing them from single- to multi-thread, “so that they can do multiple things at once.” Then he attacked memory. “I found a way to be able to process the data without storing it intermediately,” he says. “This is what really cut the memory usage and allowed me to run it on a PC.” Govindarajan estimates that he wrote a few thousand lines of code, rewriting some of the programs more than once. “But more than the lines of code, it’s about making that code work with existing tools and putting them together. If you consider all of that, it’s been pretty complex.”

Govindarajan’s approach to this hurdle is typical of how he handles problems, Rankin observes: “He sees the outcome for himself. And the outcome is not so much a better working product, but how much he will learn in the process of doing it.”

Nice to have company

Since the campus shutdown, Govindarajan has been working next to his father, both seated at desks in a sun-filled home office just a few miles from Exeter’s campus. Behind him is an upright piano. (Govindarajan’s passions include playing the piano and the violin; he also loves plane spotting and learning languages — he speaks three: Tamil, his native tongue, French and English.) He can turn to his father any time for a chat, a question or to share a success.

Early in the term, Govindarajan set up a rhythm that included a weekly Zoom call with Rankin, his senior project adviser, and Kockel, research scientist at the Stanford lab. “A lot of times, we would combine our forces and talk about not only how to use computers to analyze the data, but what the bigger picture of it is in biology and how the two interrelate,” Govindarajan explains. “At the beginning, I would show my results, and they would have a hard time understanding them because they’re biologists, not computer scientists. I would show a giant table full of information, and it would be incomprehensible. I learned how to present my data in a way that other people can understand. On the flip side, I also learned how to ask for more clarification when they were talking about something that I didn’t really understand.”

Fly or human?

The data came back from Genewiz with high marks, but it still needed to be tested to verify that it was, in fact, fruit fly DNA and not from some other source (human cells or yeast, for example). To do this, Govindarajan compared the millions of snippets, or “reads,” of DNA to the reference Drosophila melanogaster genome (published as one of the first whole genome sequences two years before Govindarajan was born). Snippets that didn’t find a match against the fruit fly were compared to other genomes to determine their source.

The next step was an alignment that “takes all the reads, which are just randomly located everywhere, and it tries to say, ‘Where is this read found within the genome?’” Govindarajan explains. “We were really only expecting something like 60% of the reads to align, but we got over 97%,” an indication that the data was extremely high quality.

After that, the focus turned to matching the validated reads against specific locations in the genome. This exercise generated the whole genome sequence. Based on that, Govindarajan could undertake the process of mapping transposable elements and research differences between StanEx-1 and the reference genome.

Surprising findings

On June 1 at 4:30 p.m., more than 20 people sign on to Zoom to watch Govindarajan present his project. Kim and Kockel from Stanford are there, along with Exeter faculty and staff, a handful of Exeter students, and teachers from other schools involved in the StanEx project.

Ten minutes in, Govindarajan, wearing glasses and an open-necked light-blue shirt, shows visible excitement as he ramps up to his findings. He compares the quest for the hot spot to sailing into a tiny island in the ocean. We discovered the island by chance, he explains, while making the insertions in the StanEx-1 stocks. With the StanEx-1 whole genome sequence, it’s as if “we’ve taken a satellite image of the entire ocean and we were able to find this island at exactly the spot where we expected it. That lets us know that our entire process is working.”

Sanath Govindarajan on a Zoom call with Exeter faculty and students.

With the proof of process established, Govindarajan announces that he has discovered two unpredicted characteristics in the DNA, both of which he hopes to research further over the summer. The first is that the StanEx-1 strain has significant differences (more than 200 new transposable elements) when compared to the published Drosophila melanogaster genome, a finding that surprised even the scientists at Stanford. The second is that there is evidence of possible other attractor sites, in addition to the known hot spot, within the StanEx-1 line.

‘A pure question’

During the Q&A that immediately follows, Kim congratulates Govindarajan on developing the workflow and pipeline to be able to analyze a whole genome sequence on something as complex as the fruit fly, adding that the project was “audacious” and could well “have gone awry.” Kim says: “The key question would be, ‘Who’s already done this or been able to use whole genome sequencing as a tool to assess something like the metastability of transposable elements in the species?’ … It’s a pure question in a way. … It’s more, ‘What is the natural history of this thing?’ I think if the answer is there’s nothing published on that, then that’s an opportunity.”

“I have not found a single paper mentioning a single line of explanation,” Kockel pipes up, referring to the existence of 200 natural transposable elements in StanEx-1 that are not in the reference genome. “You can take two views to this: [One is that] this is just mind-bogglingly complex. The other side is it’s really exciting because I think this is actually a novel thing that is not described at all. I hope I don’t let my excitement gallop away with me here, but this is really exciting data.”

As questions wind down, Govindarajan relaxes. He may be thinking already about the summer, when he will have the chance to explore further for the sheer sake of learning. Where do those 200 transposable elements exist? Do they have an impact as a controlling force to explain the hot spots observed in StanEx-1? Will a new fly line (currently called StanEx-4) resolve the issue with the hot spots observed in StanEx-1? For Govindarajan, these and other questions are starting points for the next few months’ work. He will undoubtedly enjoy extending his project, which he identifies as one of the most “thrilling” things he has ever done: “I had to learn how to create and refine a question, so it really felt like discovery. … This is new, cutting- edge research that we’re doing. The results that I find have not been seen by any other person.”