Issue 17 Understanding Science

The Story of CRISPR: Discovery and the origin of the name

🕒 7 min

CRISPR-Cas9 technology probably needs no special introduction. After all, exactly this system for precise genome editing has started a complete revolution of genetic engineering. Still, the focus of today’s article won’t be the application of CRISPR-Cas9 in biotechnology, no matter how fascinating it gets (but no worries, we will come back to it some other time).

Today, it’s time to take a look at the background of this almost perfect molecular tool, today’s version of which is reduced to only one enzyme and one carefully picked RNA molecule.

A brief historical overview of events that have led to the CRISPR-Cas9 system will help us better understand what this system is all about. More importantly, though, it will help us get a feeling for the ways of modern science – that mutual cooperation and sharing knowledge are what all great achievements are based on.

Some really like it salty

The first person we’re going to meet is Francisco Mojica. We’re going back to 1989, when Mojica started his postgraduate degree at the Spanish University of Alicante. He was researching a species of archaea, Haloferax mediterranei, which are especially interesting because they can thrive in an environment with large amounts of sea salt, which would normally kill any other microorganism.

His doctoral supervisor had previously discovered that the amount of salt present in the growth medium of those archaea can influence the way their DNA is cut by restriction enzymes. Those enzymes are actually of bacterial origin, but are frequently used in laboratories to study genomes in vitro. Large genomes are therefore cut into smaller pieces that can be studied further.

What Mojica then discovered was an even greater puzzle – in those DNA fragments he managed to find a large number of almost perfect palindromic sequences about 30 nucleotides long. A small reminder: nucleotides are building blocks of every DNA molecule, and we have 4 different types labelled by letters A, T, G and C. In the whole genome of this archaea there are 294,888 of them, so 30 is a really small number.

To better understand palindromic sequences, we have to remember that DNA has two strands – a leading strand and the one that’s complementary to it. If we look at the sequence on the leading strand, a palindromic sequence is actually a mirror image of its complementary sequence. This is also shown in the picture below. Also, those palindromic sequences are divided by entirely different sequences named spacers. Spacers are approximately 36 nucleotides long.

An example of a palindromic sequence: GACA is the sequence on the leading strand, and it has a complementary sequence. The exact mirror image of that complementary sequence (marked 1’-2’-3’-4’) is a palindrome.

But the story gets more complicated. Mojica soon discovered that his salt-loving archaea are not special: similar structures can be found in a large number of other archaea and bacteria. It’s especially interesting that those species are not closely related, which means this structure has prevailed across the long history of evolution. In biology, that is usually hinting that this structure employs an important role in an organism.

Francisco Mojica named those genome motifs SRSR (Short Regularly Spaced Repeats), and the name has been changed in 2002 to Clustered Regularly Interspaced Palindromic Repeats (CRISPR). Around that time, scientists successfully discovered some genes that regularly come packaged with the CRISPR region: they’re called CRISPR-associated or simply cas genes.

Okay, now we understand what CRISPR stands for, but we still don’t have an answer – what’s the purpose of this unusual part of the bacterial genome?

Patience has its reward

We’re back in Alicante, and the year is 2003. In this era of scientific endeavour, bioinformatics and public databases that contain bits of sequenced genomes of various organisms are important allies.

Although palindrome sequences in the CRISPR region are certainly interesting enough, Mojica decided to dive deeper into the mystery of spacer sequences. That’s why he searched for an already existing DNA sequence from a public database that would be a good match with one of his spacer sequences. Unfortunately, his search wasn’t successful because those databases weren’t as rich with content as they are today. But luckily, Mojica didn’t give up on his quest, and the database kept growing, so it was a success after all.

The first spacer that was successfully matched belongs to the CRISPR region isolated from one strain of the bacterium E. coli. The matching sequence belongs to the genome of a bacteriophage, a certain type known as P1. A bacteriophage (or simply phage) is a virus that attacks bacterial cells. But there’s more – exactly this particular E. coli strain is resistant to the P1 infection.

Other spacers were matched with similar kinds of sequences: now it’s obvious that spacers have similar sequences to those of bacteriophages or plasmids. Furthermore, bacteria from which the spacers are isolated are somehow connected to them. Maybe it’s unusual to see plasmids next to bacteriophages, but there are actually some plasmids that can cause harm to bacterial cells after entering them.

Mojica concluded that the CRISPR region protects bacteria and archaea from various infections, like some kind of bacterial acquired immunity. Still, he was missing experimental proof for his hypothesis.

It’s all yoghurt’s fault

In this part of the story, we meet Philippe Horvath, a biologist who dedicated his PhD to lactic acid bacteria, an important group of microorganisms. He started working in Dangé-Saint-Romain, in the west of France, as a chief of a local food company’s molecular biology laboratory.

The company was manufacturing dairy products: it’s important to remember that bacteria are crucial for the production of cheese and yoghurt. That’s why every diary manufacture is very interested in keeping their bacteria healthy and ready to do the work. Hence, Horvath’s task was to develop molecular methods for identifying various strains of Streptococcus thermophilus and the phages that attack it.

The CRSPR genome region came really handy for simple genotyping of bacterial strains of S. thermophilus. Horvath could successfully distinguish between them by looking at tiny differences of their CRISPR regions. He had found out about CRISPR at a conference that was held in 2002. And over a couple of years of researching it, he noticed a correlation between his bacterial strains and the phages they’re resistant to. Horvath reached a conclusion similar to Mojica’s around the same time, but he also decided to go one step further.

In 2005, Horvath decided to do an experiment to show that CRISPR indeed serves the purpose of acquired immunity of bacteria. As an experiment, they used two different types of bacteriophages and a strain of S. thermophilus that is usually attacked by them. After infection, their goal was to isolate those cells that managed to survive. Those bacterial cells, therefore, are resistant to phage attack.

Researchers then made a comparison between the CRISPR genome regions of the resistant bacteria before and after their exposure to bacteriophages. They noticed something very similar: resistant bacteria acquired a part of the bacteriophage’s genome, and it became a spacer in their CRISPR region. Furthermore, larger number of repetitions of phage sequences as spacers correlated with greater resistance to infection. This was the proof they were looking for.

Later, they actually repeated this experiment in reverse order: they wanted to know if it was possible for bacteriophages to escape bacterial immunity and infect bacterial cells. It turned out that this was also possible, and that a phage genome mutation in only one nucleotide is enough! That’s why they concluded that this immunity depends heavily on an extremely precise match between the spacer and the genome of the attacker.

Scissors of a different kind

Another mystery that had to be solved was that of the previously mentioned cas genes. They were particularly interested in cas7 and cas9. In genetics, the gene is conventionally written in italic, all lowercase – they can be read from genome sequences. Genes carry information for the production of proteins, and proteins are what’s doing all the action within the cell. The names of the corresponding proteins are usually written with a capital letter.

It turned out that Cas7 is important for adding new spacers to the CRISPR region, but it’s not doing much when it comes to immunity in action. This is where Cas9 proteins take the spotlight. This protein is a nuclease, an enzyme that can cut nucleic acids, DNA and RNA (restriction enzymes from the beginning belong to the very same group of proteins). It’s quite appropriate to call Cas9 molecular scissors. The researchers showed that Cas9 plays an important role in bacterial immunity.

Of course, the story of CRISPR-Cas9 technology doesn’t stop here. Only after these discoveries can we follow its development. Many more steps are needed for CRISPR to be applied for a completely different purpose. But don’t worry, more about that in the next part.

Thank you for reading the first in our series of posts on CRISPR-Cas9. How did you like it? Are you looking forward to the next part? Let us know!


Lander ES. The Heroes of CRISPR. Cell. 2016 Jan 14;164(1-2):18-28. doi: 10.1016/j.cell.2015.12.041. PMID: 26771483.

Image source: Ganapathiraju, M.K et al. A reference catalog of DNA palindromes in the human genome and their variations in 1000 Genomes. Hum Genome Var 7, 40 (2020).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.