Issue 25 Understanding Science

The perfect chaos

🕒 9 min

One of the main characteristics of all living beings, especially the more sophisticated ones like humans, is the amount of law and order that seems to exist inside every cell. The entire human body is a machine – well organized atoms that form molecules, which then form into cellular compartments which working together form a cell. Groups of cells further work together with the liquid space between them and form tissues which on a higher level become organs and organic systems that finally create a beautiful symphony called the human. And this goes for every multicellular organism.

Years and years of research have gone to investigating and cracking the code of life and today we can finally say that we are starting to understand it. Even more so, we have reached the point where we no longer only observe how molecules work, but we even started generating them. A lot had to be learnt about the biophysical nature of molecules and about the rules that govern their behavior before we could actually try reproducing that on a machine, let alone in a living organism. As crazy as it sounds, thanks to all the efforts of scientific community over the years of research, we are at the point where we know how to edit the genes in embryos and how to slow down or even completely eliminate some autoimmune diseases. Thanks to the micro- and nano- chips that imitate human organism we can test the bioavailability and toxicity of a new drug without using lab animals, we can create prosthetic parts of the body (including the retina, the limbs, the heart valves etc) and cure leukemia with other person’s stem cells. However, I can’t help but wonder, hadn’t we ignored one third of the human proteome all this time, would’ve we reached all these breakthroughs sooner?

Figure1 Representation of the structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain (figure taken from here)

The ones that got away

Now wait a minute, ignoring 30% of human proteome all this time – that’s a pretty bald statement! Bald or not, it’s the reality we got ourselves into. You see, all these nice descriptions of proteins, all the rules that apply to them, our knowledge of why they function the way they function (which enabled us to target them with drugs in the first place), all of that simply doesn’t apply to 30% of human proteome which we now describe as intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs)(Figure1). As you can already guess from their name, there is little to no order in the structures of those protein regions, which means that all that stuff we learnt about tertiary structure of proteins and how they so beautifully create those long or stacked alpha helices and beta sheets simply doesn’t apply in their case and we should instead picture them as spaghetti. As you can imagine, there is not that much law and order in the spaghetti, but since there’s so many of them, they must be functional, right? But that completely disagrees with our number one paradigm in protein biology: structure dictates function. Maybe that’s the very reason why we refused to acknowledge their existence all this time – we just didn’t know what to make of it. So how can this mischievous structures be such a big part of our proteome and what exactly do they do? Now this spaghetti visualization already implies a few things about their biophysics and behavior in the cell, which we will now discuss a bit further.

The order in chaos

The first thing that’s worth noting about IDRs is that they are much more abundant in higher order organisms – a much smaller portion of the prokaryotic proteome are IDRs as opposed to eukaryotic proteomes. Possible explanation for this leads us to another important feature of IDRs – they are very prone to mutations, which makes it very hard to align some human IDR to the same IDR in different species in a process called multiple sequence alignment (MSA). Since mutations, especially point mutations where only one amino acid changes, are actually very frequent, it makes sense that multicellular organisms are more robust and less sensitive to them – the more sophisticated organisms had to find a way to filter the truly malignant mutations from benign ones. Therefore, eukaryotic organisms don’t mind having some regions in proteins that are not very conserved, as long as those regions don’t disrupt the structure or function of that particular protein. This structural diversity is one of the main features of IDRs and it also contributes to their functionality, which we will touch upon in a second. However, it has been shown that IDRs do contain regions that have been conserved through evolution and the fact that little parts of IDRs are in fact highly protected and never mutated is even more important as it tells us that those regions are most probably responsible for their function.

Now wait a minute; how is it possible that both the fact that they have conserved regions and the fact that they are in general highly not conserved are important for their functionality? The thing is, the parts of IDR that are responsible for binding to other molecules (other proteins, other regions of themselves, DNA, RNA) are usually conserved, but everything else is evolutionary flexible, allowing them to adapt to different environments, to bind different partners with the same binding regions, to change their solubility and affinity etc. There has been some evidence that unconserved regions of IDRs actually play a huge role in choosing a binding partner and tuning the bind-dissolve dynamics. Imagine a protein that has a folded, globular region in the middle of the chain and a disordered region on it’s N- and C- terminus (which is often the case with various transcription factors, TFs). The very important thing about TFs is that they need to be activated in just the right amount at exactly the right time. Once they are activated, they shouldn’t be active for too long, otherwise they could set off over-transcription that potentially may lead to carcinogenesis. It is in fact the IDR regions of those TFs that have the perfect affinity to the DNA-binding region of that same protein – they are mostly in the autoinhibitory state, bound to the DNA-binding region until this TF comes near the DNA – because DNA binds this globular DNA-binding region with higher affinity than it’s own IDR region, the autoinhibition is stopped, TF binds to the DNA and activates transcription. However, it turns out that the IDR region of this TF has high avidity, meaning that if you have two IDRs in the proximity, they are going to push the DNA away and bind to the DNA-binding region of TF, initializing the autoinhibitory process all over again.  

Patterns that give them away

Even though IDRs can vary greatly in their structure, they still follow certain laws of biophysics based on which we can tell them apart from the globular regions. First of all, if you think about a globular protein and what exactly makes it globular (what forces it to fold into a tertiary structure), you hopefully come to the conclusion that it’s different inter- and intra-molecular bonds and forces that enable this folding. Hydrogen, Van der Waals, hydrophobic, electrostatic bonds between close and far away residues in the structure all contribute to the final folded structure of a protein. The secret is in finding the right order of amino acids which would then influence one another and eventually force the peptide chain to fold. In that way, for example, the hydrophobic residues are usually hidden deep inside the folded protein, while charged, hydrophilic residues are exposed on the surface. It makes sense then that IDRs would follow a completely opposite logic – since they exist in the form of spaghetti (or extended chains), it means that all residues are exposed to the solvent. Since they don’t fold, we can assume that the majority of amino acids in the IDRs are then hydrophilic. In fact, this has been experimentally proven – IDRs depend on hydrophilic residues and hydrogen bonds hat are formed along the backbone of the protein. Now because they don’t fold, they have much more structural freedom, which is one of the reasons why you can’t catch them in a crystal structure with Cryo-EM for example. They don’t exist in one folded state with minimal energy, but rather quickly shift between a few different possible conformations, making it very hard to catch each conformation independently. Therefore, instead of having a “structure” like folded regions, IDRs have “conformational ensembles”. Inside of the ensemble, there might be certain conformation that they visit more often than others, which can be nicely expressed with a probability distribution, but all conformations inside of the ensemble are plausible.

AlphaFold2 is wrong?

Interestingly, if you take a look at the structure prediction that the current state-of-the-art structure predictor AlphaFold2 (AF2) gives to some IDRs, for example alpha-synuclein, you notice that AF2 puts a little helix in the structure, and it’s pretty confident in that (pLDDT score > 70 means confident, >90 means highly confident). It  turns out that AF2 does this for a number of different IDRs. If we compare the NMR data of those IDRs in their free unbound state to the data of their bound state (after binding to a partner), we can see that in the case of alpha-synuclein, a helix is formed when binding to a partner. The thing is, some IDRs are so called “conditional folders” which means that upon binding to their partner, or upon a posttranslational modification, one part of IDR will fold into a simple secondary structure like alpha helix or beta strand. The remarkable thing is that AF2 actually learnt to predict this conditional folding, without ever seeing this types of IDRs in it’s training. Now this is a topic for another time so for those of you interested in more detailed analysis of this AF2 performance, I strongly recommend this paper.

Figure2. A) AF2 prediction of alpha synuclein structure B) predictions of disorder of alpha synuclein using different disordered predictors (SPOT-Disorder predictor shown in red is the most accurate one) C) AF2 pLDDT score of confidence in prediction – pLDDT>70 means AF2 is confident in predicted structure – we can see that AF2 gives highly confident prediction of secondary alpha helix structure even though alpha synuclein is fully disordered D) Secondary structure propensity (SSP) based on NMR data of unbound alpha_synuclein (yellow), bound alpha-synuclein (purple) – we can see that AF2 prediction (blue) corresponds to the experimental evidence of existing alpha helix in the bound form of alhpa synuclein (photo taken from here).

Hidden, forgotten or forbidden treasure?

Even with this brief introduction to the field of IDRs, I hope you can foresee the vast possibilities they offer. Since we started paying proper attention to them some 20 years ago, we realized that they might be the key to many unanswered questions about pathogenesis of various diseases (alpha synuclein for example is in the center of Parkinson’s disease), but also present themselves as potential new drug targets. For years we neglected them, cut them out when analyzing globular proteins, thinking they were just unimportant junk. But now, we started to slowly realize the treasure we neglected – out of ignorance, because of simplicity, out of fear of the unknown – who knows why. The important thing, though, is that IDRs now have our full attention, showing us just how beautiful chaos can be.

By Đesika Kolarić

Đesika is a pharmacist with an exceptional love for science. Apart from clinical pharmacy, her biggest love is computational biology, which she's currently pursuing through a predoctoral training at Medical university Graz. She loves long walks accompanied by her dog and a good beer.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.