Why do symmetric and regular shapes abound in biology? Using arguments based on algorithmic information theory, we explain why a preference for simple shapes often occurs in biological systems. Our theory is validated via predicting the shape frequencies for naturally occurring protein assemblies, non-coding RNAs, self-assembled tile shapes, and a cell cycle mathematical model.
Reading time 4 min
published on May 26, 2023
Looking at nature we see symmetric arrangements in flower petals, simple spirals in snail shells, repeating branching patterns in fern leaves and lung structures, and regular arrangements of proteins in many biomolecules. But why are these symmetric and regular structures so common in biology? There may be some functional advantages to these types of shapes, but their ubiquity also hints at a more general cause.
Adopting an information perspective, in a recent paper we reported that because simple, regular, and repetitive shapes require less specific information, genomic programs for these shapes should be easier to `find’ via random mutation during evolution. In other words, random mutations are more likely to chance upon a short genomic program than a long one.
As an example, consider the problem of making a protein capsid (cage or shell) to house a virus. The capsid could, in principle, be constructed in a wide variety of different shapes. However, not all shapes impose equal information requirements on the genome: simple, repeating, or symmetric shapes can be encoded efficiently in the genome by reusing the same building blocks again and again. Complex, irregular shapes require specifying many different blocks and bonding interfaces. The highly symmetric icosahedral shape (having 20 triangular faces) that viral capsids often adopt solves the problem of housing a virus in an information-efficient way. We argued that this type of efficiency means that simple geometries like the icosahedral capsid are more likely to appear in evolution.
This intuitive argument provided a good starting point for thinking about the evolution of simplicity, but to form a general theory and extrapolate to other biological cases we needed a formal framework and some equations.
A precise notion of complexity is described in a field of theoretical computer science known as algorithmic information theory (AIT). In AIT the quantity known as Kolmogorov complexity, K(x), quantifies complexity in terms of the compressibility of a pattern. For example, the pattern x=010101…0101 is highly compressible, and is therefore considered simple. The random string y=010001101011001011000111, for example, is not compressible and therefore is deemed complex.
A result from AIT says that, in a very abstract and general way, random programs are biased towards producing simple outputs. Put differently, this says that the vast majority of programs will produce simple output patterns. Directly applying this AIT result to biology is not possible for several reasons. Despite this, there remains a compelling analogy between (a) a computer running a program to generate some output, and (b) a DNA program producing some biological shape via development. Hence, we surmised that something like a bias towards simple outputs (shapes, arrangements, etc.) in biology may well remain.
In a separate article
we presented a practically applicable complexity-probability equation, based on the aforementioned AIT result. Our equation applies to a wide class of input-output maps. It states that if inputs (like gene sequences or model parameters) are randomly sampled, high probability patterns must be simple, while complex patterns must have low probability. But some simple patterns can have low probability. We called this phenomenon simplicity bias
Armed with some intuitive arguments and the simplicity bias equation, we could now test this general theory of simplicity bias in biology. We investigated naturally occurring protein complexes (self-assembled multi-protein structures) obtained from a protein database, natural non-coding RNA structures, a detailed mathematical model of the yeast cell cycle, and a lattice tile self-assembly system. In each case, high frequency shapes were simple, and complex shapes were far less likely. Moreover, the natural data frequencies were well described by our equation.
It is quite striking that such abstract mathematical equations based on information theory can make predictions about the `messy’ world of biology; after all, many different factors contribute to the frequencies of shapes in nature, including functional aspects and effects of natural selection. AIT suggests that different shapes and patterns have, in a sense, `intrinsic’ probabilities, and it seems that these probabilities are discernable in the natural world.
Of course, we did not claim that all of biology is simple. In fact, paradoxically, in our paper we also proposed that simplicity bias can support the emergence of complexity at a higher, organism level. For example, simple shapes are often more modular and robust to genetic mutations, and simple traits require less genomic encoding, allowing for multiple functions to be encoded within a single genome, thus allowing for complex organisms.
Our current work continues to explore other aspects of evolution, biology, and other natural sciences through the lens of information theory arguments.
Johnston, I. G., Dingle, K., Greenbury, S. F., Camargo, C. Q., Doye, J. P. K., Ahnert, S. E., & Louis, A. A. (2022). Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution. Proceedings of the National Academy of Sciences, 119(11). https://doi.org/10.1073/pnas.2113883119