During high school, I was blown away learning about the near-universal mechanisms of DNA. The same 4 base pairs encode almost all of life, and you can take DNA from a jellyfish (green fluorescent protein to be particular) and use it to make bacteria glow in the dark. It still amazes me that this actually works!
Throughout my scientific career, I became heavily influenced by the idea that changes in gene regulation (when, where, and how much a protein is produced) may be more important to evolution than changes in the actual structure of proteins (for example, King & Wilson 1975 in Science). This isn't to say that changes in proteins are unimportant (they can be very important for specific diseases: for example sickle cell disease is caused by a single base pair mutation in hemoglobin), but I've always been more interested in evolutionary differences between species and natural variation for which regulatory variation seems generally very important.
Much of my research seeks to understand how natural selection and evolutionary processes have shaped gene regulatory diversity to produce the diverse organisms and features we observe in nature. Recent studies have shown that gene expression tends to evolve under strong evolutionary constraint, although a small number of potentially important genes have expression profiles which appear to have evolved due to positive selection. However, we know considerably less about the evolution of other regulatory mechanisms (e.g. DNA methylation, histone modifications, RNA processing) and the evolution of gene regulatory responses (such as to diseases or toxic substances). Answering these questions is essential to understand how genetic variation leads to phenotypic variation - a key component in understanding the mechanistic basis for differences in human health.
To address these questions, we combine genomic data (primarily short-read sequencing) from live animals with in vitro cellular manipulations. This produces genetic and gene regulatory variation which we can then model. Some of the computational and statistical challenges of this work include accounting for batch effects (technical variation between sequencing runs), uncontrolled environmental differences between groups, cell-type differences, multi-species alignment, and evolutionary modelling which captures phylogeny and/or genetic relatedness. As sample sizes increase, efficient bioinformatics becomes a more and more critical part of this work as well.