In a new paper just out, Biochemical Complexity Drives Log-Normal Variation in Genetic Expression, I explain a biological mystery: why do log-normal distributions keep showing up in gene expression data?
Anybody who's spent much time looking at gene expression data has probably noticed this: lots of distributions tend to have nice bell-curve shapes when plotted on a log scale. Consider, for example, a few samples of a gene being repressed by various levels of LmrA:
In short, these distributions are approximately log-normal, though they might also be described by one of a number of similar heavy-tailed distributions like the Gamma or Weibull distributions. Indeed, the typical explanation for gene expression variation has been that it's a Gamma distribution, based on the underlying randomness of chemical reactions causing stochastic bursts of gene expression.
What kept bugging me about that explanation, though, is that it just doesn't fit what we know about how gene expression actually works.
Read Beal's full blog post here.