statistics from the home of two statisticians

Statistics @ Home

last update:

Bayesian Decision Theory is a wonderfully useful tool that provides a formalism for decision making under uncertainty. It is used in a diverse range of applications including but definitely not limited to finance for guiding investment strategies or in engineering for designing control systems. In what follows I hope to distill a few of the key ideas in Bayesian decision theory. In particular I will give examples that rely on simulation rather than analytical closed form solutions to global optimization problems. My hope is that such a simulation based approach will provide a gentler introduction while allowing readers to solve more difficult problems right from the start.

For users of PhILR (Paper, R Package), and also for users of the ILR transform that wan to make use of the awesome plotting functions in R. I wanted to share a function for plotting a sequential binary partition on a tree using the ggtree package. I recently wrote this for a manuscript but figured it might be of more general use to others as well. In its simplest form a sequential binary partition can be represented as a binary tree.

Lately I have been working on figures for a manuscript. In this process I created a few visualizations that I thought might help others visualize the Multinomial distribution. I will focus on describing how counting processes introduce uncertainty into estimates of relative abundances and I will end with a discussion of how understanding the Multinomial has impacted my view of analyses of sequence count data (e.g., data from microbiome surveys, RNA-seq, and more).

First things first, Gauss is our dog. Since I am able to work from home, my dog Gauss and I spend a lot of time together. As a result, I like to think I know why he does what he does. But of course I will never really know - though, it’s nice to think that I do. Both of us being a creatures of habit, we have fallen into a nice routine during the day - one where he sleeps the day away and comes to get me around 4pm for some outdoor training/playing. I have noticed that whenever I do anything interesting or out of the norm, he is right there, waiting to see if he can benefit from the activity. Most remarkably, it feels like whenever we are in the kitchen, he sits down right in the middle of everything waiting for scraps and food that drops on the floor. I know Gauss loves me, but I wonder if I am more valuable to him in certain rooms? Does he “love” me more in the kitchen?...

In this post I describe an algorithm for clustering regression data that is based somewhat on K-Means. I cooked it up yesterday when looking over Cross Validated questions. A very smart professor at Duke has recently informed me that this is basically a mixture of regressions model (or a mixture of experts). So, don't I feel silly with the title for this post. Still I left it in to grip the readers attention! (Is it working?)

Following up on a recent post on limitations of the ALR and Softmax transforms, I wanted to briefly show how we can derive an Isometric Log-Ratio transform from the Additive Log-Ratio (ALR) transform.

I wanted to write a quick post responding to a question that we received about our last post (Error Analysis Made Ridiculously Simple). A reader asked us to give some more detailed examples of how to estimate uncertainty/error in more complicated experimental designs. My response in short - "When in doubt, try to collect replicate samples in an appropriate way and try to think of ways to benchmark your measurements against known standards." Beyond this somewhat cryptic answer I will try to give a few examples that should be a little more clear and I will also at the end try to give a few words on accuracy vs. precision which I have in the past found can inspire some ideas.

All measurements have uncertainty. This is not a subjective opinion but an objective fact that should never be ignored. In light of this, I have always been curious about how infrequently uncertainty is actually taken into account in science. In this post I will advocate the use of simple simulation studies for error/uncertainty propagation.

Droplet-based microfluidics are emerging as a useful technology in various fields of biomedicine. Both droplet digital PCR and droplet based culture methods require that droplets are created with either a single DNA molecule or a single cell per droplet. Obviously it is difficult to individually place DNA molecules or cells into droplets, instead people turn to stochastic models to estimate the distribution of cells per droplet, tuning the experimental parameters to achieve an acceptable distribution. In this post I derive a Poisson approximation to this process and demonstrate how to calculate quantities of interest under uncertainty in lab measurements.