I’ve got to figure out what people mean when they say “fixed effect” and “random effect”, because I’ve been confused about it for a year and I’ve been hearing it all the time lately.
Bayesian Data Analysis is my starting guide, which includes a footnote on page 391:
The terms ‘fixed’ and ‘random’ come from the non-Bayesian statistical tradition are are somewhat confusing in a Bayesian context where all unknown parameters are treated as ‘random’. The non-Bayesian view considers fixed effects to be fixed unknown quantities, but the standard procedures proposed to estimate these parameters, based on specified repeated-sampling properties, happen to be equivalent to the Bayesian posterior inferences under a noninformative (uniform) prior distribution.
That doesn’t totally resolve my confusion, though, because my doctor-economist colleagues are often asking for the posterior mean of the random effects, or similarly non-non-Bayesian sounding quantities.
I was about to formulate my working definition, and see how long I can stick to it, but then I was volunteered to teach a seminar on this very topic! So instead of doing the work now, I turn to you, wise internet, to tell me how I can understand this thing.
I read some good practical advice about when enough is enough in Markov Chain Monte Carlo sampling this morning. In their “Inference from simulations and monitoring convergence” chapter of Handbook of Markov Chain Monte Carlo, Andrew Gelman and Kenneth Shirley say many useful things in a quickly digested format. Continue reading
I re-read a short paper of Andrew Gelman’s yesterday about multilevel modeling, and thought “That would make a nice example for PyMC”. The paper is “Multilevel (hierarchical) modeling: what it can and cannot do, and R code for it is on his website.
To make things even easier for a casual blogger like myself, the example from the paper is extended in the “ARM book”, and Whit Armstrong has already implemented several variants from this book in PyMC. Continue reading
Filed under MCMC, statistics
I don’t feel like having that post about how big things are brewing in US health care reform on the top of my blog anymore, so here is a quick replacement: a ranking paper that caught my eye recently on arxiv, where computer scientists is applied to politics: On Ranking Senators By Their Votes, by my fellow CMU alum, Mugizi Rwebangira (@rweba on twitter).
The Chronicle of Higher Ed has a short piece on public-service applications of computer science that are coming out of a class called Computing for Good (C4G) that TCS star Santosh Vempala co-taught at Georgia Tech last spring.
This is an idea that is emerging in several ACO-related disciplines. Manuela Veloso has been running a similar program at CMU called V-Unit, Karen Smilowitz and Michael Johnson held a session at INFORMS 2007 on community-based operations research, and in 2006 student statisticians started a network of volunteer consultancies called Statistics in the Community.
It’s great to see a tradition of “pro bono” work developing in theoretical fields. It’s not just a way for lawyers to assuage their consciences anymore.
Science News recently ran an article
on the health statistics work and data visualization work of Florence Nightingale. It’s fun for me to learn about this history, since I am such a recent immigrant to the land of health metrics. Nice quotes from Nightingale’s statistical mentor in the piece, too:
You complain that your report would be dry. The dryer the better. Statistics should be the dryest of all reading.
The graphics in the Science News article are from an educational project of the Statistics Lab at the University of Cambridge called Understanding Uncertainty. It seems like Nightingale’s coxcomb it is a well debated form of infoviz over at the Edward Tufte Discussion Board.