In the same spirit of my last post, here is a notebook I developed to extract information on hospital capacity from the Form 2552-10 that hospitals file regularly with Medicare. If you know how to get this information from somewhere else, know how to tangle with this data better, or know other things about it that you think I should know… please let me know!
Category Archives: disease modeling
I have recently started helping out with some computer modeling efforts to support outbreak response, and I’m learning that there are a lot of projects already underway. To learn from others, and also to perhaps be helpful to people who I don’t know who are also working on this, I’m posting this write-up of an estimate I just produced: estimates of the number of people in nursing homes by age-group and facility.
As you may know, many of the deaths so far in the COVID-19 outbreak have been among people who were in nursing homes (28 out of 37, as of March 3, 2020). How many people are in nursing homes? And, since age seems to be an important determinant of disease severity, how old are they? I used open sources to make estimates for all nursing facilities in Washington State. If this is useful to you, great! And if you know how to do this better, please let me know!
Here is a csv file you can use if you are doing disease modeling. I’ve put a Jupyter Notebook with all the code to derive these estimates on the web, and below I have collected some details on the potential data sources that this and other future estimates might use.
Medicare Minimum Data Set (MDS) 3.0
Medicare collects quality improvement data regularly from all skilled nursing facilities, and publishes summaries from this “Minimum Data Set (MDS) 3.0” exercise. You can find some information about it online: https://www.cms.gov/Research-Statistics-Data-and-Systems/Computer-Data-and-Systems/Minimum-Data-Set-3-0-Public-Reports/Minimum-Data-Set-3-0-Frequency-Report
MDS provides data in 10-year age groups, but the data on the web gives only state-level values:
|State||0-30||31-64||65-74||75-84||85-95||> 95||State Total|
Can we get more detail? This url (https://www.resdac.org/cms-data/files/mds-3.0) seems like it would have the full data file somewhere, but I was not able to locate it.
Decennial Census Summary File One (SF1)
There is also data with fine geographic precision available from the decennial census. The “Summary File One (SF1)” includes tables on the number of people living in skilled nursing facilities in each census block. This is available at the county and MSA level, stratified by sex and 5-year age groups (top coded at 85+), but it is now 10 years out of date, so perhaps it is not that useful. https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=DEC_10_SF1_PCO5&prodType=table
|SF1 PCO5 fractions in King County Nursing Homes (2010)|
It does have the sex ratio, maybe that is useful. MDS must have that, too, though. It is also available at the census block level, stratified by sex and broad age groups (<18, 22-64, 69+, if I’m reading it correctly).
Washington Aging and Long-Term Support Administration (ALTSA)
This state agency maintains a list of nursing homes in Washington State, and has bed counts for each nursing home. It does not provide an age breakdown, though. It puts WA State capacity at 19,332 beds.
ALTSA and MDS together show WA State is using 84% of available beds in nursing homes.
It seems a bit grandiose to call this a statistical model, although I think it is technically accurate to say that I have used a “first-order log-linear model” to estimate the number of individuals in nursing homes by facility and by age group. The formula is the following, where N(age, facility) is the number of individuals in a given age group and a given facility:
N(age, facility) = total number * age share * facility share
For convenience, here is a link to the csv file that I produced with this approach, and here is a link to the Jupyter Notebook with all the code to derive these estimates from open sources. If you know how to make estimates like this that are more accurate, please let me know!
Some new research that I’m excited about came out last week: Variation in life expectancy and mortality by cause among neighborhoods in King County, WA, USA, 1990–2014: a census tract-level analysis for the Global Burden of Disease Study 2015. http://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(17)30165-2/fulltext
In some ways, it is very specific to Seattle and the surrounding county: https://vizhub.healthdata.org/subnational/usa/wa/king-county
But it is also a demonstration of the “fractal” nature of population health—the variation between life expectancy from country to country around the world is big! But it is around as big as the variation between life expectancy from county to county around the United States. And what this work shows is that even in the county where I live, the life expectancy varies between census tracts almost as much as from county to county or country to country. Inequality is happening at all scales.
Here is a short paper that cites my projections of ACHD cases [link] to recruit docs into the area: https://bjcardio.co.uk/2016/04/growing-need-for-trainees-in-adult-congenital-heart-disease-in-the-uk/
Cool, it’s just what I was hoping the results would be used for.
Diagnosed and Undiagnosed Diabetes Prevalence by County in the US, 1999–2012
Laura Dwyer-Lindgren, Johan P Mackenbach, Frank J van Lenthe, Abraham D Flaxman, Ali H Mokdad
Read some of the most recently published articles from International Journal of Health Geographics.
Identifying food deserts and swamps based on relative healthy food access: a spatio-temporal Bayesian approach
It was great speaking with you. This is the paper I was talking about.
Looking forward to know more about each other’s work.
An Integrative Metaregression Framework for Descriptive Epidemiology – http://www.amazon.com/Integrative-Metaregression-Descriptive-Epidemiology-Publications/dp/0295991844
A fun pointer I picked up at the eScience institute recently is where to find constants in python: scipy.constants http://docs.scipy.org/doc/scipy/reference/constants.html
or, if you fancy units, astropy.constants http://docs.astropy.org/en/stable/constants/index.html
I wonder if this will be useful: Modeling Good Research Practices—Overview: A Report of the ISPOR-SMDM Modeling Good Research Practices Task Force-1 http://www.ispor.org/workpaper/Modeling_Methods/Modeling_Good_Research_Practices_Overview-1.pdf
It has quite a lot of best practices!