Simple visualizations of near-current opioid data.
Near current? Yes, I am pulling data from the CDC's WONDER (Wide-ranging Online Data for Epidemiologic Research), which contains Multiple Cause of Death files to 2016. Although the Division of Vital Statistics has reported the 2017 figures (see https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm), these have not made it into WONDER yet.
The analysis is pretty straightforward in R's tidyverse.
- The file prep only requires (1)
filterto remove the data's embedded totals and (2)
selectto get the variables of interest
- A single facet-wrapped plot. Note: I'm not using qplot; I want to get the full flexibility of ggplot.
But note! Gah, the tidyverse is still not quite there! Unlike pandas, which
can skip headers and footers,
readr can only skip headers. And I found a
dsicussion on some tidyverse support board that said this would be forever so,
as one can reasonably hack it using some pipeline that involves read_lines and
filters; also it would be kinda messy to fix properly. Hmm. I think it is fair
to say that R != python.
It is immediately obvious that NM has many counties with small populations. These are the blank graphs in the faceted display: their populations are so low that the data is suppressed. Of the 5 that have data, Rio Arriba stands out. It has a mortality rate of 86.3 per 100,000 standard population in 2016. That is crazily high: the national average is 19.8. Española is largest city in Rio Arriba and it is famous for multi-generational opioid abuse.
It is going to be interesting to see what the 2017 figues are like. According to the email I got back from the CDC yesterday, those will be released in WONDER later this week.
I'm mainly interested in this because of my upcoming call with the Community Data Platform people on Nantucket. Nantucket is a small county and so its results are suppressed. Barnstable (Cape Cod), Bristol (Taunton), and Plymouth counties are bad: 50 and above
If I want to avoid blanks in my plots, I probably need to look at states. I
viz_state.R to get
Wow. Poor West Virginia. Though several other states show horribly steep curves (OH, PA, NH, KY).
Take a look at WV's counties. Or look at top-20 counties in US. Also, it might be interesting to see whether these are particularly strong on fentanyl and other synthetic opioids. (Actually given that fentanyl deaths are a comparaively recent phenomenon, I might hold the latter till I have the 2017 figures.)
Worst counties in US
I tweaked the US plot:
- Use crude rate instead of age-adjusted rate. Crude rate is easier to explain, and the differences between the two rates is small.
- Filter down to the 12 counties with the highest average death rate (measured over last 3 years)
- Color by state
- Size points by population
Latter shows that although NM and WV have the highest rates, the actual numbers of people impacted are massive in OH and MD.
I shall put this on hold till I get the 2017 data.
Where is the data
Data does not belong in the git repository. It can be obtained at the link above. I use the same ICD-10 codes to identify opioid deaths as are used in the CDC's own reports (see Hedegaard et al "Drug Overdose Deaths in the United States, 1999–2017"), specifically:
X40 (Accidental poisoning by and exposure to nonopioid analgesics, antipyretics and antirheumatics)
X41 (Accidental poisoning by and exposure to antiepileptic, sedative-hypnotic, antiparkinsonism and psychotropic drugs, not elsewhere classified)
X42 (Accidental poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified)
X43 (Accidental poisoning by and exposure to other drugs acting on the autonomic nervous system)
X44 (Accidental poisoning by and exposure to other and unspecified drugs, medicaments and biological substances)
X60 (Intentional self-poisoning by and exposure to nonopioid analgesics, antipyretics and antirheumatics)
X61 (Intentional self-poisoning by and exposure to antiepileptic, sedative-hypnotic, antiparkinsonism and psychotropic drugs, not elsewhere classified)
X62 (Intentional self-poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified)
X63 (Intentional self-poisoning by and exposure to other drugs acting on the autonomic nervous system)
X64 (Intentional self-poisoning by and exposure to other and unspecified drugs, medicaments and biological substances)
X85 (Assault by drugs, medicaments and biological substances)
Y10 (Poisoning by and exposure to nonopioid analgesics, antipyretics and antirheumatics, undetermined intent)
Y11 (Poisoning by and exposure to antiepileptic, sedative-hypnotic, antiparkinsonism and psychotropic drugs, not elsewhere classified, undetermined intent)
Y12 (Poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified, undetermined intent)
Y13 (Poisoning by and exposure to other drugs acting on the autonomic nervous system, undetermined intent)
Y14 (Poisoning by and exposure to other and unspecified drugs, medicaments and biological substances, undetermined intent)
I have been reading Knaflic "Storytelling with Data", and that has led to some tweaks to my plots. I can't go full SWD because at the moment I don't really know what story I am telling. Perhaps it is something to do with issues of rates vs absolute counts. This matters because I suspect that crisis response does not scale in the same way that population scales. If county A has 10 times the population of county B, I'm guessing (and it is a guess) that the number of deaths at which local services in A get overloaded is not 10 times the threshold in B but substantially less.
Anyway, some minor cleaning up on the plots
Make the titles bigger and clearer
Point sizes are a function of deaths (which is what the story is about) not population.
2017 data from CDC
The CDC have put the 2017 data into WONDER. I pull the new data. In doing so I discover that there is actually a radio button that automatically selects the ICD-10 codes for opioid deaths. So no more need to remember X40-44 etc.
I rerun the "wrotsy counties" analysis. Rio Arriba slips to #2. Baltimore gets worse. A county in WV has an astronomical rate of 160 per 100,000.
The folks at FRIAM had expressed an interest in this, so I compose a post on my blog.
I would like to create some choropleths but I am not up to speed on R's mapping abilities. I spend the morning tinkering. I think I am ready to try in earnest, but my project directory is getting messy. I think it is time for me to actually use RStudio's project functionality. (That's my next task.)
OK, so I have some reasonable choropleths. Matching the map data to the CDC was a problem as the only common field is county name and its format is different in the two sources. Converting one to the other proved difficult in the tidyverse pipeline (I was bumping up against dplyr's "quosures") and so I end up using sapply().
Got myself a neat map of MA. Barnstable is surprisingly bad. I thought it was wealthy and problem free? By the way, this is the first time I noticed a big gap between the age-adjusted rate (50 per 100k) and the crude rate (40 per 100k). Is that a reflection of old/retiree communities?
Next: add some town names. Or put it on OpenMap.
Next steps, revised
Yes, it would be interesting to get Open Map data, particularly as it can localize the map nicely without adding too much visual noise. However I have a couple of other things I want to look at too:
Wikipedia's page on chloropleths briefly mentioned a color scheme that is amenable to bivariate visualization. What? how is that possible? I need to investigate and trial. Perhaps look at an obesity dataset too? Ed (Angel) has already commented on a possible link (admittedly based on an n of 1).
sf or sp for geocomputation in R? Former is newer and (I think) is better integrated with the tidyverse; however some legacy packages require sp. I think I shall use sf and worry about legacy issues as they arise
leaflet for interactive maps?
And what about python? And what about putting some of these maps in dashboards?
First step: stick to R. Use sf (and tmap?) to create a lower-48 map of counties. Switch between obesiy (in the tmap demo) and opioids.
Second step: try datashader and plotly in python.