The CDC is lying to you

I’ve mentioned the problem with highly variable firearm homicides numbers in the FBI UCR vs. the CDC’s WISQARS. For 2017, the UCR claims 10,982 firearms homicides, while the CDC says 14,542, 32% higher. Part of that is reporting.

The UCR is based on reported numbers from law enforcement agencies. But not all report.

The CDC, though…

The CDC numbers are based on emergency department reports, using ICD-10 codes. But, like the FBI, they don’t…use data from every hospital. In fact, they pick out just 60 hospitals and use their reports as a proxy for the country. It’s rather like Rasmussen pseudo-randomly surveying 1,000 people in hopes of picking a representative sample of all Americans, and extrapolating from there.

And that’s a huge problem.

If more of the sample hospitals are in places like Chicago, Saint Louis, or Baltimore, it skews the results, because those locations have a disproportionate number of firearms homicides compared to Alamogordo, New Mexico. If you assume everywhere has a firearms homicide rate like Baltimore, you’re going to extrapolate an unrealistically high number. Maybe even 32% higher than what the FBI says.

Problem, right?

I’ve barely started.

First, a 60 hospital sample is ridiculous when there are 6,210 hospitals in the US.

CDC samples fewer than 1% of hospitals.

Second, there is no good reason to do a 60 hospital sample. Or a 600 hospital sample.

To comply with the federal HIPAA law, since October 1, 2015, every HIPAA-covered entity — every hospital — in the nation reports every single gunshot wound, by ICD-10 code, to the government. All 6,210. For Every. Single. Patient. ICD-10 is just the latest iteration. They have been collecting this data for years.

The CDC doesn’t need to sample, then guess at the total number. The total number for every hospital in the country is already at their disposal. At most, they might have to make extremely minor adjustments for occasional coding errors. But since Medicare/Medicaid and insurance payments are based on the reported codes, the existing system already checks for coding errors. Damned few should slip past insurance companies dead set on paying out the least they can.

In fact, they have more data than just “gunshot injury.” ICD-10 breaks it down by intent (accidental/self, accidental/other, suicide, homicide) and weapon (machinegun, rifle, shotgun, handgun, other). There is a separate code for each possible combination. More codes if multiple weapons. More codes for where on the body the injury is. The admission data (which they get) includes age, race (with more choices than the 6 given in WISQARS), gender. They have the hospital location for geographic distribution of injuries.

And it isn’t just fatal injury, all those code options are there for nonfatal injuries.

With the available data, the CDC can sort for “white males, 18-24, shot in lower back, in Kalamazoo, fatal and non-fatal” and give you the exact numbers.

I’ve played with the WHO ICD database, and the available data is amazing.

So why isn’t the CDC simply using the raw data, instead of sampling and extrapolating? Is it too difficult to get to get the data for research purposes?

No. ICD is designed for researchers to use, by intent. Medical people hate it because — to make any possible injury/illness in which a researcher might someday be interested — there are upwards of 150,000 different codes to choose from; want to know how many people are bitten by large dogs vs. small; it’s there.

There’s only one reason for the CDC to forego using the entire database as intended, and cherry-pick a handful of “representative” hospitals.

Because the raw data doesn’t support the laws the victim-disarming gun controllers want.

The raw data would tell us who is getting shot. With what. What the victims’ demographic and geographic distribution is. Combined with the UCR, it would tell gun controllers which criminals to target, and how. All the things the CDC pretends it can’t do.

The CDC has to lie about injury reports to rationalize targeting honesty gun owners who don’t commit the crimes.

6 thoughts on “The CDC is lying to you”

    1. but WHICH hospitals, and WHERE they are located, can HUELY skew the dataset to support or destroy a predetermined agenda, which is PRECISELY Bear’s point here. Since the data ARE to thorough, and readily avaialble, WHY NOT use the entire available sample set ANd how do we know CDD are not cherry picking to “direct” the conclusions to suit their predetermined outcome?

      As Sam Clemmons so famously declared: there are lies, damn lies, and statistics.
      I studied statistics in college, and learned that by setting up the “study” I can easily select the desired outcome beforehand, then select the sample populations and sample size, limit the variables, and “prove” that whatever my predetermined outcome was is supported by “data” , “:research” and “facts”. Yet the TRUTh is not served.

      CDC being a FedGov agency (with NO COnstitutinal basis for existing) can be expected to behave exactly as we see here. They, being gummit, are NOT neutral. Thus should be disbanded, and all the suckers at the public teat need to be dismissed, and go find some sort of PRODUCTIVE work to feed themselves and their (hopefully small) families.

      1. I’m not saying the CDC’s methodology isn’t flawed, or can’t be made to be flawed. I’m simply observing that Carl used as a simile a statistical methodology that sounds intuitively invalid to Joe Sixpack, but is in fact quite valid. In other words, the comparison is flawed.

        1. OK, time to step in. I am NOT claiming a 60 of 6,210 sample is flawed. I merely noted that sampling hospitals i essentially the same thing as sampling people in a phone survey.

          My point, which I clearly failed to express properly in that since they have the complete dataset of all hospitals, throwing out 99% of it an extrapolating back is freaking stupid, if honest data is their goal.

    2. Dear Bog…. SciAm…

      1,004 soil samples of a suspected contaminated are: fine.
      1,004 product sampls of a assembly line: fine.
      1004 samples of people’s political opinions across a continent wide nation of 3,142 counties (and equivalent), with widely varying culture and demographics: BS.

      The best election poll I ever saw was from a professor trying to teach his students why careful sampling is important. He randomly picked 1 person from every county in his — 4 state, I think — area example. His result was within 2% of the election results. None of the “official” polls came close with predictions across the map.

      Added: What a 1,000 person survey IS good for is tracking TRENDS on a particular topic. And best if it’s the same people: a longitudinal study.

