In this section, I will begin to show preliminary results from an exploratory data analysis using both statistical and machine learning techniques based on the Medicare HOS 2012-2014 PUF dataset as previously mentioned here.
Questions to be addressed at this stage of analysis include:
- What information is available in the dataset?
- What patterns (distribution, trend, correlations, anomalies, etc) are observed?
- What hypotheses can we make based on these observations?
- What statistical tests and models should we use to estimate parameters and/or to identify underlying relations between variables, including the unobserved factors?
Let us begin with the first question.
What information is available?
Based on the User’s Guide from the HOS website (www.HOSonline.org), we can start to learn about the Cohort 15 Analytic PUF dataset. I marked some of the key info in blue.
The 2012 HOS 2.0 was administered for the 2012 Cohort 15 Baseline survey. The 2014 HOS 2.5 was administered for the 2014 Cohort 15 Follow Up survey. The updated HOS instrument contains revisions and additions to health assessment questions, as well as revisions to demographic questions to comply with standards established by the Affordable Care Act §4302. Copies of the 2012 and the 2014 HOS survey instruments may be obtained from the “Survey Instrument” page of the HOS website (www.HOSonline.org). Information is provided below about the VR-12 items and the scoring used in the HOS. However, the Cohort 15 Analytic PUF does not contain scores.
Veterans RAND 12-Item Health Survey
The key component of the HOS for assessment of physical and mental health functioning is the VR-12 Health Survey.5 The VR-12 consists of 14 items, 12 of which are used in the calculation of the eight health domains and the two summary measures: the physical component summary (PCS) and mental component summary (MCS) scores. The VR-12 measures the same eight health domains as the previously used 36-item health survey: 1) Physical Functioning, 2) Role- Physical, 3) Role-Emotional, 4) Bodily Pain, 5) Social Functioning, 6) Mental Health, 7) Vitality, and 8) General Health.
Case-mix adjustment may be used by researchers in order to adjust the survey response data for beneficiary characteristics that are known to be related to systematic biases in the way people respond to survey questions. The HOS instrument includes several items that are available for case-mix adjustment purposes. These items include, but are not limited to the following: demographic and socioeconomic characteristics (e.g., age, gender, race, education, and marital status); chronic medical conditions; and HOS study design variables (e.g., who completed the survey, the mode of survey administration, and the CMS region)… Note that the 2012-2014 Cohort 15 Analytic PUF does not contain any case-mix adjusted scores.
The HOS instruments also included questions on Activities of Daily Living (ADLs), depression, smoking, physical health symptoms, BMI, as well as other questions such as:
Healthy Days Measures
In 2003, three Healthy Days questions from the CDC’s BRFSS were added to the HOS. The questions encompass the number of days in the past thirty days that physical health was not good, mental health was not good, and activities were limited due to poor physical or mental health.
NCQA HEDIS Measures
There are four measures that are a part of the Effectiveness of Care domain of HEDIS and were included in both the HOS 2.0 and 2.5 instruments:
- In 2003, four questions were added to support the Management of Urinary Incontinence in Older Adults (MUI) measure.
- In 2005, two questions were added to support the Physical Activity in Older Adults (PAO) measure.
- In 2006, four questions were added to support the Fall Risk Management (FRM) measure and one question was added to support the Osteoporosis Testing in Older Women (OTO) measure.
Derivation of the File
The 2012 Cohort 15 Baseline Medicare HOS included a random sample of 604,992 beneficiaries, including both the aged and disabled, from 511 MAOs. Of the 604,992 individuals sampled, 53.1% (321,395) completed at least one question item of the survey. During the two years between the 2012 Cohort 15 Baseline survey and the 2014 Cohort 15 Follow Up survey, a number of MAOs discontinued offering managed care to Medicare beneficiaries, or consolidated with other health plans. The 2012-2014 Cohort 15 Analytic PUF sample is comprised of respondents with a valid survey disposition code at baseline (please refer to Field #81 for a description of the valid baseline survey disposition codes) who remained enrolled in their same MAO at the time of follow up. This resulted in 421 reporting units (MAOs) and 296,320 respondents in the 2012-2014 Cohort 15 Analytic PUF sample.
Of the 296,320 beneficiaries in the 2012-2014 Cohort 15 Analytic PUF sample, 19,568 died after baseline and before the two-year follow up survey administration. Another 102,350 beneficiaries voluntarily disenrolled from their MAOs between baseline and the start of the two-year follow up survey administration. This resulted in 174,402 beneficiaries remaining eligible at the time of follow up.
Of the 174,402 individuals sampled at the time of follow up, 72.0% (125,548) completed the follow up survey. For the purposes of this data file, a completed survey at follow up was defined as a survey with at least one question item completed. Of the 48,854 beneficiaries who did not complete a follow up survey, 2,882 were determined to have died after the follow up sample was selected but before the end of the survey administration. Additionally, 1,280 beneficiaries were determined to be ineligible for the follow up survey sample. The remaining 44,692 beneficiaries who did not complete the follow up survey were classified as non-respondents.
What patterns are observed?
Histogram of the self-reported health in Baseline and Follow Up survey shows that this two variables are normally distributed.
Unlike self-reported health, distribution of the “Healthy Days Measures” including both physical, mental, and the combined days of ill-health in the last 30 days of Baseline and Follow-up Survey is highly skewed towards zero.
The “discrepancy”, as it may have appeared in the above figures can be justified by the possibility that the combined group of respondents whose reported health are “Good”, “Very Good” or “Excellent” generally experience little illnesses at any given 30-day period. As a result, “Healthy Days Measures” may not be an outcome measure that distinguishes these health states among respondents.
For exploration purposes, a conditional density plot for “Healthy Days Measures” can be drawn to show subgroup Healthy Days distributions by self-rated health state:
From a visual assessment of the physical component density plot of healthy days measure by each health level using data from the 2014 follow-up survey, the average days of ill-health does appear to increase as self-reported health goes from “Excellent” to “Poor”. Similar correlation can be observed between mental component of healthy days measure and self-rated health from the Follow-up Survey 2014.
Another strand of thoughts led me to explore the path of health change among the respondents. I want to see what has changed in two years and what variables are time-invariant. Among the deceased before follow-up, what was the rating of their overall health in the baseline survey? What about those who stayed alive and were eligible for the follow-up survey, are they doing better, worse, or the same as they were in 2012?
My attempts to visualize the trend failed at first. As you can see below, these graphs are not telling any story other than raising eyebrows.
It is hard to see how many lines overlap and then infer the health trend. I then tried to use healthy days measure. But…
It didn’t help much by mapping baseline healthy days measurement with its follow-up. We can’t quite see the trend.
To make the plot clear and informative, I decided to rearrange the table by the degree of change in the indicators, both Healthy Days and Self-rated Health. The result looks much nicer.
The stunning display shows the number of respondents (x-axis) and their sequential health change measured by the physical component of Healthy Days between 2012 and 2014. Similarly, a plot for changes in self-rated health, a discrete variable, is illustrated in the following chart.
Another interesting phenomenon I discovered in this exploration stems from the variation of health outcomes between subgroups.
I found that in the baseline survey, the youngest participants, aged between 55 and 64 rated their health on average at 3.7, a rating that’s less than “Good”, whereas the oldest group, age 75+ rated their health on average at 3.1, and those between 65 and 74 rated their health on average at 2.8, which is better than “Good”.
Similar health outcomes (mortality rate) was also observed in 2012 from the National Vital Statistics Reports released on February 16, 2016. The report shows that mortality rate of age group from 55-64 has gone up by 0.7 percentage point in 2013 compared to 2012, while the mortality rate for group age 65-74 remains the same, and that of the group age 75 or above decreased by 0.8%.
The “reversed ageing” phenomenon is summerized below:
|Less than 65||3.71||3.77|
|75 or older||3.07||3.15|
To confirm that this phenomenon exists generally among subgroups , I created a summary table including not only gender, but also marital status and education. The surprising “reversed ageing” issue remains between people age less than 65 and those above 65.
|1||Less than 65||Male||Married||Lower than HS||4.01|
|2||65~74||Male||Married||Lower than HS||3.29|
|3||75 or older||Male||Married||Lower than HS||3.39|
|4||Less than 65||Female||Married||Lower than HS||4.01|
|5||65~74||Female||Married||Lower than HS||3.33|
|6||75 or older||Female||Married||Lower than HS||3.40|
|7||Less than 65||Male||Non-married||Lower than HS||3.77|
|8||65~74||Male||Non-married||Lower than HS||3.37|
|9||75 or older||Male||Non-married||Lower than HS||3.37|
|10||Less than 65||Female||Non-married||Lower than HS||3.89|
|11||65~74||Female||Non-married||Lower than HS||3.46|
|12||75 or older||Female||Non-married||Lower than HS||3.48|
|13||Less than 65||Male||Married||Highschool||3.82|
|15||75 or older||Male||Married||Highschool||3.08|
|16||Less than 65||Female||Married||Highschool||3.84|
|18||75 or older||Female||Married||Highschool||3.03|
|19||Less than 65||Male||Non-married||Highschool||3.55|
|21||75 or older||Male||Non-married||Highschool||3.16|
|22||Less than 65||Female||Non-married||Highschool||3.67|
|24||75 or older||Female||Non-married||Highschool||3.14|
|25||Less than 65||Male||Married||Greater than HS||3.74|
|26||65~74||Male||Married||Greater than HS||2.49|
|27||75 or older||Male||Married||Greater than HS||2.80|
|28||Less than 65||Female||Married||Greater than HS||3.74|
|29||65~74||Female||Married||Greater than HS||2.42|
|30||75 or older||Female||Married||Greater than HS||2.80|
|31||Less than 65||Male||Non-married||Greater than HS||3.63|
|32||65~74||Male||Non-married||Greater than HS||2.71|
|33||75 or older||Male||Non-married||Greater than HS||2.88|
|34||Less than 65||Female||Non-married||Greater than HS||3.75|
|35||65~74||Female||Non-married||Greater than HS||2.69|
|36||75 or older||Female||Non-married||Greater than HS||2.93|
I will further my exploration in the next post. Something intriguing research will be discussed.