#Exercise 10:  multiple linear regression

 

# The Heart and Estrogen/Progestin Study (HERS) is a clinical trial of hormone therapy for prevention of recurrent heart attacks and death among post-menopausal women with existing coronary heart disease. The HERS data are used in many of the examples in Chapters 3 and 4 of the course text book byVittinghoff et al. In this exercise we will study how different variables may influence the glucose level in the blood for the non-diabetic women in the cohort, in particular we are interested to see if exercise may help to reduce the glucose level (cf. Section 4.1 in Vittinghoff et al.).

 

# You may read the HERS data into R and extract the women without diabetes by the commands,

hers=read.table("http://www.uio.no/studier/emner/matnat/math/STK4900/data/hers.txt",sep="\t",header=T,na.strings=".")

hers.no=hers[hers$diabetes==0, ]

 

 

# We will start out by investigating (in questions a-c) how the glucose levels are for women who exercise at least three times a week (coded as exercise=1) and women who exercise less than three times a week (coded as exercise=0).

 

# a)

# Make a summary and boxplot of the glucose levels according to the level of exercise:

summary(hers.no$glucose[hers.no$exercise==0])

summary(hers.no$glucose[hers.no$exercise==1])

boxplot(hers.no$glucose~hers.no$exercise)

 

# Discuss what the summaries and boxplot tell you.

 

 

# b)

# Test if there is a difference in glucose level and make a confidence interval:

t.test(glucose~exercise, var.equal=T,data=hers.no)

 

# What may you conclude for the test and the confidence interval?

 

 

# c)

# Perform a simple linear regression with glucose level as outcome and exercise as predictor:

fit.c=lm(glucose~exercise,data=hers.no)

summary(fit.c)

 

# Discuss how the result of the simple linear regression relates to those in question b.

 

 

 

# The women who exercise at least three times a week and the women who exercise less than three times a week may differ in many ways. For example they may be younger and have a lower BMI (body mass index). We will therefore perform a multiple linear regression analysis where we adjust for these to variables.

 

# d)

# Perform a simple linear regression with glucose level as outcome and exercise, age, and BMI as predictors:

fit.d=lm(glucose~exercise+age+BMI,data=hers.no)

summary(fit.d)

 

# Discuss the result of this analysis in relation with the result in question c. Also discuss how age and BMI influence the glucose level.