Scie 301: research design and statistical analysis

SCIE 301: Research Design and Statistical Analysis

Well-presented assignments will receive 1% bonus.


Please clearly print your name and UCID on your assignment.

This is an open-book assignment, both calculators and statistical software such as R can be used.

If you are using software to assist, please include the code, output and your own conclusions

based on your interpretations of the output.

For numerical answers, please round up the results to keep three decimal digits when needed.

The \data-bmi.csv” le contains data on individual characteristics including gender, age, height and

weight on a group of individuals. The variables and the corresponding scales or measurements in the

dataset are as follows

ID: the identication number for an individual;

gender: either of the two sexes (male and female) of an individual

age: age of an individual in years

height: height of an individual in centimeters

weight: weight of an individual in pounds

Please answer the following questions based on this dataset.

1.(2 points) Please import the csv le into R and create a data frame. Please state the dimension

of the data in terms of the number of rows (individuals) and number of columns (variables).

2.(4 points) Please calculate the Body Mass Index (BMI) in the unit of kg=m

for all subjects,

report the ve number summary along with the variance and standard deviation.


3.(5 points) Please report on the frequency and percentage of missing data of all variables including

newly created variable BMI, obtain a subset that contains individuals with complete information

on all variables and state the size of this subset.

4.The following questions are based on the subset of complete cases obtained in previous question.

(a)(6 points) Please classify the individuals into groups indicating the weight statuses using

the following criterion


underweight (BMI is less than 18.50)

normal weight (BMI is between 18.50 and 25.00, including 18.50 and excluding 25.00)

overweight (BMI is between 25.00 and 30.00, including 25.00 and excluding 30.00)

obesity, excluding extreme obesity (BMI is between 30.00 and 40.00, including 30.00

and excluding 40.00)

extreme obesity (BMI is greater or equal to 40.00).

Please also summarize the distribution of weight status using numbers or tables.

(b)(8 points) Please display summary information of the weight status among all subjects

including frequencies and relative frequencies (in percentage) using two types of graphs

(bar plot and pie chart).

Please be considerate to your audience and make your graphs as informative and concise

as possible by using legend, labeling axes, displaying numbers and so on. Please sort the

layout based on the frequency or relative frequency in either ascending or descending order

to enhance readability.

(c)(4 points) Please create a histogram of age in this subset and thoroughly elaborate your

ndings about the distribution of the age based on the histogram created (for example,

you should comment on the shape such as centers (three central tendencies), skewness,

existence of outliers, spread and so on).

(d)(12 points) Please create side-by-side boxplots for the heights of the individuals in dierent

gender groups and comment on whether you are able to judge whether the two gender

groups dier in terms of the distribution of the height. Please be sure to provide foundation

of your comments. Use the 1:5IQR rule (remember to show your steps) to identify the

individuals whose heights are outliers in each gender group.


Leave a Reply

Your email address will not be published. Required fields are marked *

× How can I help you?