SOC 510: Using R

Conditional Statistics and Subsetting Data

The following examples are based on the data, soc510hw2.csv in which union, female, married, and wage variables are found. Note that union members are coded 1 and non-members are coded 0; males are coded 0 and females are coded 1; married are coded 1 and not-married are coded 0.

Conditional Statistics

Mean and standard deviation of wage by union membership.

> mean(wage[union==0])
> mean(wage[union==1])
> sd(wage[union==0])
> sd(wage[union==1])

Mean of wage for male union members.
```
> mean(wage[union==1 & female==0])
```
Mean of wage for male or union members (ie, either male or union members, thus include male union members; male non-union members; and female union members)
```
> mean(wage[union==1 | female==0])
```

Regression analysis using only male-union members.

> lm(wage~edu+age, subset=(female==0 & union==1))

Subsetting Data

Creating a subset data, "datafemale" using "subset" command from "mydata" which is pre-loaded.
```
> datafemale <- subset(mydata, female==1)
```

Another method: "which" command

> datafemale <- mydata[which(female1), ]

Creating a subset, "dataless" in which both LTHS (less than high school) and HSG (high school graduate) workers are selected.
```
> dataless <- subset(mydata, educ==1 | educ==2)
```
Note that educ=1 is LTHS and educ=2 is HSG.
Creating a subset, "fmprvt" in which only female workers who work for private sectors are selected.
```
> fmprvt <- subset(mydata, female==1 & pubst==1)
```
Note that "pubst" refers to public sector.

[ Going back to Using R: Index ]

SOC 510 Supplementary: Using R

Univ of Kansas; Sociology; ChangHwan Kim

Conditional Statistics and Subsetting Data

Conditional Statistics

Subsetting Data