Subsetting data frames in R with Top 20 Steps.

R programming is the most suitable programming language for subsetting Data Frames in R using a very effective way, and the Base R package implements the diverse characters of small datasets for data analysis.

The R code and libraries can apply to the distinctive quality of datasets from simplistic data analysis operations like selecting, filtering, grouping, ordering, of data.

Similarly, It is used to represents the group of libraries and functions for subsetting data in R for multiple formats to get an easy manipulation.

mtcars data frame Subsetting in R

In this case study, we are going to take the MTCARS dataset that is constructed in R and is a part of the base R package.

1. How to create simple copy or subset of mtcars dataset

Here you can quickly subsetting the data frames in R with the use of creating a copy variable to avoiding loss of data and without affecting on original data.

car <- mtcars

2. Create a matrix from mtcars of first 15 rows and first 4 columns

a <- car[1:15,1:4]
a <- as.matrix(a)

3. Create the mtcars dataset where the “cyl” is 4

a = subset(a, a$cyl==4)

4. Subset the rows for cyl (column) which are less than 5 in mtcars

a1 <- a[a$cyl<5,]

5. Retrieve only rows for cyl Column which are either 4 or 6 in mtcars

a2 <- a[a$cyl==4|a$cyl==6,]

6. Subset “mtcars” for rows with “cyl” less than “6”, and “gear” exactly equal to “4”

subset(a,a$cyl<6 & a$gear==4)

7. Subset “mtcars” for rows that are greater than or equal to, 21 miles per gallon. Also, select only the columns, “mpg” through “hp”.

subset(a,a$mpg >= 21, select=c(mpg,hp))

Airquality Data Frame Subsetting In R

Here We are going to work on Airquality dataset for subsetting data frames in r and this data frame which in build in base R package.

air <- airquality

8. Find the value of Ozone in the 47th row?

ozone_47th <- air[47,'Ozone']

9. Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. find out the Mean of Solar.R in this subset?

air_subset1 <- subset(air,air$Ozone>31 & air$Temp > 90)
mean_airsubset1_solarR <- mean(air_subset1$Solar.R)

10. Find out the mean of “Temp” when “Month” is equal to 6?

mean_temp_month6 <- mean(air[air$Month == 6,'Temp'])

11. Find out the maximum ozone value in the month of May (i.e. Month = 5)?

max_ozon_may <- max(air[air$Month==5,'Ozone'],na.rm = T)

12. Subset “airquality” for “Ozone” greater than “100”. Select the columns “Ozone”, “Temp”, “Month” and “Day” only.

air_subset2 <- subset(air,air$Ozone > 100 ,select = c(Ozone,Temp,Month,Day))

13. Subset the “airquality” data frame for rows without “Ozone” values of “NA”.

air_subset3 <- subset(air,!$Ozone))

14. Are there more automatic (0) or manual (1) transmission-type cars in the dataset? (please read the dataset description in help-mtcars)

nrow(car[car$am == 0,]) > nrow(car[car$am == 1,])

15. Find out the wind value when the Ozone becomes maximum in the dataset

wind_for_max.Ozone <- na.omit(air[air$Ozone==max(air$Ozone,na.rm = T),'Wind'])

16. Fetch the observations for 9 day of June in the airquality dataset

air[air$Month==5 & air$Day==c(1:9),]

17. Find the Ozone and temperature values for the 1st observation of every month.

Ozon_Temp <- air[air$Day==1,c('Ozone','Temp')]

18. Create a new data frame that contains the data from the original data set “airquality” and a new row (append a new row) which contains the average of each column’s data wherever applicable and NA for those columns where it is not applicable to calculate the average.

new_air <- rbind(air,unlist(lapply(air,mean)))

IRIS Data Frame Subsetting In R

19. Use the iris dataset. select columns sepal.length and petal.length for “Sentosa” types and store in “my_vec”

my_vec <- iris1[iris1$Species=="setosa",c('Sepal.Length','Petal.Length')]

20. Create a vector called ‘sepal.dif’ with the difference between ‘Sepal.Length’ and ‘Sepal.Width’

sepal.diff <- iris1[iris$Species=="setosa",'Sepal.Length'] - iris1[,'Sepal.Width']
sepal.diff <- sepal.diff[iris$Species=="setosa"]
m_vector<- cbind(my_vec, sepal.diff)


R programming carries multiple packages for simple and complex programming operations and data analysis.

Every R package contains a different type of function which can be used for the subsetting data frame, However for more similar type of r functions and tutorial you can check r-blogger website also.

Recommended Articles:

Dplyr In R Programming | Complete Tutorial

How You Can Use Excel Formulas Vlookup? – In Detail.

Leave a Reply

Your email address will not be published.