Skip to contents

dta_duplicates() identifies and returns duplicate rows in a dataframe or tibble, based on the specified columns.

Usage

dta_duplicates(dat, .columns = names(dat))

Arguments

dat

A dataframe or tibble.

.columns

A set of column names to check for duplicates. Defaults to all columns.

Value

A dataframe or tibble containing only the duplicate rows based on the specified column(s).

Examples

# Create a data frame for demonstration

df <- data.frame(
  id = c(14, 20, 12, 32, 14, 23, 15, 12, 30, 14),
  name = c(
   "Mary", "Mark", "Faith", "David", "Mary", "Daniel", "Christine",
   "Johnson", "Elizabeth", "Mary"
  ),
  age = c(21, 18, 25, 17, 21, 24, 21, 19, 20, 21)
)
dta_gtable(df)
id name age
14 Mary 21
20 Mark 18
12 Faith 25
32 David 17
14 Mary 21
23 Daniel 24
15 Christine 21
12 Johnson 19
30 Elizabeth 20
14 Mary 21
# return duplicated rows by all the columns in the data frame result <- dta_duplicates(df) dta_gtable(result)
id name age
14 Mary 21
14 Mary 21
14 Mary 21
# return duplicated rows by the column `id` result2 <- dta_duplicates(df, .columns = id) dta_gtable(result2)
id name age
14 Mary 21
12 Faith 25
14 Mary 21
12 Johnson 19
14 Mary 21
# A second demo data frame df2 <- data.frame( gender = c( "Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female" ), education = c( "Masters", "Bachelor", "Bachelor", "Masters", "Doctorate", "Masters", "Bachelors", "Masters" ), age = c(25, 30, 30, 25, 40, 25, 40, 25) ) dta_gtable(df2)
gender education age
Male Masters 25
Female Bachelor 30
Female Bachelor 30
Male Masters 25
Male Doctorate 40
Female Masters 25
Male Bachelors 40
Female Masters 25
# return duplicated rows based to the columns 'gender' and 'education' result2 <- dta_duplicates(df2, .columns = c(gender, education)) dta_gtable(result2)
gender education age
Male Masters 25
Female Bachelor 30
Female Bachelor 30
Male Masters 25
Female Masters 25
Female Masters 25