dta_mrq()
splits a specified column in a data frame that contains
multiple responses into multiple binary columns. Each binary column
represents whether a given response option exists in the original data.
It also allows for custom labeling of the new columns, with options for
numeric conversion and clean column names.
Usage
dta_mrq(
dat,
.column,
delimeter,
prefix = NULL,
as_numeric = FALSE,
labels = c(TRUE, FALSE),
is_clean_names = TRUE
)
Arguments
- dat
A data frame containing the column to be split.
- .column
The name of the column to split.
- delimeter
A string representing the delimiter used to separate the different responses in the original column.
- prefix
A string that is added as a prefix to the newly created column names. Default is
NULL
, which means no prefix will be added.- as_numeric
Logical. If
TRUE
, the new columns will be converted to numeric (1
forTRUE
,0
forFALSE
). Default isFALSE
.- labels
A vector of length
2
specifying the labels for theTRUE
andFALSE
values in the binary columns. Default isc(TRUE, FALSE)
.- is_clean_names
Logical. If
TRUE
,janitor::clean_names()
is used to standardize the column names (e.g., convert to lowercase and replace spaces with underscores). Default isTRUE
.
Value
A data frame with new binary columns added. The new columns represent each of the unique responses found in the original column, with the option to clean column names.
Details
This function is typically used when dealing with survey data where multiple options may be selected for a given question, and you want to split those options into individual binary columns indicating the presence or absence of each option.
Examples
data("data_gadgets")
dat <- data_gadgets
dta_gtable(dat)
# Split `gadgets_owned` column into separate columns.
# The created columns will be logical (i.e. TRUE / FALSE).
df <- dta_mrq(
dat = dat,
.column = gadgets_owned,
delimeter = ", ",
is_clean_names = TRUE)
dta_gtable(df)
# Convert the created columns from logical (TRUE / FALSE)
# columns to numeric.
df2 <- dta_mrq(
dat = dat,
.column = gadgets_owned,
delimeter = ", ",
as_numeric = TRUE,
is_clean_names = TRUE
)
dta_gtable(df2)
# You can specify the labels to be used. In the example
# below, the columns will be character with Yes / No.
df3 <- dta_mrq(
dat = dat,
.column = gadgets_owned,
delimeter = ", ",
labels = c("Yes", "No"),
is_clean_names = TRUE,
)
dta_gtable(df3)
# Any other labels could be used. For example
# Positive / Negative e.g. in the case of diseases.
df4 <- dta_mrq(
dat = dat,
.column = gadgets_owned,
delimeter = ", ",
labels = c("Positive", "Negative"),
is_clean_names = TRUE,
)
dta_gtable(df4)
# Use numeric values and specify a `prefix` for the
# column names.
df5 <- dta_mrq(
dat = dat,
.column = gadgets_owned,
delimeter = ", ",
prefix = "gad_",
labels = c(1, 2),
is_clean_names = TRUE
)
dta_gtable(df5)