dta_replace()
corrects misspelled entries in a data frame or tibble
based on a provided dictionary. The dictionary specifies the correct values
for misspelled entries in a specified column.
Arguments
- dat
A data frame or tibble containing the data to be corrected.
- dict
A data frame or tibble serving as the dictionary, with columns specifying the correct and incorrect spellings.
- .name
The column in both
dat
anddict
to match entries by (e.g., a unique identifier).- .wrong
The column in
dict
containing the misspelled values to be corrected.- .correct
The column in
dict
containing the correct values for the misspelled entries.
Details
The function first validates that dat
and dict
are data frames
or tibbles. It then fills missing values in the dict
for the columns
specified in .name
and .correct
, using a downward fill
strategy. Finally, it replaces misspelled values in dat
using a
dictionary lookup facilitated by matchmaker::match_df()
.
Examples
# Example data with misspelled characters / strings
data("data_misspelled")
dta_gtable(head(data_misspelled))
data("dict_misspelled")
dta_gtable(dict_misspelled)
# Correct the misspelled entries in `dat` using the
# `dict` dictionary
result <- dta_replace(
dat = data_misspelled,
dict = dict_misspelled,
.name = variable,
.wrong = old,
.correct = new
)
dta_gtable(head(result))