dta_replace() corrects misspelled entries in a data frame or tibble
based on a provided dictionary. The dictionary specifies the correct values
for misspelled entries in a specified column.
Arguments
- dat
A data frame or tibble containing the data to be corrected.
- dict
A data frame or tibble serving as the dictionary, with columns specifying the correct and incorrect spellings.
- .name
The column in both
datanddictto match entries by (e.g., a unique identifier).- .wrong
The column in
dictcontaining the misspelled values to be corrected.- .correct
The column in
dictcontaining the correct values for the misspelled entries.
Details
The function first validates that dat and dict are data frames
or tibbles. It then fills missing values in the dict for the columns
specified in .name and .correct, using a downward fill
strategy. Finally, it replaces misspelled values in dat using a
dictionary lookup facilitated by matchmaker::match_df().
Examples
# Example data with misspelled characters / strings
data("data_misspelled")
dta_gtable(head(data_misspelled))
data("dict_misspelled")
dta_gtable(dict_misspelled)
# Correct the misspelled entries in `dat` using the
# `dict` dictionary
result <- dta_replace(
dat = data_misspelled,
dict = dict_misspelled,
.name = variable,
.wrong = old,
.correct = new
)
dta_gtable(head(result))