Title: | Compare Data Frames |
---|---|
Description: | A toolset for interactively exploring the differences between two data frames. |
Authors: | Ryan Dickerson [aut, cre, cph] |
Maintainer: | Ryan Dickerson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0.9000 |
Built: | 2024-10-27 04:34:35 UTC |
Source: | https://github.com/eutwt/versus |
compare()
creates a representation of the differences between two tables,
along with a shallow copy of the tables. This output is used
as the comparison
argument when exploring the differences further with other
versus functions e.g. slice_*()
and weave_*()
.
compare(table_a, table_b, by, allow_both_NA = TRUE, coerce = TRUE)
compare(table_a, table_b, by, allow_both_NA = TRUE, coerce = TRUE)
table_a |
A data frame |
table_b |
A data frame |
by |
< |
allow_both_NA |
Logical. If |
coerce |
Logical. If |
compare()
A list of data frames having the following elements:
A data frame with one row per input table showing the number of rows and columns in each.
A data frame with one row per by
column showing the class
of the column in each of the input tables.
A data frame with one row per column common to table_a
and
table_b
and columns "n_diffs" showing the number of values which
are different between the two tables, "class_a"/"class_b" the class of the
column in each table, and "value_diffs" a (nested) data frame showing
the the row indices with differing values
A data frame with one row per column which is in one input table but not the other and columns "table": which table the column appears in, "column": the name of the column, and "class": the class of the column.
A data frame which, for each row present in one input table but not
the other, contains the column "table" showing which table the row appears
in and the by
columns for that row.
If the input is a data.table, you may want compare()
to make a deep copy instead
of a shallow copy so that future changes to the table don't affect the comparison.
To achieve this, you can set options(versus.copy_data_table = TRUE)
.
compare(example_df_a, example_df_b, by = car)
compare(example_df_a, example_df_b, by = car)
datasets::mtcars
- version aA version of mtcars with some values altered and some rows/columns removed. Not for informational purposes, used only to demonstrate the comparison of two slightly different data frames. Since some values were altered at random, the values do not necessarily reflect the true original values. The variables are as follows:
example_df_a
example_df_a
A data frame with 9 rows and 9 variables:
The rowname in the corresponding datasets::mtcars
row
Miles/(US) gallon
Number of cylinders
Displacement (cu.in.)
Gross horsepower
Rear axle ratio
Weight (1000 lbs)
Engine (0 = V-shaped, 1 = straight)
Transmission (0 = automatic, 1 = manual)
Sourced from the CRAN datasets package, with modified values. Originally from Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
datasets::mtcars
- version bA version of mtcars with some values altered and some rows/columns removed. Not for informational purposes, used only to demonstrate the comparison of two slightly different data frames. Since some values were altered at random, the values do not necessarily reflect the true original values. The variables are as follows:
example_df_b
example_df_b
A data frame with 9 rows and 9 variables:
The rowname in the corresponding datasets::mtcars
row
Weight (1000 lbs)
Miles/(US) gallon
Gross horsepower
Number of cylinders
Displacement (cu.in.)
Number of carburetors
Rear axle ratio
Engine (0 = V-shaped, 1 = straight)
Sourced from the CRAN datasets package, with modified values. Originally from Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Get rows with differing values
slice_diffs(comparison, table, column = everything())
slice_diffs(comparison, table, column = everything())
comparison |
The output of |
table |
One of |
column |
< |
The input table is filtered to the rows for which comparison
shows differing values for one of the columns selected by column
comp <- compare(example_df_a, example_df_b, by = car) comp |> slice_diffs("a", mpg) comp |> slice_diffs("b", mpg) comp |> slice_diffs("a", c(mpg, disp))
comp <- compare(example_df_a, example_df_b, by = car) comp |> slice_diffs("a", mpg) comp |> slice_diffs("b", mpg) comp |> slice_diffs("a", c(mpg, disp))
Get rows in only one table
slice_unmatched(comparison, table) slice_unmatched_both(comparison)
slice_unmatched(comparison, table) slice_unmatched_both(comparison)
comparison |
The output of |
table |
One of |
slice_unmatched() |
The table identified by |
slice_unmatched_both() |
The output of |
comp <- compare(example_df_a, example_df_b, by = car) comp |> slice_unmatched("a") comp |> slice_unmatched("b") # slice_unmatched(comp, "a") output is the same as example_df_a |> dplyr::anti_join(example_df_b, by = comp$by$column) comp |> slice_unmatched_both()
comp <- compare(example_df_a, example_df_b, by = car) comp |> slice_unmatched("a") comp |> slice_unmatched("b") # slice_unmatched(comp, "a") output is the same as example_df_a |> dplyr::anti_join(example_df_b, by = comp$by$column) comp |> slice_unmatched_both()
Get the differing values from a comparison
value_diffs(comparison, column) value_diffs_stacked(comparison, column = everything())
value_diffs(comparison, column) value_diffs_stacked(comparison, column = everything())
comparison |
The output of |
column |
< |
value_diffs() |
A data frame with one row for each element
of |
value_diffs_stacked() , value_diffs_all()
|
A data frame containing
the |
comp <- compare(example_df_a, example_df_b, by = car) value_diffs(comp, disp) value_diffs_stacked(comp, c(disp, mpg))
comp <- compare(example_df_a, example_df_b, by = car) value_diffs(comp, disp) value_diffs_stacked(comp, c(disp, mpg))
Get differences in context
weave_diffs_long(comparison, column = everything()) weave_diffs_wide(comparison, column = everything())
weave_diffs_long(comparison, column = everything()) weave_diffs_wide(comparison, column = everything())
comparison |
The output of |
column |
< |
weave_diffs_wide() |
The input |
weave_diffs_long() |
Input tables are filtered to rows where
differing values exist for one of the columns selected by |
comp <- compare(example_df_a, example_df_b, by = car) comp |> weave_diffs_wide(disp) comp |> weave_diffs_wide(c(mpg, disp)) comp |> weave_diffs_long(disp) comp |> weave_diffs_long(c(mpg, disp))
comp <- compare(example_df_a, example_df_b, by = car) comp |> weave_diffs_wide(disp) comp |> weave_diffs_wide(c(mpg, disp)) comp |> weave_diffs_long(disp) comp |> weave_diffs_long(c(mpg, disp))