Title: | Compare Data Frames |
---|---|
Description: | A toolbox for comparing two data frames. This package is defunct. I recommend you use the "versus" package instead. |
Authors: | Ryan Dickerson [aut, cre] |
Maintainer: | Ryan Dickerson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-11-08 02:40:49 UTC |
Source: | https://github.com/eutwt/tablecompare |
Show the contents of a data frame
contents(.data)
contents(.data)
.data |
A data frame or data table |
A data.table
with one row per column in .data
and columns
"column": The name of the column in .data
, "class": the names of classes
the column inherits from (as returned by class()
), collapsed into a single string.
contents(ToothGrowth)
contents(ToothGrowth)
count_dupes()
returns values of by
variables for which the .data
has
multiple rows, along with the number of rows for each combination of values.
assert_unique()
throws an error if there are multiple rows for any
combination of by
variable values
count_dupes(.data, by, setkey = FALSE) assert_unique(.data, by, data_chr, by_chr)
count_dupes(.data, by, setkey = FALSE) assert_unique(.data, by, data_chr, by_chr)
.data |
A data frame or data table |
by |
tidy-select. Columns in |
setkey |
Logical. Should the output be keyed by |
data_chr |
optional. character. You can use this argument to manually specify
the name of |
by_chr |
optional. character. You can use this argument to manually specify
the name of |
count_dupes()
A data.table
with the (filtered) by
columns and an additional column "n_rows" which shows the number of rows in
.data
having the combination of by
values shown in the output
row.
assert_unique()
No return value. Called to throw an error depending on the input.
df <- read.table(text = " x y z 1 6 1 2 6 2 3 7 3 3 7 4 4 3 5 4 3 6 ", header = TRUE) count_dupes(df, c(x, y)) ## Not run: assert_unique(df, c(x, y)) ## End(Not run)
df <- read.table(text = " x y z 1 6 1 2 6 2 3 7 3 3 7 4 4 3 5 4 3 6 ", header = TRUE) count_dupes(df, c(x, y)) ## Not run: assert_unique(df, c(x, y)) ## End(Not run)
count_values()
returns values of by
variables for which the .data
has
multiple unique rows, along with the number of unique rows for each
combination of values, only considering columns in col
.
assert_single_value()
throws an error if there are multiple unique rows for
any combination of by
variable values, only considering columns in col
.
count_values(.data, col, by, setkey = FALSE) assert_single_value(.data, col, by)
count_values(.data, col, by, setkey = FALSE) assert_single_value(.data, col, by)
.data |
A data frame or data table |
col |
tidy-select. Columns in |
by |
tidy-select. Columns in |
setkey |
Logical. Should the output be keyed by |
count_values()
A data.table
with the (filtered)
by
columns and an additional column "n_vals" which shows the number of
unique rows in .data
having the combination of by
values shown
in the output row.
assert_single_value()
No return value. Called to throw an error depending on the input.
df <- read.table(text = " x y z a 1 3 a 1 3 a 2 4 a 2 4 a 2 2 b 1 1 b 1 2 ", header = TRUE) count_values(df, z, by = c(x, y)) ## Not run: assert_single_value(df, z, by = c(x, y)) ## End(Not run)
df <- read.table(text = " x y z a 1 3 a 1 3 a 2 4 a 2 4 a 2 2 b 1 1 b 1 2 ", header = TRUE) count_values(df, z, by = c(x, y)) ## Not run: assert_single_value(df, z, by = c(x, y)) ## End(Not run)
Compare two data frames. Using a key-column common to both tables, see which rows are common and highlight differing values by column.
tblcompare( .data_a, .data_b, by, allow_bothNA = TRUE, ncol_by_out = 3, coerce = TRUE ) value_diffs(comparison, col) ## S3 method for class 'tbcmp_compare' value_diffs(comparison, col) all_value_diffs(comparison) ## S3 method for class 'tbcmp_compare' all_value_diffs(comparison)
tblcompare( .data_a, .data_b, by, allow_bothNA = TRUE, ncol_by_out = 3, coerce = TRUE ) value_diffs(comparison, col) ## S3 method for class 'tbcmp_compare' value_diffs(comparison, col) all_value_diffs(comparison) ## S3 method for class 'tbcmp_compare' all_value_diffs(comparison)
.data_a |
A data frame or data table |
.data_b |
A data frame or data table |
by |
tidy-select. Selection of columns to use when matching rows between
|
allow_bothNA |
Logical. If TRUE a missing value in both data frames is considered as equal |
ncol_by_out |
Number of by-columns to include in |
coerce |
Logical. If False only columns with the same class are compared. |
comparison |
An object of class "tbcmp_compare" (the output of a
|
col |
tidy-select. A single column |
tblcompare()
A "tbcmp_compare"-class object, which is a list
of data.table
's having the following elements:
A data.table
with one row per input table showing the number of rows
and columns in each.
A data.table
with one row per by
column showing the class
of the column in each of the input tables.
A data.table
with one row per column common to .data_a
and
.data_b
and columns "n_diffs" showing the number of values which
are different between the two tables, "class_a"/"class_b" the class of the
column in each table, and "value_diffs" a (nested) data.table
showing
the rows in each input table where values are unequal, the values in each
table, and one column for each of the first ncol_by_out
by
columns for
the identified rows in the input tables.
A data.table
with one row per column which is in one input table but
not the other and columns "table": which table the column appears in,
"column": the name of the column, and "class": the class of the
column.
A data.table
which, for each row present in one input table but not
the other, contains the columns "table": which table the row appears in,
"i" the row number of the input row, and one column for each of the first
ncol_by_out
by
columns for each row.
value_diffs()
A data.table
with one row for each element
of col
found to be unequal between the input tables (
.data_a
and .data_b
from the original tblcompare()
call)
The output table has columns "i_a"/"i_b": the row number of the element in the input
tables, "val_a"/"val_b": the value of col
in the input tables, and one column for
each of the first ncol_by_out
by
columns for the identified rows in the
input tables.
all_value_diffs()
A data.table
of the value_diffs()
output for all columns having at least one value difference, combined row-wise
into a single table. To facilitate this combination into a single table, the
"val_a" and "val_b" columns are coerced to character.