ds.merge {dsBaseClient} | R Documentation |
Merges (links) two data frames together based on common values in defined vectors in each data frame.
ds.merge(
x.name = NULL,
y.name = NULL,
by.x.names = NULL,
by.y.names = NULL,
all.x = FALSE,
all.y = FALSE,
sort = TRUE,
suffixes = c(".x", ".y"),
no.dups = TRUE,
incomparables = NULL,
newobj = NULL,
datasources = NULL
)
x.name |
a character string specifying the name of the first data frame to be merged. The length of the string should be less than the specified threshold for the nfilter.stringShort which is one of the disclosure prevention checks in DataSHIELD. |
y.name |
a character string specifying the name of the second data frame to be merged. The length of the string should be less than the specified threshold for the nfilter.stringShort which is one of the disclosure prevention checks in DataSHIELD. |
by.x.names |
a character string or a vector of names specifying
of the column(s) in data frame |
by.y.names |
a character string or a vector of names specifying
of the column(s) in data frame |
all.x |
logical. If TRUE then extra rows will be added to the output,
one for each row in |
all.y |
logical. If TRUE then extra rows will be added to the output,
one for each row in |
sort |
logical. If TRUE the merged result is sorted on elements
in the |
suffixes |
a character vector of length 2 specifying the suffixes to be used for making unique common column names in the two input data frames when they both appear in the merged data frame. |
no.dups |
logical. Suffixes are appended in more cases to avoid duplicated column names in the merged data frame. Default TRUE (FALSE before R version 3.5.0). |
incomparables |
values that cannot be matched. This is intended to
be used for merging on
one column, so these are incomparable values of that column.
For more information see |
newobj |
a character string that provides the name for the output
variable that is stored on the data servers. Default |
datasources |
a list of |
This function is similar to the native R function merge
.
There are some changes compared with the native R function
in choosing which variables to use to merge the data frames, the function merge
is very flexible. For example, you can choose to merge
using all vectors that appear in both data frames. However, for ds.merge
in DataSHIELD it is required that all the vectors which dictate the merging
are explicitly identified for both data frames using the by.x.names
and
by.y.names
arguments.
Server function called: mergeDS
ds.merge
returns the merged data frame that is written on the server-side.
Also, two validity messages are returned to the client-side
indicating whether the new object has been created in each data source and if so whether
it is in a valid form.
DataSHIELD Development Team
## Not run:
## Version 6, for version 5 see the Wiki
# connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
#Create two data frames with a common column
ds.dataFrame(x = c("D$LAB_TSC","D$LAB_TRIG","D$LAB_HDL","D$LAB_GLUC_ADJUSTED"),
completeCases = TRUE,
newobj = "df.x",
datasources = connections)
ds.dataFrame(x = c("D$LAB_TSC","D$GENDER","D$PM_BMI_CATEGORICAL","D$PM_BMI_CONTINUOUS"),
completeCases = TRUE,
newobj = "df.y",
datasources = connections)
# Merge data frames using the common variable "LAB_TSC"
ds.merge(x.name = "df.x",
y.name = "df.y",
by.x.names = "df.x$LAB_TSC",
by.y.names = "df.y$LAB_TSC",
all.x = TRUE,
all.y = TRUE,
sort = TRUE,
suffixes = c(".x", ".y"),
no.dups = TRUE,
newobj = "df.merge",
datasources = connections)
# clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)