ds.replaceNA {dsBaseClient} | R Documentation |
This function identifies missing values and replaces them by a value or values specified by the analyst.
ds.replaceNA(x = NULL, forNA = NULL, newobj = NULL, datasources = NULL)
x |
a character string specifying the name of the vector. |
forNA |
a list or a vector that contains the replacement value(s), for each study. The length of the list or vector must be equal to the number of servers (studies). |
newobj |
a character string that provides the name for the output object
that is stored on the data servers. Default |
datasources |
a list of |
This function is used when the analyst prefers or requires complete vectors.
It is then possible the specify one value for each missing value by first returning
the number of missing values using the function ds.numNA
but in most cases,
it might be more sensible to replace all missing values by one specific value e.g.
replace all missing values in a vector by the mean or median value. Once the missing
values have been replaced a new vector is created.
Note: If the vector is within a table structure such as a data frame the new vector is appended to table structure so that the table holds both the vector with and without missing values.
Server function called: replaceNaDS
ds.replaceNA
returns to the server-side a new vector or table structure
with the missing values replaced by the specified values.
The class of the vector is the same as the initial vector.
DataSHIELD Development Team
## Not run:
## Version 6, for version 5 see the Wiki
# Connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
# Log onto the remote Opal training servers
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Example 1: Replace missing values in variable 'LAB_HDL' by the mean value
# in each study
# Get the mean value of 'LAB_HDL' for each study
mean <- ds.mean(x = "D$LAB_HDL",
type = "split",
datasources = connections)
# Replace the missing values using the mean for each study
ds.replaceNA(x = "D$LAB_HDL",
forNA = list(mean[[1]][1], mean[[1]][2], mean[[1]][3]),
newobj = "HDL.noNA",
datasources = connections)
# Example 2: Replace missing values in categorical variable 'PM_BMI_CATEGORICAL'
# with 999s
# First check how many NAs there are in 'PM_BMI_CATEGORICAL' in each study
ds.table(rvar = "D$PM_BMI_CATEGORICAL",
useNA = "always")
# Replace the missing values with 999s
ds.replaceNA(x = "D$PM_BMI_CATEGORICAL",
forNA = c(999,999,999),
newobj = "bmi999")
# Check if the NAs have been replaced correctly
ds.table(rvar = "bmi999",
useNA = "always")
# Clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)