| ds.histogram {dsBaseClient} | R Documentation |
ds.histogram function plots a non-disclosive histogram in the client-side.
ds.histogram(
x = NULL,
type = "split",
num.breaks = 10,
method = "smallCellsRule",
k = 3,
noise = 0.25,
vertical.axis = "Frequency",
datasources = NULL
)
x |
a character string specifying the name of a numerical vector. |
type |
a character string that represents the type of graph to display.
The |
num.breaks |
a numeric specifying the number of breaks of the histogram. Default value
is |
method |
a character string that defines which histogram will be created.
The |
k |
the number of the nearest neighbours for which their centroid is calculated.
Default |
noise |
the percentage of the initial variance that is used as the variance of the embedded
noise if the argument |
vertical.axis, |
a character string that defines what is shown in the vertical axis of the
plot. The |
datasources |
a list of |
ds.histogram function allows the user to plot
distinct histograms (one for each study) or a combined histogram that merges
the single plots.
In the argument type can be specified two types of graphics to display:
'combine': a histogram that merges the single plot is displayed.
'split': each histogram is plotted separately.
In the argument method can be specified 3 different histograms to be created:
'smallCellsRule': the histogram of the actual variable is
created but bins with low counts are removed.
'deterministic': the histogram of the scaled centroids of each
k nearest neighbours of the original variable
where the value of k is set by the user.
'probabilistic': the histogram shows the original distribution disturbed
by the addition of random stochastic noise.
The added noise follows a normal distribution with zero mean and
variance equal to a percentage of the initial variance of the input variable.
This percentage is specified by the user in the argument noise.
In the k argument the user can choose any value for k equal
to or greater than the pre-specified threshold
used as a disclosure control for this method and lower than the number of observations
minus the value of this threshold. By default the value of k is set to be equal to 3
(we suggest k to be equal to, or bigger than, 3). Note that the function fails if the user
uses the default value but the study has set a bigger threshold.
The value of k is used only if the argument
method is set to 'deterministic'.
Any value of k is ignored if the
argument method is set to 'probabilistic' or 'smallCellsRule'.
In the noise argument the percentage of the initial variance
that is used as the variance of the embedded
noise if the argument method is set to 'probabilistic'.
Any value of noise is ignored if the argument
method is set to 'deterministic' or 'smallCellsRule'.
The user can choose any value for noise equal to or greater
than the pre-specified threshold 'nfilter.noise'.
By default the value of noise is set to be equal to 0.25.
In the argument vertical.axis can be specified two types of histograms:
'Frequency': the histogram of the frequencies
is returned.
'Density': the histogram of the densities
is returned.
Server function called: histogramDS2
one or more histogram objects and plots depending on the argument type
DataSHIELD Development Team
## Not run:
## Version 6, for version 5 see the Wiki
# Connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
# Log onto the remote Opal training servers
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Compute the histogram
# Example 1: generate a histogram for each study separately
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
type = "split",
datasources = connections) #all studies are used
# Example 2: generate a combined histogram with the default small cells counts
suppression rule
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
method = 'smallCellsRule',
type = 'combine',
datasources = connections[1]) #only the first study is used (study1)
# Example 3: if a variable is of type factor the function returns an error
ds.histogram(x = 'D$PM_BMI_CATEGORICAL',
datasources = connections)
# Example 4: generate a combined histogram with the deterministic method for k=50
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
k = 50,
method = 'deterministic',
type = 'combine',
datasources = connections[2])#only the second study is used (study2)
# Example 5: create a histogram and the probability density on the plot
hist <- ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
method = 'probabilistic', type='combine',
num.breaks = 30,
vertical.axis = 'Density',
datasources = connections)
lines(hist$mids, hist$density)
# clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)