ds.glm {dsModellingClient}R Documentation

Runs a combined GLM analysis of non-pooled data


A function fit generalized linear models


ds.glm(formula = NULL, data = NULL, family = NULL, offset = NULL,
  weights = NULL, checks = FALSE, maxit = 15, CI = 0.95,
  viewIter = FALSE, datasources = NULL)



a character, a formula which describes the model to be fitted


a character, the name of an optional data frame containing the variables in in the formula. The process stops if a non existing data frame is indicated.


a description of the error distribution function to use in the model


a character, null or a numeric vector that can be used to specify an a priori known component to be included in the linear predictor during fitting.


a character, the name of an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector.


a boolean, if TRUE (default) checks that takes 1-3min are carried out to verify that the variables in the model are defined (exist) on the server site and that they have the correct characteristics required to fit a GLM. The default value is FALSE because checks lengthen the runtime and are mainly meant to be # used as help to look for causes of eventual errors.


the number of iterations of IWLS used instructions to each computer requesting non-disclosing summary statistics. The summaries are then combined to estimate the parameters of the model; these parameters are the same as those obtained if the data were 'physically' pooled.


a numeric, the confidence interval.


a boolean, tells whether the results of the intermediate iterations should be printed on screen or not. Default is FALSE (i.e. only final results are shown).


a list of opal object(s) obtained after login to opal servers; these objects also hold the data assigned to R, as a dataframe, from opal datasources.


starting values for the parameters in the linear predictor


It enables a parallelized analysis of individual-level data sitting on distinct servers by sending


coefficients a named vector of coefficients

residuals the 'working' residuals, that is the residuals in the final iteration of the IWLS fit.

fitted.values the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

rank the numeric rank of the fitted linear model.

family the family object used.

linear.predictors the linear fit on link scale.



See Also

ds.lexis for survival analysis using piecewise exponential regression

ds.gee for generalized estimating equation models



 # load the file that contains the login details

 # login and assign all the variables to R
 opals <- datashield.login(logins=glmLoginData, assign=TRUE)

 # Example 1: run a GLM without interaction (e.g. diabetes prediction using BMI and HDL levels and GENDER)
 mod <- ds.glm(formula='D$DIS_DIAB~D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL', family='binomial')
 # Example 2: run the above GLM model without an intercept
 # (produces separate baseline estimates for Male and Female)
 mod <- ds.glm(formula='D$DIS_DIAB~0+D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL', family='binomial')
 # Example 3: run the above GLM with interaction between GENDER and PM_BMI_CONTINUOUS
 mod <- ds.glm(formula='D$DIS_DIAB~D$GENDER*D$PM_BMI_CONTINUOUS+D$LAB_HDL', family='binomial')
 # Example 4: Fit a standard Gaussian linear model with an interaction
 mod <- ds.glm(formula='D$PM_BMI_CONTINUOUS~D$DIS_DIAB*D$GENDER+D$LAB_HDL', family='gaussian')
 # Example 5: now run a GLM where the error follows a poisson distribution
 # P.S: A poisson model requires a numeric vector as outcome so in this example we first convert
 # the categorical BMI, which is of type 'factor', into a numeric vector
 mod <- ds.glm(formula='BMI.123~D$PM_BMI_CONTINUOUS+D$LAB_HDL+D$GENDER', family='poisson')

 # clear the Datashield R sessions and logout

[Package dsModellingClient version 4.1.0 ]