Learning Outcomes
1) To understand and be able to use different kinds of data sources such as clinical data, censuses, sample surveys administrative sources, big data as also as to use the statistical computer programme R.
2) To be able to design and manage data production processes.
3) To be able to use ordinary regression and generalized regression models for observational studies and to examine lack of fit and pure errors problems.
4) To be able to select an appropriate transformation either for data normalization or to equalize the spread of data
5) To use specific tests to identify outlier variance or outliers in the content of anova models
6) To formulate multilevel models for clustered data, longitudinal data and data derived from randomized block design experiments, as also as generalized linear models to analyze observational studies.
7) To be able to apply the above specific statistical techniques in the context of official statistics and have the ability to present data in an effective way to different kinds of audience.
Course Content (Syllabus)
The characteristic functions for the multivariate random variables -
The multivariate normal distribution and related topics - Application in statistical
analysis (Cochran’s theorem, ANOVA, regression, X2) - Statistical inference: The
Neyman-Pearson lemma - Likelihood ratio test and related procedures - decision
theory.
Laboratory in R: The open-source software R, for statistical computing and graphics employed within the integrated development environment (IDE), RStudio. Lack of fit and testing for pure errors in regression models (case study with laboratory data that have repetitions on the same measurement). The need for transforming data and testing for outlying variance and outliers in Anova (case study with laboratory data on the use of Box Cox power transformation, Tukey’s spread-level plot, Dixon test, Grubbs test, Cochran test, box and whisker plot). Multilevel models (case study with a two level random intercept model). The analysis of a randomized block design experiment with a multilevel model. Random coefficient models for longitudinal clinical data (case study with experimental data). Analysis of covariance (case study with clinical data). The generalized linear model (a case study with data from an observational study). Case studies from official statistics
Keywords
normal distribution, ANOVA, Neyman-Pearson lemma, Likelihood ratio test, Linear regression, Generalized linear models, Multilevel models, Data transformation, Official statistics, R
Additional bibliography for study
References
1. Lehman E.L. (1986), Testing Statistical hypotheses. John Wiley & Sons.
2. Patrick Billingsley (1995), Probability and Measure. John Wiley & Sons.
3. Feller W. (1971), An Introduction to probability theory and its
Applications.
John Wiley & Sons.
4. Dacunha Castelle P. and Duflo M. (1986), Probability and Statistics.
Volume I
and II. Springer-Verlag.
5. F. Kolyva-Machera (1998), Mathematical Statistics. Ziti,Thessaloniki.
6. Crawley J.M.. The R Book. John Wiley & Sons Ltd (2007)
7. Komsta L. Processing data for outliers. The Newsletter of the R Project. (2006)
8. Searle R. S. Linear Models. Wiley Classics Library (1997).
9. Fox J., Weisberg S.H. An R Companion to Applied Regression. Sage (2010).
10. Faraway J.J. Linear Models with R. CRC Press Taylor & Francis Group. (2015)
11. Faraway J.J. Extending the linear model with R : generalized linear, mixed effects and nonparametric regression models. Chapman & Hall/CRC (2016).
12. McCulloch E.C., Searle R.S. Generalized, Linear, and Mixed Models. Wiley-Interscience (2001).
13. Pinheiro C.J., Bates M.D. Mixed-Effects Models in S and S-PLUS. Springer New York (2000).
14. West T.B., Welch T.A., Galecki T.A., Linear Mixed Models. A Practical Guide Using Statistical Software. Chapman & Hall/CRC (2007).