|Course level||Advanced Bachelor/Master, open to PhD staff and professionals|
|Session 2||20 July to 3 August 2019
|Recommended course combination||Session 1: Big Data in Society
Session 3: Big Data Management and Analysis in Linux, Operations Research: A Mathematical Way to Optimize Your World
|Co-ordinating lecturers||Andrea Bassi|
|Other lecturers||Dr. Meike Morren|
|Form(s) of tuition||Interactive seminar|
|Form(s) of assessment||Programming assignments, final examination|
This course focuses upon understanding statistical models and analysing the results whilst learning to work with R. As well as introducing the software to newcomers, it presents basic and more advanced statistics.
We start with descriptive statistics and visual representation of data, which is the first step for most statistical analyses. We then introduce the linear regression model, a widely used model with two main purposes: modeling relationships among the data and predicting future observations. After that we will extend the linear model to the generalized linear framework, in order to analyse non-normally distributed variables. In the second week we focus on a common problem in statistics: classification. We explore the two main areas of classification (supervised learning and unsupervised learning) with theory and examples.
Every day consists of short lectures with examples, and exercises in which you apply what you have learned right away. Each week you are supposed to make an assignment which is graded. The focus in the exercises and assignment is the coding in R and how to apply and to interpret generalized linear regression models. By the end of the two weeks you are acquainted with various popular R packages, can write your own functions and can use attractive plots to present your data.
Upon successful completion of the course, students will be able to:
• evaluate the quality of quantitative data sources
• choose the appropriate method for analysis, depending on the data source
• conduct various statistical tests
• analyze data using generalized linear framework
• handle multivariate data and classify them into categories
• have developed their skills in programming
Andrea Bassi holds a MSc in Engineering Mathematics (Polytechnic University of Milan), with a focus on Applied Statistics. After having worked in Italy as a statistical consultant, he started his PhD training in Biostatistics at the VU University Medical Center, on the BIOMARKER project. The goal of this project is to design a Bayesian adaptive clinical trial to decide on the optimal targeted treatment strategy for patients with diffuse large B-cell lymphoma (DLBCL). Furthermore, Andrea collaborates with the VU University as a teaching assistant in the area of biostatistics, for bachelor and master programs. His main research interests are Bayesian statistics, statistical programming and decision theory.
"Students should apply for Data analysis in R to discover the enormous potential of the open-source programming language R and for acquiring a series of skills and tools to analyze statistical problems of diverse nature."