Big Data Management and Analysis in Linux

Analyze Large Datasets
The growing availability of extremely large datasets requires scientists and analysts to use powerful supercomputers or computer clusters to store, manage, and analyze these data. These clusters typically run on Linux, which requires some programming skills and insights into suitable software packages. Our course will introduce you to programming in a Linux environment, teach you how to efficiently manage very large datasets (e.g. using sed, awk, and grep commands) and create simple shell scripts to analyze your data (e.g. using a Linux version of the freely available statistics program R). You will also learn how to visualize your data and results in customized plots and figures. These skills are extremely valuable for scientists from all disciplines as well as for business practitioners (e.g. consultants or financial analysts) who are planning to work with big data.
Course levelAdvanced Bachelor/Master, open to PhD staff and professionals
Session 3
3 August to 17 August 2019
Recommended course combination
Session 1 and 2: Data Analysis in R
Session 2: Programming in Python
Co-ordinating lecturerDr. Aysu Okbay
Other lecturersRichard K. Linnér
Form(s) of tuitionInteractive seminars, practicals
Form(s) of assessmentProgramming assignments
ECTS3 credits
Contact hours45
Total tuition fee€1150

Scientists and data analysts from all disciplines, as well as business practitioners (e.g. consultants or financial analysts) who are planning to work with big data. If you have doubts about your eligibility for the course, please let us know. Our courses are multi-disciplinary and therefore are open to students with a wide variety of backgrounds.

The course will be fairly technical, combined with many computer tutorials. There are no entry requirements other than a willingness to learn about programming Linux, but a decent background in statistics, mathematics, and programming is an advantage.

The format of the course is three hour lectures in the morning, followed by two hours of supervised work in computer tutorials in the afternoon. Both the lectures and tutorials will be held in a computer room. The lectures will be interactive, with short examples that allow students to apply the introduced concepts. In the tutorials, students will get more hands-on training in a supervised environment with exercises covering the day’s topics, and they will have the opportunity to work on the assignments. The computer room will stay open to students for self-study after the tutorials.  

Students are not required to bring their own laptops, but they are allowed to do so if they wish to work on their own computers.

By the end of this course, the student should understand and feel comfortable with:

  • Basic Linux programming
  • The Unix philosophy and environment; files, processes, pipes, filters and basic utilities
  • Login and logout procedures
  • File transfer between systems
  • Text file manipulation with sed, awk, cut, paste, cat, etc.
  • Basic text editing using the vim editor
  • Automation through functions, control structures and shell scripts
  • Version control with Git
  • Working with R through the UNIX command line 
  • Plotting in R

Visit to the SURFsara computer facilities at Amsterdam Science Park.
To be announced
Facebook-icoon   linkedin-icoon   Instagram-icoon