Big Data Management and Analysis in UNIX

An introduction to the UNIX command line environment
With the growing availability of extremely large datasets, these days, scientists and analysts need to use powerful supercomputers or computer clusters to store, manage and analyse their data. These servers typically run on UNIX, which requires some programming skills and understanding of relevant software packages to get the job done.
Course levelAdvanced Bachelor/Master
Block 3
5 to 19 August
Recommended course combination
Block 1: The Economics of Vibrant Cities, Creativity and Innovation
Block 2: Data Analysis in ROrganizational Behaviour Management, Big Ideas in Computer Science
Co-ordinating lecturerDr. Aysu Okbay
Other lecturersRichard K. Linnér
Form(s) of tuitionInteractive seminars, practicals
Form(s) of assessmentProgramming assignments
ECTS3 credits
Contact hours45
Total tuition fee€1000

Scientists and data analysts from all disciplines, as well as business practitioners (e.g. consultants or financial analysts) who are planning to work with big data. If you have doubts about your eligibility for the course, please let us know. Our courses are multi-disciplinary and therefore are open to students with a wide variety of backgrounds.

None, but familiarity with basic statistical concepts and programming is a plus.

With the growing availability of extremely large datasets, these days, scientists and analysts need to use powerful supercomputers or computer clusters to store, manage and analyse their data. These servers typically run on UNIX, which requires some programming skills and understanding of relevant software packages to get the job done.

Our course introduces you to the UNIX command line environment, teaches you how to manage large datasets using text processing utilities such as sed and awk, shows the basics of shell scripting (if/else statements, loops, etc.) which you can use to automate analysing your data (for example, using a UNIX version of the freely available statistics program R), and familiarizes you with Git as a version control tool. You also learn how to present your data and results in customized plots and figures.
  • You understand the Unix philosophy and environment: files, processes, pipes, filters and basic utilities.
  • You are familiar with login and logout procedures, including remote login using SSH, and setting, protecting and changing passwords.
  • You can transfer files between systems with SFTP, SCP and RSYNC.
  • You can manipulate text files with sed, awk, cut, paste, cat, etc.
  • You can edit text using the VI editor.
  • You are familiar with automation through shell scripts.
  • You are familiar with version control using Git.
  • You can work with R from the UNIX command line.
  • You can plot in R.
Visit to the SURFsara computer facilities at Amsterdam Science Park.
To be announced
Facebook-icoon   linkedin-icoon   Instagram-icoon