MEDFL5155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data

Course content

The course considers methods integral to data analysis in modern molecular medical research. It is planned that this course will be part 1 of a series of two courses on this topic. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.

High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods. We will introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:

a) high-throughput screening (multiple testing and group tests),

b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),

c) supervised learning (classification and prediction, cross-validation and bootstrapping).

We will also introduce reference sources and biological databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.

Methods will be demonstrated by replicating analyses from publications and real-life gene expression data will be used in the computer labs.

To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.

Learning outcome

Knowledge:
Learn important statistical and bioinformatics concepts for analysing molecular data. Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data. Know important biological databases and relevant statistics/ bioinformatics software tools.

Skills:
Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software. Perform basic analyses of high-throughput biological data using R and Bioconductor. Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.

Admission

The course is restricted to students at the Medical Student Research Programme at the Faculty of Medicine and the Faculty of Dentistry, UiO.

Course registration:

  • Students apply in StudentWeb.
  • Enrollment to this course is automatically registered in StudentWeb. Applicants will be notified immediately if their application to the course is granted.

The courses MEDFL5155 and MF9155 have common admission.

Prerequisites

Formal prerequisite knowledge

Students should have passed the exam in an introductory course in statistics (e.g. MF9130). They should also have some experience with the statistical programming language R and have basic familiarity with the Unix shell, for example by having completed a software carpentry workshop.

To gain sufficient experience with R, students could for example complete an introductory online course or follow a software carpentry course at UiO.

Recommended previous knowledge

Students should have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology, biochemistry, or similar.

Overlapping courses

5 credits overlap with MF9155

Teaching

The teaching will be organized as an intensive course over five full days. There will be lectures coupled with hands-on practicals and example data analyses in the computer labs. Students will need to allow for sufficient time in advance for course preparations, which include some required reading, as well as after the course for the take-home exam. The practicals will take place in the same lecture hall as the lectures. Students will need to bring their own laptops with R/Bioconductor and RStudio installed to be able to follow the computer exercises.

You have to participate in at least 80 % of the teaching to be allowed to take the exam. Attendance will be registered.

Examination

Take-home exam in the form of a comprehensive data analysis task based on a recent publication, to be submitted four weeks after completion of the course.

Submit assignments in Inspera

You submit your assignment in the digital examination system Inspera. Read about how to submit your assignment.

Use of sources and citation

You should familiarize yourself with the rules that apply to the use of sources and citations. If you violate the rules, you may be suspected of cheating/attempted cheating.

Language of examination

The examination text is given in English, and you submit your response in English.

Grading scale

Grades are awarded on a pass/fail scale. Read more about the grading system.

Explanations and appeals

Resit an examination

Withdrawal from an examination

It is possible to take the exam up to 3 times. If you withdraw from the exam after the deadline or during the exam, this will be counted as an examination attempt.

Evaluation

The course is subject to continuous evaluation. At regular intervals we also ask students to participate in a more comprehensive evaluation.

Facts about this course

Credits
5
Level
Master
Teaching
Every autumn

6 days course.

Teaching autumn 2024:  Dates to be announced in May.  Application period: 1.6.2024 - 1.10.2024.

Course registration:  See information on how to apply in the section "Admission" in the course description below.

Examination
Every autumn
Teaching language
English