MPIA and IMPRS Mini Lecture Course


Introduction to Machine Learning and Pattern Recognition


Thursdays, 14:00 to 15:00
Four lectures: 21 Feb, 28 Feb, 6 March, 13 March 2008
Hörsaal, Max-Planck-Institut für Astronomie
Dr. Coryn Bailer-Jones

Learning from data is an essential to every area of science. It has many applications including number plate recognition, classification in astrophysical surveys, climate modelling and prediction and diagnosis of diseases. Over the years, numerous algorithms and techniques have been invented (or reinvented) to address these kinds of issues, and come under a wide variety of names such as "machine learning", "pattern recognition", "statistical learning", "statistical data modelling" and so forth. The objective of this course is to provide an introduction to statistical and mathematical methods which are used for analysing data, identifing structure, classification and making predictions.

We shall look at the fundamental principles of modelling and see how these are implemented in various techniques. While basic mathematical concepts will be covered, formal or abstract definitions and derivations will be avoided. The examples in the course will make use of the (freely available) statistical software package R. (It is a programming language as well as an interactive command-line environment with built-in graphics.) Participants are encouraged to install this package and to use it for trying the machine learning methods covered in the course. I can provide some support for this: just contact me.

Techniques which will be covered (some in more depth than others)

as well as general concepts such as

This is an introductory course, so prior knowledge of or experience using machine learning methods is not required. Basic prerequisites for the course are first year mathematics, in particular calculus, linear algebra and statistics. The lectures will be in English. The course is suitable for mid-term or advanced undergraduates, graduates and postdocs, or anybody interested in learning about machine learning methods and how to use them. The lectures are of course open to anyone.


Lecture schedule and downloads

Course syllabus

PDF and ODP files of the viewgraphs, as well as copies of the R scripts used, will be linked to below after each lecture. These do not constitute a full set of lecture notes.

Date Topic Viewgraphs R scripts
21 February Lecture 1: [ODP] [PDF] R scripts
28 February Lecture 2: [ODP] [PDF] R scripts
6 March Lecture 3: [ODP] [PDF] R scripts
13 March Lecture 4: [ODP] [PDF] R scripts

Recommended texts

Hastie et al. is a pretty good general text, covering principles and specific methods. The three authors are some of the most inovative people in this field. Venables and Ripley provide a good introduction to R in general and covers the use of R for statistics and data modelling in particular. (S is the base language on which R is based. R is the free version. S-PLUS is the commercial version.) Both of these books are in the MPIA library. Bishop covers several general topics on prediction with high-dimensional data in addition to neural networks. There are, of course, many other books on the market covering machine learning and pattern recognition, but I should warn you that many are not that good. For more specific references, see this list plus here for some links.
Coryn Bailer-Jones, calj at mpia.de
Last updated 13 March 2008