After the course completion students will have a concrete view of the methods used for mining massive data sets.
Moreover, they will be able to apply some of these techniques in practice during their development of the course assignments.
Course Content (Syllabus)
- MapReduce, Spark and distributed data management
- Advanced issues in association rules, clustering and outlier detection
- Mining data streams
- Link analysis
- Recommendation systems
- Advanced hashing techniques (locality sensitive hashing)
data mining, streams, links, clustering, similarity
Additional bibliography for study
1. Jiawei Han, Micheline Kamber, Data Mining : Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011.
2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson Addison Wesley, 2006