Learning Outcomes
Cognitive:
1. Students will get important knowledge in big data management and analytics.
2. They will will receive specialized knowledge on scaling clustering, association rule and graph mining techniques.
3. They will receive knowledge on managing data streams.
Skills:
1. Students will be capable of applying data mining techniques on large datasets using tools such as Apache Spark
2. They will work in teams
3. They will be more confident by presenting their work in class
4. They will get in contact with modern big data analytics techniques with a lot of applications in Industry.
Course Content (Syllabus)
Introduction to Big Data Management and Analytics
- Hadoop: basic and advanved topics
- The Hadoop ecosystem: HDFS, Hbase, Pig, Hive
- NoSQL databases
- Theoretical issues in MapReduce
- The Scala programming language
- The Spark platform: basic and advanced issues
- Streaming, SQL, Machine Learning, GraphΧ: the basic libraries
- Data exploration using SparkR
- Algorithm design in Spark
- Graph databases
- Other systems: Giraph, GraphLab, Hama, BlinlkDB
Keywords
big data, data management, data mining from big data, big data analytics
Additional bibliography for study
H. Karau, A. Konwinski, P. Wendell, M. Zaharia: Learning Spark, O' Reilly, 2015.
N. Lynch: Distributed algorithms, Morgan Kaufmann, 1996.
I. Robinson, J. Webber, E. Eifrem: Graph databases, O' Reilly, 2013.
S. Ryza, U. Laserson, S Owen, J. Wills: Advanced analytics with Spark, O'Reilly, 2015.
R. Schutt, C. O'Neil: Doing Data Science, O' Reilly, 2014.
C.A. Varela, G. Agha: Programming distributed computing systems: a foundational approach, The MIT Press, 2013.