Course Content (Syllabus)
Introduction to Big Data Management and Analytics
- Hadoop: basic and advanved topics
- The Hadoop ecosystem: HDFS, Hbase, Pig, Hive
- NoSQL databases
- Theoretical issues in MapReduce
- The Scala programming language
- The Spark platform: basic and advanced issues
- Streaming, SQL, Machine Learning, GraphΧ: the basic libraries
- Data exploration using SparkR
- Algorithm design in Spark
- Graph databases
- Other systems: Giraph, GraphLab, Hama, BlinlkDB
Keywords
big data, data management, data mining from big data, big data analytics
Additional bibliography for study
M. Zaharia, B. Chambers: Spark, the definitive guide, O' Reilly, 2018.
H. Karau, A. Konwinski, P. Wendell, M. Zaharia: Learning Spark, O' Reilly, 2015.
N. Lynch: Distributed algorithms, Morgan Kaufmann, 1996.
I. Robinson, J. Webber, E. Eifrem: Graph databases, O' Reilly, 2013.
S. Ryza, U. Laserson, S Owen, J. Wills: Advanced analytics with Spark, O'Reilly, 2015.
R. Schutt, C. O'Neil: Doing Data Science, O' Reilly, 2014.
C.A. Varela, G. Agha: Programming distributed computing systems: a foundational approach, The MIT Press, 2013.