Mining from Massive Datasets

Course Information
TitleΕξόρυξη Γνώσης από Μεγάλα Σύνολα Δεδομένων / Mining from Massive Datasets
Cycle / Level2nd / Postgraduate
Teaching PeriodSpring
CoordinatorApostolos Papadopoulos
Course ID600016262

Programme of Study: PMS EPISTĪMĪ DEDOMENŌN KAI PAGKOSMIOU ISTOU (2018 éōs sīmera) MF

Registered students: 3
OrientationAttendance TypeSemesterYearECTS
KORMOSElective Courses belonging to the selected specialization217.5

Programme of Study: PMS EPISTĪMĪ DEDOMENŌN KAI PAGKOSMIOU ISTOU (2018 éōs sīmera) PF

Registered students: 4
OrientationAttendance TypeSemesterYearECTS
KORMOSElective Courses belonging to the selected specialization217.5

Class Information
Academic Year2021 – 2022
Class PeriodSpring
Faculty Instructors
Weekly Hours3
Class ID
Course Type 2021
Specialization / Direction
Course Type 2016-2020
  • Scientific Area
  • Skills Development
Course Type 2011-2015
Specific Foundation / Core
Mode of Delivery
  • Face to face
Language of Instruction
  • Greek (Instruction, Examination)
  • English (Instruction, Examination)
Learning Outcomes
Cognitive: 1. Students will get important knowledge in big data management and analytics. 2. They will will receive specialized knowledge on scaling clustering, association rule and graph mining techniques. 3. They will receive knowledge on managing data streams. Skills: 1. Students will be capable of applying data mining techniques on large datasets using tools such as Apache Spark 2. They will work in teams 3. They will be more confident by presenting their work in class 4. They will get in contact with modern big data analytics techniques with a lot of applications in Industry.
General Competences
  • Apply knowledge in practice
  • Retrieve, analyse and synthesise data and information, with the use of necessary technologies
  • Work autonomously
  • Work in teams
  • Generate new research ideas
Course Content (Syllabus)
Introduction to Big Data Management and Analytics - Hadoop: basic and advanved topics - The Hadoop ecosystem: HDFS, Hbase, Pig, Hive - NoSQL databases - Theoretical issues in MapReduce - The Scala programming language - The Spark platform: basic and advanced issues - Streaming, SQL, Machine Learning, GraphΧ: the basic libraries - Data exploration using SparkR - Algorithm design in Spark - Graph databases - Other systems: Giraph, GraphLab, Hama, BlinlkDB
big data, data management, data mining from big data, big data analytics
Educational Material Types
  • Slide presentations
  • Book
Use of Information and Communication Technologies
Use of ICT
  • Use of ICT in Course Teaching
  • Use of ICT in Laboratory Teaching
  • Use of ICT in Communication with Students
  • Use of ICT in Student Assessment
Course Organization
Reading Assigment100
Written assigments32
Student Assessment
Student Assessment methods
  • Written Exam with Extended Answer Questions (Summative)
  • Written Assignment (Formative, Summative)
  • Performance / Staging (Formative, Summative)
  • Written Exam with Problem Solving (Summative)
  • Report (Formative, Summative)
Course Bibliography (Eudoxus)
Jure Leskovec, Anand Rajaraman, Jeff Ullman: Mining of Massive Datasets
Additional bibliography for study
H. Karau, A. Konwinski, P. Wendell, M. Zaharia: Learning Spark, O' Reilly, 2015. N. Lynch: Distributed algorithms, Morgan Kaufmann, 1996. I. Robinson, J. Webber, E. Eifrem: Graph databases, O' Reilly, 2013. S. Ryza, U. Laserson, S Owen, J. Wills: Advanced analytics with Spark, O'Reilly, 2015. R. Schutt, C. O'Neil: Doing Data Science, O' Reilly, 2014. C.A. Varela, G. Agha: Programming distributed computing systems: a foundational approach, The MIT Press, 2013.
Last Update