CS28013 Design and Analysis of Algorithms for Big Data 大数据算法设计与分析

 

课程名称 (Course Name) Design and Analysis of Algorithms for Big Data

课程代码 (Course Code): CS28013

学分/学时 (Credits/Credit Hours)3

开课时间 (Course Term )Fall Semester   

开课学院(School Providing the Course:  SEIEE

任课教师(Teacher:  Xiaotie Deng

课程讨论时数(Course Discussion Hours:  

课程实验数(Lab Hours:   

课程内容简介(Course Introduction):

Big data has posed a great challenge to algorithmic studies for the delivery of efficient solutions to process massive data, especially with real time data analysis needs. This course aims to provide students with the conceptual framework for “good” algorithms, and the proper algorithmic tools for the design and analysis of algorithms for big data.

We discuss randomized algorithms, probabilistic data structures and techniques that are in wide use for approximate answers in big data analysis. 

教学大纲(Course Teaching Outline):

Probabilistic data structure such as Bloom filters, HyperLogLog, count-min sketch; stream algorithm dealing with dimension reduction, compressed sensing, matrix sparsification;

Internet applications such as recommendation system, computational advertising, large-scale machine learning. Secondary memory algorithms and map reduce algorithms.

Tentative Schedule (based on 3 credit 48 hours of lectures):

Introduction and Basics (2 week): probabilistic data structure, randomized techniques.

Stream algorithms (5 weeks): approximate counting, dimensional reduction, linear regression, compressed sensing.

Map reduce algorithms (3 weeks): EM and HMM Algorithms.

Secondary memory algorithms (2 weeks): disk operations, b-trees, sorting.  

Applications (2weeks): selected topics: recommendation, web information retrieval.

New Frontiers (2 week): selected topics: TBA.

课程进度计划(Course Schedule):

Probabilistic data structure such as Bloom filters, HyperLogLog, count-min sketch; stream algorithm dealing with dimension reduction, compressed sensing, matrix sparsification;

Internet applications such as recommendation system, computational advertising, large-scale machine learning. Secondary memory algorithms and map reduce algorithms.

Tentative Schedule (based on 3 credit 48 hours of lectures):

Introduction and Basics (2 week): probabilistic data structure, randomized techniques.

Stream algorithms (5 weeks): approximate counting, dimensional reduction, linear regression, compressed sensing.

Map reduce algorithms (3 weeks): EM and HMM Algorithms.

Secondary memory algorithms (2 weeks): disk operations, b-trees, sorting. 

Applications (2weeks): selected topics: recommendation, web information retrieval.

New Frontiers (2 week): selected topics: TBA.

课程考核要求(Course Assessment Requirements)

Home work: 20%

Midterm: 20%

Project: 10%

Final Exam: 50%

参考文献(Course References)

Jimmy Lin and Chris Dyer, Data Intensive Text Processing with Map Reduce, 2010.

Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. Mining of Massive Datasets, 2014.

Charu C. Aggarwal and Philip S. Yu. A Survey of Synopsis Construction in Data Stream, 2006.

Probabilistic Data Structures for Web Analytics and Data Mining. 2012

Damian Cryski, Probability Data Structure for Go, 2014

预修课程(Prerequisite Course

Discrete Math, design and analysis of algorithms

[ 2016-09-21 ]