我院讲座教授、美国印第安纳大学Bloomington分校信息与计算学院副教授Liu Xiaozhong博士自6月5号起到南开访问并开设Seminar，以下是Seminar的大纲，欢迎感兴趣的师生参加（面向全院，不限专业）！
请有兴趣参加的师生提前注册，我们好安排合适的教室。注册邮件请发送到白文琳老师的邮箱，邮件请注明姓名、年级、专业及研究方向。
联系人: 白文琳（wenlin-bai@163.com）
Big Data Analytics for Information Science
Xiaozhong Liu
Associate Professor
School of Informatics and Computing
Indiana University Bloomington
Data science is a rising discipline that uses data to effectively characterize, interpret or predict complex real-world problems. The importance of data science also lies in its huge potential of changing our current way of doing science and social sciences. Big data, featuring in its high heterogeneity and volume, calls for our new understanding and skills towards data and data operations. Due to these features, new methods are needed to process and analyze large-scale data. The class mainly focuses on analytics of two types of big data: web and text data, and we will introduce a number of key and novel methodologies/applications in information science.
This course introduces the fundamentals of data science and big data analysis by focusing on: theoretical aspects, such as their philosophical grounds and implications, and methodological aspects, such as numerical and textual data processing, basic statistical analysis and machine learning, data retrieval and recommendation, data representation and semantics, big data storage, along with several case studies. In addition, this course will introduce and demonstrate open source data-operation framework and tools, e.g., R, Hadoop, Lucene, and NoSQL (MongoDB and Neo4j), which students will use to design class projects with provided real-world data sets.
The course has two goals: to develop students’ conceptual understanding of how data science is revolutionizing classical information management and scientific inquiry (in information science), and second, to help students acquire hands-on experience and basic implementation capabilities to grasp data science skills and apply to problems of other scientific fields.
In this course, students will learn:
· The most important and current grounding philosophies, theories, and models for data science
· How to view real-world problems from lens of these theories and models and solve these problems using the data science perspective (case studies)
· Basic data processing and statistical analysis methods
· Basic machine learning, data retrieval, ranking, and recommendation algorithms
· Basics of R, Lucene, Hadoop and NoSQL (MongoDB and Neo4j)
Schedule
Lecture | Date | Topic | Assignment |
1 | 6/5 (8:00-12:00) | Introduction to Data Science | 1 |
2 | 6/7 (8:00-12:00) | Data sampling, processing and Basic statistical analysis methods | |
3 | 6/9 (8:00-12:00) | Machine learning and Network Analysis for Information Science | 2 |
4 | 6/12 (8:00-12:00) | Text mining for Information Science, NoSQL | |
5 | 6/14 (8:00-12:00) | Scientific Data Mining/Analysis |
Course Materials & Readings
The lectures are self-contained and there is no required textbook. Many readings are from the following two online open books:
1) Zhao, Y. (2012). R and Data Mining: Examples and Case Studies. Academic Press. Available at http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
2) O'Reilly Radar Team (2011). Big Data Now: Current Perspectives from O'Reilly Radar. O'Reilly Media. Available at http://www.onmeedia.com/donwloads/Big_Data_Now_Current_Perspectives_from_OReilly_Radar.pdf