Big Data Analytics for Information Science
School of Informatics and Computing
Indiana University Bloomington
Data science is a rising discipline that uses data to effectively characterize, interpret or predict complex real-world problems. The importance of data science also lies in its huge potential of changing our current way of doing science and social sciences. Big data, featuring in its high heterogeneity and volume, calls for our new understanding and skills towards data and data operations. Due to these features, new methods are needed to process and analyze large-scale data. The class mainly focuses on analytics of two types of big data: web and text data, and we will introduce a number of key and novel methodologies/applications in information science.
This course introduces the fundamentals of data science and big data analysis by focusing on: theoretical aspects, such as their philosophical grounds and implications, and methodological aspects, such as numerical and textual data processing, basic statistical analysis and machine learning, data retrieval and recommendation, data representation and semantics, big data storage, along with several case studies. In addition, this course will introduce and demonstrate open source data-operation framework and tools, e.g., R, Hadoop, Lucene, and NoSQL (MongoDB and Neo4j), which students will use to design class projects with provided real-world data sets.
The course has two goals: to develop students’ conceptual understanding of how data science is revolutionizing classical information management and scientific inquiry (in information science), and second, to help students acquire hands-on experience and basic implementation capabilities to grasp data science skills and apply to problems of other scientific fields.
In this course, students will learn:
· The most important and current grounding philosophies, theories, and models for data science
· How to view real-world problems from lens of these theories and models and solve these problems using the data science perspective (case studies)
· Basic data processing and statistical analysis methods
· Basic machine learning, data retrieval, ranking, and recommendation algorithms
· Basics of R, Lucene, Hadoop and NoSQL (MongoDB and Neo4j)
Introduction to Data Science
Data sampling, processing and Basic statistical analysis methods
Machine learning and Network Analysis for Information Science
Text mining for Information Science, NoSQL
Scientific Data Mining/Analysis
Course Materials & Readings
The lectures are self-contained and there is no required textbook. Many readings are from the following two online open books:
1) Zhao, Y. (2012). R and Data Mining: Examples and Case Studies. Academic Press. Available at http://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf
2) O'Reilly Radar Team (2011). Big Data Now: Current Perspectives from O'Reilly Radar. O'Reilly Media. Available at http://www.onmeedia.com/donwloads/Big_Data_Now_Current_Perspectives_from_OReilly_Radar.pdf