Graduate

Data Science Courses 2017-18

A list of Data Science online courses offered in current/upcoming semesters is available through the tabs below.

Note: Unless otherwise specified, all courses listed are worth 3 credit hours.

Department: Statistics

Class: S520 Introduction to Statistics
Instructor: Jianyu Wang
Synopsis: This course introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual datasets. S320 is the basic version of this course, intended for undergraduates. It is the gateway to more advanced courses offered by the Department of Statistics. S520 is an expanded version of S320 that covers additional material. S520 serves two constituencies: Graduate students in quantitative disciplines who are looking for a solid introduction to statistics and who may want to take additional courses in statistics, and graduate students pursuing an M.S. in Applied Statistics who desire a more gentle introduction to the fundamental principles of statistical inference than is provided in the more theoretical STAT S620.


Department: Informatics

Class: I590 SQL and NOSQL
Instructor: Ying Ding
Synopsis: A database is the central focus in data science to store and manage data. Relational databases have empowered major industries for decades and are still widely adopted. In our new era of Big Data, the database landscape is undergoing significant change. Many non-relational databases become an important part of the enterprise data architecture of companies. Relational databases were developed long before the Internet and the Web to tackle the issues of central-controlled data storage and management. NoSQL databases emerged with the rise of Internet and Web applications to connect companies with customers (i.e., online or mobile) and to develop agility to adapt to faster changes. The new challenges of being agile and being able to accommodate data variablity/data integration drove enterprises to turn to NoSQL database technology. It is important for every data scientist to master the skills of current databases and know about the future of databases in a world of NoSQL. This course aims to provide the basic overview of the current database landscape, starting with relational databases and SQL, and moving to several different NoSQL databases, such as XML database and MongoDB.


Class: I590 Network Science

Instructor: Yong-Yeol Ahn
Synopsis: Networks are everywhere. We can easily find network structure in many complex systems around us: our cells, brains, society, etc. The inherent generality of network approach allowed wide applications of network theory to flourish across diverse fields including biology, sociology, and epidemiology. The questions that we will address in the class are the following: why do networks matter? What are the fundamental theories to understand the structure and dynamics of networks? How has it been applied to other fields? What are the frontiers of the research? We will explore key papers ranging from the fundamental theory to the various applications of network theory. This course will focus more on round-table discussion between students than presentation. Students will work on research projects in groups and finish a paper at the end of the class.


Class: I590 Applied Data Science

Instructor: Joanne Luciano
Synopsis: This purpose of this course is to provide Data Science graduate students with practical experience applying their data science skill sets to real-world datasets. Data for the first offering of this course in 2017 used a deidentified clinical trials dataset provided by Eli Lilly (agreement already in place with IU), but subsequent offerings could include public data or data provided by other industry partners. Students will be led through the full data analysis process of data preparation, model planning, model building, analysis, and communication of results. Students will meet (virtually or physically) daily to devise a plan.


Class: I590 Data Science On-Ramp
Instructor: Ying Ding
Credit Hours: 1 - 3
Synopsis: A course dealing with self-paced modules to build and strengthen core competencies necessary for Data Science curriculum. Individual lessons vary from beginner to intermediate and will cover C++, MongoDB, R, Java, Python, Tableau, SQL, Hadoop/MapReduce, Spark, Scala, Github, Web Scraping, and Text Mining (NLP). If you would like descriptions of each lesson and how these will be mapped to credit, please consult Professor Ying Ding for more information.


Class: I590 Python
Instructor: Vel Melbasa
Synopsis: This course provides a gentle yet intense introduction to programming with Python for students who have little or no prior experience in programming. Python, an open-source language that allows rapid application development of both large and small software ystems, is object-oriented by design and provides an excellent platform for learning the basics of language programming. The course will focus on planning and organizing programs, and developing high quality working software that solves real problems.


Class: I591 Graduate Internship
Instructor: Ying Ding
Credit Hours: 0 - 6
Synopsis: Students gain professional work experience in an industry or research organization setting, using skills and knowledge acquired in Informatics course work. May be repeated for a maximum of 6 credit hours.

Class: I699 Independent Study
Instructor: Martin Siegel
Credit Hours: 1 - 3
Synopsis: Independent readings and research for MS students under the direction of a faculty member, culminating in a written report.

Department: Computer Science

Class: B505 Applied Algorithms
Instructor: Funda Ergun
Synopsis: The course studies the design, implementation, and analysis of algorithms and data structures as applied to real world problems. Topics include divide-and-conquer, optimization, and randomized algorithms applied to problems such as sorting, searching, and graph analysis. Students will learn about trees, hash tables, heaps, and graphs.

Class: B551 Elements of Artificial Intelligence
Instructor: David Crandall
Synopsis: Introduction to major issues and approaches in artificial intelligence. Principles of reactive, goal-based, and utility-based agents. Problem-solving and search. Knowledge representation and design of representational vocabularies. Inference and theorem proving, reasoning under uncertainty, and planning. Overview of machine learning.

Class: B649 Privacy & Security in the IOT
Instructor: Jean Camp
Synopsis: Security and privacy lapses in the Internet of Things can cause real and significant harm to people, their pets, and their homes. Computer security and privacy for an IoT ecosystem is fundamentally important and challenging. From a human-centered design perspective, complex issues arise when designing technologies for a diverse collection of stakeholders, including vulnerable populations such as children and those using in-home care technologies. From a technical perspective, security and privacy are challenging not only because of the properties of IoT devices themselves but also due to risks that emerge only when technologies are combined in unexpected ways. IoT devices will be pervasive, and may have very constrained computational, communications, and energy resources. Meeting these challenges requires a large, interdisciplinary effort. A holistic approach to IoT security and privacy integrates human-computer interaction, network security, cryptography, and pervasive computing. The translation layer requires an undertsanding of people’s privacy and security requirements and the ability to express these as cryptographically enforced data controls.


Department: Information and Library Science

Class: Z639 Social Media Mining
Instructor: Vincent Malic
Synopsis: This course provides a graduate-level introduction to social media mining and methods. It offers hands-on experience mining social data for social meaning extraction (focusing on sentiment analysis) using automated methods and machine learning technologies. We will read, discuss, and critique claims and findings from contemporary research related to SMM.

Department: Informatics

Class: I520 Security for Networked Systems
Instructor: Raquel Hill
Synopsis: This course is an extensive survey of system and network security. Course materials cover the threats to information confidentiality, integrity and availability, and the defense mechanisms that control such threats. It provides the foundation for more advanced security courses and hands-on experiences through course projects.

Class: I523 Big Data Applications and Analytics
Instructor: Gregor Von Laszewski
Synopsis: The Big Data Applications & Analytics course is an overview course in Data Science and covers the applications and technologies (data analytics and clouds) needed to process the application data. It is organized around this rallying cry: Use Clouds running Data Analytics Collaboratively processing Big Data to solve problems in XInformatics

Class: I525 Organizational Informatics & Economics of Security
Instructor: Jean Camp
Synopsis: Security technologies make explicit organizational choices that allocate power. Security implementations allocate risk, determine authority, reify or alter relationships, and determine trust extended to organizational participants. The course begins with an introduction to relevant definitions (security, privacy, trust) and then moves to a series of timely case studies of security technologies.


Class: I535 Management, Access, and Use of Big and Complex Data
Instructor: Inna Kouper
Synopsis: Data is abundant, offering potential for new discovery along with economic and social gain. But data has its difficulties. It can be noisy and inadequately contextualized. There can be too big a gap from data to knowledge, or due to limits in technology or policy not easily combined with other data. This course will examine the underlying principles and technologies needed to capture data, as well as clean, contextualize, store, access, and trust it for a repurposed use. Specifically we will cover 1) distributed systems and database concepts underlying noSQL and graph databases, 2) best practices in data pipelines, 3) foundational concepts in metadata and provenance plus examples, and 4) developing theory in data trust and its role in reuse.


Class: I590 Applied Data Mining
Instructor: Mehmet Dalkilic
Synopsis: TBA

Class: I590 Applied Data Mining
Instructor: Joanne Luciano
Synopsis: The aim of the Applied Data Science course is to provide the skills needed to apply data science principles on real world applications at every stage in the data science workflow. The course is organized around each stage covering the algorithms, best practices, and evaluation criteria. Both good and bad application examples will be discussed to help the student develop an intuition and deeper understanding of the choice of algorithm for the data, and the development of the best practices and methods for evaluating results of different approaches. Students will learn Tableau and use it to to visually analyze and report data.


Class: I590 Data Science for Drug Discovery
Instructor: Joanne Luciano
Synopsis: With exploding healthcare costs, greater longevity and the widespread health challenges of diabetes, obesity, cancer and cardiovascular disease, today's medicine and healthcare will be a primary scientific and economic focus for the remainder of this century. Informatics and big data promise an understanding of health, disease and treatment on a scale never before imagined. This course will address the big data techniques that are being used in the drug discovery, healthcare and translational medicine domains. Some specific topics covered will include large-scale, integrated molecular datasets; cheminformatics and bioinformatics in a big data domain; storing and data mining of electronic medical records; visualization and mapping of diseases; bridging the clinical and molecular; smart devices for smart health; and data mining for healthcare economics.


Class: I590 Data Science On-Ramp
Instructor: Ying Ding
Credit Hours: 1 - 3
Synopsis: A course dealing with self-paced modules to build and strengthen core competencies necessary for Data Science curriculum. Individual lessons vary from beginner to intermediate and will cover C++, MongoDB, R, Java, Python, Tableau, SQL, Hadoop/MapReduce, Spark, Scala, Github, Web Scraping, and Text Mining (NLP). If you would like descriptions of each lesson and how these will be mapped to credit, please consult Professor Ying Ding for more information.


Class: I590 Data Semantics
Instructor: Ying Ding
Synopsis: The class explores the technologies of the Semantic Web by examining the application of technologies to WWW information delivery and the principles of formal logic and computation guiding their development.

Class: I590 Data Visualization
Instructor: Yong-Yeol Ahn
Synopsis: From dashboards in a car to cutting-edge scientific papers, we extensively use visual representation of data. As our world becomes increasingly connected and digitized and as more decisions are being driven by data, data visualization is becoming a critical skill for every knowledge worker. In this course we will learn fundamentals of data visualization and create visualizations that can provide insights into complex datasets.

Class: I590 Python
Instructor: Vel Melbasa
Synopsis: This course provides a gentle yet intense introduction to programming with Python for students who have little or no prior experience in programming. Python, an open-source language that allows rapid application development of both large and small software ystems, is object-oriented by design and provides an excellent platform for learning the basics of language programming. The course will focus on planning and organizing programs, and developing high quality working software that solves real problems.


Class: I591 Graduate Internship
Instructor: David Wild
Credit Hours: 0 - 6
Synopsis: Students gain professional work experience in an industry or research organization setting, using skills and knowledge acquired in Informatics course work. May be repeated for a maximum of 6 credit hours.

Class: I699 Independent Study
Instructor: David Wild
Credit Hours: 1 - 3
Synopsis: Independent readings and research for MS students under the direction of a faculty member, culminating in a written report.

Department: School of Public and Environmental Affairs

Class: V506 Statistical Analysis for Effective Decision-Making
Instructor: TBA
Synopsis: This course provides graduate-level instruction in the application of statistical analysis to issues in public and environmental affairs and related fields. It is designed to assist students in learning the methods by which statistical analysis is carried out, as well as the basic theory that enables and constrains the application of statistics to real world data. The course emphasizes practical aspects of applying such methods, appropriately interpreting the results of these statistical analysis tools, and gaining a meaningful understanding of how statistical analysis can be misused or erroneously executed (either intentionally or unintentionally). As such, the course will address descriptive statistics, statistical inference, the nature of random variables, sampling distributions, point and interval estimation of parameters (mean, standard deviation, etc.), hypothesis testing, analysis of variance, and bivariate and multivariate regression. Although these are traditional topics for an introductory statistics course, the emphasis in V506 will be on appropriately applying these techniques and extracting meaningful information from unstructured data. Use of computer tools for carrying out statistical analysis (primarily SAS) will also be a major emphasis.


Department: Engineering

Class: E599 Cloud Computing
Instructor: Geoffrey Fox
Synopsis: The course covers all aspects of the cloud architecture stack, from Software as a Service (large-scale biology and graphics applications), Platform as a Service (MapReduce (Hadoop), Iterative MapReduce (Twister) and NoSQL (HBase)), to Infrastructure as a Service (low-level virtualization technologies. At the end of this course, you will have learned key concepts in cloud computing and enough programming to be able to solve data analysis problems on your own.

Department: Statistics

Class: S520 Introduction to Statistics
Instructor: Jianyu Wang
Synopsis: This course introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual datasets. S320 is the basic version of this course, intended for undergraduates. It is the gateway to more advanced courses offered by the Department of Statistics. S520 is an expanded version of S320 that covers additional material. S520 serves two constituencies: Graduate students in quantitative disciplines who are looking for a solid introduction to statistics and who may want to take additional courses in statistics, and graduate students pursuing an M.S. in Applied Statistics who desire a more gentle introduction to the fundamental principles of statistical inference than is provided in the more theoretical STAT S620.


Department: Information and Library Science

Class: Z534 Search
Instructor: Xiaozhong Liu
Synopsis: The success of commercial search engines shows that Information Retrieval is key in helping users find the information they seek. This course provides an introduction to information retrieval theories and concepts underlying all search applications. We investigate techniques used in modern search engines and demonstrate their significance by experiment.

Class: Z637 Information Visualization
Instructor: Katy Börner, Michael Ginda
Synopsis: Introduces information visualization, highlighting processes which produce effective visualizations. Topics include perceptual basis of information visualization, data analysis to extract relationships, and interaction techniques.

Department: Informatics

Class: I524 Big Data Software and Projects
Instructor: Gregor Von Laszewski
Synopsis: This course studies software HPC-ABDS used in either High Performance Computing or open source commercial Big Data cloud computing. The student builds analysis systems using this software on clouds and then uses it in a project either chosen by the student or selected from a list given by the instructor. Credit given for only one of INFO-I424 or I524.

Class: I526 Applied Machine Learning
Instructor: Sriraam Natarajan
Synopsis: The aim of the course is to provide skills in applying machine learning algorithms on real applications. We will focus less on learning algorithms, math and theory, and instead spend more time on hands-on skills required for algorithms to work on a variety of data sets.

Class: I533 Systems & Protocol Security & Info Assurance
Instructor: Steve Myers
Synopsis: This course looks at systems and protocols, how to design threat models for them and how to use a large number of current security technologies and concepts to block specific vulnerabilities. Students will use numerous systems and programming security tools in the laboratories.

Class: I590 Intro to Business Analytics Modeling
Instructor: Doug Blocher & Rex Cutshall
Synopsis: In this course, we develop analytical models using simulation and optimization to analyze and recommend sound solutions to complex business problems. Models are discussed to solve sophisticated problems using various tools on spreadsheets, including Excel solver for linear, integer and genetic programming problems, probabilistic simulations, and risk analysis including statistical analysis of simulation models.

Class: I590 Network Science
Instructor: Yong-Yeol Ahn
Synopsis: Networks are everywhere. We can easily find network structure in many complex systems around us: our cells, brains, society, etc.The inherent generality of network approach allowed wide applications of network theory to flourish across diverse fields including biology, sociology, and epidemiology. The questions that we will address in the class are the following: why do networks matter? What are the fundamental theories to understand the structure and dynamics of networks? How has it been applied to other fields? What are the frontiers of the research? We will explore key papers ranging from the fundamental theory to the various applications of network theory. This course will focus more on round-table discussion between students than presentation. Students will work on research projects in groups and finish a paper at the end of the class.


Class: I590 Perspectives in Data Science
Instructor: Kyle Stirling
Synopsis: This course will introduce multiple perspectives of the application of data science through recorded interviews with leaders in Silicon Valley companies, and map these to the practical skillsets of the data scientist.

Class: I590 Practice in Data Science
Instructor: Kyle Stirling
Synopsis: This course is for anyone who applies their expertise to the demands of data-driven decision making and analysis. This is not so much a course on theory as it is on the practice of delivering Data Science expertise. Even if you don’t call yourself a consultant, every time a professional attempts to provide their expertise in Data Science it is often in a situation where you do not have direct control over how it is used or the implementation. This course enables you to learn the skills required to leverage your expertise and have the biggest impact in providing value and getting your expertise used, offering students the tools they need to apply their skills in Data Science during every stage of the consulting process. It will describe and give examples of how to conduct Data Science consulting behavior that is effectively used in projects.


Class: I590 Real World Data Science
Instructor: Joanne Luciano
Synopsis: This purpose of this course is to provide Data Science graduate students with practical experience applying their data science skill sets to real-world datasets. Data for the first offering of this course in 2017 used a deidentified clinical trials dataset provided by Eli Lilly (agreement already in place with IU), but subsequent offerings could include public data or data provided by other industry partners. Students will be led through the full data analysis process of data preparation, model planning, model building, analysis, and communication of results. Students will meet (virtually or physically) daily to devise a plan.


Class: I590 SQL and NOSQL
Instructor: Ying Ding
Synopsis: A database is the central focus in data science to store and manage data. Relational databases have empowered major industries for decades and are still widely adopted. In our new era of Big Data, the database landscape is undergoing significant change. Many non-relational databases become an important part of the enterprise data architecture of companies. Relational databases were developed long before the Internet and the Web to tackle the issues of central-controlled data storage and management. NoSQL databases emerged with the rise of Internet and Web applications to connect companies with customers (i.e., online or mobile) and to develop agility to adapt to faster changes. The new challenges of being agile and being able to accommodate data variablity/data integration drove enterprises to turn to NoSQL database technology. It is important for every data scientist to master the skills of current databases and know about the future of databases in a world of NoSQL. This course aims to provide the basic overview of the current database landscape, starting with relational databases and SQL, and moving to several different NoSQL databases, such as XML database and MongoDB.


Department: School of Public and Environmental Affairs

Class: P507 Data Analysis and Modeling in Public Affairs
Instructor: TBD
Synopsis: V507 provides students of public and environmental affairs and related disciplines with a detailed, intermediate-level perspective on statistical concepts and techniques for analyzing and modeling complex systems. The course content includes estimating the parameters of such models based on existing data, testing hypotheses about these systems, and forecasting. The context of the course is the application of these techniques to problems and policies in public and environmental affairs. Multivariate regression analysis is one of the primary tools for statistical modeling for purposes of policy analysis, program evaluation, simulation of systems, and general forecasting. Thus, most of the course is devoted to single equation regression models and the extension of these models to a variety of situations. A prerequisite for the class is a graduate-level, introductory statistics course that includes coverage of the simple (two-variable) regression model and an introduction to multivariate regression.