The amount of data flowing into the university increases on a daily basis. Advanced technology such as the university’s new Krios cryo-electron microscope and facilities like the Yale Center for Genome Analysis have the capacity to produce vast quantities of information. Reflecting that dramatic expansion and the growing need for teaching and research in data science, the Department of Statistics has been renamed the Department of Statistics and Data Science (DS2).
“The basic ideas of statistics and data science are becoming almost a core competency for citizenship in this century,” said Alan Gerber, dean of social sciences for the Faculty of Arts and Sciences (FAS) and the Charles C. & Dorathea S. Dilley Professor of Political Science. “These are things everyone will find useful.”
Data science combines traditional statistics with machine learning, data mining, and high-performance computing. It includes the generation and gathering of information, its management and analysis, and its use in making decisions and setting policy in the sciences, social sciences, humanities, medical sciences, and the arts. The DS2 department and its associated undergraduate and graduate courses will reflect this broad scope, with departments like sociology and astronomy offering courses in the major.
Over the next several years, Yale’s FAS will make as many as nine new appointments as part of the DS2 initiative. Up to three new faculty members will have full DS2 positions and help form the core of the expanded department; up to six scholars will hold joint appointments in DS2 and another department within FAS, spanning the natural sciences and engineering, social sciences, and humanities. These will be researchers who are at the forefront of applying data science to their specific discipline.
“We are building bridges between many disciplines and departments,” says Harrison Zhou, who has been chair of the Department of Statistics since 2012 and now is chair of the new department. “Data science is about more than just big data. It is about collaboration, analysis, and policy.”
“Becoming DS2 is important because it means we'll be more interdisciplinary,” says graduate student Natalie Doss. “Collaborating with people from very different perspectives means you can come up with better solutions to problems. Being a student from a data science department will help me on the job market: Universities and companies want people who can think about not just one aspect of data, but all the aspects,” she adds. Her own research looks at problems related to clustering, “which has applications in so many branches of science and social science.” Clustering involves grouping data in such a way that data in the same group (called a cluster) are more similar to each other than to those in other groups.
“The name change signals the department's commitment to broadening its research strengths and aligning the department with statistics in real world applications,” adds graduate student Derek Feng. Feng is working on social network analysis, which he calls “the poster child of interdisciplinary research, as it requires a combination of methodological, sociological, computational, and statistical innovations.” He studies “the relationship between positive and negative ties in a social network (friend/enemy), whether or not sociologists were right in their theories, and how we can perhaps do better in modeling such relations.”
Leaders in the effort to broaden Yale’s approach to data science were Zhou; Daniel Spielman, the Henry Ford II Professor of Computer Science and Statistics and Data Science; Associate Professor of electrical engineering and statistics Sekhar Tatikonda; Dean of the Faculty of Arts and Sciences Tamar Gendler; Dean of the School of Engineering and Applied Sciences T. Kyle Vanderlick; and Dean Gerber.
The DS2 initiative comes at a time when Yale is deepening its commitment to data science on a number of fronts:
• The Yale Center for Research Computing, established in 2015, moved into expanded facilities on Science Hill in December, increasing its capacity to help faculty and staff address the complex challenges of storing and analyzing huge volumes of data.
• The Yale School of Medicine has several data science-related projects, including the Yale Center for Outcomes Research and Evaluation (CORE), the new Center for Biomedical Data Science, and the Yale Open Data Access (YODA) Project.
• Yale recently signed an agreement with Amazon Web Services that will open the door for Yale faculty to do cloud-based research using HIPAA information.
• The Digital Humanities Lab in the Yale University Library functions as a campus hub for digital humanities research and teaching.
“This is just the beginning,” Spielman says. “We don’t exactly know where data science is going to go, but we know we want to be out in front of it, leading the way.”