We are living through the datafication of our society and its impacts are broad and deep. Everything from how much credit we can access, to how doctors diagnose and treat us, to what we pay for airline tickets, to our dating website matches, are influenced by our growing ability to draw conclusions from raw data. The rise of supercomputers, artificial intelligence, cloud computing and web-enabled devices dramatically improve how we capture, analyze, share and make inferences from large, complex data sets. The value proposition for organizations is massive, which is why more and more are racing to embrace data-driven decision-making to stay competitive.
Over the last decade, several universities across Canada have invested significant resources into studying and applying the principles and processes of data science, and preparing students for what Harvard Business Review called “the sexiest job of the 21st century.”
One example is the University of British Columbia’s Data Science Institute (DSI), which is conducting leading-edge data science research with a particular focus on biomedicine. Of the 16 studies that have emerged from the DSI across its almost five-year history, 11 have focused on various aspects of understanding, diagnosing and treating human illnesses such as cancer, Alzheimer’s disease, chronic obstructive pulmonary disease and autism.
The biomedical focus was influenced by the expertise of its founding director, Raymond Ng, who has studied data mining for the last two decades, much of it focused on health informatics. Dr. Ng, who holds the Canada Research Chair in Data Science and Analytics at UBC, says the DSI funds each project for 18 months, and then investigators are typically able to attract additional funding from external sources such as the federal research granting councils and healthcare-focused agencies and foundations.
Read also: Making sense of Big Data
In one DSI project, two UBC professors – one in statistics, one in medicine – and a DSI postdoctoral fellow sought to better understand tuberculosis transmission in B.C. The province sees about 250 cases of TB per year. In 2018, they began collaborating with the BC Centre for Disease Control, gaining access to centralized epidemiological, demographic and clinical data for every individual diagnosed with TB in the province from 2005 to 2014 – approximately 2,300 cases. The goal of this study is to explore how to better predict outbreaks and undiagnosed infections.
The researchers have already developed a computer algorithm that can correctly flag positive TB results in lab reports more than 90 percent of the time. “It can be argued that that’s better than a human. … We’re just testing the water to see how good the natural language processing techniques are,” says Dr. Ng, referring to the ability of their computer program to understand human language.
Data science for the social good
Another side of the DSI’s work involves applying data science methods to solve societal problems. For the last three summers it has run Data Science for Social Good, a 14-week program in which 16 selected undergraduate and graduate students from various disciplines work together in small groups in partnership with public organizations, applying data analysis techniques to social and environmental issues. Past projects have focused on helping energy regulators be more responsive to Indigenous communities, informing municipal strategy on electric vehicles, and making biodiversity data more accessible.
At the University of Waterloo, conducting data science for social good is the priority of Lukasz Golab, director of the Data Science Lab in the department of management sciences, and a Waterloo professor since 2011 who is cross-appointed to the school of computer science. Dr. Golab, who holds the Canada Research Chair in Data Analytics for Sustainability, is exploring how we can employ intelligent infrastructure and data analytics to reduce our use of water and energy, and adopt more green technologies. Since he established the lab in 2015, he has expanded its definition of social good beyond the parameters of sustainability to include gender equity and public health.
“There have been many success stories for using data to solve business problems and monetize applications, so why can’t we create new success stories about using data for social good?” says Dr. Golab. “To me, it’s an interesting challenge to take problems affecting society and understand the role of data science in solving them.”
One subject that has long interested Dr. Golab is the persistent gender gap in engineering programs. Women currently make up just 19 percent of engineering undergraduates in Canada, and most universities in the country are aiming to boost those numbers, including U of Waterloo. Dr. Golab saw that he could use data science to help advance the university’s efforts to attract and retain female engineering students.
In a study completed in 2018 involving four student researchers, Dr. Golab used data science methods to analyze more than 30,000 applications to U of Waterloo’s undergraduate engineering programs – particularly the section where they explain why they want to study engineering. The researchers used syntactic and semantic analysis software to identify differences between the male and female applicants in their motivation, interests and background. “Hopefully, the university can use these results to make our outreach programs more effective,” says Dr. Golab.
Queen’s University, in partnership with McGill University, is investigating another side of data science through the Conflict Analytics Lab, a research-based consortium applying data science and machine learning to dispute resolution. According to an article on the Queen’s website, the lab “brings together more than 30 lawyers, technology experts and the business community to provide both citizens and businesses with the tools they need to resolve small cases in a fair way,” says Samuel Dahan, a Queen’s law professor and director of the lab.
A particular project has acquired the data from 3,000 employment law cases, with the aim of creating an application to help laid-off Canadian employees receive fair severance from their employer. “This is an exciting interdisciplinary collaboration, harnessing big data to help these individuals better understand their rights and determine their next steps,” says Dr. Dahan.
University-industry partnerships
Of course, the immense power of data science also holds much potential for the world of business. Companies of all stripes are partnering with universities to help them make sense of their vast amounts of data so that they can operate more efficiently, identify business trends, and better anticipate and respond to their customers’ needs. In Quebec, Université de Montréal, HEC Montréal and Polytechnique Montréal joined together in 2016 to establish the Institute for Data Valorization (known by its French acronym, IVADO). Supported by $94 million in funding from the Canada First Research Excellence Fund, the institute helps create connections between data science researchers and industry partners.
IVADO research projects – which usually involve the participation of undergraduate and graduate students – are large in number and variety. There are 43 projects focused on fundamental research in areas such as the links between AI and neuroscience, energy efficiency and personalized medicine. There are a further 250 collaborative research endeavours solving the problems and supporting the operations of organizations in four main sectors: energy, transportation and logistics, business and finance, and health.
Since its founding, IVADO has evolved into a prime mover in Montreal’s dynamic artificial intelligence research cluster. It links six world-class research centres and some 20 academic partners with more than 100 companies, institutions and government agencies, leading to partnerships that have facilitated the research of 1,400 scientists globally in data analytics, machine learning and operations research. The institute is run by a staff of 40 and manages a quarter billion dollars in funds made available by the federal and provincial governments, industry members and the three founding universities. In addition to being a research broker, IVADO also offers multiple scholarships, funds three research chairs in diversity and equity in data science, and delivers accessible community workshops and online courses in data science.
“Our society is undergoing a digital transformation, and we need to know how to extract value from data,” says Gilles Savard, CEO of IVADO. “We need a lot of people who are fluent in digital algorithms, because that’s what will be required by the market.”