September 28, 2018

Duke undergraduates create energy datasets and tools with wide-ranging impact

Nicholas Institute for Environmental Policy Solutions

Nearly a third of humanity lacks reliable electricity. This summer, Duke student teams deployed cutting-edge data analysis techniques to aid the search for solutions to this global challenge.

The students were part of Duke University's Data+ program, which organizes teams of undergraduates to explore new data-driven approaches to interdisciplinary challenges. Guided by Duke faculty, students learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science.

By the end of ten weeks, this summer's 24 Data+ teams had produced impressive data projects with real-world impact on a range of sectors. Two teams focused on energy:

Energy Infrastructure Map of the World

Varun Nair (E'21: Mechanical Engineering), Tamasha Pathirathna (T'20: Computer Science), Xiaolan You (T'20: Computer Science/Statistics), and Qiwei Han (PhD '18: Materials Science, MS '18: Computer Science) created a dataset of electricity infrastructure that can be used to automatically map the transmission and distribution components of the electric power grid using machine learning. This is the first publicly available dataset of its kind, and will be analyzed during the academic year as part of a Bass Connections team.

Maximizing Data Communication for Faster Energy Access

Brooke Erickson (T'20: Economics/Computer Science), Alejandro Ortega (T'19: Mathematics and Computer Science), and Jade Wu (UNC '20: Computer Science) developed open-source tools for automatic document categorization, PDF table extraction, and data identification. While their tools can be used for a range of applications, they were designed to be used with Power for All's Platform for Energy Access Knowledge, and students frequently collaborated with professionals from that non-governmental organization.

Student participants on both teams not only honed data analysis skills but also learned to be more effective collaborators. Tamasha Pathirathna (T'20) shares, "Over the course of the project, I learned the importance of sharing ideas and listening to my team members. I also learned how heavily machine learning models rely on a good dataset to get significant results."

The teams met in an innovation space called "The Generator," which was redesigned in 2017 as a space outfitted for interdisciplinary learning and team-based problem-solving, with WiFi-based screencasting, modular video screens, and furniture that is easily rearranged. 

Opportunities to produce tangible results and link research, policy, and real-world impact helped attract students to the program. "I hope that our research will help to close the information gap between policymakers and energy researches by creating better research databases," says Brooke Erickson (T'20). Qiwei Han (PhD/MS '18) agrees: "I hope our dataset and research can provide more resources for researchers and scientists in the machine learning and energy fields. I also hope our results can attract more people to focus on or pay attention to machine learning and energy fields to better empower our society."

The students also recognized machine learning's potential to address complex issues like energy access. The International Energy Agency defines energy access as "a household having reliable and affordable access to both clean cooking facilities and to electricity, which is enough to supply a basic bundle of energy services initially, and then an increasing level of electricity over time to reach the regional average." So how does machine learning relate to household energy access? Varun Nair (E'21) explains:

One of the major takeaways from my experience this summer in Data+ was realizing how important of issue energy access is and how difficult of a problem it can be to tackle. Fourteen percent of the world's population is still unelectrified, however, most of those populations live in areas that are remote, poor, or undeveloped. Understanding where those people are through the development of machine learning algorithms is a task that is easier said than done, as my team and I found out. The more we learned, the more unknowns we found. Yet, none of these questions are impossible to answer – it just takes diligent work. This is an important principle that I think our team learned and future research teams should keep in mind as they continue.

Energy access has global implications for economics, health, and more, but Nair also recognizes more personal impacts: "There are hundreds of millions of people who lack access to a critical resource [energy] that most of us take for granted. However, if our work results in just one more person with access to electricity, it will have been worthwhile."

Both teams' research efforts contribute to the goals of Duke's Energy Access Project, a new research and policy effort that aims to address the challenges around increasing access to modern energy solutions to underserved populations around the world. Key Duke collaborators in this effort include the Nicholas Institute for Environmental Policy Solutions, the Duke University Energy Initiative, the Sanford School of Public PolicyBass Connections, and the Nicholas School of the Environment.

The students hope that their work will useful for solving a variety of challenges beyond energy accesss. "One of the major takeaways for me was working on a real software project on data problems that are ubiquitous. The work we did in this project is important to improving energy access, but it's also relevant to other fields where data needs to be accessible," affirms Alejandro Ortega (T'19).

Dr. Kyle Bradbury is managing director of Duke's Energy Data Analytics Lab, a collaborative effort of the Duke University Energy Initiative (which houses it), the Information Initiative at Duke (iiD), and the Social Science Research Institute (SSRI). As project lead for "Energy Infrastructure Map of the World," Bradbury shares: "Our Data+ team worked hard this summer taking the critical first steps in a research project with the potential to impact energy transitions decisions and research."

He adds, "We've already started using the dataset the team created to investigate how we can use it to automate the process of identifying electricity transmission infrastructure in satellite imagery through machine learning. I'm thankful for and proud of the Data+ team for their important contribution and the positive spirit they brought to the Data+ community."

The linkages between Data+, Bass Connections, and the Energy Data Analytics Lab have had a formational impact on students' career aspirations in data science. Xiaolan You (T'20) shares: "I first became interested in applications of machine learning as part of the Bass Connections team for the Energy Data Analytics Lab, just before this summer. I [became] immersed into it further during Data+, and have recently decided to go to graduate school to learn more in this area."

Data+ is housed by the Information Initiative at Duke. This summer program is just one example of how Duke University is developing interdisciplinary thinkers capable of applying cutting-edge data science techniques to some of the world's most formidable challenges... including energy. What's more, students have the opportunity to innovate and make meaningful contributions to research, policy, and implementation even before they leave Duke's campus.

Submit a Data+ project

Data+ is currently accepting proposals from faculty and partners for 2019 Data+ projects. The deadline for completing this application is ‚ÄãNovember 5th, 2018, at 5 p.m. Please email your completed application to Ariel Dawn ( If you would like help in developing your proposal, please contact Paul Bendich at ‚Äã‚Äã.

The Energy Initiative's Energy Data Analytics Lab is seeking partners for future projects. We'd welcome a chance to talk with professionals about how a Duke University team could contribute to upcoming energy data projects at your company or agency. Participation in our student teams is highly competitive among Duke undergraduates with strong quantitative skills.

Contact Energy Data Analytics Lab managing director Dr. Kyle Bradbury for more information.