Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
April 16, 2025
|
|
|
|
Earth Day, held each year on April 22, is a global event with local roots. In 1970, Wisconsin Senator Gaylord Nelson envisioned an event that would galvanize energy on college campuses, raising public awareness of air and water pollution. 55 years later, Earth Day events across the globe are calling for innovative solutions to environmental challenges. Data science can play an important role in understanding these challenges and finding these solutions. At UW–Madison, data science research, tools, and learning opportunities are helping realize the vision of Earth Day.
|
|
Planning for the 3rd annual Machine Learning Marathon (MLM25) is underway! This fall’s 12-week hackathon, September 11–December 11, will bring together teams of all skill levels to tackle ML/AI challenges drawn from research, industry, and open-source projects. We’re looking for organizers and volunteers to shape this year's projects, advise teams, and provide general administrative support. Want to get involved? Fill out the MLM25 organizing form by May 1st. More details—including the event schedule—are up on the MLM25 webpage. Check back in August for registration information!
|
|
|
|
Data Science Hub, Institute Thank Student Employees
Student employees are an invaluable part of the Data Science Hub and Data Science Institute. Emma and Olivia work behind the scenes on this newsletter, workshops, events, websites, office tasks, and other efforts that connect the community with data science at UW–Madison. Emma, a junior majoring in biochemistry, has been working at DSI since her freshman year. Olivia, a first-year law student, has worked at the Hub since September. Thank you, Emma and Olivia! We appreciate your many contributions to data science at UW.
|
|
|
|
|
April 18, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop is for researchers interested in using open source tools to analyze data. Students will apply the skills they learned at the Linux Basics workshop. Students will identify single nucleotide polymorphisms from NGS data using Linux command-line driven open source software and explore the results using graphical visualization tools.
|
|
|
|
April 19, 11:00 a.m. - 12:30 p.m.; Zoom. From “big data” to ChatGPT, text analysis and large language models are changing how we learn and research. Join UW-Madison Libraries to learn about Constellate, a platform for learning and performing text analysis, building datasets, and accessing analytics course materials suitable for self-paced learning or classroom use.
|
|
|
|
April 28 - April 30, 9:00 a.m. - 12:30 p.m.; Zoom. This online workshop covers the basics of natural language processing, API usage, data preparation, document/word embeddings, topic modeling, Word2Vec, transformer models using Hugging Face, and ethical considerations. Participants should have experience with Python. Students and researchers working in the digital humanities are especially encouraged to attend!
|
|
|
|
May 2, 9:30 a.m. - 4:30 p.m.; 1360 Biotech center. The workshop will provide a hands-on introduction to software and analysis pipelines for RNA-Seq. Participants will learn the computational process that takes the raw data through the high level analysis. We will focus on advanced analysis using a Linux command line environment to run open-source RNA-Seq software.
|
|
|
|
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Tuesdays and Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com.
|
|
|
|
TODAY April 16, 12:30 p.m. - 1:30 p.m.; Orchard View Room, Discovery Building & Zoom. Dr. Weijie Su, associate professor in the Wharton Statistics and Data Science Department, advocates for developing rigorous statistical foundations for LLMs. Dr. Su will illustrate how statistical insights can benefit LLM development and applications, address pressing challenges in LLMs, and illuminate new research avenues for the broader statistical community to advance responsible generative AI research.
For those who have not signed up to attend in-person, please refrain from taking pizza, as catering is arranged beforehand.
|
|
|
|
TODAY April 16, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Zhimei Ren, assistant professor of statistics and data science at the University of Pennsylvania, will introduce a general framework that boosts the power of e-BH without sacrificing its false discovery rate (FDR) control under arbitrary dependence. Many experiments show that their proposed method significantly improves the power of e-BH while controling the FDR.
|
|
|
|
April 17, 12:00 p.m. - 1:30 p.m.; Zoom. Join a panel of instructors as they review the opportunities and challenges they encountered this past year using generative AI in the classroom. This will be an open conversation about lessons learned, focusing on the future of using generative AI in teaching and learning.
|
|
|
|
April 17, 5:00 p.m. - 8:30 p.m.; campus. Y Combinator, technology startup accelerator and venture capital firm, is excited to talk to UW–Madison students studying technical subjects like CS, math, and engineering who may not have thought much about doing a startup, but are curious to learn more. A reception will follow the talk. Registration is required.
|
|
|
|
April 18, 10:00 p.m. - 11:00 p.m.; Orchard View Room, Discovery Building. Dr. Bryan Plummer, assistant professor of computer science at Boston University, will discuss the training channel-adaptive model, how we can take advantage of self-supervised techniques, and how to mitigate the effect of label noise, which is important given the high levels of noise in scientific data sets and the significant energy and storage burdens.
|
|
|
|
April 18, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. Dr. Amey Bhangale, assistant professor of computer science at the University of California, Riverside, will present his recent work on advances in solving CSPs. Dr. Bhangale will highlight applications of the developed techniques in areas such as complexity theory, property testing, and additive combinatorics.
|
|
|
|
April 21, 12:00 p.m. - 1:00 p.m.; 726 WARF. Dr. Nigel Paneth, professor in the Departments of Epidemiology & Biostatistics and Pediatrics & Human Development at Michigan State University, will discuss the impacts of discoveries in molecular genetics. While the application of molecular genetics to microbial agents has made important contributions, the claim that human genomic information would lead to large-scale improvements in health has not been proven. Improvements in public health have mostly been the result of epidemiological studies.
|
|
|
|
April 21, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Jingyi (Jessica) Li, statistics professor at UCLA, introduces a novel approach called Synthetic Null Parallelism (SyNPar), which controls the false discovery rate (FDR) in high-dimensional feature selections while preserving the original data. SyNPar is straightforward to implement and can be applied to various statistical models. SyNPar outperforms state-of-the-art methods in FDR control, power, and computational efficiency.
|
|
|
|
April 24, 11:00 a.m. - 11:50 a.m.; Van Vleck 901 Hall. Every day, your phone computes the Fourier Transform (FT) millions of times using the Fast Fourier Transform (FFT). Dr. Shamgar Gurevich, math professor at UW-Madison, takes a new approach and views FFT as the composition of two operators via a third, lesser-known arithmetic space—offering a natural answer to how we understand functions on the set {0,1,…, N–1}.
|
|
|
|
April 24, 12:00 p.m. - 2:00 p.m.; 6191 and 4246, Helen C. White Hall. Join the Information School Speaker Series with Dr. Shion Guha, assistant professor at the University of Toronto. Dr. Guha will present his work on AI decision-making in public higher education systems through the lens of human-centered data science. Lunch will be served in 4246 following the talk.
|
|
|
|
April 25, 9:30 a.m. - 4:15 p.m.; Union South. The Undergraduate Symposium is an annual event where undergraduate students from all areas of study showcase and celebrate their research, scholarly pursuits, service-learning, community-based research, art, and creativity. The symposium is also a forum for the campus community to learn about and engage with undergraduate work.
|
|
|
|
April 25, 2:25 p.m. - 3:25 p.m.; 901 Van Vleck Hall. Dr. Bernardo Cockburn, math professor at the University of Minnesota, explains that the continuous and discontinuous Galerkin methods for ordinary differential equations discretize the time derivative in the same form. Dr. Cockburn discusses the consequences and his ongoing work on extending his result.
|
|
|
|
May 1, 8:00 a.m. - 1:00 p.m.; Union South & Zoom. Register for and join UWEBC at the Wisconsin Digital Symposium, a high-impact event for tech and business leaders navigating today’s fast-moving digital landscape. From AI-powered automation to product-led transformation, you’ll gain strategies that make a difference—plus insights from keynote speakers Dr. Michael Proksch, global AI expert and Chief Scientist at AccelerEd, and Cortney Thompson Rowan, EVP of Strategy & Design at Delve.
|
|
|
|
May 2, 2:00 p.m. - 3:00 p.m.; 175 Science Hall. The Geospatial Data Science Seminar will host Dr. Gengchen Mai, Director of the Spatially Explicit Artificial Intelligence (SEAI) Lab, University of Texas at Austin. Dr. Mai presents several recent works from the SEAI Lab about spatial representation learning, including various location encoding models, an SRL deep learning framework (TorchSpatial), and a geo-foundation model (GAIR).
|
|
|
|
May 7, 9:00 a.m.- 11:00 a.m.; 1145 Discovery Building. Join the ML + X community to discuss "Evaluating PubMed abstracts using LLMs with Retrieval Augmented Generation (RAG)." ML + Coffee is also seeking volunteers to discuss or get help with an ongoing project, showcase an ML/AI tool, or discuss an ML/AI related paper. No formal presentation is required—this event prioritizes open dialogue and casual discussion over formal presentations . Use the ML+Coffee volunteer form to sign-up for May 7th.
|
|
|
|
May 30, Discovery Building. This retreat is required for CIBM trainees and Biomedical Data Science PhD students in year two and later. Hear from keynote speakers David Page, Professor and Chair of Biostatistics & Bioinformatics at Duke University, and Sriraam Natarajan, Professor of Computer Science at the University of Texas at Dallas. Showcase your research and network with our community by presenting a poster. Register for the retreat and submit poster abstracts by May 9th.
|
|
|
|
June 2-6, 2025; Fluno Center & Zoom. HTC25 brings together researchers, campus, science collaborations, facilitators, administrators, government representatives, and professionals interested in high throughput computing to:
- Engage with the throughput computing community, including the OSG Consortium, Center for High Throughput Computing, HTCondor staff, PATh and Pelican teams, and others contributing to HTC
- Be inspired by presentations and conversations with community leaders and contributors sharing common interests
- Learn about HTC and new developments to advance your science, your collaboration, or your campus
|
|
|
|
|
|
|
DATA VISUALIZATION OF THE WEEK
|
|
|
|
How is the coming Humanity-Plus-AI future likely to affect key aspects of humans’ capacity and behavior by 2035 as compared to when humans were not operating with advanced AI tools? 301 respondents were asked to predict for each of 12 capacities and behaviors whether the overall change for humans by 2035 would be mostly positive, mostly negative, fairly evenly positive and negative, or little to none. Experts expressed concerns around decreasing attention spans, lower emotional intelligence, and increased reliance on AI, among others.
|
|
|
|
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub. This newsletter was originally created by the Data Science Hub and published as Hub Updates.
Use our submission form to send us your news, events, opportunities and data visualizations for future issues.
|
|
|
|
|