Connect with us: Website | X | Community Map | Submit News
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
April 2, 2025
 

6th Annual Data Science Research Bazaar Explores the Promise and Limits of AI and ML

Thank you to the 150 people who participated in the 2025 Data Science Research Bazaar, held at the Discovery Building on April 19-20. Highlights included 58 posters and exhibitor tables at the poster session, 21 lightning talks, and inspiring opening speakers and closing panelists. With a focus on opportunities and boundaries in artificial intelligence and machine learning, this year’s event provided time and space for conversations about how these technologies are impacting research. The Research Bazaar is hosted by the Data Science Hub and a planning committee representing departments and programs across campus. Video recordings of selected sessions will be available soon at the Research Bazaar website.

Apply for an Open Source Summer Internship by April 7

The Open Source Program Office (OSPO) internship program provides opportunities for UW–Madison and Madison College students to participate in open-source software projects. Undergraduate and graduate students from all majors are eligible. Interns learn crucial skills related to managing open-source projects and growing software user communities by either working with a mentor or developing a proposal for a self-directed project. In addition to an initial training session, interns participate in weekly check-ins with OSPO. Learn more and apply at the OSPO website. Applications close Monday, April 7.

Text Analysis Workshop

Join the Text Analysis Carpentries workshop on April 28-May 1, 9 a.m.-12:30 p.m., for a practical Introduction to natural language processing (NLP). This online workshop covers NLP basics, API usage, data preparation, document/word embeddings, topic modeling, Word2Vec, transformer models using Hugging Face, and ethical considerations. Participants should have experience with Python. Students and researchers working in the digital humanities are especially encouraged to attend! Learn more and register on the Text Analysis workshop webpage.

WORKSHOPS AND TRAININGS

Microbiome Analysis Using QIIME2

April 4, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop will cover amplicon-based microbiome analysis using the QIIME2. The workshop will include lectures and hands-on training to analyze raw dataset through publication-quality statistics and visualizations. Participants should have a basic knowledge of the Linux command line (bash shell) and a basic understanding of statistical methods. To register, visit the QIIME2 workshop page.

R Programming: Organizing Your Projects with GitLab + RStudio

April 4, 10:00 a.m. - 12:30 p.m.; Zoom. Do you have lots of versions of files/scripts in a folder and want to better organize them? You need a formal version control software tool called Git! This workshop teaches learners to use RStudio and Git to keep track of file versions, switch back to old versions of a file, host version controlled files on the campus GitLab instance, and synchronize your files between different computers. No command line needed! A working knowledge of R and RStudio would be helpful for you to get the most out of this session. To register, visit the R Programming: Organizing Your Projects with GitLab + RStudio webpage.

Introduction to Databases with SQL

April 7, 5:30 p.m. - 7:00 p.m.; 2257 College Library. Structured Query Language (SQL) is a programming language used to interact with databases and create datasets. SQL is a useful tool for data analysts, software developers, and researchers to create and structure data for analysis and modeling. This workshop teaches basic SQL queries and syntax, and how to create tables, perform analysis, and export tables. To register, visit the SQL calendar listing.

Getting Started with High Throughput Computing

April 8, 10:30 a.m. - 12:00 p.m.; Orchard View Room, Discovery Building & Zoom. New to high throughput computing or need a refresher? This workshop will introduce the Center for High Throughput Computing's (CHTC) High Throughput Computing system, components of a job, and how to handle data. Participants will practice transferring data, submitting jobs, and debugging simple job errors. A basic understanding of the command line is recommended. To register, visit the Getting Started with High Throughput Computing calendar listing.

R1: Basics of Data Management with R

April 8, 5:30 p.m. - 7:30 p.m.; 2257 College Library. R is a free, open-source software and programming language for statistical computing and graphics. This workshop will introduce R and RStudio. Participants will learn about basic syntax, vectors, matrix and data frames, and how to import, work with, chart, and plot data. To register, visit the R1 calendar listing.

Analysis of QIIME2 Biome Results

April 11, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop is a follow-up of the April 4th “Microbiome analysis using QIIME2” workshop. The R package “qiime2R” will be used to convert data files exported by QIIME2 for further analysis and graphical exploration. Result from the previous workshop will be used to demonstrate basic analysis of microbiota data to determine if and how communities differ by variables of interest using R. Prerequisites include a basic understanding of R, statistical methods, and Qiime2. To register, visit the Analysis of QIIME2 Biome Results workshop webpage.

Intro to NGS Data Analysis

April 18, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop is aimed at researchers interested in using open source tools for analyzing Next Generation DNA Sequencing (NGS) data. After learning essential techniques of the Linux operating system with the bash shell in the Linux Basics workshop, students will apply these newly acquired skills and identify single nucleotide polymorphisms from NGS data using Linux command-line driven open source software and explore the results using graphical visualization tools. To register, visit the Intro to NGS Data Analysis workshop webpage.

Using Constellate for Text Analysis and LLMs

April 19, 11:00 a.m. - 12:30 p.m.; Zoom. From “big data” to ChatGPT, text analysis and large language models are changing the way we learn and do research. Join UW-Madison Libraries to learn about Constellate, a platform for learning and performing text analysis, building datasets, and accessing analytics course materials suitable for self-paced learning or classroom use. To register, visit the Using Constellate for Text Analysis and LLMs webpage.
 
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Tuesdays and Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com
 
 

SEMINARS AND EVENTS

SILO Seminar: Reinforcement Learning and Bayesian Optimization for Nuclear Fusion

TODAY April 2, 12:30 p.m. - 1:30 p.m.; Orchard View Room, Discovery Building & Zoom. Dr. Jeff Schneider, computer science research professor at Carnegie Mellon University, will discuss nuclear fusion, which holds the promise of limitless clean energy and would solve many of the world’s challenges. AI approaches have become increasingly capable, making them an appealing option for nuclear fusion. However, reinforcement learning (RL) is less successful in stochastic and partially observed problems, and RL and Bayesian optimization struggle when given only a few experiments. Dr. Schneider will present several algorithmic innovations to address these issues.

For those who have not signed up to attend in-person, please refrain from taking pizza, as catering is arranged beforehand. For more information, view the full abstract on SILO's upcoming talks page.

Using Bayesian Nonparametric Ideas and Spatial Statistics for Earth Sciences Applications

TODAY April 2, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Veronica Berrocal, statistics professor at UC Irvine, will present two papers that incorporate and revisit ideas proposed in the Bayesian nonparametrics literature in a spatial context to address problems in Earth Sciences. First, Dr. Berrocal will discuss how soil moisture influences land processes and will propose a statistical model to learn about the scale of dependence and variability in the spatial process from observed data. Second, Dr. Berrocal will share useful information to design air pollution mitigation strategies. To read the full abstract, visit the Using Bayesian Nonparametric Ideas and Spatial Statistics for Earth Sciences Applications calendar listing.

Building Novel Abstractions for a Declarative Cloud

April 3, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences & Zoom. Tianyu Li, a final-year PhD student at MIT, will describe Resilient Composition, a new abstraction that ensures fault-tolerance in applications composed from independent, distributed components. The key insight is to rely on atomic, fault-tolerant “steps” that span component operations and messages. Li will present DARQ, an efficient execution engine for such steps, and Distributed Speculative Execution, a transparent optimization that dramatically reduces overhead of Resilient Composition. To learn more, visit the Building Novel Abstractions for a Declarative Cloud calendar listing.

iSchool Speaker Series - Kathleen Creel

April 3, 12:00 p.m. - 1:00 p.m.; 6191 and 4246, Helen C. White Hall. Join Dr. Kathleen Creels, assistant professor of philosophy and computer sciences at Northeastern University, for a talk on algorithmic monoculture and systemic exclusion. Dr. Creels will formalize a measure of outcome homogenization, describe experiments that demonstrate that it occurs, and present an ethical argument for why and in what circumstances outcome homogenization is unfair. Lunch, on a first come first served basis, will follow the talk in 4246. To learn more, visit the iSchool Speaker Series calendar listing.

Do you Interpret your t-SNE Embeddings Correctly?

April 4, 12:00 p.m. - 1:00 p.m.; Auditorium, Genetics-Biotechnology Center Building. Dr. Yiqiao Zhong, assistant statistics professor at UW-Madison, will present evidence that embedding maps of t-SNE, UMAP and LargeVis can exhibit discontinuity points, leading to unintended topological distortions. Dr. Zhong will introduce the leave-one-out surrogate that captures the properties of embedding maps, 2 types of discontinuity patterns, and two methods to mitigate these issues that help detect out-of-distributions samples in deep learning and assist hyperparameter tuning in single-cell data analysis. To learn more, visit the Biostatistics and Medical Informatics Department Seminar calendar listing.

Steering Machine Learning Ecosystems of Interacting Agents

April 7, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. Meena Jagadeesan, PhD student in computer science at UC Berkeley, will discuss her research on characterizing and steering ecosystem-level outcomes. Jagadeesan takes an economic and statistical perspective on ML ecosystems, tracing outcomes back to the incentives of interacting agents and to the ML pipeline for training models. To learn more, visit the Steering Machine Learning Ecosystems of Interacting Agents calendar listing.

Geography of Surface Bundles Over Surfaces

April 7, 3:00 p.m. - 3:50 p.m.; B131 Van Vleck Hall. Dr. Inanc Baykur, professor at the Department of Mathematics and Statistics, University of Massachusetts, presents a problem for surface bundles: how to determine which fiber and base genera are examples with positive signatures. Dr. Baykur will describe his team's recent progress, which resolves the problem for all but 19 cases, and explain how some open cases relate to major questions in symplectic geometry. To learn more, visit the Group Actions and Dynamics Seminar calendar listing.

Climate Impacts on Infectious Disease: Past, Present and Future

April 8, 1:00 p.m. - 2:00 p.m.; Zoom. The climate is expected to play a driving role in the transmission of many infectious diseases with implications for climate change. However, the impact of the climate on directly-transmitted diseases, such as COVID19 and influenza, is less well studied. Join Dr. Rachel Baker, John and Elizabeth Irving Family Assistant Professor of Climate Health at Brown University, for a discussion on statistical inference, mechanistic modeling, and climate change projections to characterize the climate drivers of directly-transmitted disease outbreaks, and explore the future risk from climate change, while accounting for variability and uncertainty. To learn more, visit the CPEP Seminar event webpage.

Leveraging Gale Digital Scholar Lab for Engaging Narratives

April 8, 1:00 p.m. - 2:00 p.m.; Zoom. Join the Libraries for a webinar featuring Dr. Sarah Ketchley, senior digital humanities specialist, Egyptologist, and faculty member at the University of Washington. Dr. Ketchley will explore how to leverage the tools in Gale Digital Scholar Lab to create StoryMaps - a digital tool that combines maps, multimedia, and narrative to create interactive and engaging storytelling experiences. To register, visit the Leveraging Gale Digital Scholar Lab for Engaging Narratives event listing.

On Virtually Unwrapping the Herculaneum Scrolls

April 8, 4:00 p.m. - 5:30 p.m.; 160 Elvehjem Building. Discover how cutting-edge technology is unlocking history’s lost texts. The School of Computer, Data & Information Sciences, and the Center for the History of Print and Digital Culture welcome Brent Seales, UW–Madison computer sciences alumnus and University of Kentucky professor, to share his pioneering work on virtual unwrapping—an advanced AI-driven technique that is reading the Herculaneum scrolls for the first time in 2,000 years. To register, visit the RED talk webpage.

CHE Environmental Colloquium – Mechanical Angels: Satellites, Cosmology, and Capitalism

April 9, 12:00 p.m. - 1:00 p.m.; 140 Science Hall. Join Dr. Frédéric Neyrat, English professor at UW-Madison, for a discussion about satellites. Satellites' orbits form an exo-geological layer, a deterritorialization of the planet Earth, where the limits and possible futures of technology are tested. Dr. Neyrat will shed light on these futures: either satellites will remain essential elements for the development of AI capitalism (surveillance, control, and prediction of behavior) or they might become Earth’s peripheral guardian angels. To learn more, visit the CHE Environmental Colloquium event webpage.

UW Carbone Cancer Center's Annual Research Retreat

April 10, 9:00 a.m. - 5:30 p.m., Health Science Learning Center Atrium. Join trainees, administrators, researchers, clinicians, and faculty from the Carbone Cancer Center to be the driving force for research, prevention, and treatment initiatives critical to defeating cancer in Wisconsin and around the world. To register and view the agenda, visit the research retreat webpage.

Autoregressive Networks

April 14, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Eric Kolaczyk, director of the Program in Statistics at Boston University, proposes an autoregressive framework for modelling dynamic networks with dependent edges. It encompasses models that accommodate transitivity, density-dependent, and other stylized features observed in real network data. Due to many parameters, initial maximum likelihood estimations may converge slowly. An improved estimator is proposed based on an iterative projection to reduce parameter interference.To learn more, visit the Autoregressive Networks calendar listing.

Surveillance States: How AI Reshapes Borders, Refugee Lives & Human Rights

April 14, 4:00 p.m. - 5:00 p.m.; 206 Ingraham Hall & Zoom. Investigative journalist Lydia Emmanouilidou will explain how AI and surveillance technologies are transforming border management and impacting the lives of refugees. From her reporting experience in refugee camps in Greece and borders in Europe and the U.S., Emmanouilidou will reveal how cutting-edge technologies are used to "manage" vulnerable populations. To register, visit the Surveillance States event webpage.

2025 Geospatial Summit

April 16, 10:00 a.m. - 3:30 p.m.; Gordon Event Center & Zoom. The Geospatial Summit is an annual event for anyone interested in geospatial data and its many applications. We learn how maps and data can change Wisconsin and the world. The Summit will include speakers Dr. David Hart of Wisconsin Sea Grant, Janet Silbernagel of Silvernail Geodesign, and Christian Andresen from the UW-Madison Geography Department sharing how they incorporate geospatial data, tools, and technologies into their work. There will also be a career panel and career fair featuring representatives from local GIS companies and agencies. The Geospatial Summit is FREE and open to all. To register, visit the Geospatial Summit website.

Boosting e-BH via Conditional Calibration

April 16, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Zhimei Ren, assistant professor of statistics and data science at the University of Pennsylvania, will introduce a general framework that boosts the power of e-BH without sacrificing its false discovery rate (FDR) control under arbitrary dependence. Extensive numerical experiments show that their proposed method significantly improves the power of e-BH while continuing to control the FDR. To read the full abstract, visit the Statistics Seminar calendar listing.

Exploring AI in Teaching: Lessons Learned

April 17, 12:00 p.m. - 1:30 p.m.; Zoom. Join a panel of instructors as they review the opportunities and challenges they encountered this past year using generative AI in the classroom. This will be an open conversation about lessons learned with a focus on the future of using generative AI in teaching and learning. To register, visit the Exploring AI in Teaching calendar listing.

Y Combinator Will Visit UW–Madison on April 17

April 17, 5:30 p.m. - 6:30 p.m.; Discovery Building. Y Combinator, which helps startups succeed by supporting founders at their earliest stages, will visit UW–Madison. Current UW–Madison students are invited to their presentation and reception. This event is for students studying technical subjects like computer sciences, data science, math, and engineering who may not have thought much about doing a startup, but who are curious to learn more. Students can also sign up for one-on-one meetings with Y Combinator in the afternoon. Pre-registration is required. Learn more and sign up at the Y Combinator website.

Registration open for Throughput Computing Week 2025 (HTC25)

June 2-6, 2025; Fluno Center at the University of Wisconsin-Madison & Zoom. The HTC25 brings together researchers, campus, science collaborations, facilitators, administrators, government representatives, and professionals interested in high throughput computing to:
  • Engage with the throughput computing community, including the OSG Consortium, Center for High Throughput Computing, HTCondor staff, PATh and Pelican teams, and others contributing to HTC
  • Be inspired by presentations and conversations with community leaders and contributors sharing common interests
  • Learn about HTC and new developments to advance your science, your collaboration, or your campus
To learn more and register, visit the HTC25 webpage.

Check out more data science seminars and events at the Data Science @ UW website.

 

COMMUNITY EVENTS

 

Posit at ShinyConf 2025: AI, Advanced Shiny, and More

April 9-11, 2025; Zoom. Appsilon's ShinyConf is excited to host speakers who will unveil advancements and practical applications of Shiny. Attendees will get hands-on experience with AI-driven Shiny apps, Shiny for Python, and modern UI techniques. Attendees will learn from keynotes, real-world case studies, and expert sessions tailored to developers, scientists, and innovators. To register, visit the ShinyConf 2025 webpage.

2025 North American School of Information Theory

Register by April 25- The University of Minnesota Twin Cities will host the 2025 North American School of Information Theory from June 16th to June 20 in Minneapolis. The school will allow graduate students and postdoctoral researchers to engage with leading experts in information theory through short lectures and the chance to present their own work. Graduate students and postdoctoral researchers in North America working on problems in information theory and learning are encouraged to apply with the title and a short abstract of a poster that they would like to present.

Midwest Optimization and Statistical Learning Conference 2025

Register by April 25- Join Northwestern University's Center for Optimization and Statistical Learning for this biannual workshop on May 16th that brings together researchers from the region to discuss research at the interface of optimization and machine learning. Hear from keynote speaker Léon Bottou and speakers from each participating institution. To learn more, view the conference flyer. To register, complete the RSVP form.

Registration open for 2025 Mass Spec Summer School

July 21-24, Discovery Building. Join the National Center for Quantitative Biology of Complex Systems for the annual North American Mass Spectrometry Summer School. Students will experience an engaging and inspiring program covering fundamentals of mass spectrometry and the latest in its application to the analyses of plants (NSF) and animals (NIH). Tutorial lecture topics include: Mass Analyzers, Ionization, Tandem MS, Proteomics, Metabolomics, Data analysis, and PTMs. Also planned are lectures workshops for scientific and professional development. Registration closes on May 31, 2025 or when capacity is reached.
 

JOBS AND OPPORTUNITIES

STUDENT

Maternal Child Health Graduate Project Assistant

Apply by April 9- UW-Madison’s Regional Community Health Team (RCHT) provides training, technical assistance, and resource support to build capacity of community-based partners in identifying and responding to community-driven health priorities. This role will support the RCHT in providing technical assistance to build skills and capacity in youth engagement and voice. The student will assist with evaluation survey development, data review, and data visualization tools. To apply, visit the Maternal Child Health Graduate Project Assistant job posting.

Transportation Program Assistant

Apply by April 11- The assistant will collect bicycle parking and UW-Madison campus bus ridership data across campus outside on cell phones, and table summer campus events, including SOAR at Union South, to provide transportation information. They will also assist with parking product sales fulfillment tasks, including matching inventory numbers, stuffing envelopes, and performing data entry. To apply, visit the Transportation Program Assistant job posting.

PROFESSIONAL

Research Specialist

Apply by April 7- The research specialist will provide technical assistance for research projects in the Werling Lab in the Department of Genetics at UW-Madison. The researcher will advance the role of genetic variation and sex differences in brain development and risk for neurodevelopmental and psychiatric disorders. To apply, visit the Research Specialist job posting.

Data Scientist III

Apply by April 8- The successful candidate will advance the UW-Madison's Department of Emergency Medicine by providing expert statistical consulting and data analytics support. This role will collaborate closely with faculty investigators, research staff, and other members of study teams to ensure robust study design, accurate data interpretation, and high-quality research outputs. The role may also involve mentoring trainees on statistical methodologies and fostering growth for learners in health professions and associated fields. To apply, visit the Data Scientist III job posting.

Breast Surgical Oncology Research Specialist

Apply by April 11- The Division of Surgical Oncology at UW-Madison is seeking a researcher who will improve our understanding of breast cancer development and management. Working closely with the PI, medical student trainees, and collaborative lab members, the successful candidate will play an active role in advancing research projects in clinical research, basic science, and translational research. To apply, visit the Breast Surgical Oncology Research Specialist job posting.

Two postdoctoral positions with the Functional Gene Control group at MRC London Institute of Medical Sciences (LMS)/Imperial College London

Apply by April 21- The Functional Gene Control group led by Dr. Mikhail Spivakov in London combines wet-lab and computational approaches to study how enhancers, enhancer-promoter contacts, and cis-regulatory networks mediate the effects of disease-linked genetic variants and extracellular signals on gene expression.

MRC Postdoctoral Research Scientist

The group is looking for a collaborative postdoctoral scientist with either a wet-lab and/or computational background. A computational scientist must have experience in applying advanced statistical and AI approaches to epigenomics data arising from single-cell and bulk assays, and knowledge of genomics data manipulation in R/python. A wet-lab scientist must have experience in applying and troubleshooting high-throughput epigenomics and/or CRISPR targeting (activation/interference) assays in application to mammalian stem or immune cells. To apply, visit the MRC Postdoctoral Research Scientist job posting.

Research Associate in Computational Genomics

The group is also looking for a researcher who will lead the integration and conceptualisation of multimodal genomics data arising from single-cell genomics, population genetics, and chromosome conformation capture assays produced in-house and externally using state-of-the-art methodologies and participate in the development of new computational algorithms. To apply, visit the Research Associate in Computational Genomics job posting.
 

DATA VISUALIZATION OF THE WEEK

Testing citation skills and overconfidence of AI chatbots

When you enter a query in traditional search engines, you get a list of results. They are possible answers to your question, and you decide what resources you want to trust. On the other hand, when you query via AI chatbot, you get a limited number of answers, as a sentence, that appear confident in the context. For Columbia Journalism Review, Klaudia Jaźwińska and Aisvarya Chandrasekar tested this accuracy and confidence by using several chatbots to cite articles.

Overall, the chatbots often failed to retrieve the correct articles. Collectively, they provided incorrect answers to more than 60% of queries. Across different platforms, the level of inaccuracy varied, with Perplexity answering 37% of the queries incorrectly, while Grok 3 had a much higher error rate, answering 94% of the queries incorrectly.

Reposted from the Data Science Community Newsletter, an Academic Data Science Alliance project, and Nathan Yau of Flowing Data discussing a Tow Center (Columbia University Journalism School) study by Klaudia Jaźwińska and Aisvarya Chandrasekar.
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub.

Use our submission form to send us  your news, events, opportunities and data visualizations for future issues.

Feedback, questions and accessibility issues: newsletter@datascience.wisc.edu