Connect with us: Website | X | Community Map | Submit News
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
June 11, 2025
 
Members of the UW community who worked on Condor in the 1980s were recognized at the anniversary celebration on June 3. Standing are Miron Livny, Guri Sohi, Tim Theisen, Mike Litzkow, Matt Mutka, and Michael Ferris.

Center for High Throughput Computing Celebrates Condor Anniversary

The Center for High Throughput Computing recently celebrated a major milestone. Forty years ago, the first Condor job ran at UW–Madison. Since then, it has been deployed on all continents of the world, has given rise to the idea of high throughput computing, and has been used to power scientific discovery. Twenty years ago, the Open Science Grid ran its first job, extending high throughput computing into a distributed environment. CHTC celebrated the 40th anniversary of the Condor Project on June 3, during Throughput Computing Week.

Faculty Milestones

Ilias Diakonikolas, Sheldon B. Lubar professor of Computer Sciences, received the 2024 Association for Computing Machinery (ACM) Grace Murray Hopper Award. Diakonikolas is the first faculty member from UW–Madison to win this award, one of the highest honors for young researchers in computing. In their award announcement, ACM recognized Diakonikolas's breakthrough techniques in algorithm design.
 
Emeritus Professor Grace Wahba was recently awarded the 2025 International Prize in Statistics — one of the highest honors in the field — for her foundational contributions to smoothing splines, a method for drawing smooth curves through messy data, and for the impact her ideas have had on modern data science and machine learning. Wahba joined the UW–Madison statistics department in 1967 as its first female faculty member.
 

CAMPUS WORKSHOPS

Data Wrangling in Python

TODAY June 10-12, 10:00 a.m. - 3:00 p.m.; Sewell Social Sciences, 4218. This hands-on course teaches wrangling skills, mostly using the data wrangling tools of the Pandas package in Python. Pandas is a collection of functions/methods for working with data comparable to R's tidyverse. This course will cover importing and cleaning data, creating and transforming variables, merging data, and basic data visualization. Registration is required.

Linux Basics For NGS Data Analysis

June 13, 9:30 a.m. - 4:30 p.m.; Biotech Center, 1360. Students will learn the essential techniques to interact with the Linux operating system via the bash shell commands necessary to leverage the capabilities of many of the latest bioinformatics tools used in Next Generation DNA Sequencing (NGS) analysis. This workshop is a preparation for all other Summer Bioinformatic Workshops and provides insights into essential Linux commands. For most applicants, it is a required session. Registration is required.

Intro to NGS Data Analysis

June 16, 9:30 a.m. - 4:30 p.m.; Biotech Center, 1360. This workshop is for researchers interested in using open source tools to analyze Next Generation DNA Sequencing (NGS) data. After learning essential techniques of the Linux operating system with the bash shell in the Linux Basics workshop, students will apply these newly acquired skills and identify single nucleotide polymorphisms (SNPs) from NGS data using Linux command-line driven open source software and explore the results using graphical visualization tools. Registration is required.

Pilot Workshop: Reproducible ML Workflows for Scientists

June 16-17, 9:00 a.m. - 12:30 p.m.; Discovery Building, 1145. This pilot workshop introduces tools and practices for reproducing machine learning (ML) and scientific workflows across different machines and computing platforms. We'll use Pixi to create fully reproducible portable software environments (with GPU support) and show how to distribute those environments in Linux containers for production or cloud deployment. Participants should be comfortable with basic programming and navigating file systems. Registration is required.

Data Visualization in R: GGplot2 Basics

June 17, 10:00 a.m. - 12:00 p.m.; Sewell Social Sciences, 3218. This workshop will teach you how to create and modify data visualizations using ggplot2, a popular plotting package in R. Emphasis is placed on using plots to understand distributions of different numbers and types of variables. Registration is required.

Dates and Times in Stata

June 17, 1:00 p.m. - 3:00 p.m.; Sewell Social Sciences, 4218. In this workshop, we'll discuss working with dates and times in Stata, including how Stata stores dates and times, converting dates and times into Stata's format, and using dates and times in Stata code. Registration is required.

Regression Diagnostics with R

June 18, 9:00 a.m. - 12:00 p.m.; Sewell Social Sciences, 3218. The usefulness and accuracy of regression models depend on whether several assumptions are satisfied, but many researchers do not check if their model assumptions are met. In this workshop, we'll learn the importance of satisfying each regression assumption, how to check for assumption violations with statistical and visual tests, and how to correct any violations. This workshop assumes you are familiar with the basics of fitting models as taught in any introductory statistics course. Registration is required.

Functions and Iteration in R

June 19, 9:00 a.m. - 12:00 p.m.; Sewell Social Sciences, 3218. Do you find yourself copying and pasting blocks of code? This workshop will cover the basics of function writing to turn existing code procedures into functions and return multiple and conditional values. We'll apply the functions to a series of data to perform tasks like calculating the standard error of each column in a dataframe, running simulations, and writing and reading files. We will also discuss how to parallelize iterative processes on the SSCC Slurm cluster. You should know the fundamentals of working with data in R. Registration is required.

Presentable Bar Graphs in Stata

June 19, 1:00 p.m. - 3:00 p.m.; Sewell Social Sciences, 4218. Bar graphs are simple and powerful tools, but Stata bar graphs often require a good bit of tweaking to make them presentable. In this workshop, we'll learn some of the tricks (including user-written commands) needed to make bar graphs look good and convey useful information. Registration is required.

mRNAseq – Tuxedo Suite 2

June 23, 9:30 a.m. - 4:30 p.m.; Biotech Center, 1360. This workshop will explore the Tuxedo Suite #2 (Hisat2, StringTie, Ballgown) to analyze the transcript-level analysis of bulk mRNA-seq experiments, including differential gene expression. We'll cover quality control of reads, align reads, quantify transcript levels, and much more. Students should be familiar with the concept of “command line” typed within a terminal, and R and RStudio knowledge would be helpful. Registration is required.

AI and Society: Community Impacts and New Directions

June 27-28, 9:00 a.m. - 3:30 p.m.; Madison Concourse Hotel and Governor’s Club. This interactive workshop offers a comprehensive exploration of artificial intelligence (AI) and its wide-reaching impact across various sectors, focusing on how these advancements intersect with education, ethics, and community well-being. As AI rapidly reshapes industries from healthcare to journalism and the arts, educators, community organizations, and local businesses must consider how these changes affect long-term resilience and everyday life in communities across WI. Registration is required.

Single-cell RNA-seq

June 25 and 27, 9:30 a.m. - 5:00 p.m.; Biotech Center, 1360. Learn how to turn raw data into biological insight with hands-on analysis of real datasets using one of the most popular scRNA-seq analysis packages: Seurat for the R programming language. Other software options, such as R-based Bioconductor and Python-based scanpymodule, may be reviewed. Participants will begin analyzing their scRNA-seq data and pursue self-directed learning of more advanced analysis techniques. Participants should have a working proficiency in R and a basic understanding of why we do single-cell experiments. Registration is required.

Microbial Shotgun Metagenomics: Taxonomy and Function

June 30, 9:30 a.m. - 4:30 p.m.; Biotech Center, 1360. This workshop consists of lectures and hands-on training in analyzing and interpreting shotgun metagenomic sequencing data. Participants will gain experience using modern tools to profile microbial communities' composition and assess their functional potential. Registration is required.

Analysis of Shotgun Metagenomics Results

July 7, 9:30 a.m. - 4:30 p.m.; Biotech Center, 1360. This workshop consists of lectures and hands-on training in the analysis of the Shotgun Metagenomics Results from the previous workshop. Registration is required.

Real World Stata Tables

July 18, 9:00 a.m. - 12:00 p.m.; Sewell Social Sciences, 4218. In this workshop, you'll learn how to make a set of real-world tables requested by real-world researchers using Stata's table and collect commands. You'll learn new tricks and gain experience with these powerful and flexible tools. Registration is required.
 
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Tuesdays and Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com.
 
 

SEMINARS AND EVENTS

Explore AI in Teaching: Summer Reading Sessions

TODAY June 11, 18, 25, 12:00 p.m. - 1:00 p.m.; Zoom. Join UW–Madison's Center for Teaching, Learning & Mentoring for a light lift, high-impact discussion series exploring how AI is reshaping teaching and learning. We’ll discuss AI’s role in teaching and assessment, equity and academic integrity, and disciplinary perspectives. Each session will center on a short journal article or podcast episode, offering space to connect, reflect, and imagine what's next. Open to anyone who teaches or supports instruction.

SILO: Mitigating Spurious Correlations in Fake Image Detection

TODAY June 11; 4:00 p.m. - 5:00 p.m.; Memorial Union - Council Room (4th Floor). Anirudh Sundara Rajan, UW–Madison Computer Sciences student, explains that detecting AI-generated images is a challenging yet essential task. Rajan argues that an image should be classified as fake if and only if it contains artifacts introduced by the generative model. Based on this premise, Rajan proposes Stay Positive, an algorithm designed to constrain the detector's focus to generative artifacts while disregarding those associated with real data. 

ML4MI Seminar: Medical Image Foundation Models for Cardiovascular Disease Risk Prediction

June 17, 10:00 a.m. - 11:00 a.m.; Zoom. Dr. Pingkun Yan, P.K. Lashmet Career Development Chair Professor of Biomedical Engineering at Rensselaer Polytechnic Institute, will presents a series of recent efforts to develop medical image foundation models for cardiovascular disease risk assessment using chest imaging.

The Sky’s the Limit STEM Camp

June 20 - July 25, 1:00 p.m. - 4:00 p.m. The Sky's the Limit STEM Camp, hosted by the Nelson Institute Center for Climatic Research, broadens science opportunities for autistic youth in grades 5-12 with a medical diagnosis, self-diagnosis, or suspected diagnosis. The camp provides nature-based and interactive learning opportunities to build interest and appreciation for STEM (science, technology, engineering, and mathematics). Attendees, accompanied by their caregivers, will participate in science experiments and outdoor activities. Registration is limited to 20 participants.

Exploring AI in Teaching: Perspective-Taking & Productivity

June 27, 1:00 p.m. - 2:30 p.m.; Zoom. Dr. Ben Rush, UW–Madison, will share how generative AI can be used to gain insights and efficiently create actionable plans for course design and implementation. He will share examples, discuss prompt engineering principles, and demonstrate brainstorming and real-time problem solving with Copilot and ChatGPT.

Exploring AI in Teaching: Improving Communication

July 30, 12:00 p.m. - 1:30 p.m.; Zoom. Discover how AI tools can help you employ clear and concise communication to enhance student understanding, promote equity, and improve course materials. You will learn to use AI to write straightforward instructions for complex assignments and to communicate effectively with students who are not yet fluent in the language of your discipline.

Register for the posit::conf(2025)

September 16-18; Atlanta, Georgia & Zoom. Registration for the posit::conf(2025) virtual experience is now open! A fully virtual option is offered, and you’ll have access to live-streamed keynotes and 100+ insightful talks, along with a more streamlined agenda and expanded opportunities to connect with others in the community.

Check out more data science seminars and events at the data science @ uw website.

 

JOBS AND OPPORTUNITIES

STUDENT
  • Data & Tableau Assistant, UW–Madison College of Letters & Science, Department of French & Italian
  • Gernsbacher Lab Undergraduate Research Assistant, UW–Madison College of Letters & Science, Psychology Department
  • IT Helpdesk Specialist (LTE), Disability Rights Wisconsin
  • Research Assistant, UW–Madison School of Medicine and Public Health, Department of Medicine
  • Student Library Instructor in Teaching & Learning Programs, UW–Madison General Library
  • Student Worker –National Atmospheric Deposition Program (NADP) Sample & Data Processing, Wisconsin State Lab of Hygiene
PROFESSIONAL
  • Associate Dean of Libraries for Information Technology and Administration, UW–Madison General Library/Director's Office
  • Bioinformatics Scientist, UW–Madison School of Medicine and Public Health, Neurological Surgery
  • Clinical Database Coordinator, UW–Madison School of Medicine and Public Health, Department of Medicine
  • Clinical Research Study Coordinator, UW–Madison School of Medicine and Public Health, Department of Medicine
  • Survey Research Supervisor, UW–Madison College of Letters and Science, University of Wisconsin Survey Center
  • WIDA Data Warehouse Manager, UW–Madison School of Education, Wisconsin Center for Education Research
 

DATA VISUALIZATION OF THE WEEK

AI Systems are improving quickly on longer software tasks

The chart shows that AI systems are quickly increasing their speed of performing software-related tasks, in terms of how long they take human professionals.
Reposted from the Data Science Community Newsletter, an Academic Data Science Alliance project, and Our World in Data, based out of Oxford University.
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub. This newsletter was originally created by the Data Science Hub and published as Hub Updates.

Use our submission form to send us your news, events, opportunities and data visualizations for future issues.

Feedback, questions and accessibility issues: newsletter@datascience.wisc.edu