|
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
March 5, 2025
|
|
|
|
Health Sciences Data Carpentry Workshop
Join the Data Science Hub and Ebling Library on March 24-27 from 1:00 - 5:00 p.m. for a Health Sciences Data workshop on the fundamental data skills needed to conduct research using tabular health science data (spreadsheets). The workshop uses a real data set of dementia patients to teach how to work with data in spreadsheets, clean data with a tool called OpenRefine, query data in a SQL database, and work with R to analyze, manipulate, and plot data. This workshop is for researchers who have little to no prior computational experience. To preview the lesson and register, visit the Health Sciences Data Carpentry Workshop website.
|
|
|
|
Intermediate Research Software Dev. In Python
Join the Data Science Hub on March 24-27, 9:00 a.m.- 12:30 p.m., for a hands-on workshop covering intermediate software development skills, including virtual environments, PyCharm, unit testing, and collaborative GitHub workflows! This workshop focuses on practical, team-based software engineering practices essential for research environments, using Python as the example language. The core set of skills that will be taught are not a comprehensive set of all-encompassing skills, but a selective set of tried-and-tested collaborative development skills that forms a firm foundation for continuing on participants' learning journey.
Prerequisites: Basic knowledge of Python programming, Unix Shell, and Git is required to attend. Take this self-assessment quiz to see if you qualify. To preview the lesson and register, visit the workshop website.
|
|
|
|
Faces of Data Science: Adeline Lo
Adeline Lo, Assistant Professor of Political Science, studies how conflict and cooperation between groups impacts politics—especially the politics of migration. Her methods combine data science techniques for analyzing media data, such as convolutional neural networks for TV images, with randomized field projects and surveys. Lo designs statistical tools to work with “odd, complicated, and messy” data, and she has created open-source R packages for social scientists. Learn more about her work in Faces of Data Science.
|
|
|
|
Calling all data enthusiasts! Get ready to dive into the world of AI and machine learning at the 6th annual Data Science Research Bazaar, hosted by the Data Science Hub on March 19-20. This year’s event will explore the potential and limitations of AI and ML in research. Presentations will also highlight fundamental and applied data science across research fields and industries. Learn more and register by March 12.
|
|
|
|
Research Object Storage (S3): An Overview for Researchers and Support Staff
March 6, 10:00 a.m. - 11:00 a.m.; Zoom. This workshop will explore Research Object Storage (S3), a new campus-hosted data storage service offering 50TB of no-cost storage to eligible researchers. This session will be geared toward technical support staff and highlight access methods (such as API), practical applications, and potential issues using the service. We will discuss common IT tasks associated with this service and how to know if this is a service you should recommend to your researchers. For more information, visit the Research Object Storage (S3) calendar listing.
|
|
|
|
Learning Lab
March 7, 12:00 p.m. - 1:00 p.m.; Zoom. The Center for Teaching, Learning, and Mentoring (CTLM) hosts informal monthly Zoom gatherings where you can drop in at any point to ask a question or get personalized support related to teaching and learning. The CTLM has expertise in effective teaching practices, incorporating technology (including AI) in your teaching, and instructional and media design. For more information, visit the Learning Lab calendar listing.
|
|
|
|
Exploring NLP with Hugging Face and Foundation Models
March 10, 5:30 p.m. - 7:30 p.m.; 2257 College Library. Whether you’re new to Natural Language Processing (NPL) or looking to deepen your understanding of transformer-based architectures, this workshop will offer an interactive introduction to Hugging Face’s ecosystem. Participants will explore model inferencing, customizations, and practical techniques for efficient deployment. For more information and to register, visit the Exploring NLP with Hugging Face and Foundation Models calendar listing.
|
|
|
|
Linux Essentials For NGS Data Analysis
March 21, 9:30 a.m. - 4:30 p.m.; Biotechnology Center, Room 1360. This workshop is for researchers interested in using open source tools for analyzing Next Generation DNA Sequencing (NGS) data. Participants will learn the essential techniques to interact with the powerful Linux operating system via the bash shell commands that are necessary to leverage the capabilities of many of the latest, most popular bioinformatics tools used in NGS analysis. A central goal of the workshop is to make users more comfortable with what is likely an unfamiliar computing environment so that they can more confidently understand and employ these methods in the context of their independent research projects. For most applicants, this workshop is a required session to all other Bioinformatics Resource Core workshops this semester and provides insights to essential Linux commands. To learn more and register, visit the Linux Basics For NGS Data Analysis workshop webpage.
|
|
|
|
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Tuesdays and Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com
|
|
|
|
ML+Coffee — Connect, Share, and Explore Machine Learning
TODAY March 5, 9:00 a.m. - 11:00 a.m.; Room 1145, Discovery Building. ML+Coffee offers a supportive, casual space to discuss machine learning projects, tools, and ideas. Whether you’re seeking feedback, showcasing a tool, or exploring an ML-related paper, all experience levels and backgrounds are welcome.
This month, in addition to open discussions, Linqi Lu, Ph.D. candidate and lecturer in the School of Journalism and Mass Communication, will share insights on detecting online cannabis marketing and developing a multimodal chatbot for diet and health.
If you're interested in leading a discussion (e.g., getting feedback on an ongoing project or discussing a paper) or presenting a demo at the monthly ML+Coffee event, fill out the coffee discussion sign-up form and select dates that work for you.
|
|
|
|
Exploring AI in Teaching: Supporting Student Success
TODAY March 5, 12:00 p.m. - 1:00 p.m.; Zoom. UW–Madison instructor Dr. Nathan Jung, with the Program for Engineering Communication, will explore practical methods for incorporating AI-driven resources into your classroom. Through concrete examples and interactive discussions, you will gain insights into selecting and implementing AI tools. You will leave with actionable strategies and best practices to harness AI as a catalyst for improved engagement, performance, and long-term student success. To learn more and register, visit the Exploring AI in Teaching calendar listing.
|
|
|
|
Nourishing Knowledge
TODAY March 5, 12:00 p.m. - 1:00 p.m.; Zoom. In this webinar we will explore how Gale Digital Scholar Lab can be utilized to unlock the full potential of the Gale's food history archives. You will learn how to analyze, visualize, and manipulate culinary data, uncovering patterns and connections within recipe books. With real-life case studies and practical examples, this webinar is a must-attend for digital humanities scholars, cultural historians, and researchers interested in exploring food culture. To learn more and register, visit the Nourishing Knowledge calendar listing.
|
|
|
|
SILO Seminar: “Polynomial Graph Neural Networks: Theoretical Limits and Graph Noise Impact”
TODAY March 5, 12:30 p.m. - 1:30 p.m.; Discovery Building, Orchard Room 3280 & Zoom. Dr Arash Amini, Associate Professor of Statistics at UCLA, will discuss the theoretical foundations of Graph Neural Networks (GNNs), focusing on polynomial GNNs (Poly-GNNs).
Dr. Amini analyzes Poly-GNNs within a contextual stochastic block model, addressing a key question: Does increasing GNN depth improve class separation in node representations? Dr. Amini's results show that for large graphs, the rate of class separation remains constant regardless of network depth. His team demonstrates how “graph noise” can overpower other signals in deeper networks, negating the benefits of additional feature aggregation.
For those who have not signed up to attend in-person, please refrain from taking pizza, as catering is arranged beforehand. For more information, view the full abstract from SILO's upcoming talks page.
|
|
|
|
Statistical Challenges in Modern Machine Learning and their Algorithmic Consequences
March 6, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. The success of modern machine learning relies heavily on large-scale datasets, but their vast size makes effective curation challenging. Many classical estimators, designed for clean data, struggle in such settings, raising statistical and algorithmic challenges: What are the statistical limits of estimation, and can they be achieved efficiently?
Dr. Yeshwanth Cherapanamjeri, postdoctoral researcher at MIT, explores these challenges in two complementary settings: extreme noise and extreme bias. First, Dr. Cherapanamjeri examine estimation with heavy-tailed data, where optimal statistical estimators exist but are computationally impractical. Next, he will address extreme bias in the classical Roy model of self-selection bias and will introduce efficient algorithms that counteract this bias. Finally, Dr. Cherapanamjeri will discuss future directions on constructing high-quality datasets from diverse sources of varying quality and quantity. To read the full abstract, visit the Statistical Challenges in Modern Machine Learning and their Algorithmic Consequences calendar listing.
|
|
|
|
Protein Language Models: Teaching Old Sequences New Tricks
March 6, 1:30 p.m. - 2:00 p.m.; Biotechnology Center Auditorium & Zoom. Join the the Center for Genomic Science Innovation for their Genomics Seminar Series with Anthony Gitter, Associate Professor of Biostatistics and Medical Informatics at the University of Wisconsin Madison.
Dr. Gitter will give an overview of how protein language models are being used to generate new proteins, assess the impacts of protein mutations, and predict protein structures. Dr. Gitter will emphasis his approach to combine classic biophysical protein simulations with protein language models. Review the protein language model flyer to read the full abstract and access the Zoom link.
|
|
|
|
OSG School Applications Open
Apply by March 7 - Applications are open for the 2025 OSG School. During this program — June 23–27 — you will learn to use high-throughput computing (HTC) systems to run large-scale computing applications that are at the heart of today’s cutting-edge science. Through lectures, discussions, and hands-on activities, you will learn how HTC systems work, how to run and manage long lists of computing tasks and work with huge datasets to implement a scientific computing workflow, and where to turn for more information and help.
OSG school is ideal for students, postdocs, staff, and postsecondary instructors who currently or potentially use, support, or teach large-scale computing. People accepted to this program will receive financial support for basic travel and local costs. Apply by March 7. For more information, visit the program website.
|
|
|
|
Scalable Piecewise Smoothing in High Dimensions with BART
March 7, 12:00 p.m. - 1:00 p.m.; UW Biotechnology Center Auditorium & Zoom. Sameer Deshpande, UW-Madison Department of Statistics, will introduce ridgeBART, an extension of Bayesian Additive Regression Trees (BART), a nonparametric regression model that approximates known functions with a sum of binary regression trees. ridgeBART is built with trees that can assign multiple categorical levels to both branches of a decision tree node and output linear combinations of ridge functions. For more information, visit the Biostatistics and Medical Informatics Department Seminar webpage.
|
|
|
|
Recent Results on Harmonic Maps with Free Boundaries and Applications
March 7, 1:10 p.m. - 2:10 p.m.; 901 Van Vleck Hall. Yannick Sire, Director of Graduate Studies at Johns Hopkins, will report on harmonic maps with free boundaries and explore a new viewpoint reformulating some maps into a suitable framework of pseudo-differential equations (PDE). Dr. Sire will explain how to tackle questions about existence, regularity, and blow-up analysis for elliptic and parabolic problems. Dr. Sire will also show how those maps can be used to investigate issues related to rigidity/flexibility for manifolds with boundaries. For more information, visit the Recent Results on Harmonic Maps with Free Boundaries and Applications calendar listing.
|
|
|
|
Privacy, Copyright, and Data Integrity: The Cascading Implications of Generative AI
March 10, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences & Zoom. The rise of generative AI created a continuous cycle where personal data flows between people, models, applications, and online platforms, making traditional safeguards like "don’t train on this data" inadequate. Given training data scarcity and AI’s deep integration into daily life, we must consider data, people, and models together.
Niloofar Mireshghallah, post-doctoral scholar at University of Washington, explores three key research directions:(1) measuring the imprint of data on models through novel membership inference attacks and uncovering memorization patterns, (2) developing algorithmic approaches to help people control the exposure of their data while preserving utility, and (3) grounding model evaluations in legal and social frameworks. For the full abstract, visit the Privacy, Copyright, and Data Integrity calendar listing.
|
|
|
|
TEO Speaker Series: Navigating Success with Insights from Zack Howe
|
|
|
|
March 10, 4:00 p.m. - 5:00 p.m.; 1003 (Tong), Engineering Centers Building. Join UW-Madison's Technology Entrepreneurship Office (TEO) for an inspiring session where industry experts share their tips for startup success. Topics include entering your desired field, advancing in your career, and the importance of networking. Special guest Zack Howe, CEO of Nyx Studios and UW-Madison alum, will share his journey from freelance writer to successful entrepreneur. For more information, visit the TEO Speaker Series event listing.
|
|
|
|
Center for High Throughput Computing (CHTC) Information Session
March 11, 10:30 a.m. - 11:15 a.m.; 1170 Discovery Building & Zoom. CHTC staff will present an overview of CHTC services and how these services can help researchers accomplish their research and computational goals. CHTC staff will also help attendees identify the next steps for getting started - whether that is getting an account, how to log in, or how to start running work. For more information and to register, visit the CHTC Information Session calendar listing.
|
|
|
|
How you think on a function defined on 0,1,…,N-1?
March 13, 11:00 a.m. - 12:00 p.m.; Van Vleck 901 Hall. About one million times per day, your cellphone calculates the Fourier Transform (FT) of certain functions defined on 0,1,…,N-1, with N large (order of magnitude of thousands and more). The calculation is done using the Fast Fourier Transform (FFT). Shamgar Gurevich, professor of mathematics at the University of Wisconsin Madison, will explain how to obtain the FFT by answering how to think on the space of functions on the set 0,1,...,N-1?
Engineers tell us that there are two answers for this question, but Professor Gurevich will explain that there is a not so well-known third space, of arithmetic nature, that also gives an answer to the above question. To view the full abstract, visit the Applied Algebra Seminar Spring website.
|
|
|
|
Vision-Language Models for Radiology AI with Dr. Akshay Chaudhari
March 17, 10:00 a.m. - 11:00 a.m.; Zoom. Join Dr. Akshay Chaudhari, Assistant Professor of Radiology and Biomedical Data Science at Stanford University, for the March Machine Learning for Medical Imaging Seminar Series.
Dr. Chaudhari will describe the impact that language models can have for the field of radiology. Dr. Chaudhari will describe the use of some pure large language models (LLMs) for radiology text and transition to how these domain-specific LLMs can enable improved image understanding, image captioning, and synthetic image generation tasks. Finally, Dr. Chaudhari will describe how our evaluation techniques need to keep pace with new model developments. For more information and the Zoom link, visit the Vision-Language Models for Radiology AI seminar webpage.
|
|
|
|
What Do We (Not) Know About AI and Child Development?
March 17, 10:00 a.m. - 11:00 a.m.; 1240 Computer Sciences. Ying Xu, Assistant Professor of AI in Learning and Education at Harvard, will discuss the role and impact of artificial intelligence (AI) on children’s cognitive and social development. Dr. Xu will discuss how children interact with, perceive, and learn from AI, as well as their trust in their "AI companions." Dr. Xu will explore emerging directions about how generative AI tools influence curiosity, creativity, and critical thinking. Dr. Xu will show how the research community can amplify our collective voice and establish a stronger presence to ensure that AI is developed and implemented in ways that are safe for children. For more information, visit the What Do We (Not) Know About AI and Child Development calendar listing.
|
|
|
|
Bridging the Semantic Gap between Autonomous System Requirements and Complex Sensor Data
March 18, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. Autonomous systems rely on advanced sensor inputs like LiDAR and cameras to interpret their environments and perform safety-critical tasks. To operate safely, they must translate sensor data into an accurate world representation. Simultaneously, developers must be able to utilize the internal representation to assess the system’s adherence to safety requirements. The key challenge lies in bridging the semantic gap by effectively leveraging the proper internal world representation.
|
|
|
|
Weston Roundtable – AI and Computing for Local Food Systems
March 20, 4:15 p.m. - 5:15 p.m.; Zoom. Alfonso Morales, Vilas Distinguished Achievement Professor of Planning and Landscape Architecture, will review topics that broadly deal with the use of computing solutions to address the mounting challenges we face in securing our food systems. The lecture focuses on three dimensions:
- Precision agriculture: This includes micro weather modeling, crop selection and adaptation, land management, real-time sensing for efficient crop watering, fertilization, and pest control.
- Intelligent food distribution systems: This covers transportation optimization, local sourcing promotion, distribution and markets, waste management, and avoidance.
- Inter-silo connections: This includes connections to public health, marketing and consumer behavior, ecological / ecosystems management, and services of farm production.
|
|
|
|
Johnson & Johnson's (J&J) Open Source Journey with R in Clinical Trials
March 12, 11:00 p.m. - 12:00 p.m.; YouTube Live. Join posit to learn how J&J is embracing R and open source in clinical trials. In this live web event, the data science team at J&J will walk through their open-source journey in R, building an open-source culture, lessons learned, best practices, and future roadmap opportunities. Stay tuned after the presentation for a live Q&A with the team. For more information and to register, visit the Johnson & Johnson's Open Source Journey with R in Clinical Trials registration page.
|
|
|
|
Machine Learning Theory Summer School at Princeton
Summer school is a chance to learn about some exciting new ideas from math, physics, and statistics for understanding machine learning. For more information and to apply for scholarships, view the summer school flyer or visit the summer school webpage.
|
|
|
|
Learning Support Assistant (LSA) for LIS 407 Data Storytelling with Visualization - 2025 Summer
Apply by March 10- Under the general supervision of the Director of the Information School (iSchool), the LSA will teach LIS 407: Data Storytelling with Visualization in an web-based / online format from June 9 - August 10, 2025. The LSA will prepare a pre-approved syllabus in consultation with iSchool staff, create and grade assignments and examinations, hold synchronous office hours, and communicate with students via email and virtual meetings. For more information and to apply, visit the LSA for LIS 407 Data Storytelling with Visualization - 2025 Summer job posting.
|
|
|
|
Research Cyberinfrastructure Project Assistant
Apply by March 12- The Office of Research Cyberinfrastructure within DoIT (Division of Information Technology) at the University of Wisconsin-Madison, is hiring a graduate student Project Assistant to provide administrative and analytical support for services within Research Cyberinfrastructure including ResearchDrive, Cloud Platforms, Electronic Lab Notebooks (ELN) and Data Science Platform. Qualified applicants must have experience with data analysis and visualization.
The Project Assistant will analyze survey data and produce visualizations and reports, gather and summarize user requirements for new systems, and develop and update online communications materials for Research Cyberinfrastructure services. For more information and to apply, visit the Research Cyberinfrastructure Project Assistant job posting.
|
|
|
|
Data Analytics Intern
Apply by March 16- The data analytics intern will work closely with the data and policy analyst for the undergraduate program within the Wisconsin School of Business. This position will assist with the development of presentations, reports, and statistical analyses of administrative operations for the School of Business.
The analytics intern will pull, clean, reformat, and analyze data from campus data tools to answer staff
questions, generate lists for staff needs, and prepare data for other projects. The analytics intern will also assist with the development of presentations/reports including data visualizations. For more information and to apply, visit the Data Analytics Intern job posting.
|
|
|
|
Teaching Assistant (TA) for LIS 461 Data and Algorithms: Ethics and Policy - 2025 Summer
Apply by March 16- The TA will assist the course instructor in operation of the summer class LIS 461 Data and Algorithms: Ethics and Policy. The TA will lead two online discussion sections from June 16 to August 10, 2025.
The TA will exercise sessions, provide technological support to the students in the class, grade assignments, post materials, troubleshoot issues with Canvas, and manage communications with students via email and discussion boards. The TA is expected to attend and participate in all class sessions. For more information and to apply, visit the TA for LIS 461 Data and Algorithms: Ethics and Policy job posting.
|
|
|
|
Data Integration and Warehouse Engineer
Apply by March 10- The Data Integration and Warehouse Engineer develops, maintains, and troubleshoots data integrations, primarily in Informatica Cloud (IICS), and supports multiple database systems for the UW–Madison School of Medicine and Public Health. The successful candidate will monitors, responds to, and resolves existing database access and performance issues.
The person in this role has primary responsibility for authoring and maintaining ETL/ELT mappings, mapping tasks, and workflows from a wide variety of sources. The person in this role will also assist in the initial setup and configuration of a new data warehouse as part of the UW-Madison transition to Workday in July 2025. For more information and to apply, visit the Data Integration and Warehouse Engineer job posting.
|
|
|
|
Software Engineer - Student Enterprise Applications (SEA)
Apply by March 10- The Software Engineer will join a team of developers and other IT professionals at UW–Madison to analyze, design, develop, troubleshoot, and maintain the Student Information System (SIS) and Student Administration System PeopleSoft Applications in support of establishing a sustainable institutional technological infrastructure.
The Software Engineer will conduct system analysis, contribute to strategy development, and provide training and technical guidance to less experienced staff. The successful candidate will resolve the most complex problems independently or by guiding team members. The Software Engineer is the principal contact or lead for development of technical solutions, products or suite of products. For more information and to apply, visit the Software Engineer - SEA job posting.
|
|
|
|
Data Scientist I
Apply by March 15- The Data Scientist will work within Dr. Andrew South's laboratory in the UW–Madison Department of Dermatology to support projects associated with tissue damage-driven cancer initiation and progression. The successful candidate will work with existing and newly generated raw and processed sequencing data to generate library QC, annotate reads, and perform comparative analysis to assess somatic mutation, mutational signatures, and RNA expression in a range of human tumor and normal tissue samples. For more information and to apply, visit the Data Scientist I job posting.
|
|
|
|
|
DATA VISUALIZATION OF THE WEEK
|
|
|
|
Many remote workers say they’d be likely to leave their job if they could no longer work from home
A growing number of U.S. companies are requiring workers to return to the office, and President-elect Donald Trump’s incoming administration has signaled it may do the same with federal employees. But many American workers say they’d rather find a new job than give up working from home. Among employed adults who have a job that can be done from home, 75% are working remotely at least some of the time. Nearly half of workers in this group (46%) say that if their employer no longer allowed them to work from home, they would be unlikely to stay at their current job.
|
|
|
|
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub.
Use our submission form to send us your news, events, opportunities and data visualizations for future issues.
|
|
|
|
|