|
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
March 19, 2025
|
|
|
|
Awards Recognize Achievement in Open Source, Open Scholarship
The Data Science Research Bazaar will host the inaugural UW-Madison Open Awards on Thursday, March 20th at 9:30 a.m. in the Wisconsin Institute for Discovery DeLuca Forum. The Open Awards, which recognize achievement in open source and open scholarship, are sponsored by the Data Science Institute, Open Source Program Office, and the Libraries, with support from the Alfred P. Sloan Foundation. All are welcome! You don’t need to be registered for the Research Bazaar to attend the Open Awards. Learn more at the OSPO website.
|
|
|
|
Health Sciences Data Carpentry Workshop
Register by 5:00 p.m. today to join the Data Science Hub and Ebling Library on March 24-27 from 1:00 - 5:00 p.m. for a Health Sciences Data workshop on the fundamental data skills needed to conduct research using tabular health science data (spreadsheets). The workshop uses a real data set of dementia patients to teach how to work with data in spreadsheets, clean data with a tool called OpenRefine, query data in a SQL database, and work with R to analyze, manipulate, and plot data. This workshop is for researchers who have little to no prior computational experience. To preview the lesson and register, visit the Health Sciences Data Carpentry Workshop website.
|
|
|
|
Thank You, Research Bazaar Sponsors!
This week’s Data Science Research Bazaar would not be possible without the support of our generous campus and industry sponsors. We are grateful to these partners for supporting the Research Bazaar: UW-Madison Data Science Institute, UW Libraries, Department of Computer Sciences, Department of Industrial and Systems Engineering, Department of Statistics, iSchool, Division of Information Technology, School of Computer, Data and Information Sciences, School of Medicine and Public Health, Technology Entrepreneurship Office, Capital Data, and the Wisconsin Alumni Research Foundation. Learn more about these sponsors at the Research Bazaar website.
|
|
|
|
Excel 2: Data-Driven Excel: Unlocking Insights with Analysis Tools and Techniques
TODAY March 19, 5:30 p.m. - 7:00 p.m.; 2257 College Library. This workshop introduces more advanced analysis tools that can be used to summarize data, perform complex calculations, and maximize the computing power of Microsoft Excel. To learn more and register, visit the Excel 2 calendar listing.
|
|
|
|
Linux Essentials for NGS Data Analysis
March 21, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop is for researchers interested in using open source tools for analyzing Next Generation DNA Sequencing (NGS) data. Participants will learn the essential techniques to interact with the Linux operating system via the bash shell commands that are necessary to leverage the capabilities of the most popular bioinformatics tools used in NGS analysis. For most applicants, this workshop is a required session to all other Bioinformatics Resource Core workshops this semester and provides insights to essential Linux commands. To register, visit the Linux Basics For NGS Data Analysis workshop webpage.
|
|
|
|
Python 1: Foundational Python for Beginners
March 31, 5:30 p.m. - 7:00 p.m.; 2257 College Library. Python is a powerful, cross-platform, easy-to-use programming language. The use of high-level syntax and typeless data makes it especially beginner-friendly and its interpreter makes troubleshooting and debugging a breeze. This workshop offers an introduction to basic Python concepts that will help you get coding in no time – no previous programming experience required! To register, visit the Python 1 calendar listing.
|
|
|
|
Python Programming: Intermediate Python
April 1, 10:00 a.m. - 12:00 p.m.; Zoom. This workshop explores additional Python functionality, building on skills learned in the first four workshops hosted by the Libraries. Prior workshops included lessons on automating tasks with loops, lists, and functions, spreadsheets and data manipulation, and data visualization with Seaborn. An understanding of basic Python concepts is helpful. To register, visit the Python Programming: Intermediate Python webpage.
|
|
|
|
Excel 1: Introduction to Data Processing with Excel
April 1, 6:00 p.m. - 7:30 p.m.; 2257 College Library. The Excel 1 workshop introduces students to basic spreadsheet terminology and the simple navigation of the Excel desktop. This course teaches the skills necessary to construct an operative spreadsheet and create graphs or charts from data. To register, visit the Excel 1 calendar listing.
|
|
|
|
Microbiome Analysis Using QIIME2
April 4, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop will cover amplicon-based microbiome analysis using the QIIME2. The workshop will consist of lectures and hands-on training to analyze from raw dataset through publication-quality statistics and visualizations. Participants should have a basic knowledge of the Linux command line (bash shell) and a basic understanding of statistical methods. To register, visit the QIIME2 workshop page.
|
|
|
|
R Programming: Organizing Your Projects with GitLab + RStudio
April 4, 10:00 a.m. - 12:30 p.m.; Zoom. Do you have lots of versions of files/scripts in a folder and want to better organize them? You need a formal version control software tool called Git! This workshop teaches learners to use RStudio and Git to keep track of file versions, switch back to old versions of a file, host version controlled files on the campus GitLab instance, and synchronize your files between different computers. No Command Line Needed! A working knowledge of R and RStudio would be helpful for you to get the most out of this session. To register, visit the R Programming: Organizing Your Projects with GitLab + RStudio webpage.
|
|
|
|
SQL: Introduction to Databases with SQL
April 7, 5:30 p.m. - 7:00 p.m.; 2257 College Library. Structured Query Language (SQL) is a programming language used to interact with databases and create datasets. SQL is a useful tool for data analysts, software developers, and researchers to create and structure data for analysis and modeling. This workshop teaches basic SQL queries and syntax, and how to create tables, perform analysis, and export tables. To register, visit the SQL calendar listing.
|
|
|
|
Getting Started with High Throughput Computing
April 8, 10:30 a.m. - 12:00 p.m.; Orchard View Room, Discovery Building & Zoom. New to high throughput computing or need a refresher? This workshop will introduce the Center for High Throughput Computing's (CHTC) High Throughput Computing system, components of a job, and how to handle data. Participants will practice transferring data, submitting jobs, and debugging simple job errors. A basic understanding of the command line is recommended. To register, visit the Getting Started with High Throughput Computing calendar listing.
|
|
|
|
R1: Basics of Data Management with R
April 8, 5:30 p.m. - 7:00 p.m.; 2257 College Library. R is a free, open-source software and programming language for statistical computing and graphics. This workshop will introduce R and RStudio. Participants will learn about basic syntax, vectors, matrix and data frames, and how to import, work with, chart, and plot data. To register, visit the R1 calendar listing.
|
|
|
|
Analysis of QIIME2 Biome Results
April 11, 9:30 a.m. - 4:30 p.m.; 1360 Biotechnology Center. This workshop is a follow-up of the April 4th “Microbiome analysis using QIIME2” workshop. The R package “qiime2R” will be used to convert data files exported by QIIME2 for further analysis and graphical exploration. Result from the previous workshop will be used to demonstrate basic analysis of microbiota data to determine if and how communities differ by variables of interest using R. Prerequisites include a basic understanding of R, statistical methods, and Qiime2. To register, visit the Analysis of QIIME2 Biome Results workshop webpage.
|
|
|
|
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Tuesdays and Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com
|
|
|
|
SILO Seminar: Characterizing the Power of Markov Chain Monte Carlo (MCMC) Methods for Sparse Estimation
TODAY March 19, 12:30 p.m. - 1:30 p.m.; Orchard View Room, Discovery Building & Zoom. Dr. Ilias Zadik, assistant professor of statistics and data science at Yale University, will discuss several recent results that characterize the performance of a large class of (low-temperature) MCMC methods for a series of canonical estimation models, such as sparse tensor PCA and sparse linear regression. This characterization reveals that in some models MCMC methods achieve the performance of the conjecturably optimal polynomial-time estimators, but in some other cases they significantly underperform.
For those who have not signed up to attend in-person, please refrain from taking pizza, as catering is arranged beforehand. For more information, view the full abstract on SILO's upcoming talks page.
|
|
|
|
Model-Based Sampling for Admissible Quantification of Model Uncertainty
TODAY March 19, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Merlise Clyde, professor of statistical science at Duke University, will discuss how Markov Chain Monte Carlo can be viewed through the lens of Probability Proportional to Size (PPS) sampling from a finite population sampling perspective. Dr. Clyde will present a new adaptive independent Metropolis-Hastings algorithm and illustrate how it can also be used for adaptive importance sampling. To read the full abstract, visit the Model Based Sampling for Admissible Quantification of Model Uncertainty calendar listing.
|
|
|
|
Data & Social Sciences Talk with Professor Christine Schwartz
TODAY March 19, 6:00 p.m. - 7:00 p.m.; 1240 Computer Science. Christine Schwartz, Professor of Sociology and Director of the Sociology Concentration in Analysis and Research (CAR) Program, will discuss how data plays a crucial role in sociological research. Professor Schwartz will also share how students can engage with data-driven research in the field of sociology.
|
|
|
|
Modeling Language as Social and Cultural Data
March 20, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. Lucy Li, Ph.D. candidate at UC Berkeley, will discuss how she's built reciprocal relationships between natural language processing (NLP), sociolinguistics, and education. Li will show how a sociolinguistic lens can inform model development by surfacing implicit social preferences of pretraining data curation practices. Within education, Li will discuss how LLMs can support content analyses of school curricula. To learn more, visit the Modeling Language as Social and Cultural Data calendar listing.
|
|
|
|
Tech Talk: Data Integration with Enterprise APIs
March 20, 12:00 p.m. - 1:00 p.m.; Zoom. To keep each other informed and to share our collective wealth of knowledge, each month a DoIT group provides a lunch-and-learn-style virtual presentation on a topic that helps people better perform their jobs, increases technical understanding across the organization, provides transparency by leadership regarding decision making, and creates an opportunity for learning and knowledge sharing. Join DoIT for the March Tech Talk! To learn more, visit the Tech Talk webpage.
|
|
|
|
The (Ultra) Long and the Short of it: Oxford Nanopore Sequencing
March 20, 1:00 p.m. - 2:30 p.m.; 1111 UW Biotechnology Center & Zoom. Join three guest speakers at the UW Biotechnology Center Bio-Tech Talk.
- Maddy Hartley, Field Applications Scientist at Oxford Nanopore, will discuss using nanopore sequencing to generate ultra-rich data and insights.
- Matthew Brian Couger, assistant professor at Harvard Medical School and associate professor at UC Riverside, will teach about ultra-long nanopore for assembling Y chromosomes and other highly repetitive content.
- Rachel Kirchner, MD-PhD Student at the UW-Madison School of Medicine and Public Health, will discuss the amplicon sequencing on Oxford nanopore technologies.
|
|
|
|
Weston Roundtable – AI and Computing for Local Food Systems
March 20, 4:15 p.m. - 5:15 p.m.; Zoom. Alfonso Morales, Vilas distinguished achievement professor of planning and landscape architecture, will review topics that broadly deal with the use of computing solutions to address the mounting challenges we face in securing our food systems. The lecture focuses on three dimensions:
- Precision agriculture: This includes micro weather modeling, crop selection and adaptation, land management, real-time sensing for efficient crop watering, fertilization, and pest control.
- Intelligent food distribution systems: This covers transportation optimization, local sourcing promotion, distribution and markets, waste management, and avoidance.
- Inter-silo connections: This includes connections to public health, marketing and consumer behavior, ecological / ecosystems management, and services of farm production.
|
|
|
|
Using Kinetic Theory and Simulations to Resolve Anomalies in Fusion Z-Pinch Experiment
March 21, 2:25 p.m. - 3:25 p.m.; 901 Van Vleck Hall. Dr. Genia Vogman, a computational scientist at the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, will explore how kinetic physics affects inertial confinement fusion Z-pinches and how we are using kinetic theory and high-order accurate continuum kinetic Vlasov-Poisson simulations to advance our understanding of multi-scale plasma dynamics. Dr. Vogman shows how new techniques for constructing kinetic equilibria, performing linear theory analysis, and performing quasilinear theory analysis shed light on nonlinear mass, momentum, and energy transport in multi-species magnetized collisionless plasmas. These insights will improve predictive modeling and ultimately bridge the gap between simulations and experiments. To learn more, visit the Applied and Computational Mathematics Seminar website.
|
|
|
|
Nuclear Engineering & Engineering Physics Department Seminar
April 1, 12:00 p.m. - 1:00 p.m.; 106 Engineering Research Building & Zoom. Dr. Jamie Baalis Coble, associate professor in the Nuclear Engineering department at the University of Tennessee, will discuss her specialization in statistical data analysis, empirical modeling, and advanced pattern recognition for equipment condition assessment, process and system monitoring, anomaly detection and diagnosis, failure prognosis, and integrated decision making. Dr. Coble's research incorporates system monitoring and life estimates into risk assessment, operations and maintenance planning, and optimal control algorithms. To learn more, visit the Nuclear Engineering & Engineering Physics Department Seminar calendar listing.
|
|
|
|
Tracing the Dynamic Distribution of Arsenic Across Reservoirs
April 1, 1:00 p.m. - 2:00 p.m.; 811 Atmospheric, Oceanic, and Space Sciences & Zoom. Athena Nghiem, assistant professor of Geoscience, UW–Madison, will explore Arsenic (As) biogeochemical cycling in multiple reservoirs, including closing the knowledge gap on global atmospheric As cycling. Professor Nghiem’s goal is to determine the redistribution and legacy impacts of As in the environment between reservoirs by using a diversity of approaches. As carries a significant threat to human health, due to its toxicity and likelihood of exposure, and the atmosphere is an exposure route. To learn more, visit the Tracing the Dynamic Distribution of Arsenic Across Reservoirs event page.
|
|
|
|
Using Bayesian Nonparametric Ideas and Spatial Statistics for Earth Sciences Applications
April 2, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Dr. Veronica Berrocal, statistics professor at UC Irvine, will present two papers that incorporate and revisit ideas proposed in the Bayesian nonparametrics literature in a spatial context to address problems in Earth Sciences. First, Dr. Berrocal will discuss how soil moisture influences land processes and will propose a statistical model that aims to learn about the scale of dependence and variability in the spatial process from observed data. Second, Dr. Berrocal will share useful information to design air pollution mitigation strategies. To read the full abstract, visit the Using Bayesian Nonparametric Ideas and Spatial Statistics for Earth Sciences Applications calendar listing.
|
|
|
|
Steering Machine Learning Ecosystems of Interacting Agents
April 7, 12:00 p.m. - 1:00 p.m.; 1240 Computer Sciences. Meena Jagadeesan, PhD student in Computer Science at UC Berkeley, will discuss her research on characterizing and steering ecosystem-level outcomes. Jagadeesan takes an economic and statistical perspective on ML ecosystems, tracing outcomes back to the incentives of interacting agents and to the ML pipeline for training models. To learn more, visit the Steering Machine Learning Ecosystems of Interacting Agents calendar listing.
|
|
|
|
Leveraging Gale Digital Scholar Lab for Engaging Narratives
April 8, 1:00 p.m. - 2:00 p.m.; Zoom. Join the Libraries for a webinar featuring Dr. Sarah Ketchley, Senior Digital Humanities Specialist, Egyptologist, and faculty member at the University of Washington. Dr. Ketchley will explore how to leverage the tools in Gale Digital Scholar Lab to create StoryMaps - a digital tool that combines maps, multimedia, and narrative to create interactive and engaging storytelling experiences. To register, visit the Leveraging Gale Digital Scholar Lab for Engaging Narratives event listing.
|
|
|
|
Advancing Assistive Robotics: Transforming Personal Care and Rehabilitation
March 27, 2:00 p.m. - 3:00 p.m.; Zoom. The Northwestern Mutual Data Science Institute (NMDSI) will host Dr. Mohammad Habibur Rahman, professor and chair of the department of Mechanical Engineering at the University of Wisconsin-Milwaukee.
Dr. Rahman will discuss advancements in assistive robotics for personal care, focusing on three pivotal areas: robot-assisted activities of daily living support, stroke rehabilitation, and telerehabilitation. Dr. Rahman's presentation will explore the integration of AI, machine learning, and advanced robotics to usher in a new era of personalized care. To register, visit the Advancing Assistive Robotics eventbrite.
|
|
|
|
Machine Learning Theory Summer School at Princeton
Apply for scholarship by March 31- Machine Learning Theory Summer School is for PhD students studying machine learning theory. Summer school will be held on August 12 - 21, 2025 at Princeton. The main courses will be taught by Florent Krzakala, Yue Lu, Yury Polynaskiy, Theodor Misiakiewicz, and Jianfeng Lu. Summer school is a chance to learn about exciting new ideas from math, physics, and statistics for understanding machine learning. For more information and to apply for scholarships, view the summer school flyer or visit the summer school webpage.
|
|
|
|
2025 Geospatial Summit
April 16, 10:00 a.m. - 3:30 p.m., Gordon Event Center & Zoom. The Geospatial Summit is an annual event for anyone interested in geospatial data and its many applications. We seek to discover how maps and data can change Wisconsin and the world. The Summit will include speakers; Dr. David Hart from Wisconsin Sea Grant, Janet Silbernagel of Silvernail Geodesign, and Christian Andresen from the UW Madison Geography Department; sharing how they incorporate geospatial data, tools, and technologies into their work. There will also be a career panel and career fair featuring representatives from local GIS companies and agencies. The Geospatial Summit is FREE and open to all. To register, visit the Geospatial Summit website.
|
|
|
|
Data Analyst Intern for PEOPLE
Apply by March 28- The Precollege Enrichment Opportunity Program for Learning Excellence (PEOPLE) prepares Wisconsin students to succeed in college with an emphasis on enrollment at UW Madison. The data analyst intern will update inventory management systems, collect data for in-house tutoring resources, and prepare reports to advocate for the efficacy of those programs. To apply, visit the Data Analyst Intern for PEOPLE job posting.
|
|
|
|
Student Computational Research Assistant
Apply by April 1- The Psychology Department at UW-Madison is hiring a student research assistant to maintain computational aspects and establish pipelines of the research (DeepSqueak/Sleap). The research assistant will work with graduate students and faculty member to continue developing machine learning for analyzing rodent vocalizations (Deep Squeak) and movements (Sleap), along with synchronizing behaviors between dyads of mice. To apply, visit the Student Computational Research Assistant job posting.
|
|
|
|
Student Help Intern - Data Specialist
Apply by April 2- The Epistemic Analytics lab at UW creates novel computational techniques and statistical tools that enable researchers to construct quantitative models of rich qualitative data in education, healthcare, policy, and other contexts. The student intern will assist researchers with preparation, analysis, and presentation of large amounts of qualitative and quantitative data. The student intern will manipulate text data in Excel and R, or Python, and help run webinars and training workshops. To apply, visit the Student Help Intern job posting.
|
|
|
|
Financial/Data Review and Analytics Hero
Apply by April 4- The Department of Cell and Regenerative Biology within the UW-Madison School of Medicine and Public Heath is hiring a student to assist with regular data review, sorting, and analysis as needed for ongoing budgetary review process. To apply, visit the Financial/Data Review and Analytics Hero job posting.
|
|
|
|
Executive Director N+1 Institute
Apply by March 24- The N+1 Institute was founded by UW-Madison and 4490 ventures, a Midwest Venture Capital firm, to enable the future of Web 3.0. As executive director you are the CEO of the N+1 Institute and the driver of the center's strategy and execution. Responsibilities include fundraising, recruiting board members, managing member companies, and driving execution for the center. The executive director will hire all the resources required for this center and build the operational structure and systems required to run the center. To apply, visit the Executive Director N+1 Institute job posting.
|
|
|
|
Business Intelligence (BI) Developer
Apply by March 28- The BI Developer will play a key role in building out resources for the UW-Madison community, making data more available and useable to inform decisions. The BI Developer will work with the Office of Data, Academic Planning & Institutional Research (DAPIR) to identify and understand needs and collaborate toward a solution. Responsibilities include developing and maintaining data visualization dashboards and reports for DAPIR and its partners, troubleshooting issues with reports, administering the report environment, and supporting the data governance program. To apply, visit the BI Developer job posting.
|
|
|
|
Canada Research Chair (CRC) Tier II in Synthetic Biology and Biomanufacturing
Apply by March 31- The Department of Biology in the Faculty of Arts and Science at Concordia University is hiring for a research intensive, tenure-track position that will support and expand upon Concordia University’s research programs in synthetic biology, bioengineering, and bioprocessing with an applied focus on sustainability, food security, and/or human health. The successful candidate will establish a research program that will benefit from, and foster growth of, our state-of-the-art Centre for Applied Synthetic Biology, Centre for Structural and Functional Genomics, Genome Foundry, and Bioprocessing Facility. To apply, visit the CRC Tier II in Synthetic Biology and Biomanufacturing job posting.
|
|
|
|
|
DATA VISUALIZATION OF THE WEEK
|
|
|
|
Fewer Below-Freezing Days in Winter
January 2025 was the hottest January on record, beating the prior record set in January 2024 by a sizable margin. More alarming than the record itself is that the record was set during a La Niña. Within the fairly recent past, January warmth records were set in 2007, 2016, 2020, and 2024 during El Niño events which boost global temperatures. This January, the world was in modest La Niña conditions that should, all things being equal, result in lower global temperatures.
Looking across the winter as a unit, our climate-changed winters are less frozen than they used to be. Across the contiguous 48 states, with a couple of exceptions, winter has fewer days below freezing. In some parts of the Rocky Mountain region and the desert southwest, up to 30 days that used to be below freezing are now above freezing. The average number of freezing days that have been lost is lower, at 13, but still impactful. First, even though there may be more Arctic blasts due to a weakened jet stream, those are more than offset by warmer winters overall. Second, species that react to warming temperatures will be forced to adapt or die. Some appear to be doing more dying than adaptive thriving.
|
|
|
|
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub.
Use our submission form to send us your news, events, opportunities and data visualizations for future issues.
|
|
|
|
|