|
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
November 27, 2024
|
|
|
|
DSI and WUD Film Committee Present a Free Screening of The Thinking Game
The Data Science Institute and WUD Film Committee invite the UW–Madison campus community to a free screening of The Thinking Game on December 5, 6:30pm, at the Union South Marquee Cinema. The Thinking Game tells the story of DeepMind’s founder Demis Hassabis and his team as they pursue the creation of AI that matches or surpasses human abilities across a wide range of tasks. The documentary captures the groundbreaking achievement of AlphaFold, an AI tool that can predict protein structures. Hassabis, David Baker, and John Jumper received the 2024 Nobel Prize in Chemistry for developing AlphaFold.
The screening will be followed by a panel discussion with Anthony Gitter, Biostatistics and Medical Informatics; Hannah Wayment-Steele, Biochemistry; and Kyle Cranmer, Data Science Institute. Hope to see you there!
|
|
|
|
Interested in Anomaly Detection? Join the NSF HDR ML Challenge
At the National Science Foundation’s Harnessing the Data Revolution ML Challenge, the brightest minds in machine learning and data science will tackle one of the most exciting frontiers in scientific research: anomaly detection. The challenge: develop algorithms that can identify subtle differences and anomalies across three provided datasets. The winner will be flown to the Association for the Advancement of Artificial Intelligence (AAAI) workshop in Philadelphia and receive a cash prize!
The NSF HDR ML Challenge is open to UW–Madison students, postdocs, faculty and staff, as well as members of the broader community. Learn more about this challenge at the NSF HDR Community website.
|
|
|
|
Present Your Work at the Data Science Research Bazaar
Researchers, students, and professionals at all levels of expertise are invited to showcase their work at the 6th annual Data Science Research Bazaar, March 19-20 at the Discovery Building. This interdisciplinary conference welcomes submissions for data-science-focused lightning talks, posters, workshops, and interactive discussions.
This year’s theme is AI and ML in Research: Navigating Opportunities and Boundaries, and we welcome presentations that explore both the potential and limitations of artificial intelligence (AI) and machine learning (ML) in research. While AI/ML will be a key focus, we encourage submissions from all areas of fundamental and applied data science and computational work. Proposals are due January 15th.
The Research Bazaar is an opportunity to connect with, learn from, and contribute to UW–Madison’s ever-expanding data science research community. Learn more and apply on the Data Science Research Bazaar webpage.
|
|
|
|
Trustworthy ML/AI – Explainability, Bias, Fairness, and Safety
December 2 - 4, 9:00 a.m. - 1:00 p.m.; Zoom. Join us for a free pilot of our new workshop, Trustworthy AI – Explainability, Bias, Fairness, and Safety. This lesson equips participants with trustworthy AI/ML practices, focusing on fairness, explainability, reproducibility, accountability, and safety across structured data, NLP, and computer vision tasks. Participants will learn to evaluate and enhance model trustworthiness and integrate ethical practices into research applications. As this is a pilot workshop, we’re looking for participant feedback to refine the lesson for future sessions.
The course is aimed at graduate students and other researchers at UW-Madison. Participants must have experience using Python and a basic understanding of machine learning (e.g., familiar with the concepts like train/test split and cross-validation) and should have trained at least one model in the past. To register and preview the schedule / materials, visit the Trustworthy AI registration page.
|
|
|
|
Methods for Biological Data Workshop
December 5, 1:00 p.m. - 2:30 p.m.; Russell Labs, Room 584 + Zoom.
Check out the workshop website for more information about the December 5th meeting.
|
|
|
|
|
|
Software Carpentry
January 13-16, 9:00 a.m. - 1:00 p.m.; Zoom. Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems. Register for the workshop on the Software Carpentry workshop website beginning on December 2nd, at 9:00 a.m..
|
|
|
|
Health Sciences Data Carpentry
January 21-24, 9:00 a.m. - 1:00 p.m.; Health Sciences Learning Center, room 3330. Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its target audience is researchers who have little to no prior computational experience, and its lessons are domain specific, building on learners' existing knowledge to enable them to quickly apply skills learned to their own research. Participants will be encouraged to help one another and to apply what they have learned to their own research problems. Register for the workshop on the Health Sciences Data Carpentry workshop website beginning on December 2nd, at 9:00 a.m..
|
|
|
|
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com
|
|
|
|
SILO: American Family Funding Initiative Short Talks - Image Alignment, RL, Auto-labeling
TODAY November 27, 12:30 p.m. - 1:30 p.m; Wisconsin Institute for Discovery, Orchard room 3280 + Zoom. SILO will host three talks on projects funded by American Family, presented by Anirudh Sundara Rajan, Brahma S. Pavse, and Harit Vishwakarma. Rajan will discuss the effectiveness of dataset alignment for fake image detection; Pavse will discuss reliable offline evaluation of reinforcement learning agents through abstraction; Vishwakarma will discuss improved confidence functions for auto-labeling.
|
|
|
|
Modern Recommendation Systems: When the Academic Meets the Industrial Practice
December 4, 3:00 p.m. - 4:00 p.m.; 1240 Computer Sciences + Zoom. Dr. Ji Liu, principal AI scientist at Instagram, Meta Platforms (UW Madison Comp Sci PhD graduate), will introduce the fundamental components of recommendation systems, their evolution in the data-driven era, and share his own attempts and explorations to upgrade the current systems. Dr. Ji Liu will discuss key industry advancements, including training super large-scale recommendation models (100 trillion parameters), addressing optimal decision-making in recommendation systems, and exploring the relationship between recommendation systems and generative AI. To view the full abstract, visit the modern recommendation systems lecture calendar listing.
|
|
|
|
Job Talks for RISE-AI Candidates in Management
|
|
|
|
Statistics Seminar: Estimating Direct Effects under Interference: A Spectral Experimental Design
December 4, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. In this talk, Chris Harshaw explains the problem of direct effect estimation under interference. His team presents a new experimental design under which the variance of a Horvitz—Thompson style estimator is bounded as $Var <= O( \lambda / n )$, where $\lambda$ is the largest eigenvalue of the adjacency matrix of the graph. This experimental approach achieves consistency when $\lambda = o(n)$, which is a much weaker condition on the network than most similar approaches which require the maximum degree to be bounded. This experimental design establishes the best known rate of convergence for this problem. His team also presents a variance estimator and CLT that facilitate the construction of asymptotically valid confidence intervals. To view the full abstract, visit the estimating direct effects under interference statistics seminar calendar listing.
|
|
|
|
RED Talk with Amy Thering
December 4, 4:30 p.m. - 6:00 p.m.; 1240 Computer Sciences. Join us for an exciting conversation with Chief Information Officer at Optum Health-East Region, Amy Thering. Thering will discuss breaking barriers in tech leadership and healthcare innovation. Pizza will be available while supplies last. To learn more about the event and to register, visit the School of Computer, Data & Information Sciences 24-25 RED Talks webpage.
|
|
|
|
Public Nutrition Programs and the Lasting Impacts of Early Life Lead Exposure
December 5, 12:15 p.m. - 1:30 p.m.; 5131 Nancy Nicholas Hall. Nicholas Sanders, Associate Professor, Brooks School of Public Policy and Department of Economics, Cornell University will discuss how lead poisoning in children remains a large public health concern. His presentation explores an alternate pathway by which society can reduce the damages of lead – through improvements in childhood nutrition. Matching early childhood test score data to geolocated birth records, historical information on exposure to lead, and access to social food programs provides insight into how nutrition-related welfare programs might help mitigate long-term impacts of childhood lead exposure. To view the full abstract, visit the Institute for Research on Poverty Seminar calendar listing.
|
|
|
|
How Intelligent Are Current Multimodal Video Models?
December 6, 12:30 p.m. - 1:30 p.m.; 1221 Computer Sciences. Everyone is invited to the weekly Machine Learning Lunch Meetings. Faculty members from Computer Sciences, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning.
This week, Yong Jae Lee (CS) will present two recent contributions from his lab that challenge and advance the capabilities of multimodal video models. First, he will introduce a new benchmark called Vinoground, which evaluates the temporal, counterfactual reasoning capabilities of existing models. Spoiler alert: they aren't great (to put it mildly). Second, he will present a novel approach inspired by the Matryoshka doll to improve the efficiency of multimodal models. It learns to compress the total number of visual tokens in a nested fashion, significantly reducing the number of tokens that the subsequent language model needs to process. For more information, visit the machine learning lunch meeting calendar listing.
|
|
|
|
Translational AI Applications in Prostate Cancer
December 10, 10:00 a.m. - 11:00 a.m.; Zoom. Join the next installment of the ML4MI Seminar Series with Dr. Stephanie Harmon, Stadtman Investigator at the National Cancer Institute. Machine learning applications have dominated major medical imaging journals in recent years, but how practical are these tools and how do we translate them to clinical use? In this talk, we will review recent ML applications developed at the NCI and discuss how clinical translation is considered during model development, validation, and deployment.
|
|
|
|
Workflows with Posit Team
TODAY November 27, 10:00 a.m. - 11:00 a.m.; streamed on YouTube Live. The Posit Team shows how Posit and Snowflake can help you streamline and accelerate your R modeling workflows. If you’re looking for a way to simplify model deployment and access high-performance compute for your analyses, this session is for you!
During this demo, we’ll cover: how to fit an R model using Snowflake data for direct integration, versioning and evaluating your model to ensure consistency and reproducibility, performing fast model inference directly in Snowflake, and storing and accessing models in Snowflake so they’re available to other users and tools (like Snowsight and Python). To learn more about the benefits of this workflow and YouTube link to join the event, check out the Posit Team's Workflows event description.
|
|
|
|
ML+X Nexus: Crowdsourced ML and AI Resources
Nexus is the ML+X community’s centralized hub for sharing machine learning (ML) and AI resources, designed to make the practice of ML/AI more connected, efficient, accessible, and reproducible. It is intended to host a wide range of resources (original or external), including educational materials, recorded talks across campus, model use guides, datasets, EDA case studies, and more. While practitioners can use Nexus to expand their expertise, educators and researchers will find it useful for sharing ML knowledge and procedures, reducing redundancy, and improving learning outcomes. Visit ML+X Nexus to begin expanding your knowledge, or visit our How to Contribute page to share a useful ML resource from your field.
|
|
|
|
|
WISCIENCE Evaluation Intern
Apply by December 6 - The Evaluation Intern will support the Director of Evaluation and Research in evaluating WISCIENCE programs and provide evaluation support to grant-funded initiatives relating to STEM education and research experiences in higher education. The person in this role will assist WISCIENCE evaluation efforts by building surveys and reports, contacting individuals within WISCIENCE, at UW-Madison, and other institutions by email, recruiting subjects, tracking and entering consent forms, maintaining and storing data, and assisting with the analysis of qualitative data. Visit the WISCIENCE Evaluation Intern job posting from the Student Jobs Site to learn more and apply.
|
CHTC Fellows Program for Summer 2025
Apply by December 20 - The Center for High Throughput Computing (CHTC) Fellows Program trains undergraduate and graduate students in the development and use of cyberinfrastructure through a summer program where participants will work with mentors on delivering a project that will make an impact on the nation’s science. The projects provide exciting and challenging opportunities for students interested in software development, infrastructure services, or research facilitation. Fellows work with a mentor to develop a project relevant to one of these areas. To view the program description and to apply, visit the CHTC Fellows Program webpage.
|
|
|
|
Spring course announcement: Physics 361: Machine Learning in Physics
As spring enrollment beings, consider enrolling in Physics 361: Machine Learning in Physics. This course covers some of the ways in which Machine Learning is used in physics, including classification, likelihood-free inference, and emulators with generative models. Students will also briefly cover some advanced topics such as graph neural networks, solving PDE, and uses of LLMs. The professor will try to make the course accessible both to people with a physics background and people with a machine learning background (which will require some review for both sides). The course load will be moderate, with a few problem sets and a final project, no exams. To learn more about the class, review the learning outcomes on the physics course description webpage.
|
|
|
|
2025 National Big Data Health Science Student Case Competition
|
|
|
|
The University of South Carolina's Big Data Health Science Center is hosting their 6th annual Student Case Competition, a virtual event in which students compete in teams to solve an issue in healthcare using big data analytics for a chance to win cash prizes. The Big Data Health Science Student Case Competition is intended to provide enthusiastic teams of graduate and senior undergraduate students with the opportunity to apply their knowledge to analyzing big datasets in health care.
Each participating team will analyze the case and datasets to be released on Friday, February 7th, 2025, at noon EST. Through February 9th, team members will work together to present their methods, analyses, and results at the Big Data Health Science Center Case Competition virtually. A panel of industry and academic experts will judge the presentations based on each team’s use of full analytics tools/processes, from framing the problem to data use, model building, innovation, and communicating the solutions to decision-makers. Applications are open until January 29th, 2025. View the flyer for more details about this event.
|
|
|
|
Data Systems Administrator
Apply by December 1 - The Office of Data, Academic Planning & Institutional Research (DAPIR) is seeking a Data Systems Administrator to join their Enterprise Data Management team. The position will be instrumental in helping shape the future of UW-Madison's data landscape. Enterprise Data Management is in the process of modernizing several elements of our data platform and is looking for an experienced data professional to help us grow beyond traditional databases into new technologies and methodologies (data lake, data lakehouse, data fabric, etc.).
This role will be central to helping design, build, and maintain physical and virtual database systems and related data systems (AWS S3, etc.) in a hybrid cloud and on-premises environment, and will assist with the overall UW-Madison data platform to ensure data security and integrity. The Data Systems Administrator will develop related policy and procedure, facilitate training, and serve as a subject matter expert for related project planning. This person will also play a key role in proactively monitoring activity and performance to maintain responsible financial operations in cloud environments. Visit the job posting from the DAPIR to learn more and apply.
|
Wisconsin Health Data Hub Program Manager
Under the direction of the Associate Dean for Informatics and Information Technology at SMPH, the Wisconsin Health Data Hub Program Manager will oversee the strategic direction, execution, and delivery of the WHDH projects. This role requires an experienced leader with a deep understanding of healthcare systems, health data, exceptional project management skills, and the ability to drive cross-functional collaboration. The ideal candidate will have a proven track record of leading large-scale health data projects and delivering impactful results for successful and efficient operation of healthcare sector. Visit the job posting from SMPH to learn more and apply.
|
|
|
|
New Research Topics: NVIDIA Academic Grant Program Accepting Proposals
Apply by December 31 - NVIDIA’s Academic Grant Program is calling for research proposals to advance work in three new interest areas: Data Science, Graphics and Vision, and Edge AI. NVIDIA will continue accepting submissions for projects related to Simulation and Modeling and Gen AI and LLMs. For more information, please see the NVIDIA FAQs webpage.
New areas of interest:
- Data science submissions can include data processing, operational research and route optimization, and graph neural networks
- Graphics and vision submissions can include augmented and virtual reality, ray tracing, rendering, and AI for graphics
- Edge AI submissions can include robotics, autonomous vehicles, 5G/6G, smart spaces, and federated learning
|
|
|
|
|
DATA VISUALIZATION OF THE WEEK
|
|
|
|
Detecting AI-Generated Images is Getting Harder
The Data Science Community Newsletter reports about Elizabeth Bik and her colleagues who have been searching published scientific papers looking for images that may indicate the fabrication of scientific results. Once they find an image that is either an exact copy or otherwise provably fake, they typically write to the authors and journal’s editors and request retractions. These days, they are finding less scientific fakery, but they surmise that it’s likely due to the faker's use of more sophisticated AI image generators that can produce more convincing images. One clue “that fraudsters are using sophisticated image tools is that most of the problematic images sleuths are currently detecting are in papers that are several years old.”
In an interview, Bik explains that her work is more challenging now because “in the past couple of years, we’ve seen fewer and fewer image problems." She thinks, "most folks who have gotten caught doing image manipulation have moved on to creating cleaner images.” Scientific misinformation erodes trust and corrodes the information commons so it's a big deal if we don't know what we shouldn't trust. Read the "AI-generated images threaten science" nature news article by Diana Kwon to learn more.
|
|
|
|
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub.
Use our submission form to send us your news, events, opportunities and data visualizations for future issues.
|
|
|
|
|