|
Data Science Updates is the University of Wisconsin-Madison's resource for news, training, events, and professional opportunities in data science, brought to you by the Data Science Institute, powered by American Family Insurance, and the Data Science Hub.
November 13, 2024
|
|
|
|
Present Your Work at the Data Science Research Bazaar
Researchers, students, and professionals at all levels of expertise are invited to showcase their work at the 6th annual Data Science Research Bazaar, March 19-20 at the Discovery Building. This interdisciplinary conference welcomes submissions for data-science-focused lightning talks, posters, workshops, and interactive discussions.
This year’s theme is AI and ML in Research: Navigating Opportunities and Boundaries, and we welcome presentations that explore both the potential and limitations of artificial intelligence (AI) and machine learning (ML) in research. While AI/ML will be a key focus, we encourage submissions from all areas of fundamental and applied data science and computational work.
The Research Bazaar is an opportunity to connect with, learn from, and contribute to UW–Madison’s ever-expanding data science research community. Learn more and apply on the Data Science Research Bazaar webpage.
|
|
|
|
MadPrompts Showcases Generative AI
Following the success of MadPrompts 2023, the first generative artificial intelligence (AI) prompt battle on an American university campus, WARF and the UW-Madison Data Science Institute (DSI) hosted MadPrompts 2024 on October 18. This year’s event featured four competitors from majors and programs across campus using generative AI tools to create detailed images, and it introduced a new AI video showcase. MadPrompts is a one-of-a-kind experience that highlights the best of generative AI and leaves people inspired. Learn more about the MadPrompts event in the news article from WARF.
|
|
|
|
AI for Science Seminar: Charton to Present on AI for the Working Mathematician
The Data Science Institute is pleased to launch a new seminar series focused on AI for Science, which will highlight advances in science fueled by machine learning and AI as well as advances in AI inspired by scientific challenges. Our inaugural speaker is François Charton, a research engineer at Meta AI who is at the forefront of using AI for symbolic problems found in mathematics, cryptography, and theoretical physics. Charton will speak on AI for the Working Mathematician on November 20, 4-5pm in the Chemistry Learning Studio (1435 Chemistry). Cookies and coffee will be provided.
|
|
|
|
Posit Day 2: Intro to Shiny Apps – Python Focus
November 19, 1:00 p.m. - 3:00 p.m.; Zoom. This is the second installment of the Fall 2024 Posit Days. Ryan Johnson will talk to us about Shinny apps using Python and touch on concepts such as: User interface- Inputs & Outputs, Server Logic, Reactivity, and Layout and Style. This event will be online and open to everyone affiliated with UW–Madison. Learn more and register for Posit Day 2.
|
|
|
|
Registration for Mini-Workshop Series
Registration is open for the Data Science Hub's Fall 2024 Mini-Workshop Series. The final workshop in the series will discuss an introduction to machine learning. Tickets sales close on November 15th. To learn more and register, visit the mini-workshop event page.
We are also looking for a couple of ML-savvy folks interested in helping out at the workshop (during one or both days). Helpers will help troubleshoot bugs and answer questions from learners during the workshop. We typically try to have at least 1 helper for every 5 learners at the workshop. This ratio helps us make sure that no learner gets left behind during the workshop! Email ml-community-of-practice@g-groups.wisc.edu if you are interested.
|
Topic |
Date & Time |
Location |
Intro to Machine Learning with Sklearn |
November 20-21 9:00 a.m. - 1:00 p.m. |
Zoom |
|
|
|
|
Trustworthy ML/AI – Explainability, Bias, Fairness, and Safety
December 2 - 4, 9:00 a.m. - 1:00 p.m.; Zoom. Join us for a free pilot of our new workshop, Trustworthy AI – Explainability, Bias, Fairness, and Safety. This lesson equips participants with trustworthy AI/ML practices, focusing on fairness, explainability, reproducibility, accountability, and safety across structured data, NLP, and computer vision tasks. Participants will learn to evaluate and enhance model trustworthiness and integrate ethical practices into research applications. As this is a pilot workshop, we’re looking for participant feedback to refine the lesson for future sessions.
The course is aimed at graduate students and other researchers at UW-Madison. Participants must have experience using Python and a basic understanding of machine learning (e.g., familiar with the concepts like train/test split and cross-validation) and should have trained at least one model in the past. To register and preview the schedule / materials, visit the Trustworthy AI registration page.
|
|
|
|
Methods for Biological Data Workshop
December 5, 1:00 p.m. - 2:30 p.m.; Russell Labs, Room 584 + Zoom. Do you have too much data or computation to run on your laptop or lab server? One campus resource to scale up your computing is CHTC – the Center for High Throughput Computing. In this workshop, we will briefly cover when CHTC is a good home for your computing workflows and then do a hands-on demonstration of how data analysis can be run in CHTC. Check out the workshop website for more information.
Please create an account with this form before the workshop and bring your laptop. You will need access to a Unix (Apple)/Linux bash command-line and basic shell computing knowledge. If anyone needs help getting set up, facilitators Emile and Claudia will be in Russell Labs 584 and the Zoom room at 15 minutes before the workshop starts.
|
|
|
|
|
Have questions about anything data science-related? Come see the Data Science Hub facilitators at Coding Meetup on Thursdays from 2:30-4:30 p.m. CT. To join Coding Meetup, join data-science-hubgroup.slack.com
|
|
|
|
SILO Yuqian Zhang - Optimal Vintage Factor Analysis with Deflation Varimax
TODAY November 13, 12:30 p.m. - 1:30 p.m.; Wisconsin Institute for Discovery, Researchers’ Link 2nd floor + Zoom. Join Professor Yuqian Zhang, assistant professor in the Department of Electrical and Computer Engineering at Rutgers University, in a talk about her work in optimal vintage factor analysis with deflation varimax.
The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation, however, little theoretical guarantee of varimax rotation can be established. Zhang's team proposed a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. Her team fully establish theoretical guarantees for the proposed procedure in a broader context. Adopting this new deflation varimax as the second step after PCA, they further analyze this two-step procedure under a general class of factor models. Their results show that it estimates the factor loading matrix in the minimax optimal rate when the signal-to-noise-ratio (SNR) is moderate or large.
|
|
|
|
Statistics Seminar: Robustly Estimating Heterogeneity in Factorial Data using Rashomon Partitions: Tyler McCormick
TODAY November 13, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? McCormick's team aims to partition this factorial space into "pools" of covariate combinations where the outcome differs across the pools (but not within a pool).
Existing approaches have significant limitations, so McCormick's team developed an alternative perspective, called Rashomon Partition Sets (RPSs). Join McCormick as he applies this method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance. To view the full abstract, visit Tyler McCormick's statistics seminar calendar listing.
|
|
|
|
AI and Computing for Local Food Systems
November 14, 4:00 p.m. - 5:00 p.m.; 1240 Computer Sciences + Zoom. Join Professor Alfonso Morales, from the Department of Planning, Landscape, and Architecture, to learn about computing (and sensing) solutions to address the mounting challenges we face in securing our food systems. This lecture focuses on precision agriculture (micro weather modeling, crop selection and adaptation, land management, real-time sensing for efficient crop watering, fertilization and pest control), intelligent food distribution systems (transportation optimization, local sourcing promotion, distribution and markets, waste management and avoidance), and inter-silo connections (connections to public health, marketing and consumer behavior, and ecological / ecosystems management and services of farm production). To learn more, visit the AI and Computing for Local Food Systems calendar listing.
|
|
|
|
Panel Discussion of Research Data and Integrity in the Social Sciences
November 14, 4:00 p.m. - 5:30 p.m.; Grainger Hall, Room 4151. The Office of the Vice Chancellor for Research and Wisconsin School of Business are hosting a panel discussion of research data and integrity in the social sciences. The panel will discuss data in the social sciences, authorship, and best practices. There will also be ample opportunities for Q&A with the audience. Mark Rickenbach, Ph.D., Interim Associate Vice Chancellor for Research Policy and Integrity, will also give additional remarks. To view the list of attending panelists list, visit the UW Madison Research's panel discussion announcement.
|
|
|
|
Decoding the Human Genome by Multi-Omics in Cell-Free DNA and Single-Cells
November 15, 12:00 p.m. - 1:00 p.m.; Biotech Center Auditorium + Zoom. Join us for the BMI Seminar Series with speaker Yaping Liu from Northwestern University. Epigenetic modifications, such as DNA methylation, histone modifications, and three-dimensional (3D) genome topology, shape gene regulation and transcription factor binding. Liu's team developed single-cell Methyl-HiC to reveal the heterogeneity of DNA methylation, long-range DNA methylation concordance, and 3D genome in the same cells.
Additionally, to non-invasively monitor the dynamics of regulatory elements in vivo, they developed a set of computational methods to study the cellular epigenomes from cell-free DNA (cfDNA) fragmentation patterns. Their lab will pave the road for our understanding of the variation of cis-regulatory elements non-invasively across different physiological and pathological conditions. To view the full abstract, please review the BMI Seminar Series poster.
|
|
|
|
MadSystems Seminar: Single Level Stores: Providing Checkpointing as an OS Service
November 19, 4:00 p.m. - 5:00 p.m.; 4310 Computer Sciences + Zoom. Emil Tsalapatis, researcher at the University of Waterloo, will discuss his work on Single Level Stores. He will present the Aurora single level store, an OS design that uses continuous checkpointing for application persistence and deployment. Aurora provides submillisecond application checkpoint and restore operations to efficiently turn applications into on-disk images and back. Fast checkpointing/restore as an OS service also serves as a foundation for further research into open problems like efficient persistence APIs for memory-mapped data and serverless computing.
Tsalapatis describes three systems that demonstrate the efficiency and flexibility of the single level store paradigm: Aurora (SOSP 2021), an OS design capable of continuous application checkpointing at a fast enough granularity to provide transparent persistence; MemSnap (ASPLOS 2024), an OS single level store API and associated virtual memory mechanism; and Metropolis, a serverless invoker that uses the single level store paradigm to create serverless function instances at submillisecond latency. To read the full abstract, visit the MadSystem Seminar calendar listing.
|
|
|
|
Statistics Seminar: Graph-regularized topic modeling: Extending pLSI with Document Similarity: Claire Donnat
November 20, 4:00 p.m. - 5:00 p.m.; 133 Service Memorial Institute. To address limitations of topic modeling, Donnat's team proposes an extension of probabilistic latent semantic indexing (pLSI), a frequentist framework for topic modeling, that incorporates document-level covariates or known similarities between documents using a graph formalism. Modeling documents as nodes in a network with edges denoting similarities, we propose a new estimator based on a fast graph-regularized iterative Singular Value Decomposition (SVD) that encourages similar documents to share similar topic proportions.
Donnat's team further characterizes the estimation error of both the topics and topic assignment matrix under their proposed method by deriving high-probability bounds, and validate our model through comprehensive experiments on synthetic datasets and real-world corpora. To view the full abstract, visit the graph-regularized topic modeling statistics seminar calendar listing.
|
Modeling Learning in the Age of AI
|
|
|
|
Public Cloud Office Hours
November 14, 2:00 p.m. - 3:00 p.m.; Zoom. The UW Madison public cloud team holds weekly office hours for open discussion on topics related to the use of Amazon Web Services, Microsoft Azure and/or Google Cloud Platform (a.k.a. "Public Cloud", as opposed to on premise virtual computing).
Drop in for help with an issue, general discussion on using pubic cloud and work with your peers to learn best practices for experimenting, operating and innovating in the public cloud. Visit the Public Cloud Office Hours calendar listing for the Zoom link.
|
|
|
|
StatShare Community of Practice
November 19, 10:30 a.m. - 11:30 a.m.; WARF 1st Floor, room 132. StatShare is a community of practice for non-faculty biostatisticians to share knowledge, learn, network, and support one another in solving difficult research problems they encounter in their work. Join Tom Cook, professor CHS, for his part 3 discussion of Model Assumptions of Goodness of Fit. Learn more about StatShare and the upcoming event on the Biostatistics and Medical Informatics webpage.
|
|
|
|
NMDSI AI Ethics Symposium: From Policy to Practice
November 21, 8:00 a.m. - 6:00 p.m.; Marquette University - Alumni Memorial Union
This event will have two keynote addresses from Dr. Alondra Nelson, former deputy assistant to President Joe Biden and director of the White House Office of Science and Technology Policy, as well as Dr. Casey Fiesler, associate professor of information science at the University of Colorado Boulder. The event will be hybrid. Learn more and register via Eventbrite.
|
|
|
|
SMPH Collaborate
November 22, 3:00 p.m. - 5:00 p.m.; Health Sciences Learning Center Room 1345. All individuals from across campus involved in research — including undergraduates, graduate students, postdocs, research staff, and faculty — are welcome at SMPH Collaborate, an event series from the SMPH Office of Basic Research, Biotechnology and Graduate Studies. The Fall 2024 edition of SMPH Collaborate is focused on postdoc research.
|
|
|
|
SMPH Collaborate fosters connections among investigators, researchers, and learners through the sharing of research discoveries and building of community. Events, typically held twice a year, include two components. The first hour features four postdoc researchers presenting novel research and innovative technologies. A social hour offers time for discussion to promote collaboration and a postdoc poster session to engage with further research being done across the SMPH community. To register and learn more, visit the SMPH Intranet, SMPH Collaborate, webpage.
|
|
|
|
ML+X Nexus: Crowdsourced ML and AI Resources
Nexus is the ML+X community’s centralized hub for sharing machine learning (ML) and AI resources, designed to make the practice of ML/AI more connected, efficient, accessible, and reproducible. It is intended to host a wide range of resources (original or external), including educational materials, recorded talks across campus, model use guides, datasets, EDA case studies, and more. While practitioners can use Nexus to expand their expertise, educators and researchers will find it useful for sharing ML knowledge and procedures, reducing redundancy, and improving learning outcomes. Visit ML+X Nexus to begin expanding your knowledge, or visit our How to Contribute page to share a useful ML resource from your field.
|
|
|
|
|
Data Curation Specialist
Responsibilities
- Extract data from Web of Science database searches, curate a working database, manipulate data by removing irrelevant and duplicate information, and extract contact author information from academic articles.
- Create multiple working databases and curated lists of relevant journal articles with contact information for an ongoing social science project.
Requirements & Qualifications
- Proficiency in data programming and curation.
- Knowledge of programming languages (Python, Java, Javascript, C/C++, PHP) and data visualization (e.g., R) for data extraction, manipulation, curation, and transformation.
- Experience extracting, manipulating, curating, and transforming data with the Web of Science database preferred.
- Ability to work iteratively with social scientists on an ongoing research project.
|
|
|
|
Spring course announcement: Botany/Pl Path 563: Phylogenetic Analysis of Molecular Data
As spring enrollment beings, consider enrolling in Botany/Pl Path 563: Phylogenetic Analysis of Molecular Data. This class will focus on theory and practice of phylogenetic inference from DNA sequence data. Students will explain all the steps in the pipeline for phylogenetic inference and how different data and model choices affect the inference outcomes. Students will also plan and produce reproducible scripts with the analysis of real biological data. Students will justify the data and model choices and interpret and present their results.
|
|
|
|
Spring course announcement: ECE/MATH 888: Nonparametric Methods in Data Science
As spring enrollment beings, consider enrolling in ECE/MATH 888: Topics in Mathematical Data Science. Class meeting times are Monday and Wednesday, 11:00 a.m. - 12:15 p.m..
This course explores the theoretical foundations of nonparametric methods in data science. It covers classical techniques like reproducing kernel Hilbert spaces and splines, as well as modern approaches such as deep learning. If you're interested in the mathematical underpinnings of these methods and their interconnections, this course is for you! The course will feature lectures, readings, problem sets, and homework, addressing topics such as: applied functional analysis, approximation theory, smoothness spaces (e.g., Sobolev, Besov, Besov, Bounded Variation, Barron), nonparametric regression and classification, deep learning, statistical learning theory, and minimax lower bounds.
|
2025 National Big Data Health Science Student Case Competition
|
|
|
|
The University of South Carolina's Big Data Health Science Center is hosting their 6th annual Student Case Competition, a virtual event in which students compete in teams to solve an issue in healthcare using big data analytics for a chance to win cash prizes. The Big Data Health Science Student Case Competition is intended to provide enthusiastic teams of graduate and senior undergraduate students with the opportunity to apply their knowledge to analyzing big datasets in health care.
Each participating team will analyze the case and datasets to be released on Friday, February 7th, 2025, at noon EST. Through February 9th, team members will work together to present their methods, analyses, and results at the Big Data Health Science Center Case Competition virtually. A panel of industry and academic experts will judge the presentations based on each team’s use of full analytics tools/processes, from framing the problem to data use, model building, innovation, and communicating the solutions to decision-makers. Applications are open until January 29th, 2025. View the flyer for more details about this event.
|
|
|
|
Cybersecurity Monitoring Analyst
Apply by November 17 - This position assists in providing data security and regulatory compliance for the School of Medicine and Public Health at the University of Wisconsin-Madison. The incumbent will evaluate alerts from multiple data sources, analyze and follow up on alerts, and design and update reports and dashboards to inform priorities. Additionally, this position will contribute to the information security management program to support the school's mission of teaching and research, consistent with risk tolerance. Visit the job posting from the School of Medicine and Public Health to learn more and apply.
Responsibilities
- Assists with monitoring daily system operations using intrusion detection and prevention systems and assesses findings.
- Assists in providing cybersecurity training and security awareness programs.
- Reports application security concerns and escalates security incidents to senior staff and generates notification alerts for compromised assets.
- Conducts vulnerability-scanning analysis and tests security controls and implements security change requests.
Requirements & Qualifications
- Must hold, or be able to obtain within six months, an industry accepted cybersecurity certification (e.g. GIAC, ISACA, ISC2).
- Working knowledge of cybersecurity industry standards (e.g. HIPAA, NIST, ISO, CIS, etc.) and current IT risks.
- Experience using cybersecurity tools to monitor the IT environment.
- Proven ability to document process, procedures, and validation of remediation.
|
|
|
|
Software Engineer/Developer
Apply by November 20 - The Wisconsin Geological and Natural History Survey (WGNHS) is seeking a self-motivated, talented technology professional with a passion for software development. The Software Developer will build and improve modern tools for geoscience information access and management, such as our Publications Catalog and Data Viewer. Working with WGNHS staff and information consumers, this person will also identify needs and lead the effort to build new tools and applications. This position and new software development will be supported by IT professionals at WGNHS and UW-Madison who are responsible for servers and network infrastructure. Visit the job posting from the Wisconsin Geological and Natural History Survey to learn more and apply.
Responsibilities
- Maintain and extend an existing CKAN-based data management system, including the supporting Docker and SQL Server systems and content update workflows.
- Maintain application development and deployment infrastructure, including Git-based version control and Jenkins automation.
- Design, implement, and maintain a new web application for managing geoscience data, including data-entry, quality control, and reporting.
- Contribute to technical planning and oversight of WGNHS IT infrastructure as related to software development.
Requirements & Qualifications
- Two years of professional experience in web development or information technology.
- Experience in modern, responsive frontend web development, including proficiency in HTML, CSS, and Javascript.
- Experience with: Python, ASP.NET/C#, or Java programming languages; Apache, IIS, or nginx web servers; version control systems and continuous integration workflows; Database systems like SQL Server or PostgreSQL.
- Interest and motivation to build the Preferred skills as needed, especially Docker or similar technologies.
|
|
|
|
AI in Health Faculty Cluster Hire at University of Texas at Austin
Responsibilities
- Strengthen our leadership in research, development and implementation of health-outcomes oriented artificial intelligence (AI) capabilities.
- Complement the ongoing and future expansion of Dell Medical School’s and McCombs School of Business’s research and teaching mission.
- Produce scholarly research that enhances both schools as well as the focal disciplines of their scholarly works.
- Foster a collaborative, engaging, and dynamic environment comprised of scholars with a range of backgrounds, skills, and perspectives.
Requirements & Qualifications
- PhD or equivalent before the start date; those with ABD status will be considered at the application/interviewing stage.
- Doctoral training in one or more of the following areas will be preferred: information systems, computer science (in particular, machine learning and AI), (bio)statistics, economics, operations research, and decision sciences.
- Specific interest in cross-disciplinary research in clinical, health-care delivery, and/or public health settings.
- Can excel in teaching students from a range of backgrounds and experiences.
|
|
|
|
New Research Topics: NVIDIA Academic Grant Program Accepting Proposals
Apply by December 31 - NVIDIA’s Academic Grant Program is calling for research proposals to advance work in three new interest areas: Data Science, Graphics and Vision, and Edge AI. NVIDIA will continue accepting submissions for projects related to Simulation and Modeling and Gen AI and LLMs. For more information, please see the NVIDIA FAQs webpage.
New areas of interest:
- Data science submissions can include data processing, operational research and route optimization, and graph neural networks
- Graphics and vision submissions can include augmented and virtual reality, ray tracing, rendering, and AI for graphics
- Edge AI submissions can include robotics, autonomous vehicles, 5G/6G, smart spaces, and federated learning
|
|
|
|
|
DATA VISUALIZATION OF THE WEEK
|
|
|
|
"The New American Dream Should be a Townhouse."
This Washington Post article by Amanda Shendruk and Heather Long emphasizes that owning a home is a core part of the American Dream. Unfortunately, that ideal is increasingly out of reach because there are not enough affordable options. The typical household makes nearly $30,000 a year less than what is necessary to afford a median-price home.
There's a sensible way to address this shortfall, but it requires moving beyond the antiquated vision of a big house with a fenced yard in the suburbs. The new American Dream should be a townhouse: a two- or three-story home that shares walls with a neighbor. In big urban areas, the median sale price for a townhouse is substantially cheaper than a single-family home. Townhomes offer community amenities and are more energy efficient than single-family homes. However, zoning ordinances complicate new townhome construction.
|
|
|
|
Data Science Updates is a collaborative effort of the Data Science Institute and Data Science Hub.
Use our submission form to send us your news, events, opportunities and data visualizations for future issues.
|
|
|
|
|