February 9, 2022


Data Science News

Learn About Epic Cosmos at the Research Bazaar on February 16

Cosmos brings together health care data from across the community, pulling together over 100 million longitudinal patient records from over 100 contributing organizations across the United States. To build Cosmos, Epic had to find solutions to a number of technical problems and governance needs. These solutions had to work at scale and across disparate pre-existing customer systems with minimal configuration. The data has to be reviewed to make sure it is high quality, consistent, timely, and protected. Epic needs tools that enable researchers to dive into the data, and turn it into knowledge. John Komenda, from Epic, will talk about how Cosmos works, the challenges around such a novel data set, and the promise of evidence-based medicine.

This discussion about Epic Cosmos will happen on February 16, from 1-2:30 p.m. CST on Zoom. To attend, register for free here.

Upcoming Trainings & Workshops

February 10th, 1:00 p.m.,This workshop will teach you how to create and modify data visualizations using ggplot2, a popular plotting package in R. Emphasis is placed on using plots to understand distributions of different numbers and types of variables.This workshop assumes you understand the fundamentals of working with data in R, such as you can get from our Data Wrangling in R workshop or online curriculum. Register here.

February 17th and 24th, 9:00 a.m., This workshop is a refresher in specifying, interpreting, and diagnosing problems with linear models using R. Register here.

February 17th, 1:00 p.m., This workshop will teach you how to use ggplot2 to check regression assumptions, and ggeffects to produce plots of predicted values and margins. This workshop assumes you understand the basics of using the ggplot2 R package as taught in Data Visualization in R: ggplot2 Basics, the fundamentals of working with data in R as taught in Data Wrangling in R, and how to fit models in R as taught in R Workshop: Regression Models & Diagnostics. Register here.

Upcoming Seminars & Events

SILO Seminar Series
This semester there will be hybrid seminars, with both in-person and virtual (Zoom) participation. The number of in-person attendees will be limited to 20 people (we will possibly increase this number later in the semester). In person will be notified the Tuesday around 10 p.m. before each SILO.
  • February 16th, 12:30 p.m., TBA with Victor Zavala
  • February 23rd, 12:30 p.m., TBA with Mengdi Wang
February 15th, 12:00 p.m., FAIR: findable, accessible, interoperable, reusable, is a very high level framework for communicating metadata quality for making data usable for somebody who was not directly involved in the sampling. This webinar will explore how the use of the EML schema for metadata, a congruence checker in the submission process, and regular best practice exchanges within the LTER/EDI community assure a high degree of metadata quality. We’ll introduce the specific criteria and discuss how they may be applied in EDI or for a particular project or single data set. Register here.

RStudio Blog Events
  • February 10th, 11:00 a.m., The intent is to foster a space where we can chat about popular data science topics each week. No hard agenda or predetermined talk tracks: just an expert or two willing to share their perspectives on what’s really going on in data science at an organizational level. Zoom link available here.
  • February 16th, 11:00 a.m., The increasing volume, variety, and velocity of sports data provides both great opportunities and challenges for data scientists working in sports. Using R with Google Cloud data science tools like BigQuery can help practitioners scale their analysis and impact in this "new era" of sports analytics. This presentation will include a demonstration of using R and Google Cloud together with an NCAA basketball data example, as well as a discussion of the application of such metrics and tools in the sports media and technology industries. Zoom link available here.
March 14th-19th, OSG has an annual All-Hands Meeting for users, resource providers, staff, and other people who are interested in what we do and how we do it. The meeting is usually hosted by an OSG resource provider and so the location varies each year. Topics for this year's meeting include Democratizing Access to Cyberinfrastructure, Integrating a Diversity of Capacity Resources into dHTC Pools, and much more. For more information and to register, visit this site.

The Machine Learning Community of Practice brings together machine learning (ML) practitioners across a variety of departments and disciplines, and provides a space for both novice and experienced ML practitioners to advance their ML skillsets, share knowledge and resources, and form collaborations. The ML Community is looking for UW-Madison students, faculty, and staff to help organize this community. The time commitment is 1-2 hours per month, and is a great opportunity to make this community an excellent resource for all ML practitioners on campus. If you're interested, please fill out this form. Be in touch with with any questions.

Student Opportunities

Machine Learning Internship, John Deere
The internship position requires background in Optimization, Deep Learning and Machine Learning. Potential projects are in the domain of emerging control schemes using Reinforcement Learning, developing virtual sensors for state estimation, Deep Learning/Machine Learning/Physics based models for Prognostics and Deep Learning for Radar based applications. The Intern will also develop design documentation for each design and will also work with other engineering disciplines and other project team members as directed during feature/product development. The Intern will develop and present technical and business presentations to leadership and engineering personnel on the work performed during internship. Interested students should contact Lav Thyagarajan ( for more information.

This internship provides seven highly motivated undergraduate students an opportunity to apply their coding abilities in support of cutting-edge atmospheric research. From year to year projects may include a combination of designing, monitoring, and debugging data ingest archival systems; creating and maintaining data analysis tools; and calibration, processing and visualization of atmospheric data collected from ground, aircraft or satellite based instrumentation in collaboration with partners and funders that include NOAA, NASA, NSF and DOE. Apply by February 25.

The University of Washington eScience Institute will open applications on January 10, 2022 for Student Fellows to participate in the 2022 Data Science for Social Good (DSSG) summer program. The program brings together data scientists and domain researchers to work on focused, collaborative projects for societal benefit. Up to sixteen DSSG Student Fellows will be selected to work with academic researchers, data scientists, and public stakeholder groups on data-intensive research projects. This year’s projects will leverage data science approaches to address societal challenges in areas such as human services, public policy, health and safety, environmental impacts, transportation, accessibility, social justice, and urban informatics. Apply by February 14.

February 15th, 10:00 a.m. - 5:00 p.m., All UW students with computer, data or information science skills are invited to attend the CDIS Spring 2022 Career Fair. This year’s participating companies range from rapidly-growing start-ups to industry-leading enterprises, firms whose primary products are software to those using technology to support infrastructure. Participating companies include local businesses, national industry leaders, and government agencies. Some of the notable companies recruiting this year include Epic, Milliman, DataChat, Sentry, Esker, Gainwell Technologies, Uline and more! The career fair will be hosted in person at Union South. For more information, visit this site.

February 21st, 5:30 p.m., At this 90 minute virtual program, connect with alumni speaking about their educational and career journey as it relates to being a woman in the field of technology. After introductions and opening questions, we will open the floor for Q&A from students. Register here.

Professional Opportunities


On Campus

Computational Research Assistant, The Mandel Lab
The Mandel Lab is seeking someone for a pangenomics project. This would be a primarily computational position that would examine patterns of bacterial colonization factors across diverse microbes; and/or conduct a pangenomic analysis of 100+ Vibrio fischeri strains. Requirements are proficiency with the command line, basic understanding of bacterial genomics, and some experience or the ability to learn: Python and/or R, Jupyter notebooks, Anvio. Interested students should contact Mark Mandel for more information (

The individual will use machine learning and computational methods for representing and analyzing data from the electronic health record for clinical applications to improve health outcomes in hospitalized patients. The diversity of subject matter will require a candidate with a background in machine learning, statistics, health information technology, and data management to derive and validate clinical decision support tools for the electronic health record. We are looking for a candidate who could fill a Deep Learning subspecialty in our lab. The Research Associate will be expected to lead manuscripts and participate in multicenter collaborations under close mentorship. For more information, access this document.

Join the team at the American Family Insurance Data Science Institute (DSI)! The Administrative Associate Director will assist with the planning and direction for administrative operations to advance DSI’s program goals and objectives. This position will provide leadership for research administration, human resources, campus and external relations, and financial management functions of DSI. The administrator provides day-to-day responsibility including, but not limited to, DSI’s operating budget, grants management, recruitment, human resources, and payroll administration. Apply by February 27.

Off Campus

This position will support research data and digital scholarship by developing and maintaining applications and workflows to support the lifecycle of digital scholarship materials, research data, and researcher identities. This position will help PUL provide enhanced support for management of scholarship and research materials into our existing systems, and also play an integral role in sustainably expanding PUL’s digital scholarship support strategies.  This position will work with stakeholders to address existing and emerging needs for digital scholarship and the data lifecycle, developing, supporting, and providing consultation to faculty for research and digital scholarship-focused services. For more information and to apply, visit this site.

The Financial & Economic Data Analysis Librarian guides faculty, students, and staff with data analysis software; statistical and qualitative data analysis techniques; and data collection, management, and visualization. This position conceptualizes, develops, deploys, and assesses frontline data support services tailored to members of the Stern School of Business and the broader business, economics, and entrepreneurship research community at NYU. The successful candidate may serve as the primary subject liaison to one or more departments at NYU, depending upon background and interest. For more information and to apply, visit this site.

STScI is seeking a Science Metrics Data Analyst II to join the Library team in the Science Mission Office. Under the general direction of the Branch Manager and with support from the Science Metrics Lead, the Science Metrics Data Analyst contributes to the metrics program by gathering key science metric information and making those metrics available through database tools and reports. For more information and to apply, visit this site.

The Advanced Cyberinfrastructure for Education and Research (ACER)  seeks a qualified Data Scientist to join our impactful team and advance scientific discovery. If you are looking for an inclusive environment that provides challenges, learning opportunities, and work-life balance, then this opportunity is for you to consider. ACER also supports their staff's pursuit of further education through UIC's significant tuition waiver benefit. They also encourages staff exploration of emerging technologies such as deep learning, blockchain, IoT, and quantum computing. For more information, visit this site.
