NRP Engagement webinar
Typically on the fourth Monday of the month at 1 ET - 12 CT - 11 MT - 10 PT
To hear about future activities, please join the NRP engagement email list via this link.
Feel free to share this with anyone that may be interested. Calls will typically be on the fourth Monday of the month.
If you would like to present or have a suggestion for a session, please write to Dana Brunson
Monday, April 27, 2020 at 1 ET - 12 CT - 11 MT - 10 PT
Monday, March 23, 2020 at 1 ET - 12 CT - 11 MT - 10 PT
Inter-campus collaboration in the era of big data enabled by the Pacific Research Platform with UCSC, WUSTL, and UCSF - David Parks
UCSC is utilizing the PRP in collaboration with both the Washington University in St. Louis and UCSF. These collaborations involve cross-campus collaboration on datasets that are well into the multi-terabyte sizes. The PRP provides an enabling platform for these collaborations. The Henglab, lead by Keith Hengen, out of the University of Washington in St. Louis is producing terabyte and even petabytes scale data from novel longitudinal studies of neural activity in live mouse subjects. The PRP has enabled collaboration between UCSC and WUSTL where one campus provides data and the other provide the algorithms and compute, and both meet seamlessly on the PRP. UCSF is involved in recording organoid cell cultures of human glial neural cells, producing individual recordings that can reach into terabyte-scale, the PRP is enabling standard tools and analysis of data shared between UCSC and UCSF. At UCSC we are scaling up live organoid cell culture experimentation such that hundreds or even thousands of simultaneous experiments can be undertaken in parallel, generating massive datasets and the PRP enables scalable processing and streaming solutions. In this talk, we will introduce the science being performed at the scale of big data and how the PRP is enabling both the collaboration and the science in novel ways.
David Parks is a graduate student researcher at UCSC pursuing his Ph.D. in Bimolecular Engineering with a focus on deep learning technologies. He works under Professor Haussler in the Braingeeers lab, a multi-disciplinary lab scaling up cell culture experimentation and bringing it into the open-source ecosystem. David has over a decade of experience in Silicon Valley working in enterprise software and deploying big data systems on platforms such as Hadoop.
Monday, February 24, 2020 at 1 ET - 12 CT - 11 MT - 10 PT
IceCube Computing Grid - Benedikt Riedel
Abstract: We present how the IceCube computing efforts have evolved over the last 15 years from mostly home-grown solutions to a globally distributed computing infrastructure. It will highlight advantages and disadvantages of such an approach for an experiment with broad science goals ranging from astrophysics to particle physics to geophysics, and what we see in our future as we engage in more external collaborations in computing.
Bio: Benedikt Riedel is the Global Computing Coordinator for the IceCube Neutrino Observatory and Computing Manager for the Wisconsin IceCube Particle Astrophysics Center. Previously he worked on the Open Science Grid at University of Chicago. He received a Ph.D. in 2014 from University of Wisconsin-Madison working on supernova neutrino signals in the the IceCube Neutrino Observatory.
Monday, January 27, 2020 at 1 ET - 12 CT - 11 MT - 10 PT
Running a 380PFLOP32s GPU burst for Multi-Messenger Astrophysics with IceCube across all available GPUs in the Cloud
Igor Sfiligoi and Frank Würthwein
The IceCube Neutrino Observatory is the National Science Foundations (NSF)’s premier facility to detect neutrinos with energies above approximately 10 GeV and a pillar for NSF’s Multi-Messenger Astrophysics (MMA) program, one of NSF’s 10 Big Ideas. The detector is located at the geographic South Pole and is designed to detect interactions of neutrinos of astrophysical origin by instrumenting over a gigaton of polar ice with 5160 optical sensors. The sensors are buried between 1450 and 2450 meters below the surface of the South Pole ice sheet. To understand the impact of ice properties on the incoming neutrino detection, and origin, photon propagation simulations on GPUs are used. We report on a few hour GPU burst across Amazon Web Services, Microsoft Azure, and Google Cloud Platform that harvested all available for sale GPUs across the three cloud providers the weekend before SC19, reaching over 51k GPUs total and 380 PFLOP32s. GPU types span the full range of generations from the NVIDIA GRID K520 to the most modern NVIDIA T4 and V100. We report the scale and science performance achieved across all the various GPU types, as well as the science motivation to do so.
Igor Sfiligoi is Lead Scientific Software Developer and Researcher at UCSD/SDSC. He has been active in distributed computing for over 20 years. He has started in real-time systems, moved to local clusters, worked with leadership HPC systems, but spent most of his career in computing spanning continents. For about 10 years, he has been working on one such world-wide system, called glideinWMS, which he brought from the design table to being de-facto standard for many scientific communities. He has recently moved his attention in supporting users on top of Kubernetes clusters and Cloud resources. He has a M.S. in Computer Science equivalent from Universita degli studi di Udine, Italy. He has presented at many workshops and conferences over the years, with several published papers.
Frank Würthwein is the Executive Director of the Open Science Grid, a national cyberinfrastructure to advance the sharing of resources, software, and knowledge, and a physics professor at UC San Diego. He received his Ph.D. from Cornell in 1995. After holding appointments at Caltech and MIT, he joined the UC San Diego faculty in 2003. His research focuses on experimental particle physics and distributed high-throughput computing. His primary physics interests lie in searching for new phenomena at the high energy frontier with the CMS detector at the Large Hadron Collider. His topics of interest include, but are not limited to, the search for dark matter, supersymmetry, and electroweak symmetry breaking. As an experimentalist, he is interested in instrumentation and data analysis. In the last few years, this meant developing, deploying, and now operating a worldwide distributed computing system for high-throughput computing with large data volumes. In 2010, "large" data volumes are measured in Petabytes. By 2025, they are expected to grow to Exabytes.
Monday, October 28, 2019 at 1 ET - 12 CT - 11 MT - 10 PT
Running Genomics Workflows on the Pacific Research Platform’s Nautilus Kubernetes Cluster
Alex Feltus, Ph.D.
Abstract: Our core biological research mission is to discover causal alleles underlying complex trait expression in plants and animals. Active projects include (A) discovery of genetic subsystems driving legume-microbe symbiosis that can be engineered into other plants so they can make their own fertilizer, (B) elucidation of gene expression pattern shifts between normal and disordered brain tissue for better diagnosis of intellectual disability, and (C) detection of tumor specific alterations in kidney and other tumors of relevance to precision medicine. Our scientific instrument is the high performance/throughput computer where we run bioinformatic, machine learning, and network biology workflows on tens to thousands of terabytes of in-house and open source deep DNA sequencing datasets. In recent years, we have wrapped applications in containerized NextFlow workflows and now run data intensive experiments on Kubernetes (K8s) clusters including the PRP Nautilus cluster (we have added a node at Clemson) and the Google Cloud Platform. In this webinar we will (A) present results from a large tumor biomarker screen generated with the Nautilus cluster, (B) describe broadly useful open source genomics workflows (GEMmaker, KINC, and Gene Oracle) with Nautilus-specific usage documentation, (C) outline a grassroots strategy to add nodes to the Nautilus cluster and train people how to use that super-awesome system, and (D) discuss a business model where one can build an elastic K8s cluster for a small virtual organization that can be dynamically linked to larger national compute fabrics platforms via aggregation or federation.
Dr. F. Alex Feltus received a B.Sc. in Biochemistry from Auburn University in 1992, served two years in the Peace Corps, and then completed advanced training in biomedical sciences at Vanderbilt and Emory. Since 2002, he has performed research in bioinformatics, high-performance computing, cyberinfrastructure, network biology, genome assembly, systems genetics, paleogenomics, and bioenergy feedstock genetics. Currently, Feltus is an Professor in Clemson University's Dept. of Genetics & Biochemistry, CEO of Allele Systems LLC, Core Faculty in the CU-MUSC Biomedical Data Science and Informatics (BDSI) program, member of the Center for Human Genetics, and serves on the Internet2 Board of Trustees as well as various "Advance Research Computing" engagement workgroups. Feltus has published numerous scientific articles in peer-reviewed journals, teaches undergrad and PhD students in bioinformatics, biochemistry, and genetics. At present, he is funded by multiple NSF grants and is engaged in tethering together extremely smart people from diverse technical backgrounds in an effort to propel genomics research from the Excel-scale towards the Exascale.