Keren Zhou
Keren Zhou
Home
Experience
Projects
Featured
Publications
Talks
Students
Tags
News
Light
Dark
Automatic
Deep Learning
Profiling and Debugging GPU-accelerated AI Applications
Presented our research on debugging and profiling of GPU-accelerated AI applications.
Oct 24, 2024 9:41 PM — 9:41 PM
Virtual
Keren Zhou
Project
Slides
Triton Update
Presented a talk about Triton and called for contributions to improving the language
Aug 13, 2024 10:56 PM — 10:56 PM
Lake Tahoe, California
Keren Zhou
Project
Slides
Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor
For an extended period, graphics processing units (GPUs) have stood as the exclusive choice for training deep neural network (DNN) …
Zhen Xie
,
Murali Emani
,
Xiaodong Yu
,
Dingwen Tao
,
Xin He
,
Pengfei Su
,
Keren Zhou
,
Venkatram Vishwanath
Cite
Project
URL
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation
This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement …
Jason Ansel
,
Edward Yang
,
Horace He
,
Natalia Gimelshein
,
Animesh Jain
,
Michael Voznesensky
,
Bin Bao
,
Peter Bell
,
David Berard
,
Evgeni Burovski
,
Geeta Chauhan
,
Anjali Chourdia
,
Will Constable
,
Alban Desmaison
,
Zachary DeVito
,
Elias Ellison
,
Will Feng
,
Jiong Gong
,
Michael Gschwind
,
Brian Hirsh
,
Sherlock Huang
,
Kshiteej Kalambarkar
,
Laurent Kirsch
,
Michael Lazos
,
Mario Lezcano
,
Yanbo Liang
,
Jason Liang
,
Yinghai Lu
,
C. K. Luk
,
Bert Maher
,
Yunjie Pan
,
Christian Puhrsch
,
Matthias Reso
,
Mark Saroufim
,
Marcos Yukio Siraichi
,
Helen Suk
,
Shunting Zhang
,
Michael Suo
,
Phil Tillet
,
Xu Zhao
,
Eikan Wang
,
Keren Zhou
,
Richard Zou
,
Xiaodong Wang
,
Ajit Mathews
,
William Wen
,
Gregory Chanan
,
Peng Wu
,
Soumith Chintala
Cite
Project
DOI
URL
Technical Review on PyTorch 2.0 and Triton
High-level overview of PyTorch 2.0 and Triton integration
Aug 7, 2023 10:03 PM — 10:03 PM
Virtual
Keren Zhou
Project
Slides
Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
Advancements in deep learning are often associated with increasing model sizes. Training and deploying large models require …
Aditya Desai
,
Keren Zhou
,
Anshumali Shrivastava
Cite
Project
URL
Towards Agile Development of Efficient Deep Learning Operators (Hardware Insights)
Presented a talk about Triton and requested feedback from Intel engineers
Jun 29, 2023 10:56 PM — 10:56 PM
Virtual
Keren Zhou
Project
Slides
Towards Agile Development of Efficient Deep Learning Operators (Call for Contributions)
Presented a talk about Triton and called for contributions to improving the language
Jun 19, 2023 10:56 PM — 10:56 PM
Lake Tahoe, California
Keren Zhou
Project
Slides
Paw-Net: Stacking Ensemble Deep Learning for Segmenting Scanning Electron Microscopy Images of Fine-grained Shale Samples
Segmentation of scanning electron microscopy (SEM) images is critical yet time-consuming for geological analyses, as it needs to …
Binqian Yin
,
Qinhong Hu
,
Yingying Zhu
,
Chen Zhao
,
Keren Zhou
Cite
DOI
URL
A Performance Analysis Framework for Exploiting GPU Microarchitectural Capability
Presented our work on static performance analysis for GPUs at ICS17
Jul 20, 2017 9:36 PM — 9:36 PM
Chicago, IL, USA
Keren Zhou
Slides
»
Cite
×