Keren Zhou
Keren Zhou
Home
Experience
Projects
Featured
Publications
Talks
Students
Tags
News
Light
Dark
Automatic
GPU
Triton Update
Presented a talk about Triton and called for contributions to improving the language
Aug 13, 2024 10:56 PM — 10:56 PM
Lake Tahoe, California
Keren Zhou
Project
Slides
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
Presented the FASTEN work for accelerating segmented matrix multiplication
Jun 1, 2024 9:41 PM — 9:41 PM
Virtual
Keren Zhou
Project
Slides
Update on Triton's Interpreter
Review Triton’s Interpreter’s progress and future plans
Apr 3, 2024 10:03 PM — 10:03 PM
Virtual
Keren Zhou
Project
Slides
Proton: A Profiler for Triton
Went through Proton’s design overview
Feb 20, 2024 10:03 PM — 10:03 PM
Virtual
Keren Zhou
Project
Slides
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation
This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement …
Jason Ansel
,
Edward Yang
,
Horace He
,
Natalia Gimelshein
,
Animesh Jain
,
Michael Voznesensky
,
Bin Bao
,
Peter Bell
,
David Berard
,
Evgeni Burovski
,
Geeta Chauhan
,
Anjali Chourdia
,
Will Constable
,
Alban Desmaison
,
Zachary DeVito
,
Elias Ellison
,
Will Feng
,
Jiong Gong
,
Michael Gschwind
,
Brian Hirsh
,
Sherlock Huang
,
Kshiteej Kalambarkar
,
Laurent Kirsch
,
Michael Lazos
,
Mario Lezcano
,
Yanbo Liang
,
Jason Liang
,
Yinghai Lu
,
C. K. Luk
,
Bert Maher
,
Yunjie Pan
,
Christian Puhrsch
,
Matthias Reso
,
Mark Saroufim
,
Marcos Yukio Siraichi
,
Helen Suk
,
Shunting Zhang
,
Michael Suo
,
Phil Tillet
,
Xu Zhao
,
Eikan Wang
,
Keren Zhou
,
Richard Zou
,
Xiaodong Wang
,
Ajit Mathews
,
William Wen
,
Gregory Chanan
,
Peng Wu
,
Soumith Chintala
Cite
Project
DOI
URL
Technical Review on PyTorch 2.0 and Triton
High-level overview of PyTorch 2.0 and Triton integration
Aug 7, 2023 10:03 PM — 10:03 PM
Virtual
Keren Zhou
Project
Slides
GPA
GPA is a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program’s structure and the GPU to match inefficiency patterns with suggestions for optimization. GPA estimates each optimization’s speedup based on a PC sampling-based performance model.
Code
HPCToolkit
Our tool provides a profile view and a trace view for GPU-accelerated applications. The profile view identifies where GPU APIs are invoked in CPU calling context, approximates calling context for GPU execution, and analyzes instruction mix for GPU kernels. The tool traces CPU and GPU activities for a large number of processes and threads with minimal overhead.
Code
DOC
Triton
Triton is a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment for expressing tensor math workloads that offers high flexibility, developer productivity and end to end performance.
Code
DOC
GVProf
We implemented GVProf, the first value profiler that locates value redundancy problems in applications running on GPU-based clusters. Our experiments show that GVProf incurs acceptable overhead and scales to large executions. GVProf provides useful insights to guide performance optimization. Under the guidance of GVProf, we optimized several HPC and machine learning workloads, obtaining speedups up to 1.93x.
Code
DOC
»
Cite
×