avatar

Zesen Liu

Master's Student
Nanjing University
zesenliu (at) smail.nju.edu.cn



About Me

I am a Master’s student in the SPAR group at the Institute of Computer Software, Nanjing University. I am currently working under the supervision of Associate Professor Yanyan Jiang on reasoning about software’s performance characteristics in complex environments (e.g., mobile systems). My research interests also include analyzing and building high-performance solutions for heterogeneous platforms, including FPGAs and GPUs.
Trying to stay ready to conceive (and implement) crazy ideas in the performance aspect!

News

Education

Nanjing University, Nanjing, China
Sep 2023 – Jul 2026 (expected)
M.S. in Computer Science & Technology — Institute of Computer Software
Advisor: Yanyan Jiang
South China University of Technology, Guangzhou, China
Sep 2019 – Jun 2023
B.Eng. in Computer Science & Technology — Embedded & Intelligent Robotics Lab
Advisor: Sheng Bi

Selected Projects

  1. FP8 GEMM, MoE, and MLA kernels tuned for AMD Instinct MI300X. The MoE kernel combining expert-centric token packing, hipBLASlt Grouped GEMM, and low-overhead reverse permutation to maximize throughput. The kernels deliver 8x speedups over AMD’s PyTorch references across challenge workloads.

  2. Designed a Cholesky factorization for sparse symmetric positive-definite matrices on Xilinx Kria using HLS. Prioritized low per-factorization latency for typical robot obstacle-avoidance navigation, optimized for small matrices with a fully pipelined load, compute, and store path, delivering up to 148% performance improvement.

  3. DomCast: Towards Remote UI Rendering on OpenHarmony
    DomCast captures UI update events and streams compact UI/state patches to the receiver, which performs layout and rendering, avoiding video encode/decode and achieving low-latency, low-bandwidth mirroring (up to 80x compression on UI deltas). Auto falling back to region-aware JPEG tile casting for highly dynamic apps like games.

  4. An LLVM IR-compatible C99 subset compiler targeting ARM Thumb with an end-to-end pipeline (AST → IR → ASM). Implements a broad suite of IR-level optimizations—CFG simplification, inlining, tail-recursion elimination, loop unrolling and strength reduction, GVN/GCM, and aggressive DCE. Delivers strong performance while keeping its simplicity.

Hobbies


Powered by Jekyll and Minimal Light theme.