Research
Photo taken by Yifu Ding in Yellowknife, Canada, December 2025 - Northern Lights

Research

PhD Candidate at Beihang University, specializing in neural network compression and model quantization. Making deep learning more efficient and accessible for real-world deployment.

Research Focus

  • Efficiency optimization for large foundation models, especially LLMs, to reduce memory footprint, latency, and cost with minimal quality loss.
  • Low-bit quantization for LLMs, vision models, and multimodal systems, including practical PTQ and QAT oriented techniques.
  • Structured pruning and model slimming, focusing on hardware-friendly structures (for example channels, heads, FFN dimensions, and MoE experts).
  • Efficient AI systems and deployment, translating model-side compression into real speedups via hardware-aware design and optimization.

Research Interests

  • LLM efficiency as an end-to-end, budgeted problem across the full lifecycle, covering both training and inference, and both serving and edge scenarios.
  • Long-context and KV-cache-aware inference, targeting memory growth, bandwidth limits, and decoding efficiency.
  • Edge efficiency under strict power and memory limits, including heterogeneous collaboration and deployment feasibility.
  • Co-design across algorithms, systems, and hardware, with measurable evaluation of quality, latency, peak memory, bandwidth, and energy.
Citations
1343
Papers
32
H-index
17
I10-index
23

All Publications

Filter Rules
Sort by Year
View publications on Google Scholar β†’