Photo taken by Yifu Ding in Yellowknife, Canada, December 2025 - Northern Lights
Research
PhD Candidate at Beihang University, specializing in neural network compression and model quantization.
Making deep learning more efficient and accessible for real-world deployment.
Research Focus
Efficiency optimization for large foundation models, especially LLMs, to reduce memory footprint, latency, and cost with minimal quality loss.
Low-bit quantization for LLMs, vision models, and multimodal systems, including practical PTQ and QAT oriented techniques.
Structured pruning and model slimming, focusing on hardware-friendly structures (for example channels, heads, FFN dimensions, and MoE experts).
Efficient AI systems and deployment, translating model-side compression into real speedups via hardware-aware design and optimization.
Research Interests
LLM efficiency as an end-to-end, budgeted problem across the full lifecycle, covering both training and inference, and both serving and edge scenarios.
Long-context and KV-cache-aware inference, targeting memory growth, bandwidth limits, and decoding efficiency.
Edge efficiency under strict power and memory limits, including heterogeneous collaboration and deployment feasibility.
Co-design across algorithms, systems, and hardware, with measurable evaluation of quality, latency, peak memory, bandwidth, and energy.