Research

Papers and academic interests

Efficient Memory Management for LLM Serving

Exploring paged attention mechanisms and prefix caching strategies that reduce memory overhead in large-scale inference deployments.

Compiler Optimizations for ML Workloads

Investigating how compiler techniques like operator fusion and memory planning can accelerate neural network inference on commodity hardware.

Adversarial Robustness in Production Systems

Studying practical attack vectors and defense mechanisms for ML models deployed in security-critical environments.