Papers and academic interests
Exploring paged attention mechanisms and prefix caching strategies that reduce memory overhead in large-scale inference deployments.
Investigating how compiler techniques like operator fusion and memory planning can accelerate neural network inference on commodity hardware.
Studying practical attack vectors and defense mechanisms for ML models deployed in security-critical environments.