Research — Shubham Ojha

Efficient Memory Management for LLM Serving

Exploring paged attention mechanisms and prefix caching strategies that reduce memory overhead in large-scale inference deployments.

Investigating how compiler techniques like operator fusion and memory planning can accelerate neural network inference on commodity hardware.

Studying practical attack vectors and defense mechanisms for ML models deployed in security-critical environments.