Seungwoo Son

Machine Learning Engineer @ Samsung Research

I am a Machine Learning Engineer at Samsung Research (AI System Team). Previously, I worked at Google (CoreML Team) as a Student Researcher Intern.

My research interests lie in Model Compression and On-Device Personalization. Recently, I have been exploring ways to internalize retrieval augmented generation (e.g., GraphRAG) on edge devices for personalized AI. I have experience developing quantization methods that significantly reduce model size and latency while maintaining accuracy.

Work Experience

  • Machine Learning Engineer, Samsung Research (AI System Team) Oct. 2024 - Present
    Developing quantized models for Galaxy edge devices. Reduced model size by 75% and latency by 30%.
  • Student Researcher Intern, Google (CoreML Team) Aug. 2023 - Jul. 2024
    Implemented advanced quantization methods for LLMs, achieving 50% improvement in zero-shot accuracy.
  • Graduate Research Assistant, POSTECH Mar. 2022 - Jul. 2024
    Researched neural network compression techniques (KD, Quantization).

Education

  • Pohang University of Science and Technology (POSTECH) Mar. 2022 - Aug. 2024
    M.S. in Electrical Engineering
  • Inha University Mar. 2016 - Feb. 2022
    B.S. in Electronic Engineering (Total GPA: 4.33/4.5, Major GPA: 4.4/4.5, Summa Cum Laude)

Publications

TurboBoA: Faster and Exact Attention Aware Quantization without Backpropagation
Junhan Kim, Yeo Jeong Park, Seungwoo Son, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon  |  ICLR 2026 (Submitted)
Proposed a backpropagation free quantization algorithm that achieves 4x speedup over state of the art methods by jointly quantizing multiple out channels and correcting propagated distortions, delivering superior accuracy in low bit regimes.
Work done at Samsung Research
On the Importance of a Multiscale Calibration for Quantization
Seungwoo Son, Junhan Kim, Ingyu Seong, Hyemi Jang, Yongkweon Jeon  |  ICASSP 2026 (Submitted)
Introduced MaCa, a length aware calibration method that incorporates multiscale sequence length information into Hessian estimation to improve quantization accuracy for variable length inputs in LLMs.
Work done at Samsung Research
Two Stage Grid Optimization for Groupwise Quantization of LLMs
Junhan Kim, Seungwoo Son, Jeewook Kim, Gukryeol Lee, Yongkweon Jeon  |  ICASSP 2026 (Submitted)
Developed a two stage optimization strategy for groupwise quantization that initializes group scales based on input statistics and refines them via closed form coordinate descent, minimizing layerwise reconstruction loss efficiently.
Work done at Samsung Research
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee  |  EMNLP 2024
Revealed that prepending attention sink tokens mitigates activation outliers in LLMs by absorbing massive attention scores, enabling effective activation quantization.
The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
Seungwoo Son, Jegwang Ryu, Namhoon Lee, Jaeho Lee  |  ECCV 2024, ICLR 2023 Workshop on Sparsity in Neural Networks
Developed a cost efficient distillation framework for Vision Transformers by masking input tokens to the teacher.
DSP: Distill The Knowledge Only By A Subset of Patches
Seungwoo Son, Jaeho Lee  |  IPIU 2023 (Oral)
Investigated methodology to efficiently extract model knowledge using only a subset of image patches.

Invited Talks

  • Naver-Intel Joint Lab Workshop: Lightweighting for Hyperscale AI, Jun. 2024
    Conference Info

Academic Services

  • Reviewer: ACL 2026, EACL 2026, ACL 2025, EMNLP 2025

Honors & Awards

  • Best M.S. Dissertation Award, POSTECH (Feb. 2025)
  • IPIU Best Paper Award, Korea Computer Vision Society (Feb. 2023)
  • National Science and Engineering Undergraduate Scholarship, Ministry of Science and ICT (Mar. 2020)

Technical Skills

  • Languages: C/C++, Python
  • Frameworks & Tools: PyTorch, Jax, Git, Overleaf