
Dissertation Defense
Algorithm-Hardware Co-design For Efficiency Optimization Of Machine Learning Workloads
This event is free and open to the publicAdd to Google Calendar

Hybrid Event: 3941 BBB / Zoom
Abstract: As computational demands continue to grow, optimizing performance and efficiency has become a critical challenge, particularly in large-scale machine learning (ML) and graph mining. Traditional approaches struggle to meet the increasing requirements for scalability, power efficiency, and processing speed, especially with the rising complexity of Convolutional Neural Networks (CNNs), Large Language Models (LLMs), and large-scale graphs.
This thesis explores techniques to improve the efficiency of deep learning and graph mining by leveraging approximate computing methods such as quantization, early termination, and sampling. While approximation strategies improve efficiency, they introduce challenges in ensuring end-to-end efficiency and managing accuracy and quality loss. To address these challenges, this thesis proposes integrated algorithm-hardware co-design approaches that balance computational savings with system-level performance across different ML workloads.
For CNN inference, this thesis presents BitSET, a software-hardware co-design that employs a prediction-based bit-level early termination technique to reduce energy consumption with minimal accuracy degradation. For LLM training, this thesis presents a fine-grained mixed-precision training framework by dynamically selecting layer-wise precision.It introduces loss divergence and weight divergence as proxy quality metrics and formulates precision selection as an Integer Linear Programming (ILP) problem, achieving a balance between computational efficiency and model quality. For graph mining, this thesis presents TIMEST, which tackles the challenge of efficiently counting small temporal motifs in large graphs. TIMEST leverages a spanning tree sampler, relaxed constraints, and a sliding window technique to construct sampling weights, enabling fast and scalable motif estimation in large temporal graphs.