날로 거대해지는 AI 모델로 인한 비용 상승을 막기 위한 모든 기술에 대해 공부를 하는 스터디입니다.
- 권세중 ([email protected])
- 김정훈 ([email protected])
- 2020년 이후의 최신 논문을 다루는 것을 원칙으로 합니다. 이왕이면 최신 논문을 공부한다는 마음으로 도전하시는 것을 권합니다. Neurips 2022에서 경량화를 키워드로 스크리닝한 리스트를 드릴테니 고르실수도 있습니다.
- 워낙 다루는 기술의 폭이 넓다보니 백그라운드 없이 논문을 이해하기 힘들 수 있습니다. 기술의 의미를 설명하는 것에 20%-50% 이상 할애할 필요가 있습니다.
- 3번 연속으로 참여가 없으신 멤버는 스터디 활동을 원하지 않으시는 것으로 간주할 수 있습니다.
- 발표 자료와 녹화본의 공개는 각 발표자의 의사에 따릅니다. (발표자료 공개를 원치 않는 경우, 요약 리뷰글을 공개해주셔야합니다.)
현재는 2022년까지 설정하였으나 발표자의 수가 늘어나면서 더 길어질수도 있습니다. 주제가 결정된 발표자는 Issue를 생성하고 아래 표에 Issue 번호와 함께 발표 주제를 적어 넣습니다. 추가 디스커션은 Issue를 통해 이뤄지도록 합니다.
When | Who | What | Links | Issue # | Etc. |
---|---|---|---|---|---|
10/11 | 권세중 | Introduction to Efficient AI | - | #1 | - |
10/18 | 이승현 | Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning | arxiv.org/abs/2203.02651 | #4 | - |
10/25 | |||||
11/1 | |||||
11/8 | 이경준 | Hiddenite: 4K-PE Hidden Network Inference 4D-Tensor Engine Exploiting On-Chip Model Construction Achieving 34.8-to-16.0TOPS/W for CIFAR-100 and ImageNet | ieeexplore.ieee.org/document/9731668 | #5 | - |
11/15 | 이제민 | PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization | arxiv.org/abs/2111.12293 | #3 | - |
11/22 | 김형준 | EfficientViT | arxiv.org/abs/2205.14756 | - | - |
11/29 | |||||
12/6 | 박준규 | It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher | arxiv.org/abs/2203.17008 | - | - |
12/13 | |||||
12/20 | 박상수 | Reconfigurable arary for flexible GEMM accelerator | arxiv.org/abs/2101.04799 | #2 | - |
12/27 |
아직 공개가 되지 않은 논문도 많긴하지만, 차차 공개될 것으로 예상합니다.
Title | Link | Keyword |
---|---|---|
FP8 Quantization: The Power of the Exponent | https://arxiv.org/abs/2208.09225 | Quantization |
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers | https://arxiv.org/abs/2206.01861 | Quantization |
Extreme Compression for Pre-trained Transformers Made Simple and Efficient | https://arxiv.org/abs/2206.01859 | Quantization |
Towards Efficient Post-training Quantization of Pre-trained Language Models | https://arxiv.org/abs/2206.01859 | Quantization |
Leveraging Inter-Layer Dependency for Post-Training Quantization | https://nips.cc/Conferences/2022/Schedule?showEvent=54389 | Quantization |
Entropy-Driven Mixed-Precision Quantization for Deep Network Design on IoT Devices | https://neurips.cc/Conferences/2022/ScheduleMultitrack?event=54104 | Quantization |
Redistribution of Weights and Activations for AdderNet Quantization | https://nips.cc/Conferences/2022/Schedule?showEvent=54812 | Quantization |
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning | https://arxiv.org/abs/2208.11580 | Quantization |
ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences | https://nips.cc/Conferences/2022/Schedule?showEvent=55162 | Quantization |
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer | https://arxiv.org/abs/2201.07703 | Quantization |
Structural Pruning via Latency-Saliency Knapsack | https://nips.cc/Conferences/2022/Schedule?showEvent=52841 | Pruning |
Advancing Model Pruning via Bi-level Optimization | https://neurips.cc/Conferences/2022/ScheduleMultitrack?event=55360 | Pruning |
Pruning has a disparate impact on model accuracy | https://arxiv.org/abs/2205.13574 | Pruning |
Data-Efficient Structured Pruning via Submodular Optimization | https://arxiv.org/abs/2203.04940 | Pruning |
SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization | https://nips.cc/Conferences/2022/Schedule?showEvent=55067 | Pruning |
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm | https://arxiv.org/abs/2206.02976 | Pruning |
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions | https://arxiv.org/abs/2209.08554 | Pruning |
Robust Binary Models by Pruning Randomly-initialized Networks | https://arxiv.org/abs/2202.01341 | Pruning |
On Neural Network Pruning's Effect on Generalization | https://nips.cc/Conferences/2022/Schedule?showEvent=54812 | Pruning |
A Fast Post-Training Pruning Framework for Transformers | https://arxiv.org/abs/2204.09656 | Pruning |
VTC-LFC: Vision Transformer Compression with Low-Frequency Components | https://neurips.cc/Conferences/2022/ScheduleMultitrack?event=54752 | Compression |
Lossless Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach | https://nips.cc/Conferences/2022/Schedule?showEvent=55429; https://arxiv.org/abs/2001.00218 | Compression |
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees | https://arxiv.org/abs/2206.01299 | Compression |
Deep Compression of Pre-trained Transformer Models | https://nips.cc/Conferences/2022/Schedule?showEvent=53013 | Compression |