This repository contains the code and implementation details for the research paper titled Neural Network Diffusion.
The paper explores novel paradigms in deep learning, specifically focusing on diffusion models for generating high-performing neural network parameters.
- Kai Wang1, Dongwen Tang1, Boya Zeng2, Yida Yin3, Zhaopan Xu1, Yukun Zhou, Zelin Zang1, Trevor Darrell3, Zhuang Liu*4, and Yang You*1(* equal advising)
- 1National University of Singapore, 2University of Pennsylvania, 3University of California, Berkeley, and 4Meta AI
Abstract: Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained ones. Our results encourage more exploration into the versatile use of diffusion models.
We support all versions of pytorch>=2.0.0
.
But we recommend to use python==3.11
and pytorch==2.5.1
, which we have fully tested.
conda create -n pdiff python=3.11
conda activate pdiff
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
git clone https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion.git --depth=1
cd Neural-Network-Diffusion
pip install -r requirements.txt
This will run three steps sequentially: preparing the dataset, training p-diff, and evaluating.
Then the results will be saved in the root directory and save checkpoint in ./checkpoint
cd workspace
bash run_all.sh main cifar100_resnet18 0
# bash run_all <category> <tag> <device>
Prepare checkpoints dataset.
cd ./dataset/main/cifar100_resnet18
rm performance.cache # optional
CUDA_VISIBLE_DEVICES=0 python train.py
CUDA_VISIBLE_DEVICES=0 python finetune.py
Train pdiff and generate models.
cd ../../../workspace
bash launch.sh main cifar100_resnet18 0
# bash launch <category> <tag> <device>
CUDA_VISIBLE_DEVICES=0 python generate.py main cifar100_resnet18
# CUDA_VISIBLE_DEVICES=<device> python generate.py <category> <tag>
Test original checkpoints and generated checkpoints and their similarity.
CUDA_VISIBLE_DEVICES=0 python evaluate.py main cifar100_resnet18
# CUDA_VISIBLE_DEVICES=<device> python evaluate.py <category> <tag>
All our <category>
and <tag>
can be found in ./dataset/<category>/<tag>
.
- Create a directory that mimics the dataset folder and contains three contents:
mkdir ./dataset/main/<tag>
cd ./dataset/main/<tag>
checkpoint
: A directory contains many .pth
files, which contain dictionaries of parameters.
generated
: An empty directory, where the generated model will be stored.
test.py
: A test script to test the checkpoints. It should be callable as follows:
CUDA_VISIBLE_DEVICES=0 python test.py ./checkpoint/checkpoint001.pth
# CUDA_VISIBLE_DEVICES=<device> python test.py <checkpoint_file>
- Register a dataset.
Add a class to the last line of the dataset file.
cd ../../../dataset
vim __init__.py
# This __init__.py is the dataset file.
# on line 392
+ class <Tag>(MainDataset): pass
- Create your launch script.
You can change other hyperparameters here.
cd ../workspace/main
cp cifar10_resnet18.py main_<tag>.py
vim main_<tag>.py
# on line 33
- from dataset import Cifar100_ResNet18 as Dataset
+ from dataset import <Tag> as Dataset
-
Train pdiff and generate models.
Following Section "Detail Usage". -
Test original ckpt and generated ckpt and their similarity.
Following Section "Detail Usage".
We thank Kaiming He, Dianbo Liu, Mingjia Shi, Zheng Zhu, Bo Zhao, Jiawei Liu, Yong Liu, Ziheng Qin, Zangwei Zheng, Yifan Zhang, Xiangyu Peng, Hongyan Chang, Zirui Zhu, Dave Zhenyu Chen, Ahmad Sajedi and George Cazenavette for valuable discussions and feedbacks.
If you found our work useful, please consider citing us.
@misc{wang2024neural,
title={Neural Network Diffusion},
author={Kai Wang and Dongwen Tang and Boya Zeng and Yida Yin and Zhaopan Xu and Yukun Zhou and Zelin Zang and Trevor Darrell and Zhuang Liu and Yang You},
year={2024},
eprint={2402.13144},
archivePrefix={arXiv},
primaryClass={cs.LG}
}