☪️CVPR2022论文下载:添加微信: nvshenj125, 备注 CVPR 2022 即可获取全部论文pdf
☪️福利 注册即可领取 200 块计算资源 : https://www.bkunyun.com/wap/console?source=aistudy 使用说明
欢迎关注公众号:AI算法与图像处理
🌟 CVPR 2023 持续更新最新论文/paper和相应的开源代码/code!
B站demo:https://space.bilibili.com/288489574
✋ 注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目
往年顶会论文汇总:
CVPR 2023 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
目录(右侧点击可折叠)
- Backbone
- 数据集/Dataset
- Diffusion Model
- NAS
- NeRF
- Knowledge Distillation
- 多模态 / Multimodal
- 对比学习/Contrastive Learning
- 图神经网络 / Graph Neural Networks
- 胶囊网络 / Capsule Network
- 图像分类 / Image Classification
- 目标检测/Object Detection
- 目标跟踪/Object Tracking
- 轨迹预测/Trajectory Prediction
- 语义分割/Segmentation
- 弱监督语义分割/Weakly Supervised Semantic Segmentation
- 医学图像分割
- 视频目标分割/Video Object Segmentation
- 交互式视频目标分割/Interactive Video Object Segmentation
- Visual Transformer
- 深度估计/Depth Estimation
- 人脸识别/Face Recognition
- 人脸检测/Face Detection
- 人脸活体检测/Face Anti-Spoofing
- 人脸年龄估计/Age Estimation
- 人脸表情识别/Facial Expression Recognition
- 人脸属性识别/Facial Attribute Recognition
- 人脸编辑/Facial Editing
- 人脸重建/Face Reconstruction
- Talking Face
- 换脸/Face Swap
- 人体姿态估计/Human Pose Estimation
- 6D位姿估计 /6D Pose Estimation
- 手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)
- 视频动作检测/Video Action Detection
- 手语翻译/Sign Language Translation
- 3D人体重建
- 行人重识别/Person Re-identification
- 行人搜索/Person Search
- 人群计数 / Crowd Counting
- GAN
- 彩妆迁移 / Color-Pattern Makeup Transfer
- 字体生成 / Font Generation
- 场景文本检测、识别/Scene Text Detection/Recognition
- 图像、视频检索 / Image Retrieval/Video retrieval
- Image Animation
- 抠图/Image Matting
- 超分辨率/Super Resolution
- 图像复原/Image Restoration
- 图像补全/Image Inpainting
- 图像去噪/Image Denoising
- 图像编辑/Image Editing
- 图像拼接/Image stitching
- 图像匹配/Image Matching
- 图像融合/Image Blending
- 图像去雾/Image Dehazing
- 图像压缩/Image Compression
- 反光去除/Reflection Removal
- 车道线检测/Lane Detection
- 自动驾驶 / Autonomous Driving
- 流体重建/Fluid Reconstruction
- 场景重建 / Scene Reconstruction
- 3D Reconstruction
- 视频插帧/Frame Interpolation
- 视频超分 / Video Super-Resolution
- 3D点云/3D point cloud
- 标签噪声 / Label-Noise
- 对抗样本/Adversarial Examples
- Anomaly Detection
- 其他/Other
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
- 论文/Paper: http://arxiv.org/pdf/2303.01943
- 代码/Code: None
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- 论文/Paper: http://arxiv.org/pdf/2303.02760
- 代码/Code: None
Unifying Layout Generation with a Decoupled Diffusion Model
- 论文/Paper: http://arxiv.org/pdf/2303.05049
- 代码/Code: None
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
- 论文/Paper: http://arxiv.org/pdf/2303.06885
- 代码/Code: None
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
- 论文/Paper: http://arxiv.org/pdf/2303.08137
- 代码/Code: https://github.com/CyberAgentAILab/layout-dm
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2303.07938
- 代码/Code: None
Decomposed Diffusion Models for High-Quality Video Generation
- 论文/Paper: http://arxiv.org/pdf/2303.08320
- 代码/Code: None
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
- 论文/Paper: http://arxiv.org/pdf/2303.09119
- 代码/Code: https://github.com/advocate99/diffgesture
Leapfrog Diffusion Model for Stochastic Trajectory Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.10895
- 代码/Code: https://github.com/mediabrain-sjtu/led
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
- 论文/Paper: https://arxiv.org/pdf/2303.13744.pdf
- 代码/Code: https://github.com/nihaomiao/CVPR23_LFDM
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
- 论文/Paper: http://arxiv.org/pdf/2303.03361
- 代码/Code: None
NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
- 论文/Paper: http://arxiv.org/pdf/2303.06919
- 代码/Code: None
PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
- 论文/Paper: http://arxiv.org/pdf/2303.09554
- 代码/Code: None
StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
- 论文/Paper: http://arxiv.org/pdf/2303.10598
- 代码/Code: None
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
- 论文/Paper: http://arxiv.org/pdf/2303.13277
- 代码/Code: None
Paper title: Generic-to-Specific Distillation of Masked Autoencoders
- 论文/Paper: https://arxiv.org/abs/2302.14771
- 代码/Code: https://github.com/pengzhiliang/G2SD
X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.02203
- 代码/Code: None
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
- 论文/Paper: https://arxiv.org/abs/2302.14771
- 代码/Code: None
Multimodal Industrial Anomaly Detection via Hybrid Fusion
- 论文/Paper: http://arxiv.org/pdf/2303.00601
- 代码/Code: https://github.com/nomewang/m3dm
Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
- 论文/Paper: http://arxiv.org/pdf/2303.00462
- 代码/Code: https://github.com/toytiny/cmflow
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
- 论文/Paper: http://arxiv.org/pdf/2303.00865
- 代码/Code: None
Multimodal Prompting with Missing Modalities for Visual Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.03369
- 代码/Code: https://github.com/yilunlee/missing_aware_prompts
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
- 论文/Paper: http://arxiv.org/pdf/2303.02483
- 代码/Code: None
Virtual Sparse Convolution for Multimodal 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.02314
- 代码/Code: https://github.com/hailanyi/virconv
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
- 论文/Paper: http://arxiv.org/pdf/2303.03595
- 代码/Code: https://github.com/sankin97/LoGoNet
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
- 论文/Paper: http://arxiv.org/pdf/2303.05952
- 代码/Code: None
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
- 论文/Paper: http://arxiv.org/pdf/2303.07284
- 代码/Code: None
Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
- 论文/Paper: http://arxiv.org/pdf/2303.09164
- 代码/Code: None
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
- 论文/Paper: http://arxiv.org/pdf/2303.10971
- 代码/Code: https://github.com/dongliangcao/Self-Supervised-Multimodal-Shape-Matching
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
- 论文/Paper: http://arxiv.org/pdf/2303.12501
Twin Contrastive Learning with Noisy Labels
- 论文/Paper: http://arxiv.org/pdf/2303.06930
- 代码/Code: https://github.com/Hzzone/TCL
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
- 论文/Paper: http://arxiv.org/pdf/2303.06819
- 代码/Code: https://github.com/kali-hac/transg
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- 论文/Paper: http://arxiv.org/pdf/2303.07815
- 代码/Code: None
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2303.11302
- 代码/Code: \url{https://github.com/weixuansun/FNAC-AVL}.
Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.10904
- 代码/Code: None
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
- 论文/Paper: http://arxiv.org/pdf/2303.10323
- 代码/Code: https://github.com/mlii0117/dcl
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2303.12793
- 代码/Code: https://github.com/FangyunWei/SLRT
MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
- 论文/Paper: http://arxiv.org/pdf/2303.12756
- 代码/Code: https://github.com/MrChenFeng/MaskCon_CVPR2023
Fine-Grained Classification with Noisy Labels
- 论文/Paper: http://arxiv.org/pdf/2303.02404
- 代码/Code: None
Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
- 论文/Paper: http://arxiv.org/pdf/2303.08446
- 代码/Code: https://github.com/invoker-LL/WSI-finetuning
Boosting Verified Training for Robust Image Classifications via Abstraction
- 论文/Paper: http://arxiv.org/pdf/2303.11552
- 代码/Code: https://github.com/zhangzhaodi233/abscert
Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
- 论文/Paper: http://arxiv.org/pdf/2303.12307
- 代码/Code: None
Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
- 论文/Paper: http://arxiv.org/pdf/2303.01686
- 代码/Code: None
Virtual Sparse Convolution for Multimodal 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.02314
- 代码/Code: https://github.com/hailanyi/virconv
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
- 论文/Paper: http://arxiv.org/pdf/2303.03595
- 代码/Code: https://github.com/sankin97/LoGoNet
NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
- 论文/Paper: http://arxiv.org/pdf/2303.04958
- 代码/Code: None
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.05892
- 代码/Code: https://github.com/LutingWang/OADP.
Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.05886
- 代码/Code: https://github.com/PJLabADG/3DTrans
Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.06880
- 代码/Code: https://github.com/PJLab-ADG/3DTrans
Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
- 论文/Paper: http://arxiv.org/pdf/2303.07335
- 代码/Code: https://github.com/IDEA-Research/Lite-DETR
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.08129
- 代码/Code: https://github.com/blvlab/pimae
Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
- 论文/Paper: http://arxiv.org/pdf/2303.08686
- 代码/Code: https://github.com/weakmono3d/weakmono3d
Active Teacher for Semi-Supervised Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.08348
- 代码/Code: https://github.com/hunterj-lin/activeteacher
MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
- 论文/Paper: http://arxiv.org/pdf/2303.08316
- 代码/Code: https://github.com/skyhehe123/MSF
MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.09061
- 代码/Code: https://github.com/lliuz/MixTeacher
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.09674
- 代码/Code: https://github.com/Phoenix-V/DiGeo
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.11301
- 代码/Code: https://github.com/dvlab-research/VoxelNeXt
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2303.11040
- 代码/Code: https://github.com/kkkcx/3D_Corruptions_AD.
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.10209
- 代码/Code: https://github.com/PaddlePaddle/Paddle3D
STDLens: Model Hijacking-resilient Federated Learning for Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.11511
- 代码/Code: https://github.com/git-disl/stdlens
MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- 论文/Paper: http://arxiv.org/pdf/2303.13018
- 代码/Code: None
Dense Distinct Query for End-to-End Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.12776
- 代码/Code: https://github.com/jshilong/ddq
OcTr: Octree-based Transformer for 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.12621
- 代码/Code: None
Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection
- 论文/Paper: https://arxiv.org/abs/2209.01589
- 代码/Code: https://github.com/Adamdad/ConsistentTeacher
Referring Multi-Object Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.03366
- 代码/Code: https://github.com/wudongming97/rmot
Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
- 论文/Paper: http://arxiv.org/pdf/2303.08364
- 代码/Code: https://github.com/junbongjang/contour-tracking
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.11301
- 代码/Code: https://github.com/dvlab-research/VoxelNeXt
Visual Prompt Multi-Modal Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.10826
- 代码/Code: https://github.com/jiawen-zhu/ViPT.
MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.10404
- 代码/Code: None
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.00575
- 代码/Code: None
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
- 论文/Paper: http://arxiv.org/pdf/2303.05095
- 代码/Code: None
Leapfrog Diffusion Model for Stochastic Trajectory Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.10895
- 代码/Code: https://github.com/mediabrain-sjtu/led
Interactive Segmentation as Gaussian Process Classification
- 论文/Paper: http://arxiv.org/pdf/2302.14578
- 代码/Code: None
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2302.14250
- 代码/Code: None
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
- 论文/Paper: https://arxiv.org/abs/2302.14771
- 代码/Code: None
ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
- 论文/Paper: http://arxiv.org/pdf/2303.00246
- 代码/Code: None
Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
- 论文/Paper: https://arxiv.org/abs/2301.05709
- 代码/Code: None
Delivering Arbitrary-Modal Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.01480
- 代码/Code: None
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.01276
- 代码/Code: https://github.com/xiaoyao3302/CCVC
Token Contrast for Weakly-Supervised Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.01267
- 代码/Code: https://github.com/rulixiang/toco
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2303.04803
- 代码/Code: None
MP-Former: Mask-Piloted Transformer for Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.07336
- 代码/Code: https://github.com/IDEA-Research/MP-Former
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
- 论文/Paper: http://arxiv.org/pdf/2303.07224
- 代码/Code: https://github.com/thu-lyj-lab/ar-seg
InstMove: Instance Motion for Object-centric Video Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.08132
- 代码/Code: https://github.com/wjf5203/vnext
DynaMask: Dynamic Mask Selection for Instance Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.07868
- 代码/Code: https://github.com/lslrh/dynamask
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- 论文/Paper: http://arxiv.org/pdf/2303.07815
- 代码/Code: None
MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2303.08600
- 代码/Code: https://github.com/jialeli1/lidarseg3d
FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.08594
- 代码/Code: https://github.com/junjiehe96/FastInst
SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.08578
- 代码/Code: https://github.com/lslrh/sim
Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.10100
- 代码/Code: https://github.com/0liliulei/Mask-VOS
Generative Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.11316
- 代码/Code: https://github.com/fudan-zvg/gss
Reliability in Semantic Segmentation: Are We on the Right Track?
- 论文/Paper: http://arxiv.org/pdf/2303.11298
- 代码/Code: https://github.com/naver/relis
Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.11203
- 代码/Code: https://github.com/l1997i/lim3d
Explicit Visual Prompting for Low-Level Structure Segmentations
- 论文/Paper: http://arxiv.org/pdf/2303.10883
- 代码/Code: https://github.com/nifangbaage/explict-visual-prompt
Two-shot Video Object Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.12078
- 代码/Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation
Focused and Collaborative Feedback Integration for Interactive Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.11880
- 代码/Code: https://github.com/veizgyauzgyauz/fcfi
Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.13090
- 代码/Code: https://github.com/hengcai-nju/desco
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- 论文/Paper: http://arxiv.org/pdf/2302.14746
- 代码/Code: None
ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
- 论文/Paper: http://arxiv.org/pdf/2302.14435
- 代码/Code: https://github.com/I2-Multimedia-Lab/ProxyFormer
Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
- 论文/Paper: http://arxiv.org/pdf/2303.01112
- 代码/Code: None
MP-Former: Mask-Piloted Transformer for Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.07336
- 代码/Code: https://github.com/IDEA-Research/MP-Former
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
- 论文/Paper: http://arxiv.org/pdf/2303.06819
- 代码/Code: https://github.com/kali-hac/transg
BiFormer: Vision Transformer with Bi-Level Routing Attention
- 论文/Paper: http://arxiv.org/pdf/2303.08810
- 代码/Code: https://github.com/rayleizhu/biformer
Making Vision Transformers Efficient from A Token Sparsification View
- 论文/Paper: http://arxiv.org/pdf/2303.08685
- 代码/Code: None
Rotation-Invariant Transformer for Point Cloud Matching
- 论文/Paper: http://arxiv.org/pdf/2303.08231
- 代码/Code: None
Graph Transformer GANs for Graph-Constrained House Generation
- 论文/Paper: http://arxiv.org/pdf/2303.08225
- 代码/Code: None
PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
- 论文/Paper: http://arxiv.org/pdf/2303.09187
- 代码/Code: None
Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
- 论文/Paper: http://arxiv.org/pdf/2303.09164
- 代码/Code: None
Dual-path Adaptation from Image to Video Transformers
- 论文/Paper: http://arxiv.org/pdf/2303.09857
- 代码/Code: https://github.com/park-jungin/DualPath
Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
- 论文/Paper: http://arxiv.org/pdf/2303.13434
- 代码/Code: None
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
- 论文/Paper: http://arxiv.org/pdf/2303.13357
- 代码/Code: None
MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- 论文/Paper: http://arxiv.org/pdf/2303.13018
- 代码/Code: None
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
- 论文/Paper: http://arxiv.org/pdf/2303.13009
- 代码/Code: https://github.com/mlvlab/MELTR
Spherical Transformer for LiDAR-based 3D Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.12766
- 代码/Code: https://github.com/dvlab-research/sphereformer
OcTr: Octree-based Transformer for 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.12621
- 代码/Code: None
Text with Knowledge Graph Augmented Transformer for Video Captioning
- 论文/Paper: http://arxiv.org/pdf/2303.12423
- 代码/Code: None
MAGVLT: Masked Generative Vision-and-Language Transformer
- 论文/Paper: http://arxiv.org/pdf/2303.12208
- 代码/Code: None
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
- 论文/Paper: https://arxiv.org/abs/2211.13202
- 代码/Code:https://github.com/noahzn/Lite-Mono
Fully Self-Supervised Depth Estimation from Defocus Clue
- 论文/Paper: http://arxiv.org/pdf/2303.10752
- 代码/Code: https://github.com/ehzoahis/dered
Data-Free Sketch-Based Image Retrieval
- 论文/Paper: http://arxiv.org/pdf/2303.07775
- 代码/Code: https://github.com/abhrac/data-free-sbir
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
- 论文/Paper: http://arxiv.org/pdf/2303.13440
- 代码/Code: None
OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
- 论文/Paper: http://arxiv.org/pdf/2303.01091
- 代码/Code: None
Super-Resolution Neural Operator
- 论文/Paper: http://arxiv.org/pdf/2303.02584
- 代码/Code: https://github.com/2y7c3/super-resolution-neural-operator
Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
- 论文/Paper: http://arxiv.org/pdf/2303.05156
- 代码/Code: None
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
- 论文/Paper: http://arxiv.org/pdf/2303.08331
- 代码/Code: https://github.com/coulsonlee/STDO-CVPR2023.git
Masked Image Training for Generalizable Deep Image Denoising
- 论文/Paper: http://arxiv.org/pdf/2303.13132
- 代码/Code: https://github.com/haoyuc/maskeddenoising
CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
- 论文/Paper: http://arxiv.org/pdf/2303.05031
- 代码/Code: None
Context-Based Trit-Plane Coding for Progressive Image Compression
- 论文/Paper: http://arxiv.org/pdf/2303.05715
- 代码/Code: https://github.com/seungminjeon-github/ctc
Attribute-preserving Face Dataset Anonymization via Latent Code Optimization
- 论文/Paper: http://arxiv.org/pdf/2303.11296
- 代码/Code: https://github.com/chi0tzp/falco
Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
- 论文/Paper: http://arxiv.org/pdf/2303.10896
- 代码/Code: None
Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.12512
- 代码/Code: None
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
- 论文/Paper: http://arxiv.org/pdf/2303.06885
- 代码/Code: None
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
- 论文/Paper: https://arxiv.org/abs/2302.14434
- 代码/Code: https://younglbw.github.io/HRN-homepage/
TriDet: Temporal Action Detection with Relative Boundary Modeling
- 论文/Paper: http://arxiv.org/pdf/2303.07347
- 代码/Code: https://github.com/sssste/tridet
Continuous Sign Language Recognition with Correlation Network
- 论文/Paper: http://arxiv.org/pdf/2303.03202
- 代码/Code: None
Natural Language-Assisted Sign Language Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.12080
- 代码/Code: https://github.com/FangyunWei/SLRT
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
- 论文/Paper: http://arxiv.org/pdf/2303.06819
- 代码/Code: https://github.com/kali-hac/transg
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
- 论文/Paper: https://arxiv.org/abs/2303.14481
- 代码/Code: https://github.com/ZYK100/LLCM
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
- 论文/Paper: https://arxiv.org/abs/2211.12194
- 代码/Code: https://github.com/Winfredy/SadTalker
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
- 论文/Paper: http://arxiv.org/pdf/2303.07337
- 代码/Code: https://github.com/qihao067/poseexaminer
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
- 论文/Paper: http://arxiv.org/pdf/2303.08475
- 代码/Code: None
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
- 论文/Paper: http://arxiv.org/pdf/2303.09095
- 代码/Code: None
Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2303.11180
- 代码/Code: None
3D Human Mesh Estimation from Virtual Markers
- 论文/Paper: http://arxiv.org/pdf/2303.11726
- 代码/Code: https://github.com/ShirleyMaxx/VirtualMarker.
Rigidity-Aware Detection for 6D Object Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2303.12396
- 代码/Code: None
Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
- 论文/Paper: http://arxiv.org/pdf/2303.12246
- 代码/Code: None
Improving GAN Training via Feature Space Shrinkage
- 论文/Paper: http://arxiv.org/pdf/2303.01559
- 代码/Code: https://github.com/WentianZhang-ML/AdaptiveMix
Scaling up GANs for Text-to-Image Synthesis
- 论文/Paper: http://arxiv.org/pdf/2303.05511
- 代码/Code: None
Graph Transformer GANs for Graph-Constrained House Generation
- 论文/Paper: http://arxiv.org/pdf/2303.08225
- 代码/Code: None
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
- 论文/Paper: http://arxiv.org/pdf/2303.10774
- 代码/Code: https://github.com/mattolson93/cross_gan_auditing
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
- 论文/Paper: http://arxiv.org/pdf/2302.14348
- 代码/Code: https://github.com/jyunlee/Im2Hands
ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2303.05938
- 代码/Code: https://github.com/zhengdiyu/arbitrary-hands-3d-reconstruction
Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- 论文/Paper: http://arxiv.org/pdf/2303.01999
- 代码/Code: None
MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
- 论文/Paper: http://arxiv.org/pdf/2303.01932
- 代码/Code: None
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
- 论文/Paper: http://arxiv.org/pdf/2303.02700
- 代码/Code: None
NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2303.02375
- 代码/Code: None
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2303.05937
- 代码/Code: None
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
- 论文/Paper: http://arxiv.org/pdf/2303.00440
- 代码/Code: https://github.com/MCG-NJU/EMA-VFI
ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
- 论文/Paper: http://arxiv.org/pdf/2303.00246
- 代码/Code: None
Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
- 论文/Paper: https://arxiv.org/abs/2301.05709
- 代码/Code: None
Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
- 论文/Paper: http://arxiv.org/pdf/2303.01038
- 代码/Code: None
ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion
- 论文/Paper: http://arxiv.org/pdf/2303.01979
- 代码/Code: https://github.com/Sangminhong/ACL-SPC_PyTorch
PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees
- 论文/Paper: http://arxiv.org/pdf/2303.01959
- 代码/Code: None
SCPNet: Semantic Scene Completion on Point Cloud
- 论文/Paper: http://arxiv.org/pdf/2303.06884
- 代码/Code: None
Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
- 论文/Paper: http://arxiv.org/pdf/2303.08134
- 代码/Code: https://github.com/zrrskywalker/point-nn
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.08129
- 代码/Code: https://github.com/blvlab/pimae
Frequency-Modulated Point Cloud Rendering with Easy Editing
- 论文/Paper: http://arxiv.org/pdf/2303.07596
- 代码/Code: None
MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
- 论文/Paper: http://arxiv.org/pdf/2303.08316
- 代码/Code: \url{https://github.com/skyhehe123/MSF}.
Rotation-Invariant Transformer for Point Cloud Matching
- 论文/Paper: http://arxiv.org/pdf/2303.08231
- 代码/Code: None
Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
- 论文/Paper: http://arxiv.org/pdf/2303.09950
- 代码/Code: https://github.com/qinzheng93/GraphSCNet
Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.11203
- 代码/Code: https://github.com/l1997i/lim3d
Novel Class Discovery for 3D Point Cloud Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2303.11610
- 代码/Code: https://github.com/luigiriz/nops
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
- 论文/Paper: http://arxiv.org/pdf/2303.13290
- 代码/Code: https://github.com/gfmei/udpreg
Diversity-Measurable Anomaly Detection
- 论文/Paper: http://arxiv.org/pdf/2303.05047
- 代码/Code: None
PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
- 论文/Paper: http://arxiv.org/pdf/2302.14772
- 代码/Code: https://github.com/ShunLu91/PA-DA
Generic-to-Specific Distillation of Masked Autoencoders
- 论文/Paper: http://arxiv.org/pdf/2302.14771
- 代码/Code: https://github.com/pengzhiliang/G2SD.
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- 论文/Paper: http://arxiv.org/pdf/2302.14677
- 代码/Code: None
Turning a CLIP Model into a Scene Text Detector
- 论文/Paper: http://arxiv.org/pdf/2302.14338
- 代码/Code: None
Adversarial Attack with Raindrops
- 论文/Paper: http://arxiv.org/pdf/2302.14267
- 代码/Code: None
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
- 论文/Paper: http://arxiv.org/pdf/2302.14115
- 代码/Code: None
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
- 论文/Paper: http://arxiv.org/pdf/2302.14685
- 代码/Code: None
Neural Video Compression with Diverse Contexts
- 论文/Paper: http://arxiv.org/pdf/2302.14402
- 代码/Code: https://github.com/microsoft/DCVC
Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
- 论文/Paper: http://arxiv.org/pdf/2302.14290
- 代码/Code: None
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
- 论文/Paper: http://arxiv.org/pdf/2303.00748
- 代码/Code: https://github.com/ofsoundof/grl-image-restoration
Quality-aware Pre-trained Models for Blind Image Quality Assessment
- 论文/Paper: http://arxiv.org/pdf/2303.00521
- 代码/Code: None
Renderable Neural Radiance Map for Visual Navigation
- 论文/Paper: http://arxiv.org/pdf/2303.00304
- 代码/Code: None
Single Image Backdoor Inversion via Robust Smoothed Classifiers
- 论文/Paper: http://arxiv.org/pdf/2303.00215
- 代码/Code: https://github.com/locuslab/smoothinv
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
- 论文/Paper: http://arxiv.org/pdf/2303.00040
- 代码/Code: None
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
- 论文/Paper: http://arxiv.org/pdf/2303.01311
- 代码/Code: None
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
- 论文/Paper: http://arxiv.org/pdf/2303.01239
- 代码/Code: https://github.com/jingjing12110/MixPHM
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
- 论文/Paper: http://arxiv.org/pdf/2303.00971
- 代码/Code: https://github.com/zhijieshen-bjtu/dopnet
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- 论文/Paper: http://arxiv.org/pdf/2303.00914
- 代码/Code: None
Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
- 论文/Paper: http://arxiv.org/pdf/2303.00885
- 代码/Code: None
Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training
- 论文/Paper: http://arxiv.org/pdf/2303.00874
- 代码/Code: https://github.com/yutinghe-list/gvsl
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
- 论文/Paper: http://arxiv.org/pdf/2303.01052
- 代码/Code: None
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
- 论文/Paper: http://arxiv.org/pdf/2303.00938
- 代码/Code: None
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
- 论文/Paper: http://arxiv.org/pdf/2303.02151
- 代码/Code: https://github.com/ZrrSkywalker/CaFo.
Zero-shot Object Counting
- 论文/Paper: http://arxiv.org/pdf/2303.02001
- 代码/Code: https://github.com/cvlab-stonybrook/zero-shot-counting
EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
- 论文/Paper: http://arxiv.org/pdf/2303.01904
- 代码/Code: None
Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
- 论文/Paper: http://arxiv.org/pdf/2303.01903
- 代码/Code: https://github.com/MILVLG/prophet
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
- 论文/Paper: http://arxiv.org/pdf/2303.01869
- 代码/Code: None
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2303.01788
- 代码/Code: None
Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement
- 论文/Paper: http://arxiv.org/pdf/2303.01765
- 代码/Code: None
Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
- 论文/Paper: http://arxiv.org/pdf/2303.01669
- 代码/Code: None
Hierarchical discriminative learning improves visual representations of biomedical microscopy
- 论文/Paper: http://arxiv.org/pdf/2303.01605
- 代码/Code: None
A Meta-Learning Approach to Predicting Performance and Data Requirements
- 论文/Paper: http://arxiv.org/pdf/2303.01598
- 代码/Code: None
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.01573
- 代码/Code: None
Detecting Human-Object Contact in Images
- 论文/Paper: http://arxiv.org/pdf/2303.03373
- 代码/Code: None
MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
- 论文/Paper: http://arxiv.org/pdf/2303.03315
- 代码/Code: None
Masked Images Are Counterfactual Samples for Robust Fine-tuning
- 论文/Paper: http://arxiv.org/pdf/2303.03052
- 代码/Code: None
UniHCP: A Unified Model for Human-Centric Perceptions
- 论文/Paper: http://arxiv.org/pdf/2303.02936
- 代码/Code: None
PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow
- 论文/Paper: http://arxiv.org/pdf/2303.02595
- 代码/Code: None
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- 论文/Paper: http://arxiv.org/pdf/2303.02489
- 代码/Code: None
DistilPose: Tokenized Pose Regression with Heatmap Distillation
- 论文/Paper: http://arxiv.org/pdf/2303.02455
- 代码/Code: None
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
- 论文/Paper: http://arxiv.org/pdf/2303.02165
- 代码/Code: https://github.com/alibaba/lightweight-neural-architecture-search
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
- 论文/Paper: http://arxiv.org/pdf/2303.03108
- 代码/Code: None
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
- 论文/Paper: http://arxiv.org/pdf/2303.04077
- 代码/Code: None
Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
- 论文/Paper: http://arxiv.org/pdf/2303.03770
- 代码/Code: None
Learning Discriminative Representations for Skeleton Based Action Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.03729
- 代码/Code: None
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.03684
- 代码/Code: None
RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
- 论文/Paper: http://arxiv.org/pdf/2303.04456
- 代码/Code: https://github.com/twhui/rm-depth
A Light Weight Model for Active Speaker Detection
- 论文/Paper: http://arxiv.org/pdf/2303.04439
- 代码/Code: https://github.com/junhua-liao/light-asd
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
- 论文/Paper: http://arxiv.org/pdf/2303.04249
- 代码/Code: None
CUDA: Convolution-based Unlearnable Datasets
- 论文/Paper: http://arxiv.org/pdf/2303.04278
- 代码/Code: None
Masked Image Modeling with Local Multi-Scale Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2303.05251
- 代码/Code: None
Revisiting Rotation Averaging: Uncertainties and Robust Losses
- 论文/Paper: http://arxiv.org/pdf/2303.05195
- 代码/Code: https://github.com/zhangganlin/globalsfmpy
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
- 论文/Paper: http://arxiv.org/pdf/2303.04995
- 代码/Code: None
MVImgNet: A Large-scale Dataset of Multi-view Images
- 论文/Paper: http://arxiv.org/pdf/2303.06042
- 代码/Code: None
Neuron Structure Modeling for Generalizable Remote Physiological Measurement
- 论文/Paper: http://arxiv.org/pdf/2303.05955
- 代码/Code: https://github.com/lupaopao/nest
3D Cinemagraphy from a Single Image
- 论文/Paper: http://arxiv.org/pdf/2303.05724
- 代码/Code: None
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
- 论文/Paper: http://arxiv.org/pdf/2303.05675
- 代码/Code: https://github.com/OpenGVLab/HumanBench
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
- 论文/Paper: http://arxiv.org/pdf/2303.05762
- 代码/Code: https://github.com/chenweixin107/trojdiff
Modality-Agnostic Debiasing for Single Domain Generalization
- 论文/Paper: http://arxiv.org/pdf/2303.07123
- 代码/Code: None
Upcycling Models under Domain and Category Shift
- 论文/Paper: http://arxiv.org/pdf/2303.07110
- 代码/Code: https://github.com/ispc-lab/glc
Prototype-based Embedding Network for Scene Graph Generation
- 论文/Paper: http://arxiv.org/pdf/2303.07096
- 代码/Code: None
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
- 论文/Paper: http://arxiv.org/pdf/2303.07065
- 代码/Code: https://github.com/vimar-gu/MSINet
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
- 论文/Paper: http://arxiv.org/pdf/2303.06949
- 代码/Code: None
Progressive Open Space Expansion for Open-Set Model Attribution
- 论文/Paper: http://arxiv.org/pdf/2303.06877
- 代码/Code: https://github.com/tianyunyoung/pose
Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
- 论文/Paper: http://arxiv.org/pdf/2303.06873
- 代码/Code: https://github.com/HHHedo/IBMIL
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
- 论文/Paper: http://arxiv.org/pdf/2303.06870
- 代码/Code: https://github.com/megvii-research/US3L-CVPR2023.
Adaptive Data-Free Quantization
- 论文/Paper: http://arxiv.org/pdf/2303.06869
- 代码/Code: https://github.com/hfutqian/adadfq
Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
- 论文/Paper: http://arxiv.org/pdf/2303.06859
- 代码/Code: https://github.com/lixinustc/casual-ir-dil
Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies
- 论文/Paper: http://arxiv.org/pdf/2303.06856
- 代码/Code: None
Universal Instance Perception as Object Discovery and Retrieval
- 论文/Paper: http://arxiv.org/pdf/2303.06674
- 代码/Code: https://github.com/MasterBin-IIAU/UNINEXT
Iterative Geometry Encoding Volume for Stereo Matching
- 论文/Paper: http://arxiv.org/pdf/2303.06615
- 代码/Code: https://github.com/gangweix/igev
Regularized Vector Quantization for Tokenized Image Synthesis
- 论文/Paper: http://arxiv.org/pdf/2303.06424
- 代码/Code: None
Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
- 论文/Paper: http://arxiv.org/pdf/2303.06380
- 代码/Code: None
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
- 论文/Paper: http://arxiv.org/pdf/2303.06357
- 代码/Code: None
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
- 论文/Paper: http://arxiv.org/pdf/2303.06285
- 代码/Code: https://github.com/yueming6568/deltaedit
Diversity-Aware Meta Visual Prompting
- 论文/Paper: http://arxiv.org/pdf/2303.08138
- 代码/Code: https://github.com/shikiw/dam-vp
Blind Video Deflickering by Neural Filtering with a Flawed Atlas
- 论文/Paper: http://arxiv.org/pdf/2303.08120
- 代码/Code: https://github.com/chenyanglei/all-in-one-deflicker
Non-Contrastive Unsupervised Learning of Physiological Signals from Video
- 论文/Paper: http://arxiv.org/pdf/2303.07944
- 代码/Code: None
DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
- 论文/Paper: http://arxiv.org/pdf/2303.07929
- 代码/Code: None
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos
- 论文/Paper: http://arxiv.org/pdf/2303.07863
- 代码/Code: None
NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
- 论文/Paper: http://arxiv.org/pdf/2303.07653
- 代码/Code: None
I$^2$-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
- 论文/Paper: http://arxiv.org/pdf/2303.07634
- 代码/Code: None
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
- 论文/Paper: http://arxiv.org/pdf/2303.07601
- 代码/Code: https://github.com/ucla-mobility/V2V4Real
Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
- 论文/Paper: http://arxiv.org/pdf/2303.08698
- 代码/Code: https://github.com/Zhicaiwww/Bi-VAEGAN
Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
- 论文/Paper: http://arxiv.org/pdf/2303.08658
- 代码/Code: https://github.com/Kebii/R2ET.
Lana: A Language-Capable Navigator for Instruction Following and Generation
- 论文/Paper: http://arxiv.org/pdf/2303.08409
- 代码/Code: https://github.com/wxh1996/lana-vln
Rethinking Optical Flow from Geometric Matching Consistent Perspective
- 论文/Paper: http://arxiv.org/pdf/2303.08384
- 代码/Code: https://github.com/dqiaole/matchflow
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
- 论文/Paper: http://arxiv.org/pdf/2303.08536
- 代码/Code: https://github.com/joannahong/av-relscore
Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
- 论文/Paper: http://arxiv.org/pdf/2303.09352
- 代码/Code: https://github.com/uitml/nohub
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
- 论文/Paper: http://arxiv.org/pdf/2303.09165
- 代码/Code: https://github.com/huitangtang/on_the_utility_of_synthetic_data
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
- 论文/Paper: http://arxiv.org/pdf/2303.09483
- 代码/Code: https://github.com/kim-sanghwan/ancl
TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2303.09998
- 代码/Code: None
Adversarial Counterfactual Visual Explanations
- 论文/Paper: http://arxiv.org/pdf/2303.09962
- 代码/Code: None
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
- 论文/Paper: http://arxiv.org/pdf/2303.09875
- 代码/Code: None
TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
- 论文/Paper: http://arxiv.org/pdf/2303.09870
- 代码/Code: None
Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
- 论文/Paper: http://arxiv.org/pdf/2303.09757
- 代码/Code: https://github.com/jiaqixuac/MAP-Net
LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
- 论文/Paper: http://arxiv.org/pdf/2303.09665
- 代码/Code: None
On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
- 论文/Paper: http://arxiv.org/pdf/2303.09877
- 代码/Code: https://github.com/DanielTrosten/DeepMVC
3D Concept Learning and Reasoning from Multi-View Images
- 论文/Paper: http://arxiv.org/pdf/2303.11327
- 代码/Code: None
Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
- 论文/Paper: http://arxiv.org/pdf/2303.11162
- 代码/Code: None
Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
- 论文/Paper: http://arxiv.org/pdf/2303.11101
- 代码/Code: None
Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
- 论文/Paper: http://arxiv.org/pdf/2303.11066
- 代码/Code: https://github.com/megvii-research/FullMatch.
Feature Alignment and Uniformity for Test Time Adaptation
- 论文/Paper: http://arxiv.org/pdf/2303.10902
- 代码/Code: None
EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
- 论文/Paper: http://arxiv.org/pdf/2303.10876
- 代码/Code: https://github.com/mediabrain-sjtu/eqmotion
Trainable Projected Gradient Method for Robust Fine-tuning
- 论文/Paper: http://arxiv.org/pdf/2303.10720
- 代码/Code: \url{https://github.com/PotatoTian/TPGM}.
Partial Network Cloning
- 论文/Paper: http://arxiv.org/pdf/2303.10597
- 代码/Code: https://github.com/jngwenye/pncloning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
- 论文/Paper: http://arxiv.org/pdf/2303.10482
- 代码/Code: https://github.com/szzexpoi/poem
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
- 论文/Paper: http://arxiv.org/pdf/2303.10449
- 代码/Code: https://github.com/lufan31/et-ood
DeAR: Debiasing Vision-Language Models with Additive Residuals
- 论文/Paper: http://arxiv.org/pdf/2303.10431
- 代码/Code: None
3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
- 论文/Paper: http://arxiv.org/pdf/2303.10406
- 代码/Code: https://github.com/colorful-liyu/3dqd
Sharpness-Aware Gradient Matching for Domain Generalization
- 论文/Paper: http://arxiv.org/pdf/2303.10353
- 代码/Code: https://github.com/wang-pengfei/sagm
Extracting Class Activation Maps from Non-Discriminative Features as well
- 论文/Paper: http://arxiv.org/pdf/2303.10334
- 代码/Code: https://github.com/zhaozhengchen/lpcam
Make Landscape Flatter in Differentially Private Federated Learning
- 论文/Paper: http://arxiv.org/pdf/2303.11242
- 代码/Code: None
Computationally Budgeted Continual Learning: What Does Matter?
- 论文/Paper: http://arxiv.org/pdf/2303.11165
- 代码/Code: https://github.com/drimpossible/BudgetCL.
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
- 论文/Paper: http://arxiv.org/pdf/2303.11135
- 代码/Code: https://github.com/ziquanliu/cvpr2023-twins
Efficient Map Sparsification Based on 2D and 3D Discretized Grids
- 论文/Paper: http://arxiv.org/pdf/2303.10882
- 代码/Code: https://github.com/fishmarch/SLAM_Map_Compression.
ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
- 论文/Paper: http://arxiv.org/pdf/2303.12071
- 代码/Code: None
Joint Visual Grounding and Tracking with Natural Language Specification
- 论文/Paper: http://arxiv.org/pdf/2303.12027
- 代码/Code: https://github.com/lizhou-cs/JointNLT.
Automatic evaluation of herding behavior in towed fishing gear using end-to-end training of CNN and attention-based networks
- 论文/Paper: http://arxiv.org/pdf/2303.12016
- 代码/Code: None
Learning A Sparse Transformer Network for Effective Image Deraining
- 论文/Paper: http://arxiv.org/pdf/2303.11950
- 代码/Code: https://github.com/cschenxiang/drsformer
Context De-confounded Emotion Recognition
- 论文/Paper: http://arxiv.org/pdf/2303.11921
- 代码/Code: None
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
- 论文/Paper: http://arxiv.org/pdf/2303.11906
- 代码/Code: None
The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
- 论文/Paper: http://arxiv.org/pdf/2303.11828
- 代码/Code: https://github.com/zhoucx117/uaed
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
- 论文/Paper: http://arxiv.org/pdf/2303.11791
- 代码/Code: None
Detecting Everything in the Open World: Towards Universal Object Detection
- 论文/Paper: http://arxiv.org/pdf/2303.11749
- 代码/Code: None
Data-efficient Large Scale Place Recognition with Graded Similarity Supervision
- 论文/Paper: http://arxiv.org/pdf/2303.11739
- 代码/Code: https://github.com/marialeyvallina/generalized_contrastive_loss
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
- 论文/Paper: http://arxiv.org/pdf/2303.11730
- 代码/Code: https://github.com/xu-jingyi/algebraicmr
Learning a 3D Morphable Face Reflectance Model from Low-cost Data
- 论文/Paper: http://arxiv.org/pdf/2303.11686
- 代码/Code: https://github.com/yxuhan/reflectancemm
Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns
- 论文/Paper: http://arxiv.org/pdf/2303.11678
- 代码/Code: None
ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
- 论文/Paper: http://arxiv.org/pdf/2303.11674
- 代码/Code: https://github.com/lingeringlight/aloft
Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
- 论文/Paper: http://arxiv.org/pdf/2303.11642
- 代码/Code: Available:https://github.com/MyNiuuu/VCSD.
Human Pose as Compositional Tokens
- 论文/Paper: http://arxiv.org/pdf/2303.11638
- 代码/Code: https://github.com/Gengzigang/PCT
Equiangular Basis Vectors
- 论文/Paper: http://arxiv.org/pdf/2303.11637
- 代码/Code: https://github.com/njust-vipgroup/equiangular-basis-vectors
HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
- 论文/Paper: http://arxiv.org/pdf/2303.11616
- 代码/Code: None
Boundary Unlearning
- 论文/Paper: http://arxiv.org/pdf/2303.11570
- 代码/Code: None
One-to-Few Label Assignment for End-to-End Dense Detection
- 论文/Paper: http://arxiv.org/pdf/2303.11567
- 代码/Code: https://github.com/strongwolf/o2f.
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
- 论文/Paper: http://arxiv.org/pdf/2303.11545
- 代码/Code: https://github.com/LeeDongYeun/FixNoise
PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
- 论文/Paper: http://arxiv.org/pdf/2303.11526
- 代码/Code: fromhttps://github.com/Zhang-VISLab.
Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
- 论文/Paper: http://arxiv.org/pdf/2303.11502
- 代码/Code: None
Polynomial Implicit Neural Representations For Large Diverse Datasets
- 论文/Paper: http://arxiv.org/pdf/2303.11424
- 代码/Code: https://github.com/rajhans0/poly_inr
Persistent Nature: A Generative Model of Unbounded 3D Worlds
- 论文/Paper: http://arxiv.org/pdf/2303.13515
- 代码/Code: None
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
- 论文/Paper: http://arxiv.org/pdf/2303.13510
- 代码/Code: https://github.com/smartbot-pjlab/mv-jar
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- 论文/Paper: http://arxiv.org/pdf/2303.13483
- 代码/Code: None
Egocentric Audio-Visual Object Localization
- 论文/Paper: http://arxiv.org/pdf/2303.13471
- 代码/Code: https://github.com/wikichao/ego-av-loc
Improving Generalization with Domain Convex Game
- 论文/Paper: http://arxiv.org/pdf/2303.13297
- 代码/Code: None
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- 论文/Paper: http://arxiv.org/pdf/2303.13283
- 代码/Code: https://github.com/htyao89/kgcoop
TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- 论文/Paper: http://arxiv.org/pdf/2303.13273
- 代码/Code: https://github.com/plusmultiply/taps3d
A Bag-of-Prototypes Representation for Dataset-Level Applications
- 论文/Paper: http://arxiv.org/pdf/2303.13251
- 代码/Code: None
CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
- 论文/Paper: http://arxiv.org/pdf/2303.13245
- 代码/Code: https://github.com/stegmuel/croc
Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
- 论文/Paper: http://arxiv.org/pdf/2303.13232
- 代码/Code: None
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
- 论文/Paper: http://arxiv.org/pdf/2303.13223
- 代码/Code: https://github.com/jameslahm/SCPNet.
Marching-Primitives: Shape Abstraction from Signed Distance Function
- 论文/Paper: http://arxiv.org/pdf/2303.13190
- 代码/Code: https://github.com/ChirikjianLab/Marching-Primitives.git.
CP$^3$: Channel Pruning Plug-in for Point-based Networks
- 论文/Paper: http://arxiv.org/pdf/2303.13097
- 代码/Code: None
Box-Level Active Detection
- 论文/Paper: http://arxiv.org/pdf/2303.13089
- 代码/Code: https://github.com/lyumengyao/blad.
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
- 论文/Paper: http://arxiv.org/pdf/2303.13087
- 代码/Code: https://github.com/zhuohuangai/sharpdro
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- 论文/Paper: http://arxiv.org/pdf/2303.13076
- 代码/Code: https://github.com/tgxs002/cora
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
- 论文/Paper: http://arxiv.org/pdf/2303.13071
- 代码/Code: None
Human Guided Ground-truth Generation for Realistic Image Super-resolution
- 论文/Paper: http://arxiv.org/pdf/2303.13069
- 代码/Code: https://github.com/chrisdud0257/hggt
SIEDOB: Semantic Image Editing by Disentangling Object and Background
- 论文/Paper: http://arxiv.org/pdf/2303.13062
- 代码/Code: https://github.com/wuyangluo/siedob
Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
- 论文/Paper: http://arxiv.org/pdf/2303.13051
- 代码/Code: None
Top-Down Visual Attention from Analysis by Synthesis
- 论文/Paper: http://arxiv.org/pdf/2303.13043
- 代码/Code: None
Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
- 论文/Paper: http://arxiv.org/pdf/2303.13014
- 代码/Code: None
Backdoor Defense via Adaptively Splitting Poisoned Dataset
- 论文/Paper: http://arxiv.org/pdf/2303.12993
- 代码/Code: https://github.com/kuofenggao/asd
LightPainter: Interactive Portrait Relighting with Freehand Scribble
- 论文/Paper: http://arxiv.org/pdf/2303.12950
- 代码/Code: None
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- 论文/Paper: http://arxiv.org/pdf/2303.12930
- 代码/Code: None
Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs
- 论文/Paper: http://arxiv.org/pdf/2303.13211
- 代码/Code: None
Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
- 论文/Paper: http://arxiv.org/pdf/2303.13031
- 代码/Code: https://github.com/andreguo/hdrtvdm
Open Set Action Recognition via Multi-Label Evidential Learning
- 论文/Paper: http://arxiv.org/pdf/2303.12698
- 代码/Code: None
Dense Network Expansion for Class Incremental Learning
- 论文/Paper: http://arxiv.org/pdf/2303.12696
- 代码/Code: None
VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
- 论文/Paper: http://arxiv.org/pdf/2303.12675
- 代码/Code: None
Correlational Image Modeling for Self-Supervised Visual Pre-Training
- 论文/Paper: http://arxiv.org/pdf/2303.12670
- 代码/Code: None
An Extended Study of Human-like Behavior under Adversarial Training
- 论文/Paper: http://arxiv.org/pdf/2303.12669
- 代码/Code: None
RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
- 论文/Paper: http://arxiv.org/pdf/2303.12564
- 代码/Code: None
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
- 论文/Paper: http://arxiv.org/pdf/2303.12513
- 代码/Code: None
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- 论文/Paper: http://arxiv.org/pdf/2303.12419
- 代码/Code: https://github.com/xu5zhao/bicro
Balanced Spherical Grid for Egocentric View Synthesis
- 论文/Paper: http://arxiv.org/pdf/2303.12408
- 代码/Code: https://github.com/changwoonchoi/EgoNeRF
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
- 论文/Paper: http://arxiv.org/pdf/2303.12370
- 代码/Code: https://github.com/svip-lab/weaksvr
Re-thinking Federated Active Learning based on Inter-class Diversity
- 论文/Paper: http://arxiv.org/pdf/2303.12317
- 代码/Code: https://github.com/raymin0223/logo
Learning a Depth Covariance Function
- 论文/Paper: http://arxiv.org/pdf/2303.12157
- 代码/Code: None
Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation
- 论文/Paper: http://arxiv.org/pdf/2303.12112
- 代码/Code: https://github.com/aimagelab/pacscore
Music-Driven Group Choreography
- 论文/Paper: http://arxiv.org/pdf/2303.12337
- 代码/Code: None
Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
- 论文/Paper: https://arxiv.org/abs/2303.17602
- 代码/Code: https://github.com/tinyvision/SOLIDER
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects