Guanglu Song

Director of the Base Model R&D Department and the Base Model Services Department, SenseTime
guanglusong@foxmail.com
Google Scholar
Resume

News

[2024] Our AIGC product MiaoHua (QuPai) has garnered users over 4,000,000, with a DAU exceeding 530,000.
[2024] 12 papers are accepted by ECCV/CVPR/NeurIPS.
[2023] 7 papers are accepted by TPAMI/ICCV.
[2022] 7 paper are accepted by ECCV/ICLR/NeurIPS.
[2021] We obtain the Top-1 of ICCV2021-MFR Glint360K Track, Top-1 of ICCV2021-MFR Unconstrained Track and Top-1 of ICCV2021-MFR WebFace260M Track
[2021] We obtain the Top-1 of NIST FRVT 1:N Identification, an official 1:N face recognition algorithms evaluation platform
[2021] We obtain the Top-1 of NIST FRVT 1:1 Verification, an official 1:1 face recognition algorithms evaluation platform
[2021] We obtain the Top-1 of NIST FRVT Face Mask Effects, an official evaluation face recognition accuracy with face masks
[2017-2021] 7 papers are accepted by CVPR, ECCV, AAAI, ICCV.
[2020] We obtain the Top-1 of AcitivityNet Challenge 2020 [Solutions]
[2019] We obtain the Top-1 of ICCV19 Multi-Moments in Time (MIT) Challenge (solutions)
[2019] We obtain the Top-1 of ICCV19 OpenImage Instance Segmentation Challenge (solutions)
[2019] We obtain the Top-1 of ICCV19 OpenImage Object Detection Challenge (solutions)[Code]
[2019] We obtain the Top-1 of ICCV19 Lightweight Face Recognition Challenge (model and report)

About me

With 8 years of experience in AI model research and development, I possess a keen technical insight into foundational large models and extensive hands-on experience in frontline R&D.

Director of the Base Model R&D Department and Director of the Base Model Services Department at SenseTime.
A core member of the founding team for SenseTime's Visual Large Model. Part of the earliest team in China (2020) to engage with and take responsibility for large model training.
Led the frontline R&D and deployment of the Large Recognition Model (2020), Large Perception Model (2021), Large Multimodal Model (2021), and Large AIGC Model (2023-2024).

Managed a centralized platform team that supports over X production lines. This team has won the Group's highest research awards multiple times.

My research interests include: large model design and optimization、 large AIGC model、 and basic computer vision topics (detection, classification, recognition, and video understanding).

I also explore the design and optimization of the supervised learning in DI-star

Working Experience

Director of the Base Model R&D Department and the Base Model Services Department at SenseTime BaseModel. (2022 to Now)
Working on large AIGC model design and application.
Senior researcher at SenseTime BaseModel. (2021 to 2022)
Working on large model design and optimization.
Researcher at SenseTime X-Lab. (2020 to 2021)
Working on large model design and optimization.
Research intern at SenseTime. (2017 to 2020)
Worked on object detection and recognition with Yu Liu.

Publications

*equal contribition, more publications please refer to Google Scholar

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
Fu Yun Wang, Zhaoyang Huang, Qiang Ma, Guanglu Song, Xudong LU, Weikang Bian, Yijin Li, Yu Liu, Hongsheng Li
ECCV2024
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks
Manyuan Zhang, Guanglu Song, Xiaoyu Shi, Yu Liu, Hongsheng Li
ECCV2024
Deep reward supervisions for tuning text-to-image diffusion models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li
ECCV2024
Be-your-outpainter: Mastering video outpainting through input-specific adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li
ECCV2024
AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li
SIGGRAPH Asia 2024
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu
NeurIPS2024
Phased Consistency Model
Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang
NeurIPS2024
Mova: Adapting mixture of vision experts to multimodal context
Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu
NeurIPS2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li
NeurIPS2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
NeurIPS2024
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li
ECCV2024
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen, Guanglu Song*, Zeyue Xue, Fu-Yun Wang, Yu Liu
CVPR2024
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
NeurIPS2023
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu
ICCV2023
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao
TPAMI2023
Rethinking Robust Representation Learning Under Fine-grained Noisy Faces 【Code (coming soon)】
Bingqi Ma, Guanglu Song^*, Boxiao Liu, Yu Liu
2022 European Conference on Computer Vision (ECCV)
Unifying Visual Perception by Dispersible Points Learning 【Code】
Jianming Liang, Guanglu Song, Biao Leng, Yu Liu
2022 European Conference on Computer Vision (ECCV)
Self-slimmed Vision Transformer 【Code】
Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu
2022 European Conference on Computer Vision (ECCV)
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP 【Code】
Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, Yu Liu
2022 European Conference on Computer Vision (ECCV)
Towards Robust Face Recognition with Comprehensive Search 【Code (coming soon)】
Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
2022 European Conference on Computer Vision (ECCV)
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning 【extended version】【Code】
Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao
2022 ICLR
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Tech Report
Switchable K-class Hyperplanes for Noise-robust Representation Learning
Boxiao Liu^*, Guanglu Song^*, Manyuan Zhang, Haihang You, Yu Liu
2021 International Conference on Computer Vision (ICCV)
Rectifying the Data Bias in Knowledge Distillation
Boxiao Liu, Shenghan Zhang, Guanglu Song, Haihang You, Yu Liu
(Best Workshop Paper) 2021 International Conference on Computer Vision (ICCV) Masked Face Recognition Challenge & Workshop
Discriminability Distillation in Group Representation Learning
Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu
2020 European Conference on Computer Vision (ECCV)
Revisiting the Sibling Head in Object Detector, Code
Guanglu Song, Yu Liu, Xiaogang Wang
(OpenImage 2019 Champion) 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
KPNet: Towards Minimal Face Detector
Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
(Oral) 2020 AAAI Conference on Artificial Intelligence (AAAI)
Towards Flops-constrained Face Recognition, Code
Yu Liu*, Guanglu Song*, Manyuan Zhang*, Jihao Liu*, Yucong Zhou, Junjie Yan
(Top-1 Solution) 2019 ICCV Lightweight Face Recognition Challenge & Workshop
Transductive Centroid Projection for Semi-supervised Large-scale Recognition
Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang
2018 European Conference on Computer Vision (ECCV)
Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy
Guanglu Song*, Yu Liu*, Ming Jiang, Yujie Wang, Junjie Yan, Biao Leng
2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Region-based Quality Estimation Network for Large-Scale Person Re-identiﬁcation
Guanglu Song, Biao Leng, Yu Liu, Congrui Hetang, Shaofan Cai
2018 AAAI Conference on Artificial Intelligence (AAAI)
Top-1 Solution of Multi-Moments in Time Challenge 2019
Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan
Top-1 Solution of Multi-Moments in Time Challenge 2019
1st Place Solutions for OpenImage2019--Object Detection and Instance Segmentation
Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
(Top-1 Solution)1st Place Solutions for OpenImage2019--Object Detection and Instance Segmentation
Team Efficient Multi-Moments in Time Challenge 2019 Technical Report
Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan
(Top-1 Solution)Team Efficient Multi-Moments in Time Challenge 2019 Technical Report

Projects & Datasets

TSD, OpenImage Top-1 solutions.
Labeled Pedestrains in the Wild ，a large scale pedestrain re-identification benchmark