28 12

Huang Qidong

shikiw

https://shikiw.github.io/

AI & ML interests

multi-modal LLMs

Recent Activity

upvoted a paper 9 days ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

upvoted a collection 3 months ago

Qwen3.5

liked a model 3 months ago

Qwen/Qwen3.5-397B-A17B

View all activity

Organizations

None yet

upvoted a paper 9 days ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 11 days ago • 139

upvoted a collection 3 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.67k

liked a model 3 months ago

Qwen/Qwen3.5-397B-A17B

Image-Text-to-Text • 403B • Updated Apr 24 • 1.08M • • 1.5k

upvoted a collection 5 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 738

authored a paper 6 months ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 163

upvoted a paper 6 months ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 163

authored 5 papers 6 months ago

Diversity-Aware Meta Visual Prompting

Paper • 2303.08138 • Published Mar 14, 2023

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

Paper • 2308.10315 • Published Aug 20, 2023

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12, 2025 • 43

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Paper • 2502.11903 • Published Feb 17, 2025

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Paper • 2509.22647 • Published Sep 26, 2025 • 37

liked a model 6 months ago

Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 15, 2025 • 8.15M • • 941

liked 2 models 7 months ago

Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15, 2025 • 3.92M • 394

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 6.2k • 29

liked 2 models 9 months ago

Qwen/Qwen3-VL-235B-A22B-Instruct

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 1.79M • • 391

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 7.69k • • 396

authored a paper 12 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24, 2025 • 27

liked a dataset 12 months ago

long-xing1/ScaleCap-450k

Viewer • Updated Jun 25, 2025 • 455k • 231 • 5

upvoted a paper 12 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24, 2025 • 27

upvoted a paper about 1 year ago

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10, 2025 • 46

Huang Qidong

AI & ML interests

Recent Activity

Organizations

shikiw's activity