VEFX-Bench — Video Editing Benchmark
A comprehensive benchmark for evaluating video editing models across 300 videos, 9 task categories, and 3 quality dimensions.
Benchmark version: v0
Download Dataset
Download the benchmark dataset from HuggingFace. It contains 300 source videos with their corresponding editing instructions.
Download from HuggingFaceDataset: https://huggingface.co/datasets/xiangbog/VEFX-Bench
Submission Format
After running your video editing model on all benchmark videos, package the results as follows:
Create a .zip file containing 300 edited videos.
Name files as 0000.mp4, 0001.mp4, …, 0299.mp4
Each video should be the edited version of the corresponding original video in the benchmark dataset.
Task Categories
The benchmark covers 9 distinct editing categories:
Evaluation Metrics
Each video is evaluated along three complementary dimensions using the VEFX-Reward model. The Overall score is the average of all three.
Instructional Following (IF)
Measures how well the edited video follows the editing instruction.
Range: 1-4 · Higher is better
Render Quality (RQ)
Measures the visual rendering quality of the edited video.
Range: 1-4 · Higher is better
Edit Exclusivity (EE)
Measures whether only the intended region/attribute was edited without side effects.
Range: 1-4 · Higher is better
Scoring Rubric
Each dimension is scored on a 1–4 scale:
| Score | Level | Description |
|---|---|---|
| 4 | Excellent | Fully satisfies the criterion with no noticeable issues. |
| 3 | Good | Mostly satisfies the criterion with minor shortcomings. |
| 2 | Fair | Partially satisfies the criterion with noticeable issues. |
| 1 | Poor | Fails to satisfy the criterion or has severe issues. |