VEFX-Bench — Video Editing Benchmark

A comprehensive benchmark for evaluating video editing models across 300 videos, 9 task categories, and 3 quality dimensions.

Benchmark version: v0

Download Dataset

Download the benchmark dataset from HuggingFace. It contains 300 source videos with their corresponding editing instructions.

Dataset: https://huggingface.co/datasets/xiangbog/VEFX-Bench

After running your video editing model on all benchmark videos, package the results as follows:

Create a .zip file containing 300 edited videos.

Name files as 0000.mp4, 0001.mp4, …, 0299.mp4

Each video should be the edited version of the corresponding original video in the benchmark dataset.

Tip: Ensure all 300 files are present and correctly numbered. Missing or misnamed files will receive a score of 0.

The benchmark covers 9 distinct editing categories:

Attribute Editing

Camera Angle Editing

Camera Motion Editing

Creative Edit

Instance Editing

Instance Motion Editing

Quantity Editing

Style Editing

Visual Effect Editing

Each video is evaluated along three complementary dimensions using the VEFX-Reward model. The Overall score is the average of all three.

score_geoagg

Weighted geometric aggregate of IF (2×), RQ, EE — primary ranking metric.

Range: 1-4 · Higher is better

Measures how well the edited video follows the editing instruction.

Range: 1-4 · Higher is better

Measures the visual rendering quality of the edited video.

Range: 1-4 · Higher is better

Measures whether only the intended region/attribute was edited without side effects.

Range: 1-4 · Higher is better

Each dimension is scored on a 1–4 scale:

Score	Level	Description
4	Excellent	Fully satisfies the criterion with no noticeable issues.
3	Good	Mostly satisfies the criterion with minor shortcomings.
2	Fair	Partially satisfies the criterion with noticeable issues.
1	Poor	Fails to satisfy the criterion or has severe issues.