Higgsfield AI

Blog

Benchmark Testing Results: Our Video Model Leads in Quality and Prompt Adherence

Alex Mashrabov (Chief Executive Officer, Co-Founder)

Yerzat Dulat (Chief Research Officer, Co-Founder)

Alex Mashrabov (Chief Executive Officer, Co-Founder)

Yerzat Dulat (Chief Research Officer, Co-Founder)

The video generation space is booming and consumers and creators have so many options to choose from. We know that creators want three primary features from any video model: excellent video quality, how well their prompt is represented by the video and lots of motion.

We recently completed a training run and are excited to reveal our latest round of benchmark testing compared to other players in this space.

Our testing methodology was as follows:

Compare our video model to two other leading models: Pika Labs and Luma Labs.

Analyze three dimensions: Video / Visual Quality, Amount of Action, Prompt Adherence
Video / Visual Quality: a subjective review of how a video appears compared to other models. The labelers were looking for artifacts, proper generation, amount of detail and other visual cues that properly represent the user submitted text prompt.
Amount of Action: how well the video followed the physical actions specified in the prompt. We also looked at how fluid and real time the motion was if a modifier was not specified (e.g. a man moves slowly vs. a man is running)
Prompt Adherence: how well the video adheres to the user submitted prompt. We are looking for adherence to details (e.g. the color of clothing, the setting, the pose, etc.)
Test each video generator with dozens of user submitted prompts.

Compare our video model to two other leading models: Pika Labs and Luma Labs.

Analyze three dimensions: Video / Visual Quality, Amount of Action, Prompt Adherence
Video / Visual Quality: a subjective review of how a video appears compared to other models. The labelers were looking for artifacts, proper generation, amount of detail and other visual cues that properly represent the user submitted text prompt.
Amount of Action: how well the video followed the physical actions specified in the prompt. We also looked at how fluid and real time the motion was if a modifier was not specified (e.g. a man moves slowly vs. a man is running)
Prompt Adherence: how well the video adheres to the user submitted prompt. We are looking for adherence to details (e.g. the color of clothing, the setting, the pose, etc.)
Test each video generator with dozens of user submitted prompts.

The process was straightforward:

Generate a video using each user submitted text prompt
Ask a group of human labelers to grade each output based on each dimension
Each human labeler scores a video. The labeler does not know the source of each video, just that they are AI generated.
Everyone grades the same set of videos to remove any risk of bias or skewed results. This allows us to measure the results accurately.

Generate a video using each user submitted text prompt
Ask a group of human labelers to grade each output based on each dimension
Each human labeler scores a video. The labeler does not know the source of each video, just that they are AI generated.
Everyone grades the same set of videos to remove any risk of bias or skewed results. This allows us to measure the results accurately.

Here are the results from our internal tests:

Model Quality Comparison Test Results

Purpose

Higgsfield

Luma Labs

Pika Labs

Video Quality
(scale of 1-5)

Video Quality
(scale of 1-5)

4.42

4.11

3.79

Higgsfield

Luma Labs

Pika Labs

Amount of Action
(scale of 1-5)

3.47

3.79

2.74

Prompt Adherance

79%

58%

53%

And some example outputs:

This result highlighted the adherence to motion and the prompt itself. It was interesting to observe that Luma and Pika had limited success in generating the polar bear and adding appropriate motion to play the guitar.

Higgsfield

luma labs

pika

Prompt: A majestic polar bear strums a wooden guitar beside a cascading waterfall, set against a serene Arctic backdrop. The scene captures the bear sitting on a large ice floe, playful yet focused on the melody it creates. Snowflakes gently fall around, adding a whimsical touch to the surreal, frosty landscape bathed in the soft light of the polar twilight

In this human-centered example, both Higgsfield and Luma accurately portrayed a fun time at the disco and correct anatomical behavior of holding the can. With the swinging camera movement, it gave more life to the scene compared to Pika.

Higgsfield

luma labs

pika

Prompt: A girl with a 80's hairstyle and denim jumpsuit, holds pink cans of soda in her hands at a disco

In Conclusion:

We’re excited about the results of the test and how well our model performs in terms of visual quality, action and prompt adherence. We believe these are fundamental needs of video generators, regardless of the use case and we continue to prioritize these areas in model development and training.

Create the Unseen

Company

Legal

Socials

535 Mission St, 14th floor, San Francisco, CA, 94105

Create the Unseen

Company

Legal

Socials

535 Mission St, 14th floor, San Francisco, CA, 94105

Blog

Alex Mashrabov (Chief Executive Officer, Co-Founder)

Yerzat Dulat (Chief Research Officer, Co-Founder)

Alex Mashrabov (Chief Executive Officer, Co-Founder)

Yerzat Dulat (Chief Research Officer, Co-Founder)

The video generation space is booming and consumers and creators have so many options to choose from. We know that creators want three primary features from any video model: excellent video quality, how well their prompt is represented by the video and lots of motion.

The video generation space is booming and consumers and creators have so many options to choose from. We know that creators want three primary features from any video model: excellent video quality, how well their prompt is represented by the video and lots of motion.

We recently completed a training run and are excited to reveal our latest round of benchmark testing compared to other players in this space.

We recently completed a training run and are excited to reveal our latest round of benchmark testing compared to other players in this space.

Our testing methodology was as follows:

Our testing methodology was as follows:

Compare our video model to two other leading models: Pika Labs and Luma Labs.

Analyze three dimensions: Video / Visual Quality, Amount of Action, Prompt Adherence

Video / Visual Quality: a subjective review of how a video appears compared to other models. The labelers were looking for artifacts, proper generation, amount of detail and other visual cues that properly represent the user submitted text prompt.

Amount of Action: how well the video followed the physical actions specified in the prompt. We also looked at how fluid and real time the motion was if a modifier was not specified (e.g. a man moves slowly vs. a man is running)

Prompt Adherence: how well the video adheres to the user submitted prompt. We are looking for adherence to details (e.g. the color of clothing, the setting, the pose, etc.)

Test each video generator with dozens of user submitted prompts.

Compare our video model to two other leading models: Pika Labs and Luma Labs.

Analyze three dimensions: Video / Visual Quality, Amount of Action, Prompt Adherence

Video / Visual Quality: a subjective review of how a video appears compared to other models. The labelers were looking for artifacts, proper generation, amount of detail and other visual cues that properly represent the user submitted text prompt.

Amount of Action: how well the video followed the physical actions specified in the prompt. We also looked at how fluid and real time the motion was if a modifier was not specified (e.g. a man moves slowly vs. a man is running)

Prompt Adherence: how well the video adheres to the user submitted prompt. We are looking for adherence to details (e.g. the color of clothing, the setting, the pose, etc.)

Test each video generator with dozens of user submitted prompts.

The process was straightforward:

The process was straightforward:

Generate a video using each user submitted text prompt

Ask a group of human labelers to grade each output based on each dimension

Each human labeler scores a video. The labeler does not know the source of each video, just that they are AI generated.

Everyone grades the same set of videos to remove any risk of bias or skewed results. This allows us to measure the results accurately.

Generate a video using each user submitted text prompt

Ask a group of human labelers to grade each output based on each dimension

Each human labeler scores a video. The labeler does not know the source of each video, just that they are AI generated.

Everyone grades the same set of videos to remove any risk of bias or skewed results. This allows us to measure the results accurately.

Here are the results from our internal tests:

Here are the results from our internal tests:

Model Quality Comparison Test Results

Model Quality Comparison Test Results

And some example outputs:

And some example outputs:

This result highlighted the adherence to motion and the prompt itself. It was interesting to observe that Luma and Pika had limited success in generating the polar bear and adding appropriate motion to play the guitar.

This result highlighted the adherence to motion and the prompt itself. It was interesting to observe that Luma and Pika had limited success in generating the polar bear and adding appropriate motion to play the guitar.

In this human-centered example, both Higgsfield and Luma accurately portrayed a fun time at the disco and correct anatomical behavior of holding the can. With the swinging camera movement, it gave more life to the scene compared to Pika.

In this human-centered example, both Higgsfield and Luma accurately portrayed a fun time at the disco and correct anatomical behavior of holding the can. With the swinging camera movement, it gave more life to the scene compared to Pika.

Prompt: A girl with a 80's hairstyle and denim jumpsuit, holds pink cans of soda in her hands at a disco

Prompt: A girl with a 80's hairstyle and denim jumpsuit, holds pink cans of soda in her hands at a disco

In Conclusion:

In Conclusion:

We’re excited about the results of the test and how well our model performs in terms of visual quality, action and prompt adherence. We believe these are fundamental needs of video generators, regardless of the use case and we continue to prioritize these areas in model development and training.

We’re excited about the results of the test and how well our model performs in terms of visual quality, action and prompt adherence. We believe these are fundamental needs of video generators, regardless of the use case and we continue to prioritize these areas in model development and training.