PosterReward - Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

Recent advancements in the text-rendering capabilities of image generation models have made the end-to-end creation of graphic design content, such as posters, increasingly feasible. However, existing reward models fall short of accurately assessing design quality, as they primarily focus on global image aesthetics while overlooking the critical dimensions of typography and layout. Furthermore, the scarcity of domain-specific preference data remains a significant bottleneck, limiting the further development of graphic design evaluation and generation.

To bridge this gap, we design an automated pipeline to construct a high-quality dataset of 70k poster preferences by leveraging the consensus of multiple Multi-modal Large Language Models (MLLMs) to simulate human-like judgment. Based on this dataset, we propose PosterReward, a reward model specifically designed for high-precision poster assessment through a cascaded, multi-stage training strategy. We also provide multiple variants of the model to cater to different application scenarios. Finally, we introduce PosterRewardBench and PosterBench to evaluate the performance of existing reward models in poster assessment and the generation capabilities of current text-to-image models in poster creation, respectively.

Capability Showcase

What is PosterReward

PosterReward

Image + prompt are first analyzed, then mapped to scalar reward scores via the scoring module.

Shared Prompt: "A serene minimalist poster with Japanese negative space, showcasing a pagoda, skyscraper, cherry blossom tree, and stone lantern in a quiet urban park. At the top is the title "Tokyo Landmark Check-In Tour", and below the tree is the text "Discover Iconic Spots"."

→

Analysis (Chosen): The image exhibits excellent fundamental integrity. As a digital illustration, it is free from technical flaws like pixelation, unintended blur, or digital noise.

→

17.38

→

Analysis (Rejected): The image exhibits strong fundamental quality. However, a significant flaw is the inclusion of extraneous, nonsensical text—"Landmark Uleevpa."—directly below the main title.

→

1.15

PosterReward-Lite

A lightweight pointwise evaluator that predicts scores directly from image + prompt with minimal latency.

→

14.38

→

-1.66

PosterReward-Pairwise

A pairwise judge that compares two candidates and outputs preference decisions with interpretable textual reasoning.

→

Yes.

Image 1 is significantly better than Image 2 primarily due to its superior adherence to the "serene minimalist poster with Japanese negative space" aesthetic and its flawless text rendering.

Is Image 1 better than Image 2? Please answer Yes or No first, then provide the reason.

Automated Preference Data Construction

From Raw Image Pools to Reliable Preferences Dataset

Automated preference data collection pipeline

Automated AI-judged preference data pipeline with cascaded filtering, pair generation, and multi-model consensus validation.

Representative chosen-vs-rejected preference pairs, with dimensions such as aesthetics, text readability, layout coherence, and prompt alignment.

Large-scale pool generation and staged filtering

We build cinematic and non-cinematic sample pools from multiple generators, then progressively reduce noise using aesthetic scoring, similarity constraints, and stability checks before expensive model voting.

Consensus-based pair selection with bias control

Preference labels are verified by multiple advanced MLLMs under order-swapped pairwise prompting, which suppresses positional bias and retains only stable, high-confidence comparisons.

Poster-Preference-70K

The final dataset emphasizes typography, layout, and instruction faithfulness, addressing the underrepresentation of graphic design quality signals in general-purpose visual preference datasets.

Unified Cascaded Training Framework

Pairwise and Pointwise Joint Training Framework

The complete cascaded framework that unifies discriminative and pairwise reward learning through SFT, rejection sampling, score-module optimization, and reinforcement fine-tuning.

Joint SFT and rejection-sampling refinement

Single-image analysis and paired-image comparison are trained together, then refined by rejection sampling to improve answer quality and preference consistency.

Score module training with pairwise supervision

The scoring head is optimized on chosen/rejected triplets with Bradley-Terry loss, converting rich analysis into stable scalar reward differences.

RL-based analysis enhancement

A GRPO stage improves analysis outputs while keeping the scorer frozen, making final rewards more robust for downstream post-training.

Experiments

Experimental Results

Main Comparisons

We evaluate PosterReward against existing reward models on poster assessment tasks, demonstrating superior performance in both pointwise accuracy and pairwise preference prediction.

Pointwise Reward Models

Performance comparison across various benchmarks. All values represent accuracy (↑). PRB is an abbreviation for PosterRewardBench.

Model	MMRB2	HPDv3	PRB-Basic	PRB-Ad
ImageReward	53.0	58.6	60.7	49.3
PickScore	57.6	65.6	66.7	44.1
HPSv2	55.0	65.3	70.8	43.7
UnifiedReward*	56.9	59.4	60.0	52.7
HPSv3	58.5	76.9	72.9	41.2
PosterReward-Lite	60.5	77.1	83.9	85.0
PosterReward	59.6	77.8	86.7	86.0

Pairwise Reward Models

Performance comparison on PosterRewardBench (PRB). "Yes" and "No" refer to the accuracy on samples with positive and negative ground truth labels, respectively.

Model	PRB-Basic Acc. ↑			PRB-Ad Acc. ↑
	Yes	No	Avg.	Yes	No	Avg.
UnifiedReward-think	75.1	61.5	68.3	52.6	48.5	50.6
Qwen3-VL-Plus	89.9	39.2	64.5	98.7	14.2	56.4
Gemini-2.5-Flash	94.7	33.3	64.0	95.2	28.8	62.0
Gemini-2.5-Pro	75.6	83.1	79.3	81.8	68.6	75.2
GPT-5	90.4	80.5	85.4	89.8	75.9	82.9
PosterReward-Pairwise	82.0	84.0	83.0	84.1	83.6	83.8

User Study on Analysis Module

Human evaluation demonstrates the progressive improvement of our cascaded training strategy.

Overall preference comparison across different training stages. Right model win rate increases progressively with each component.

Ablation Study

We analyze the contribution of each component in our cascaded training framework.

PosterReward-Pairwise

Method	Advanced Acc. ↑			Basic Acc. ↑
Method	Yes	No	Avg.	Yes	No	Avg.
SFT (Single)	81.77	82.09	81.93	80.85	82.59	81.72
SFT (Joint)	82.09	83.32	82.71	80.08	83.75	81.92
+ RSFT (Single)	82.67	83.24	82.96	80.66	83.56	82.11
+ RSFT (Joint)	84.06	83.57	83.82	82.01	83.95	82.98

"Yes" and "No" refer to the ground truth of the response.

PosterReward Components

Model / Component	HPDv3	PRB-Basic	PRB-Ad
PosterReward-Lite	77.1	83.9	85.0
+ Analysis	77.5	85.7	85.8
+ Analysis + GRPO	77.8	86.7	86.0

Cumulative impact of each component on key benchmarks.

PosterBench Leaderboard v1.0, 2026/02

PosterReward supports stronger decision quality at inference through richer analysis traces and demonstrates high agreement with design-centric quality criteria on poster generation benchmarks.

Model	Mean ↑	Median ↑	Std ↓	Best-of-8 Avg ↑
Closed-Source Models
Nano-Banana	11.60	11.69	4.94	14.49
Seedream4.0	11.46	11.44	4.95	13.93
GPT-Image-1	11.16	11.38	4.85	13.43
Seedream3.0	5.01	5.13	6.28	9.75
Open-Source Models
Z-Image-Turbo	7.65	7.31	6.00	10.47
Qwen-Image-2512	11.86	11.63	5.28	13.85
Qwen-Image	7.69	7.72	5.57	11.06
Flux.1-dev	2.55	2.42	7.10	7.81
Flux-Krea	5.00	5.14	6.87	9.58
SD3.5-L	-2.90	-3.92	5.76	1.24

PosterBench evaluation results. Higher Mean/Median/Best-of-8 indicate better average and scalable performance; lower Std indicates more stable quality across prompts.

Application

Reward Model in Post Training and Test-Time Scaling

A practical reward source for RL alignment

PosterReward provides dense, design-aware scalar feedback over typography, composition, and semantic faithfulness, making it suitable as a reward function in post-training loops for image generation systems.

Bridging pairwise preference and pointwise optimization

The pairwise model improves preference reliability and interpretability, while the discriminative scorer supports efficient large-batch optimization, offering a unified path from data curation to policy improvement.

Transferable to diverse post-training scenarios

Beyond benchmark ranking, the reward can guide rejection sampling, reranking, and reinforcement fine-tuning for poster generation, helping models move from generic aesthetics to task-specific design quality.

Visual Comparison

PosterReward enables effective post-training improvement through Best-of-8 selection and GRPO training. Below we show qualitative comparisons between models before and after applying PosterReward-guided optimization.

Best-of-8 Selection with PosterReward

Using PosterReward to select the best from 8 candidates significantly improves output quality compared to random selection.

Left: Random selection from 8 candidates. Right: PosterReward-guided selection.

Flow-GRPO Fine-tuning on Flux.1-dev

Visual comparison of Flux.1-dev fine-tuned with various reward models. From left to right, the columns display the outputs of the original Flux.1-dev, followed by models fine-tuned using PosterReward, HPSv3, UnifiedReward and PickScore. The corresponding prompts are enclosed at the bottom.

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: This historical poster for "DC 9/11 Time of Crisis" features a serious portrait of actor Timothy Bottoms portraying George W. Bush, his face partially veiled by the American flag on the left. Behind him, a hazy, sky-blue background is subtly patterned with faint stars and stylized depictions of the New York City skyline and the World Trade Center towers. The overall style is earnest and somber, conveying the gravity of the events being portrayed. Bold, blocky sans-serif text dominates the right side of the poster. At the top, centered horizontally and rendered in black, is "TIMOTHY BOTTOMS". Below this, with a black horizontal line above and below, and occupying the middle of the right half of the frame, is the title "DC 9/11", also in large black sans-serif. Underneath this, between another black horizontal line and the bottom, is the impactful subtitle "TIME OF CRISIS", rendered in the same large, bold black sans-serif font. All text is oriented horizontally. The layout strategically places the actor's portrait and the American flag on the left, drawing the viewer's eye to the central figure, while the bold, prominent text on the right immediately conveys the film's subject matter and tone. The visual elements and typography work together to create a sense of historical weight and national significance, focusing on the human face of leadership during a defining moment.

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: Study abroad info session poster, Neue Bauhaus style, geometric composition, globe, laptop, digital tablet, muted blue-gray palette, modern informative atmosphere, title "Study Abroad Info Session" at top, subtitle "Culture & Tech Pathways" at bottom

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: A minimalist poster for a new single-origin coffee launch, featuring a glass pour-over dripper on the left, a ceramic mug filled with dark coffee to its right, a pile of roasted coffee beans in the foreground, and a small paper bag labeled "Single Origin" behind the beans. The background is a soft beige, creating a serene, sophisticated atmosphere. At the top is the title "New Single-Origin Coffee", and below the subjects is the text "Freshly Roasted & Ready to Brew".

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: A neumorphic poster for a new tech product launch, featuring a sleek silver smartphone centered, a matte black earbud case to its right, and a compact gold-accented power bank below. Soft gray backgrounds with metallic blue highlights create a futuristic mood. At the top is the title "Introducing NextGen Innovations", and in the center below the devices is the text "Available Worldwide".

Qwen-GRPO Fine-tuning

Visual comparison of Qwen model fine-tuned with various reward models. From left to right, the columns display the outputs of the original Qwen, followed by models fine-tuned using PosterReward, HPSv3, UnifiedReward and PickScore. The corresponding prompts are enclosed at the bottom.

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: A striking Art Deco poster for "Eye Protection Action" in nature and public welfare, showcasing a golden magnifying glass at top left, a vibrant green leaf at center right, and an open amber book at bottom left. At the top, the bold title "Eye Protection Action"; in the center, the tagline "See Nature, Protect Your Vision".

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: The poster depicts a close-up, three-quarters view of a woman's face in the foreground, her gaze directed upwards and to the right, set against a vast, icy landscape under a clear, pale blue sky. Scattered across the snow and ice are subtle streaks of light blue, suggesting cracks or meltwater, with a small figure in the distance providing a sense of scale against the immense backdrop. The style is cinematic and naturalistic, emphasizing the human element against a formidable natural environment. Prominently displayed is the main title, "MOUNTAIN QUEEN," positioned at the top left in two lines. To the right, spread across several lines, is the subtitle, "THE SUMMITS OF LHAKPA SHERPA." Both titles are in a clear, sans-serif font, with "MOUNTAIN QUEEN" in red and "THE SUMMITS OF LHAKPA SHERPA" in a deep blue, echoing the colors of the landscape and the subject's attire. The text is sharp and legible against the light background. The title "MOUNTAIN QUEEN" is positioned horizontally, while the subtitle is also horizontal but arranged vertically across the upper right quadrant, drawing the eye upwards towards the distant figure and the vastness of the mountains. The layout cleverly uses the subject's upward gaze to lead the viewer's attention from her face towards the text and the expansive landscape beyond, creating a sense of aspiration and scale. In the bottom left, the official selection laurel of the "tiff toronto international film festival" is present in white, providing context for the film's recognition.

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: This dramatic poster for "Wind River" presents a stark, layered landscape under a deep blue, nearly black sky illuminated by a single white moon. Bare, snow-covered branches of trees dominate the foreground on both sides, framing a central view of jagged, snow-capped mountains receding into the distance in shades of blue and purple, suggesting a chilling and remote setting. The style is a blend of illustration and cinematic mood, emphasizing the harsh beauty and isolation of a winter landscape. The bold, white sans-serif title "WIND RIVER" is prominently displayed at the top left, conveying the film's name with directness and strength. Below the title, in smaller sans-serif white text, is "TAYLOR SHERIDAN," the director's name, positioned just beneath and slightly to the right of "RIVER." Further down, centered below the mountains, the text "From the Writer of SICARIO & HELL OR HIGH WATER" appears in smaller white sans-serif font, emphasizing the film's pedigree and suggesting a similar tone of intensity and grit. The typography throughout is clean and modern, starkly contrasting with the natural, organic shapes of the landscape, and is consistently horizontal. The layout draws the eye from the foreground branches towards the imposing mountains in the middle distance and finally to the dark expanse of the sky, with the text strategically placed to be easily readable against the varying shades of blue and white, creating a sense of depth and highlighting the key information about the film.

Before RL

PosterReward

HPSv3

UnifiedReward

PickScore

Prompt: This movie poster for "No Time To Die" employs a gritty, illustrative style, reminiscent of classic spy thrillers, featuring prominent portraits of the main cast. A large, head-and-shoulders shot of Daniel Craig as James Bond, facing right and looking intently off-frame, dominates the top and center, superimposed over a target graphic marked with bullet holes. Below and to the left, a close-up portrait of Rami Malek as the villain looms, while to Bond's left, Ana de Armas in a black dress holds a pistol. A smaller image of Ben Whishaw as Q is positioned below Rami Malek, and to the right of the text, Léa Seydoux as Madeleine Swann stands with her arms crossed. Action scenes are layered throughout the lower portion: a motorcycle chase with Bond at the forefront, a distant explosion, and a vintage car driving down a road. The title, "NO TIME TO DIE," is presented in large, blocky, all-caps sans-serif font, rendered in a stark white with no texture, positioned horizontally across the center of the poster, appearing to break through the composite image. The layout is a dynamic collage, with the large central figures establishing a hierarchy and the surrounding elements, including the action scenes and supporting characters, creating a sense of depth and narrative, all anchored by the bold, central title which draws the eye and clearly communicates the film's name.

PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

What is PosterReward

From Raw Image Pools to Reliable Preferences Dataset

Pairwise and Pointwise Joint Training Framework

Experimental Results

Main Comparisons

Pointwise Reward Models

Pairwise Reward Models

User Study on Analysis Module

Ablation Study

PosterReward-Pairwise

PosterReward Components

PosterBench Leaderboard v1.0, 2026/02

Reward Model in Post Training and Test-Time Scaling

Visual Comparison

Best-of-8 Selection with PosterReward

Flow-GRPO Fine-tuning on Flux.1-dev

Qwen-GRPO Fine-tuning