← all galleries
Phase-1 — character face-quality experiment
The UltraShape comparison showed refinement doesn't sharpen faces, because at this scale a face is
texture- and source-sprite-bound, not geometry-bound. Phase-1 tested two cheap, sprite-side levers on the
archmage to see which actually moves the needle — control vs. each variant, same geometry & paint settings otherwise.
Pipeline under test:
arcane sprite → Hunyuan3D-2.1 shape → decimate + PBR paint @200K → textured GLB
Only the input image (A) or the sprite framing (B) changed; everything downstream was held constant.
Variants
| # | Lever tested | What changed | Cost |
| control | — | the shipped archmage (1024² sprite → shape → paint) | ~4.5 min |
| A | higher-res input | RealESRGAN-upscale the sprite to 2048² before paint (same coarse mesh re-painted) | +~30 s |
| B | face-emphasis sprite | regenerate the sprite with a waist-up / "detailed face" prompt → new shape → paint | +~3.5 min |
Full front
Face crop (zoomed)
Verdict.
A — no gain Upscaling the input sprite is a dud: A is visually identical to
control. The paint pipeline downsamples its conditioning, so extra input resolution is thrown away. Don't repeat it.
B — inconclusive The face-emphasis regen isn't meaningfully sharper, for two reasons:
(1) Z-Image largely ignored the "waist-up" framing and drew a full figure anyway, so the face didn't gain pixels;
(2) the archmage's face is occluded by hat-brim + beard — a poor test subject with little face to improve.
Conclusion & next lever. The two cheapest sprite-side tricks don't help. The remaining promising lever is
higher-resolution paint (more views + larger paint render resolution), which adds texels directly to whatever
face exists — to be tested on an unobstructed-face character (apprentice / witch), with a forceful portrait
prompt if framing is retested.
Phase-1b — forcing the framing (Z-Image vs SANA-Sprint)
The "waist-up" attempt (B) failed because the phrase was weak and buried in full-body context — not because
Z-Image can't frame. Re-run with a portrait-first prompt (full-body tokens removed), 4 seeds each, on an
unobstructed-face apprentice:
Both generators obey the strong prompt. Z-Image (top) frames the close-up crisply and on-style; SANA-Sprint
(bottom) frames fine but renders softer / more realistic (style-drift risk). Conclusion: the framing fix is
prompt surgery, not a generator swap — keep Z-Image. A true close-up carries much more face detail (more
pixels land on the face), but it reconstructs a bust, so world full-body characters still want higher-res
paint; the portrait is a separate hero/closeup asset.
On SANA for speed: its 2-step advantage only applies to the 2D sprite (~55 s of a ~270 s character) — shape +
paint dominate, so it's marginal (~18%) for 3D characters, but a large win for 2D-only mass sprite generation.
Phase-1b — reconstructed in 3D
Best seed from each generator (s2) → Hunyuan shape + paint. Both reconstruct cleanly as busts — the
close-up framing yields far more face detail than a full-body sprite carries.
▶ spin both in 3D →
Z-Image (left) reconstructs a crisp, on-style game face. SANA-Sprint (right) is softer / more
photoreal and comes out slightly waxier (single-view reconstruction punishes realism), and it drifts from the
stylized cast. Timings were identical bar the sprite step — shape ~114 s + paint ~90 s dominate — so
SANA's 2-step speed is marginal for 3D characters. Keep Z-Image for 3D; SANA's speed only pays off for
2D-only mass sprite work.
Phase-2 — higher-res paint on the full body
Re-painted the full-body apprentice's shape with more views + higher per-view resolution
(8 views @768 vs the baseline 6 @512; texture already 4096). Left = baseline, right = Phase-2:
Marginal. The Phase-2 face is slightly cleaner but not transformative — at ~2.5× the paint time
(187 s vs 76 s). Higher paint resolution can only resample the source sprite's ~70 px face; it can't add
detail that isn't there.
Every downstream lever so far (geometry refinement, input upscaling, higher-res paint) gives little or nothing —
all signs point at the source sprite's small face as the bottleneck. Phase-2b asks: can we just upscale that
source? ↓
Phase-2b — does upscaling the source help?
Re-painted the same full-body shape @768 from an ESRGAN-upscaled 2048 source (face ~70 → ~140 px)
vs the plain 1024 source — same geometry, same paint res, only the source differs:
No — near-identical. Upscaling adds pixels but not face information; ESRGAN can't invent detail the
generator never drew.
Corrected final conclusion. Face quality is bounded by the face detail the generator actually draws,
which is set by framing — a close-up makes the model render a real, detailed face; a full-body shot doesn't.
No post-hoc pixel trick (upscaling, hi-res paint, geometry refinement) recovers it. So:
hero / closeup → portrait busts; world full-body with a sharp face → a generative face pass
(img2img / inpaint the face region at high detail before reconstruction), not upscaling.
Result models
control.glb
A_upscaled-input.glb
B_face-emphasis.glb
B source sprite
portrait_Z-Image.glb
portrait_SANA.glb