A working field guide for architects and interior designers — ten principles, three applied workflows, and the honest limits of a model that has finally learned to listen.
OpenAI released gpt-image-2 on 21 April 2026, and for the first time architects and interior designers have a general-purpose image model that treats a prompt the way a drafting brief treats a drawing — with hierarchy, materials, constraints, and intent.
The model accepts flexible resolutions up to an experimental 2K, edits up to sixteen reference images in a single call, and — most relevant to design work — preserves everything outside the edit region with pixel-stable fidelity. Iterative material and furniture studies no longer redraw the room around you.
It ships in both the /v1/images/generations and /v1/images/edits endpoints, inside the Responses API, and through Codex. This guide distills OpenAI's official Cookbook, the developers.openai.com guides, and early field use from practicing architects into ten prompting principles and three applied workflows — tuned specifically for design practice.
What changed matters. Gpt-image-2 keeps the instruction-following strengths of its predecessors but retires the warm colour cast, lifts the resolution ceiling, improves text rendering to near-production accuracy, and — critically for iterative design work — preserves everything outside the edit region with structural fidelity.
Early case studies from Archtene, Rendair, and the fal.ai prompting guide converge on the same finding: the model rewards specific visual facts over adjectives, and rewards constraint language over decorative praise. OpenAI's own Cookbook is explicit — for photorealism, include the word "photorealistic" to engage the model's photorealistic mode; for edits, repeat what must be preserved on every turn.
Parameter-level realities you should internalise before writing a brief — because the quickest way to waste credits is to ask the model for something it structurally cannot do.
The architects and designers getting useful work out of gpt-image-2 in its first week share a common practice: they treat a prompt like a specification document, not a wish. What follows are the ten rules their briefs have in common.
Both OpenAI's Cookbook and the fal.ai guide converge on this spine. For a render, scene is the site and time of day, subject is the building or room, key details are materials and camera framing, constraints are what must not change, and the use case — "competition board," "client presentation," "magazine editorial" — sets the polish level.
Use short labelled segments with line breaks, not a single run-on paragraph. The model treats a structured brief as a brief, and an adjective soup as an adjective soup.
"Minimalist," "brutalist," and "Japandi" on their own are weak triggers. The fal.ai guide's "visual facts over vague praise" rule is the single highest-leverage technique for architectural output. Every canonical style compresses to a finite list of materials, palette, and silhouette rules — write those rules.
The OpenAI Cookbook states directly: including "photorealistic" engages the photorealistic mode. Pair it with texture-forward nouns — "real skin texture, pores, subtle film grain, brushed aluminium with micro-scratches, weathered copper patina, chipped paint, worn travertine" — and explicit anti-CGI language: "no glamorisation, no heavy retouching, no cinematic colour grading, no studio polish, avoid plastic AI look."
Archtene's production phrasing — "smooth realistic quality like 3Ds Max and V-Ray rendering, accurate shadows, reflections and architectural realism, materials feel natural and premium, less plastic" — consistently pulls renders away from the default oversaturated gloss.
The Cookbook warns that detailed camera specs are interpreted loosely: lens lengths influence look, not physics. A prompt like "medium close-up at eye level, 50mm lens feel, shallow depth of field, 35mm film aesthetic" is more reliable than a sensor-and-aperture recipe.
For composition, be literal. "Corner perspective at eye level, slight three-quarter angle, hero object centred with generous negative space, horizon line in the lower third." Archtene notes corner perspectives outperform flat front elevations because they reveal depth and form.
This is the most consequential discipline for design iteration. Every edit prompt should state explicitly what changes and — repeated every turn — what must be preserved.
Without the preservation clause, the model drifts saturation, reflections, and background over multiple iterations. Without repeating it on every turn, the drift compounds.
Both fal.ai and Archtene arrived at the same workflow rule independently: a single surgical edit — "warm the lighting," "mature the trees," "soften the façade finish" — outperforms a large rewrite every time.
This compounds with Principle V: the preserve list stabilises what you already like while one knob moves. Think of it as CAD versioning, not wish-making.
"A prompt is a specification document,
not a wish."
For signage mockups, wayfinding studies, or presentation boards with captions, put the literal string in quotes or ALL CAPS, specify font family (Inter, condensed sans, humanist serif), size, colour, placement, and kerning, and explicitly add "render the text exactly once, no duplicate text, no extra words."
Use quality: "high" for small text. PixVerse's informal fifty-prompt test found roughly nineteen of twenty generations returned fully legible first-pass text on gpt-image-2 when prompted this way.
Use the word "photorealistic" for renders and "architectural render" for concept boards. Archtene's tested template — "[building type], [style], [materials], [camera angle], [lighting], [site context], architectural render, realistic proportions, clean presentation, design-focused composition" — consistently produces the flat, even, competition-board aesthetic designers want for concept work.
"Photorealistic candid photograph" produces client-facing heroes with weather and atmosphere. Both modes exist. Name which one you want.
For multi-reference compositions — a site photo plus a finishes board plus a furniture swatch — label them explicitly in the prompt.
Gpt-image-2 accepts up to sixteen references per edit call and processes every one at high fidelity.
The model does not render to construction dimensions and never will. Rendair's long-standing caution — "the model tends to dream over geometry" — still applies. Seed scale explicitly.
Human and object references calibrate the model's sense of storey height, door widths, and furniture depth more effectively than any numeric prompt.
The ten principles are a grammar. A grammar without sentences is inert. What follows are the three design workflows where gpt-image-2 most clearly earns its place inside a studio's existing process — concept boards, photoreal renders, and material studies.
For early-stage ideation, gpt-image-2 functions as what Rendair calls a "digital sketchpad." The Cookbook's n=4 parameter is the key — generating four variants from one brief and comparing them is faster than writing four separate prompts.
For mood boards, shift the centre of gravity from geometry to atmosphere, palette, and texture. Ask for "a 2×2 grid of material vignettes: matte terracotta glaze against raw linen, polished black marble beside brushed brass, aged oak with natural oil finish, and rough-sawn spotted gum. Soft north-window light, flat-lay composition, magazine-quality editorial feel, no labels, no text."
Gpt-image-2's grid and image-set capabilities make this a single-call operation.
Archtene's published sketch-to-render recipe is the strongest documented workflow for turning Revit, SketchUp, Rhino, or Archicad screenshots into client-ready renders.
Then iterate one pass at a time — materials, then trees, then lighting, then glazing reflections, then detail sharpening.
For exterior heroes, these fragments produce reliable output at 1536×1024 high-quality:
For interiors, expand "luxury minimalist kitchen, marble island, oak joinery, soft daylight, magazine-quality realism" with specific stone (Calacatta Oro vs. Carrara vs. Taj Mahal quartzite), specific timber (rift-sawn white oak, spotted gum, American walnut), and specific light conditions.
Gpt-image-2 is not replacing V-Ray.
It is replacing the sketch phase.
Floorplan-to-perspective remains one of the most valuable workflows for client presentation. The Architizer method — draw an arrow on the floorplan indicating camera position and direction, upload the annotated plan, and prompt the model — works reliably on gpt-image-2 because of the model's improved spatial reasoning.
Boyuan Chen, the gpt-image-2 research lead, described the model's ability to handle "3D-style perspective shifts and complex spatial reasoning through simple text prompts" as a headline capability at launch.
Material studies are where gpt-image-2's edit endpoint earns its cost. The canonical pattern — tested across fal.ai, Archtene, and the OpenAI Cookbook — is a three-sentence structure: change, preserve, physical realism.
Reuse the same source image across parallel prompts and change one material at a time. Each comparison should share its preserve clause verbatim so the room reads as one room with four finish schemes — rather than four different rooms.
Single-image edits use /v1/images/edits with the change-preserve-physics pattern. Multi-image composition passes an array of reference images and addresses each by index. Masked edits pass a mask image; the Cookbook notes masking on gpt-image-2 is "prompt-based" — the model treats the mask as guidance rather than strict boundary.
Always reinforce the mask with words: "edit only the region under the mask; leave every other pixel unchanged."
The highest-reliability workflow for material transfer: upload the existing room as Image 1, upload a tight product photo of the new finish as Image 2.
The short list of things gpt-image-2 genuinely cannot do, so you stop asking it to — and the two behavioural pitfalls that show up on nearly every design team's first week.
Use quality: "high" whenever text appears in the image, when small-scale details (door hardware, stair nosings, tile grout) carry the design, or when the render is client-facing.
Drop to medium for fast comparison sprints. Use low for ideation at volume.
Stay at or below 2560×1440 for reliable output; treat anything above as experimental. Use landscape for exteriors and interior wide shots, portrait for towers and tall interior details, and square only for social and mood-board tiles.
All edges must be multiples of 16. Ratio capped at 3:1.
They write in a fixed order. They translate style names into material and palette rules. They name what preserves across iterations. They trigger photorealism by name and anchor it with honest texture language. They iterate one knob at a time.
Gpt-image-2 is not replacing V-Ray, Enscape, or the detail-level work of a visualiser. It is replacing the sketch phase, the comparison board, the first-pass material study, and the Monday-morning client conversation that used to take two days of modelling. Used that way — with the ten principles above as a checklist — it is the first general-purpose image model that genuinely fits inside a design studio's existing workflow rather than demanding a new one.