Prompting GPT Image 2 — A Field Guide for Architects & Interior Designers

Foreword

02 / 17

A note before we begin

OpenAI released gpt-image-2 on 21 April 2026, and for the first time architects and interior designers have a general-purpose image model that treats a prompt the way a drafting brief treats a drawing — with hierarchy, materials, constraints, and intent.

The model accepts flexible resolutions up to an experimental 2K, edits up to sixteen reference images in a single call, and — most relevant to design work — preserves everything outside the edit region with pixel-stable fidelity. Iterative material and furniture studies no longer redraw the room around you.

It ships in both the /v1/images/generations and /v1/images/edits endpoints, inside the Responses API, and through Codex. This guide distills OpenAI's official Cookbook, the developers.openai.com guides, and early field use from practicing architects into ten prompting principles and three applied workflows — tuned specifically for design practice.

What changed matters. Gpt-image-2 keeps the instruction-following strengths of its predecessors but retires the warm colour cast, lifts the resolution ceiling, improves text rendering to near-production accuracy, and — critically for iterative design work — preserves everything outside the edit region with structural fidelity.

Early case studies from Archtene, Rendair, and the fal.ai prompting guide converge on the same finding: the model rewards specific visual facts over adjectives, and rewards constraint language over decorative praise. OpenAI's own Cookbook is explicit — for photorealism, include the word "photorealistic" to engage the model's photorealistic mode; for edits, repeat what must be preserved on every turn.

The Essentials

03 / 17

Specifications at a glance

Before a single prompt, know the machine.

Parameter-level realities you should internalise before writing a brief — because the quickest way to waste credits is to ask the model for something it structurally cannot do.

Max Resolution

2560 × 1440

Edges in multiples of 16. Ratio ≤ 3:1. Longest edge < 3840 px.

Quality Tiers

low · med · high

High for client-facing, medium for sprints, low for ideation volume.

Reference Images

Up to 16

Each processed at high fidelity. Address by index in the prompt.

Cost / Render

~ $0.041

For a 1536×1024 high-quality landscape. Cheap enough for 20-image studies.

PNG · JPG

Default output. WebP and JPEG are faster and support output_compression. Transparent backgrounds are not supported — use background: "opaque" and remove downstream.

n = 4

Variant generation. One brief yields four parallel variants. Faster than writing four separate prompts. Best friend of concept-board work.

~ 2 min

Complex prompts. Can take up to two minutes. Plan iteration cycles accordingly; don't expect instant turnaround on heavy briefs.

Part One — The Ten Principles

04 / 17

Part One

Ten principles that actually move design output.

The architects and designers getting useful work out of gpt-image-2 in its first week share a common practice: they treat a prompt like a specification document, not a wish. What follows are the ten rules their briefs have in common.

Principles · 01 — 03

05 / 17

01

Write in a fixed order. Scene → subject → details → constraints → use case.

Both OpenAI's Cookbook and the fal.ai guide converge on this spine. For a render, scene is the site and time of day, subject is the building or room, key details are materials and camera framing, constraints are what must not change, and the use case — "competition board," "client presentation," "magazine editorial" — sets the polish level.

Use short labelled segments with line breaks, not a single run-on paragraph. The model treats a structured brief as a brief, and an adjective soup as an adjective soup.

02

Replace style words with their visual atoms.

"Minimalist," "brutalist," and "Japandi" on their own are weak triggers. The fal.ai guide's "visual facts over vague praise" rule is the single highest-leverage technique for architectural output. Every canonical style compresses to a finite list of materials, palette, and silhouette rules — write those rules.

Instead of: "brutalist façade" Board-formed exposed concrete with visible wooden formwork striations, aggregate texture, cantilevers, deep recesses, raw concrete grey with rust staining, modular repetition, high-contrast raking shadows. Instead of: "Japandi interior" Matte white walls, light oak floor, low black steel-frame furniture, handcrafted ceramics on slim shelves, noren curtain at the entrance, paper lantern diffusing soft indirect light, sand and beige palette.

03

Trigger photorealism by name — then anchor it with texture.

The OpenAI Cookbook states directly: including "photorealistic" engages the photorealistic mode. Pair it with texture-forward nouns — "real skin texture, pores, subtle film grain, brushed aluminium with micro-scratches, weathered copper patina, chipped paint, worn travertine" — and explicit anti-CGI language: "no glamorisation, no heavy retouching, no cinematic colour grading, no studio polish, avoid plastic AI look."

Archtene's production phrasing — "smooth realistic quality like 3Ds Max and V-Ray rendering, accurate shadows, reflections and architectural realism, materials feel natural and premium, less plastic" — consistently pulls renders away from the default oversaturated gloss.

Principles · 04 — 05

06 / 17

04

Camera language loosely. Composition precisely.

The Cookbook warns that detailed camera specs are interpreted loosely: lens lengths influence look, not physics. A prompt like "medium close-up at eye level, 50mm lens feel, shallow depth of field, 35mm film aesthetic" is more reliable than a sensor-and-aperture recipe.

For composition, be literal. "Corner perspective at eye level, slight three-quarter angle, hero object centred with generous negative space, horizon line in the lower third." Archtene notes corner perspectives outperform flat front elevations because they reveal depth and form.

Fig. 1 Corner perspective, eye level

05

Edits are a two-column contract: change + preserve.

This is the most consequential discipline for design iteration. Every edit prompt should state explicitly what changes and — repeated every turn — what must be preserved.

OpenAI Cookbook template In this room photo, replace ONLY the white chairs with chairs made of wood. Preserve camera angle, room lighting, floor shadows, and surrounding objects. Keep all other aspects of the image unchanged. Photorealistic contact shadows and fabric texture.

Without the preservation clause, the model drifts saturation, reflections, and background over multiple iterations. Without repeating it on every turn, the drift compounds.

Principles · 06 — 07

07 / 17

06

Iterate with one change per turn.

Both fal.ai and Archtene arrived at the same workflow rule independently: a single surgical edit — "warm the lighting," "mature the trees," "soften the façade finish" — outperforms a large rewrite every time.

This compounds with Principle V: the preserve list stabilises what you already like while one knob moves. Think of it as CAD versioning, not wish-making.

"A prompt is a specification document,
not a wish."

07

Treat in-image text as typography, not content.

For signage mockups, wayfinding studies, or presentation boards with captions, put the literal string in quotes or ALL CAPS, specify font family (Inter, condensed sans, humanist serif), size, colour, placement, and kerning, and explicitly add "render the text exactly once, no duplicate text, no extra words."

Use quality: "high" for small text. PixVerse's informal fifty-prompt test found roughly nineteen of twenty generations returned fully legible first-pass text on gpt-image-2 when prompted this way.

Principles · 08 — 10

08 / 17

08

Name the render mode. Photorealistic, or architectural?

Use the word "photorealistic" for renders and "architectural render" for concept boards. Archtene's tested template — "[building type], [style], [materials], [camera angle], [lighting], [site context], architectural render, realistic proportions, clean presentation, design-focused composition" — consistently produces the flat, even, competition-board aesthetic designers want for concept work.

"Photorealistic candid photograph" produces client-facing heroes with weather and atmosphere. Both modes exist. Name which one you want.

09

Reference images are indexed inputs — address them by number.

For multi-reference compositions — a site photo plus a finishes board plus a furniture swatch — label them explicitly in the prompt.

Image 1 is the existing room to preserve. Image 2 is the wood grain reference. Image 3 is the sofa reference. Apply the wood from Image 2 to the flooring in Image 1; replace the sofa in Image 1 with the sofa from Image 3. Match scale, cast shadows, and white balance to Image 1.

Gpt-image-2 accepts up to sixteen references per edit call and processes every one at high fidelity.

10

Add scale cues. Do not trust proportion.

The model does not render to construction dimensions and never will. Rendair's long-standing caution — "the model tends to dream over geometry" — still applies. Seed scale explicitly.

One parked car at the curb, two pedestrians walking at adult height, café tables and chairs on the terrace, bicycles against the wall, garden lighting at knee height.

Human and object references calibrate the model's sense of storey height, door widths, and furniture depth more effectively than any numeric prompt.

Part Two — Applied Workflows

09 / 17

Three workflows

From principles to practice.

The ten principles are a grammar. A grammar without sentences is inert. What follows are the three design workflows where gpt-image-2 most clearly earns its place inside a studio's existing process — concept boards, photoreal renders, and material studies.

Workflow 01 · Concept Boards

10 / 17

01

Concept boards
& ideation

The digital sketchpad

Generate four variants, compare, commit.

For early-stage ideation, gpt-image-2 functions as what Rendair calls a "digital sketchpad." The Cookbook's n=4 parameter is the key — generating four variants from one brief and comparing them is faster than writing four separate prompts.

Competition-stage library, concept brief Scene: small coastal town site, autumn afternoon, overcast natural light.

Subject: public library with timber louvered façade and a cantilevered reading room.

Details: spotted gum battens with visible grain, blackened steel window frames, board-formed concrete plinth, clerestory glazing admitting raking light, mature fig trees framing entry.

Use case: concept presentation board, architectural render, realistic proportions, clean presentation.

Constraints: no text, no watermarks, no people, eye-level corner perspective.

Fig. 2 Ideation, early stage

For mood boards, shift the centre of gravity from geometry to atmosphere, palette, and texture. Ask for "a 2×2 grid of material vignettes: matte terracotta glaze against raw linen, polished black marble beside brushed brass, aged oak with natural oil finish, and rough-sawn spotted gum. Soft north-window light, flat-lay composition, magazine-quality editorial feel, no labels, no text."

Gpt-image-2's grid and image-set capabilities make this a single-call operation.

Workflow 02 · Photoreal Renders

11 / 17

02

Photoreal renders
& visualisation

Turning Revit screenshots into client-ready images.

Sketch to render

Archtene's published sketch-to-render recipe is the strongest documented workflow for turning Revit, SketchUp, Rhino, or Archicad screenshots into client-ready renders.

Upload source view, then Do a realistic render of this photo. Keep the same shapes and forms of buildings, fences, windows and doors. Add lush foliage to planter boxes and landscaping. Use a slightly blurred suburban streetscape background. Make materials feel natural and premium, less plastic. Smooth realistic quality like 3Ds Max and V-Ray rendering. Use accurate shadows, reflections and architectural realism.

Then iterate one pass at a time — materials, then trees, then lighting, then glazing reflections, then detail sharpening.

Tested skeletons

For exterior heroes, these fragments produce reliable output at 1536×1024 high-quality:

Modern luxury home, travertine façade, warm lighting, premium landscaping, sunset mood.

Contemporary coastal home, limestone textures, windswept grasses, soft ocean light.

Mid-rise apartment façade, active ground floor, realistic pedestrians, urban streetscape.

For interiors, expand "luxury minimalist kitchen, marble island, oak joinery, soft daylight, magazine-quality realism" with specific stone (Calacatta Oro vs. Carrara vs. Taj Mahal quartzite), specific timber (rift-sawn white oak, spotted gum, American walnut), and specific light conditions.

Gpt-image-2 is not replacing V-Ray.
It is replacing the sketch phase.

Workflow 02 · Continued

12 / 17

Floorplan to perspective

Draw an arrow. Get a room.

Floorplan-to-perspective remains one of the most valuable workflows for client presentation. The Architizer method — draw an arrow on the floorplan indicating camera position and direction, upload the annotated plan, and prompt the model — works reliably on gpt-image-2 because of the model's improved spatial reasoning.

After uploading annotated plan Create an image of a 3D space from the angle shown on the floorplan as if you are a human standing there.

Boyuan Chen, the gpt-image-2 research lead, described the model's ability to handle "3D-style perspective shifts and complex spatial reasoning through simple text prompts" as a headline capability at launch.

Fig. 3 Timber façade — exterior hero

Fig. 4 Material flat-lay, mood reference

Workflow 03 · Material Studies

13 / 17

03

Material, finish
& detail studies

Where the edit endpoint earns its cost.

Material studies are where gpt-image-2's edit endpoint earns its cost. The canonical pattern — tested across fal.ai, Archtene, and the OpenAI Cookbook — is a three-sentence structure: change, preserve, physical realism.

Kitchen finish study Replace only the white cabinetry with natural oak with a matte finish and visible straight grain.

Preserve the camera angle, table shape, window light, floor shadows, reflections on the countertop, refrigerator geometry, and all surrounding objects.

Match lighting and contact shadows to the original photo; render believable oak grain with soft directional highlights.

Fig. 5 Board-formed concrete — detail

Comparative studies

Reuse the same source image across parallel prompts and change one material at a time. Each comparison should share its preserve clause verbatim so the room reads as one room with four finish schemes — rather than four different rooms.

Façade

Dark brick · off-form concrete · charcoal-stained timber battens · terracotta rainscreen — on the same building, same camera, same light.

Kitchen

Sage green cabinets with terracotta tile · navy cabinets with Calacatta marble · warm oak cabinets with travertine.

Light

Morning raking light · overcast midday · golden hour · blue hour with warm interior spill · rainy evening with neon reflection.

Reference Boards

14 / 17

Edit modes

Three modes for most design work.

Single-image edits use /v1/images/edits with the change-preserve-physics pattern. Multi-image composition passes an array of reference images and addresses each by index. Masked edits pass a mask image; the Cookbook notes masking on gpt-image-2 is "prompt-based" — the model treats the mask as guidance rather than strict boundary.

Always reinforce the mask with words: "edit only the region under the mask; leave every other pixel unchanged."

Material transfer

The highest-reliability workflow for material transfer: upload the existing room as Image 1, upload a tight product photo of the new finish as Image 2.

Material transfer prompt Apply the material shown in Image 2 to the floor surface in Image 1. Preserve the floor plan geometry, furniture positions, wall paint, window openings, and lighting direction. Match scale of the grain pattern to realistic plank dimensions. Render natural contact shadows and believable reflections consistent with the original lighting.

The Honest Limits

15 / 17

What not to expect

A CAD tool it is not — and never will be.

The short list of things gpt-image-2 genuinely cannot do, so you stop asking it to — and the two behavioural pitfalls that show up on nearly every design team's first week.

i.

It dreams over geometry. Rendair's warning remains accurate for any prompt that asks for dimensional precision, exact floor plans, or repeating fenestration grids with consistent mullion spacing. If it has to be measured, it has to be modelled.
ii.

Logo reproduction is unreliable. Composite brand marks in Photoshop afterward rather than asking the model to render them. The same applies to any specific text longer than a short signage string.
iii.

Complex prompts can take two minutes. Plan your iteration cycles accordingly. Don't stack four agencies of change into a single brief and expect speed.
iv.

Overloaded prompts hurt output. Stacking "stunning, ultra-detailed, cinematic, masterpiece, 8K, award-winning" produces worse output than terse visual facts. Adjective inflation is not intensity — it is noise.
v.

Vague preservation compounds. A prompt that says "keep the room the same" while adding furniture will quietly shift the wall colour, floor reflection, and window framing. Name what must stay, and name it again on every iteration.

Parameter Discipline

16 / 17

A brief word on quality settings.

When to spend

Use quality: "high" whenever text appears in the image, when small-scale details (door hardware, stair nosings, tile grout) carry the design, or when the render is client-facing.

Drop to medium for fast comparison sprints. Use low for ideation at volume.

Aspect & ratio

Stay at or below 2560×1440 for reliable output; treat anything above as experimental. Use landscape for exteriors and interior wide shots, portrait for towers and tall interior details, and square only for social and mood-board tiles.

All edges must be multiples of 16. Ratio capped at 3:1.

Client Hero

high · 1536×1024

Comparison Sprint

medium · 1024×1024

Ideation Volume

low · 1024×1024

Mood Board

high · square

Closing the Loop

17 / 17

In conclusion

The architects and designers getting useful work out of gpt-image-2 share a common practice: they treat a prompt like a specification document, not a wish.

They write in a fixed order. They translate style names into material and palette rules. They name what preserves across iterations. They trigger photorealism by name and anchor it with honest texture language. They iterate one knob at a time.

Gpt-image-2 is not replacing V-Ray, Enscape, or the detail-level work of a visualiser. It is replacing the sketch phase, the comparison board, the first-pass material study, and the Monday-morning client conversation that used to take two days of modelling. Used that way — with the ten principles above as a checklist — it is the first general-purpose image model that genuinely fits inside a design studio's existing workflow rather than demanding a new one.

PromptingGPT Image 2