Flight Path vs Storyboard
Testing Seedance 2 and Google Omni Flash on the same aerial brief.
The output: an FPV aerial flythrough of Melbourne's Yarra River corridor, generated from a flight path plus a four-frame storyboard.
Getting an AI video model to follow a specific flight path over a real city is harder than it looks. The brief seems simple: fly along the Yarra River in Melbourne, move south toward the CBD, descend to river level, pass under the bridges, pull back to reveal the skyline.
I tested two models on this brief: Seedance 2 and Google Omni Flash. Both were given the same prompt. Both were given the same reference material. The question was not which model produces the better image. The question was which input structure produces the most accurate spatial result.
The answer surprised me. And the fix was not what I expected.
A site-specific aerial sequence.
The shot was an FPV drone sequence along the Yarra River in Melbourne, starting at high altitude over Southbank, tracking south-east along the river corridor toward Flinders Street Station and Federation Square, descending to a low river-level pass under the pedestrian bridges, then pulling back to reveal the full CBD skyline at golden hour.
This is the kind of shot that would cost several thousand dollars to commission as a real drone production. As an architectural communication tool, it is the sort of sequence that establishes site context, orientation, and scale in ten seconds, far more legibly than a plan or an aerial photograph alone.
Flight path reference only.
The first attempt used a flight path arrow overlaid on a Google Maps aerial as the reference image. A red arrow traced the intended camera trajectory from Southbank south-east toward the CBD. The prompt described the FPV movement, the altitude change, and the target landmarks.
Both models produced a video that was technically on-path. The camera moved in roughly the right direction. But the spatial accuracy was poor.
Camera direction: correct. Building positions: wrong. Scale: shifting mid-flight. Landmark placement: drifting frame to frame. The flight path told the model where to go. It did not tell the model what to see.
The Yarra River read as a generic city waterway. The distinctive bend at Southbank was absent. Federation Square was not recognisable. The CBD skyline geometry was plausible but architecturally fictional.
Both Seedance 2 and Google Omni Flash produced this class of result. The failure was not model-specific. It was input-specific.
A flight path is a vector, not a storyboard.
A flight path arrow communicates direction and trajectory. It does not communicate what the camera should be looking at at any given moment. It does not describe altitude changes. It does not establish the spatial sequence of landmarks. It gives the model a vector, not a storyboard.
For a generic city, this might be acceptable. The model can hallucinate a plausible skyline and the viewer will not notice the inaccuracy.
For a specific site, particularly one as visually distinctive as Melbourne's Yarra River corridor, with its characteristic bridges, the Arts Centre spire, the curve of the river at Princes Bridge, and the contrast between Southbank's hospitality strip and the CBD's glass towers, the hallucinated result is not usable.
An architect or project manager presenting this video to a client who knows Melbourne would immediately see the problem. The building positions are wrong. The spatial logic is approximate. The communication value is undermined.
Adding a four-frame storyboard.
The fix was to add a four-frame storyboard as a second reference input, alongside the flight path map. The storyboard used actual reference photographs of the Yarra River corridor to establish the visual sequence the camera needed to pass through.
High aerial looking south down the Yarra toward the CBD. Establishes altitude, orientation, and the river geometry.
Descending FPV approach over Southbank. Camera angle drops. The hospitality strip and pedestrian bridges come into frame.
Low river-level flight passing under the bridges. The underside of the bridge deck, the reflected skyline in the water, the transition from open sky to enclosed structural space.
Pulling back to reveal the full skyline at golden hour. Camera lifts. The CBD geometry and Arts Centre spire anchor the final frame.
The improvement in both models was significant. The spatial logic held. Landmark positions stabilised. The altitude progression read correctly. The river geometry was recognisable.
Seedance 2 vs Google Omni Flash.
With the storyboard input in place, the models diverged in their strengths.
| Criteria | Seedance 2 | Google Omni Flash |
|---|---|---|
| Spatial accuracy | Strong | Strong |
| Lighting consistency | Strong held warm golden hour throughout | Variable cooler grade, lens flare mid-flight |
| Building materiality | Strong glass and concrete read correctly | Adequate slightly stylised surfaces |
| Camera physics | Smooth FPV motion felt grounded | Cinematic more dramatic, less realistic |
| Artefacts | Minimal | Lens flare and geometry drift in bridge passage |
| Overall (with storyboard) | Production-adjacent | Strong concept, needs iteration |
Neither model is a clear winner on every dimension. Seedance 2 produced more consistent results with fewer artefacts. Google Omni Flash produced a more cinematic visual language but required more iteration to control the lighting and geometry drift.
Critically, both models closed most of the gap between Round 1 and Round 2 once the storyboard was introduced. The storyboard was the variable that mattered most, not which model was used.
The flight path says where. The storyboard says what.
A flight path tells the model where to go. A storyboard tells it what to see along the way.
For site-specific aerial work over a real city, the flight path reference is necessary but not sufficient. It establishes the macro trajectory. The storyboard establishes the spatial sequence: the altitude at each moment, the camera orientation, the landmark hierarchy, and the visual logic of the shot.
Without the storyboard, both models default to a plausible but generic interpretation of the path. The city becomes a hallucinated approximation. For architectural communication, where the specific site, the specific view, and the specific landmark positions carry meaning, that approximation is not acceptable.
The practical implication: for any AI aerial video brief over a real site, the minimum viable input set is a flight path reference plus a storyboard. The storyboard does not need to be polished. Four reference photographs taken from Google Street View or a drone footage library, sequenced to match the intended camera path, is enough to stabilise the output significantly.
The tool accelerates. The judgment is still the architect's.
The ability to generate a credible aerial flythrough of a real site, from a defined camera path, with controlled lighting and recognisable landmark positions, is a meaningful shift in how early-stage architectural communication can work.
It does not replace drone production. For a completed building or a major presentation, commissioned drone footage is still the right tool. But for a feasibility study, a planning submission, a community consultation document, or an early client presentation, an AI-generated aerial sequence, produced in a few hours rather than a few days, is a useful addition to the communication toolkit.
The workflow requires judgment at the input stage. Choosing the right storyboard frames, understanding what the model responds to, knowing when the output is architecturally accurate enough for the purpose, these are not automated decisions. They require the same spatial literacy and site understanding that an architect brings to any form of site representation.
The tool accelerates. The judgment is still the architect's.