Generative AI
Generative AI in the context of architectural design is a rapidly-evolving design tool that can broadly divided into three categories.
Blank canvas: Early concept to set tone only using text to describe the desired image.
Reference: Uses reference images or linework to guide the image generation, more clearly defining the shape and color or material pallette
Details: Refines the overall previously-rendered image or modifys specific areas of the image.
Blank Canvas
Programs: Midjourney, Stable Diffusion, Adobe Firefly, Microsoft Designer
Midjourney Prompt: A bronze metallic screen facade tower with diamond shape support frame backlit at night.
In the early concept phase, it is common to use reference images from personal past work or publicly available work to communicate a design intent. The images could be used to discuss massing, materials, structure, historical precedents, style, etc. Text-to-image generative AI enables the user to produce an image from a text prompt describing the desired outcome. Midjourney and Stable Diffusion are two of the most popular text-to-image machine learning models.
Experimenting with text prompts is some of the most fun that a designer can have with AI tools. Within seconds you have a highly detailed image of almost anything you can imagine given the correct description. This is how the concept of “prompt engineering” came about. Language learning models use natural language to understand or interpret a prompt but each model and even each version of a model can interpret the prompt differently depending on how it is structured therefore making prompt engineering a critical part of any generative AI workflow.
Programs: Midjourney, Stable Diffusion using ControlNet, Diffusion plugin for Sketchup, Veras plugin for Revit
Reference
Sketchup Diffusion Prompt: A house with light blue lap siding and gray roof in a grass field. Dramatic clouds.
One of the major limitations of a program like Midjourney is the lack of relative control over the output given the descriptive input. The user may want to narrow the number of variations or control the shape of the content in the image, and this can be done with linework. Using image-to-image models with a layer of descriptive text on top will give the program some rules to make the final image more predictable depending on the other parameters.
In the example above, I used Sketchup to model a friend’s house and then used Sketchup’s built-in AI image generator to produce a rendering with a certain feeling or atmosphere. It works at a “thumbnail” size to communicate the mood, but zooming in reveals all sorts of errors and odd details may be ignored for internal team use would probably not be appropriate to share with a client. I have no doubt that the quality of these images will improve over time and the convenience of built-in AI tools will make the apps much more valuable.
Details
Programs: Stable Diffusion’s Paint-In, Photoshop’s generative fill
Stable Diffusion Prompt: A woman walking upstairs viewed from behind.
This third broad category of AI modifies parts of an image by either substituting it with a more detailed version, changing the content in a small area, or adding/removing content. Photorealistic renderings are made more real by introducing more detailed elements like higher-quality assets or entourage. This can be done within rendering programs like D5 Render or in post with Photoshop’s latest generative AI tools.
Using AI at this stage not only increase the quality of the final image, but it also saves time either by not having to re-render an image if a mistake is found or by adding details with text prompts rather than hours of modeling.
For example, 3D entourage has seen a lot of improvement in quality, but they still tend to not hold up in a close-up shot in a photorealistic environment. Using Stable Diffusion, I can swap out the built-in entourage person with a more detailed, realistic version that retains the qualities of the original character like their posture and outfit.