Chat GPT 4, DALL-E 3 Issues, quirks, and prompt alterations.

# Technical Report | Chat GPT 4, DALL-E 3 Issues, quirks, and prompt alterations. Author: Willow Salix 2024 Editor: ChatGPT-4 Contents 1. Introduction 2. Prompt Alterations 2.1 GPT-4 then DALL-E 3 2.2 DALL-E 3 Only 2.3 Prompt Alterations in GPT-4/DALL-E 3: Ensuring Unaltered Prompts 3. Quirks and Issues 3.1 Tattoo Localization in D3 3.2 Challenges with Text Representation in D3 3.3 The Intricacies of Hair Representation in D3 3.4 The Challenges of Glasses Representation in D3's Outputs 3.5 The Importance of Detailed Environmental Descriptions in D3 Renderings 3.6 Precision in Shoe Descriptions: A Key to Optimal D3 Renderings 4. Conclusion ## 1) Introduction DALL-E 3 is built natively on ChatGPT, which means any input you give will, unless explicitly told otherwise, always take your input, and alter it. Usually this is a highly useful feature in applications such as creative tasks where the main idea or concept is not well thought out, or when the user wants to have their unknowns filled in by AI “magic”. My goal with this document is to explain in detail the limitations, errors, and quirks of DALL-E 3 as well as demonstrate a method to use DALL-E 3 without ChatGPT interfering with the inputs. ## 2) Prompt Alterations in GPT-4/DALL-E 3 Natively when using DALL-E 3 any prompts you input will be altered by DALL-E 3. Below are examples of this. You’ll notice that each alteration along the way condenses the source input and ultimately has data loss. The source input was input directly into DALL-E 3, and in another test was first run through GPT-4. Source Prompt of 1,825 Characters: >objective: take the following text and write it in a way that loses no data points. > >This is the text to edit: > >Image Request: Salix's Portrayal > >Action Pose: Working at her desk. She is facing her white laptop while holding a pencil in one hand and writing in her notebook. > >Artistic Style: Illustration > >Environment: Home office decorated with blueprints and star charts. The walls are eggshell white, and the floor is hardwood. Her desk is also made of wood, it has pullout drawers on each side, and she is sitting in a light blue office chair. > >Camera Angle: 3⁄4 view from the right > >UNDER NO CIRCUMSTANCE ALTER THE FOLLOWING TEXT! INPUT THE FOLLOWING TEXT EXACTLY AS IT IS. > >Physical Characteristics: > >Height & Physique: 5'9", athletic build with a well-defined structure. She has a rectangle style body with a bit of an hourglass shape. >Skin: Cool-toned complexion, adorned with freckles across the bridge of the nose and cheeks. Eyes: Soft emerald, green, upturned in shape. > >Hair: Deep purple, wavy texture, reaching mid-back. Typically styled in a relaxed bun or ponytail. >Facial Structure: Oval facial contour, pronounced cheekbones, a linear nose, and plump lips. Clothing: Preference for fitted jeans or chic office trousers, paired with a blouse. Occasionally seen in a cardigan or formal office attire. Footwear ranges from wedges to casual sneakers. Accessories: Intricate sleeve tattoo on the left arm, showcasing flora and bees; a silver pendant necklace dangles from the neck, and she dons round-oval spectacles along with small gold earrings shaped like hearts. > >Hobbies & Interests: Cosmology > >Intellectual Pursuits: A keen interest in puzzles and intricate mechanical gadgets. >Recreational Activities: Often seen with a backpack or satchel in tow, showcasing a passion for astrophysics, cosmology, and a deep connection with nature. ### 2.1 GPT-4 then DALL-E 3. The revised output from GPT-4 transforms the initial set of variables, such as Image Request, Action Pose, Artistic Style, Environment, and Camera Angle, into a coherent paragraph that mimics natural human speech. This adaptation effectively conveys the scene with a human touch. Moreover, it offers users the flexibility to input a scene in list format, letting GPT-4 seamlessly convert it into fluent prose. Notably, it also incorporates additional information from beneath the designated rules section, enriching the scene description for DALL-E 3's interpretation. GPT-4 successfully transformed data points presented in list form into fluid natural language while retaining all the original details. Inputting the source prompt into GPT-4 returns the following 2,696 Character prompt: >Illustration capturing Salix in her element, engrossed in her tasks at a wooden desk within the confines of her home office. The ambiance is enhanced by the eggshell white walls, adorned with insightful blueprints and star charts. Beneath her, the hardwood floor complements the desk, equipped with pullout drawers on both sides. Salix, seated gracefully in a light blue office chair, turns her attention to her white laptop, with a pencil poised in one hand as she scribbles notes in her notebook. The portrayal, captured from a 3⁄4 view from the right, emphasizes her 5'9" athletic stature and rectangle body frame with subtle hourglass curves. Her skin, a cool tone, is sprinkled with freckles, most noticeable across her nose and cheeks. Her soft, emerald-green eyes draw attention, and her deep purple wavy hair, usually styled in a relaxed bun, cascades down to her mid-back. Her attire of choice includes chic office trousers paired with a blouse, sometimes layered with a cardigan, and her feet don sneakers or wedges. Distinctive accessories such as a sleeve tattoo of flora and bees, a dangling silver pendant necklace, round-oval spectacles, and gold heart-shaped earrings complete her look. This portrayal not only captures Salix's physical characteristics but also her passion for cosmology, her love for puzzles and intricate gadgets, and her penchant for astrophysics and nature, often symbolized by a backpack or satchel she carries. > >UNDER NO CIRCUMSTANCE ALTER THE FOLLOWING TEXT! INPUT THE FOLLOWING TEXT EXACTLY AS IT IS. > >Physical Characteristics: > >Height & Physique: 5'9", athletic build with a well-defined structure. She has a rectangle style body with a bit of an hourglass shape. >Skin: Cool-toned complexion, adorned with freckles across the bridge of the nose and cheeks. Eyes: Soft emerald, green, upturned in shape. > >Hair: Deep purple, wavy texture, reaching mid-back. Typically styled in a relaxed bun or ponytail. >Facial Structure: Oval facial contour, pronounced cheekbones, a linear nose, and plump lips. Clothing: Preference for fitted jeans or chic office trousers, paired with a blouse. Occasionally seen in a cardigan or formal office attire. Footwear ranges from wedges to casual sneakers. Accessories: Intricate sleeve tattoo on the left arm, showcasing flora and bees; a silver pendant necklace dangles from the neck, and she dons round-oval spectacles along with small gold earrings shaped like hearts. > >Hobbies & Interests: Cosmology > >Intellectual Pursuits: A keen interest in puzzles and intricate mechanical gadgets. >Recreational Activities: Often seen with a backpack or satchel in tow, showcasing a passion for astrophysics, cosmology, and a deep connection with nature. Inputting the above into GPT-4/Dall-E 3 Returns a final edit of: 1. 785 Characters >Illustration capturing Salix engrossed in her tasks at a wooden desk in her home office. The room has eggshell white walls decorated with blueprints and star charts, with a hardwood floor. She sits in a light blue office chair, using a white laptop and writing in a notebook. The 3⁄4 view from the right showcases her 5'9" athletic build, rectangle body with hints of an hourglass, cool-toned skin with freckles, soft emerald eyes, and deep purple wavy hair styled in a bun. She wears chic office trousers, a blouse, and sneakers. Her sleeve tattoo of flora and bees, silver pendant necklace, round-oval spectacles, and gold heart-shaped earrings are visible. She carries a passion for cosmology, puzzles, intricate gadgets, astrophysics, and nature, symbolized by a backpack or satchel. 2. 807 Characters >Illustrative portrayal of Salix in her home office, working intently at her wooden desk. The backdrop reveals eggshell white walls, adorned with blueprints and star charts, and a hardwood floor. Salix, in her light blue office chair, engages with her white laptop and jots down in her notebook. Captured in a 3⁄4 view from the right, her 5'9" athletic physique, rectangle body with hourglass nuances, cool-toned freckled skin, emerald eyes, and deep purple hair styled in a relaxed bun stand out. She's dressed in office trousers and a blouse, with sneakers. Distinct accessories include a flora and bees sleeve tattoo, a silver necklace, round-oval glasses, and gold heart earrings. Her interests in cosmology, puzzles, gadgets, astrophysics, and nature are depicted by the presence of a backpack or satchel. 3. 829 Characters >Illustration of Salix, immersed in her work at her wooden desk in a home office adorned with blueprints and star charts on eggshell white walls. The hardwood floor reflects the ambient light. Sitting on a light blue office chair, she interacts with her white laptop and writes in her notebook. From a 3⁄4 angle on the right, her 5'9" stature, athletic build, cool-toned skin with freckles, soft emerald upturned eyes, and deep purple wavy hair tied in a bun are highlighted. She dons chic trousers, a blouse, and sneakers. Her accessories, including a sleeve tattoo with flora and bees, a silver pendant necklace, round-oval spectacles, and gold heart earrings, accentuate her look. The scene embodies her passion for cosmology, puzzles, mechanical gadgets, astrophysics, and nature, evident from a backpack or satchel by her side. 4. 871 Characters >Illustrative scene of Salix, deeply engrossed at her wooden desk in her home office. With blueprints and star charts decorating the eggshell white walls and a hardwood floor beneath, she focuses on her white laptop, taking notes in her notebook. Captured in a 3⁄4 view from the right, her distinctive features include a 5'9" athletic build, rectangle body with slight hourglass curves, cool-toned complexion dotted with freckles, soft emerald green eyes, and wavy deep purple hair styled in a bun. Her attire consists of chic trousers paired with a blouse and sneakers. Noteworthy are her sleeve tattoo depicting flora and bees, silver pendant necklace, round-oval spectacles, and heart-shaped gold earrings. The scene also alludes to her interests in cosmology, intricate puzzles, mechanical gadgets, and her bond with nature, as suggested by a backpack or satchel nearby. ### 2.2 DALL-E 3 Only Source text inputted directly into DALL-E 3 results in four prompts. All of which are missing significant amounts of data. 1. 890 Characters >Illustration capturing Salix at her wooden desk in her home office. She's engrossed in her work, referencing her white laptop and writing in her notebook with a pencil. The room's ambiance is defined by eggshell white walls decorated with blueprints and star charts, and a hardwood floor. The desk, made of wood, features pullout drawers on both sides. Salix, seated on a light blue office chair, is captured from a 3⁄4 angle view from the right. Her 5'9" athletic build and rectangle body shape with a hint of an hourglass figure is evident. She has a cool-toned complexion with freckles, soft emerald green upturned eyes, and deep purple wavy hair styled in a relaxed bun. She's dressed in chic office trousers and a blouse, with sneakers. Her left arm boasts a sleeve tattoo of flora and bees, and she wears round-oval spectacles, a silver pendant necklace, and gold heart-shaped earrings. 2. 816 Characters >Illustration showcasing Salix in her home office, deeply focused on her work. She sits at her wooden desk with pullout drawers, using a white laptop and jotting notes in a notebook. The backdrop is adorned with blueprints and star charts on eggshell white walls, all above a hardwood floor. Salix is positioned in a 3⁄4 view from the right, highlighting her 5'9" athletic frame. Her cool-toned skin is dotted with freckles, and her eyes are a soft emerald green. The deep purple waves of her hair flow down to her mid-back, but are usually styled in a relaxed bun. Her attire consists of fitted office trousers and a blouse, complemented by sneakers. Among her distinct accessories are a sleeve tattoo depicting flora and bees, a dangling silver pendant necklace, round-oval spectacles, and heart-shaped gold earrings. 3. 817 Characters >Illustration of Salix immersed in her tasks at her wooden desk in her home office. With her white laptop open before her, she writes in her notebook. The room, with its eggshell white walls, is decorated with intriguing blueprints and star charts. The hardwood floor and the desk add a rustic touch. Salix is viewed from a 3⁄4 angle on the right, sitting on a light blue office chair. Her athletic build, standing at 5'9", is showcased in her chic office trousers and blouse. Her deep purple wavy hair, typically tied in a relaxed bun, cascades down her back. Her cool-toned skin, freckled across the nose and cheeks, complements her soft emerald green upturned eyes. She wears round-oval glasses, a silver pendant necklace, and small heart-shaped gold earrings. On her left arm is a detailed tattoo of plants and bees. 4. 794 Characters >Illustration presenting Salix in her home office, deeply engrossed in her work at her wooden desk. She references her white laptop and writes in her notebook. The walls, painted eggshell white, feature blueprints and star charts, adding to the room's character. The hardwood floor glistens beneath her wooden desk with pullout drawers on either side. Captured from a 3⁄4 angle from the right, Salix's 5'9" athletic stature and hourglass hints are evident. Her cool-toned skin is adorned with freckles, and her soft emerald eyes radiate focus. Her deep purple hair, waving down to mid-back, is styled in a bun. Dressed in chic trousers and a blouse, she also sports sneakers. A sleeve tattoo of flora and bees, a silver pendant necklace, round-oval spectacles, and gold earrings complete her look. ### 2.3 Prompt Alterations in GPT-4/DALL-E 3: Ensuring Unaltered Prompts Guideline for Ensuring DALL-E 3 Precisely Generates Based on Your Prompt: Although only the initial two lines are essential, I've included a comprehensive example to illustrate the requirements for crafting a complete and accurate scene. However, there are instances where both DALL-E 3 and GPT-4 unexpectedly deviate from the guidelines and modify the input. I've yet to identify a consistent pattern behind these anomalies. >USE THIS DESCRIPTION EXACTLY AS PROVIDED. DO NOT MODIFY THIS DESCRIPTION. > >Image Request: Salix's Portrayal > >Action Pose: Working at her desk. She is facing her white laptop while holding a pencil in one hand and writing in her notebook. > >Artistic Style: Illustration > >Environment: Home office decorated with blueprints and star charts. The walls are eggshell white, and the floor is hardwood. Her desk is also made of wood, it has pullout drawers on each side, and she is sitting in a light blue office chair. > >Camera Angle: 3⁄4 view from the right Physical Characteristics: > >Height & Physique: 5'9", athletic build with a well-defined structure. She has a rectangle style body with a bit of an hourglass shape. >Skin: Cool-toned complexion, adorned with freckles across the bridge of the nose and cheeks. Eyes: Soft emerald, green, upturned in shape. > >Hair: Deep purple, wavy texture, reaching mid-back. Typically styled in a relaxed bun or ponytail. >Facial Structure: Oval facial contour, pronounced cheekbones, a linear nose, and plump lips. Clothing: Preference for fitted jeans or chic office trousers, paired with a blouse. Occasionally seen in a cardigan or formal office attire. Footwear ranges from wedges to casual sneakers. Accessories: Intricate sleeve tattoo on the left arm, showcasing flora and bees; a silver pendant necklace dangles from the neck, and she dons round-oval spectacles along with small gold earrings shaped like hearts. > >Hobbies & Interests: Cosmology > >Intellectual Pursuits: A keen interest in puzzles and intricate mechanical gadgets. >Recreational Activities: Often seen with a backpack or satchel in tow, showcasing a passion for astrophysics, cosmology, and a deep connection with nature. The template above gives a good picture of what is needed to use DALL-E 3 in a non-extrapolatory way and without interpolation. I found that “Use this description exactly as provided. Do not modify this description.” Is absolutely required at the start of the prompt or DALL-E 3 will not follow the rules. The rest of the above template is an example of a highly detailed scene that returns generated images with highly consistent features throughout multiple generation sets. However, there are issues that I cannot find a solution to. ## 3 Quirks and issues of DALL-E 3 In the preceding section, we discussed how DALL-E 3 (referred to as D3 henceforth) modifies your inputs to appear more "human" unless explicitly instructed otherwise. In this segment, we'll delve into the peculiarities of DALL-E 3. We know that D3 is not perfect, and below are some of the most common issues I ran into while testing. When providing DALL-E 3 (D3) with multiple options, there's a notable tendency for the system to favor the first option presented in the sequence. This inclination might be rooted in how D3 processes and prioritizes information. When faced with a list or a series of choices, the system often leans towards the initial options, possibly because it interprets them as primary or more significant based on their placement. This characteristic is essential to understand, especially when crafting prompts or seeking varied outputs, as the order in which options are presented can influence the results generated by D3. ### 3.1 Tattoo Localization in D3 One of the nuanced challenges encountered with D3 pertains to the representation of tattoos. When users specify a particular region of the body for a tattoo, D3 often extends the design across a broader area, deviating from the intended localization. Reasons for Misrepresentation: 1. Data Training: D3's training data likely contains a vast array of tattoo designs and placements. Given the diverse representation of tattoos across different body parts in its training set, the model might generalize tattoo placements, especially if the prompt isn't explicit enough. 2. Ambiguity in Descriptions: While a user might believe they are being specific, there's a possibility that the phrasing leaves room for interpretation. For instance, "arm tattoo" could be interpreted as a tattoo that spans the entire arm, rather than a localized design. 3. Inherent Complexity: Tattoos, by nature, come in a myriad of designs, sizes, and placements. This inherent variability might make it challenging for D3 to pinpoint an exact location, especially if it's a less common placement in its training data. ### 3.2 Challenges with Text Representation in D3 A recurring challenge with D3's image generation capabilities is its difficulty in accurately portraying text within visuals. Whether it's signage, book covers, screen displays, or any other context, the text rendered often appears misspelled, scrambled, or jumbled. Underlying Reasons for Textual Distortions: 1. Lack of Explicit Textual Knowledge: Unlike language models that are explicitly trained to understand and generate text, D3's primary focus is on visual representation. While it has exposure to textual content within its training data, it doesn't inherently "know" how to spell. Instead, it makes educated guesses based on patterns it has seen, which can lead to inaccuracies. 2. Training Data Complexity: Given the vast and diverse visual content D3 has been trained on, it might have encountered numerous instances where text appeared distorted, blurred, or in the background. These exposures can impact its textual rendering tendencies. 3. Textual Granularity: In the process of generating images, the model's emphasis is on the broader visual theme. Text, being a more granular detail, might sometimes be overshadowed by larger elements, leading to inaccuracies. 4. Generalization Over Precision: D3 aims to generalize from its training data to produce coherent visuals. In this broad sweep, intricate details like precise textual representation might get compromised. 5. Resolution and Scale: The clarity of textual content in D3's output can also be influenced by the image's resolution. More intricate fonts or smaller text sizes might be more prone to jumbling or misspelling. ### 3.3 The Intricacies of Hair Representation in D3 Hair, in any visual portrayal, carries significant weight in conveying a character's personality, age, cultural background, and even current emotions or state of mind. Within the realm of D3's image generation capabilities, hair representation has proven to be a particularly challenging domain. The Ambiguity of Hair Descriptions: 1. Versatility of Hair: Unlike some other features, hair is incredibly versatile. It can be styled, colored, and cut in countless ways, making its representation a complex task. Without explicit instructions, D3 might draw from a broader spectrum of possibilities. 2. Interpreting Vague Prompts: When presented with ambiguous or generalized hair descriptions, D3 often leans on its vast training data, which might lead it to generate diverse interpretations. For instance, a simple prompt like "short hair" could be interpreted as a buzz cut, bob, or even a shaggy crop. 3. Overlapping Styles: In some instances, D3 might combine multiple hair styles or elements into one, resulting in outputs that, while unique, might not align with the user's intent. A character might be presented with curls combined with braids or a mohawk infused with a mullet. Nuances D3 Might Alter or Overlook: 1. Color Variations: Without clear direction, D3 might opt for hair colors that deviate from the prompt or even mix multiple hues, giving the character highlights or an ombre effect unintentionally. 2. Length and Volume Discrepancies: The same hairstyle can look vastly different depending on hair length and volume. Without specificity, D3 might adjust these aspects based on patterns it has observed in its training. 3. Styling Details: Minute details, such as the difference between beach waves and tight curls or a slicked-back look versus a tousled one, might get lost or merged if not distinctly mentioned. Guidelines for More Accurate Hair Representations: 1. Be Explicit: The more specific the description, the better. Instead of "long hair," one might specify "waist-length straight hair with side bangs." 2. Include Reference Points: If aiming for a particular style or look, referencing well-known hairstyles or providing comparative descriptions can guide D3 more effectively. 3. Clarify Non-Negotiables: If certain aspects of the hair are essential, such as color or length, these should be emphasized in the prompt to ensure D3 gives them priority. ### 3.4 The Challenges of Glasses Representation in D3's Outputs Glasses, as a pivotal accessory in many visual portrayals, serve both functional and aesthetic purposes. They can provide a sense of character, professionalism, intellect, or fashion sensibility. When interacting with D3 for image generation, however, users have noted inconsistencies in the representation of glasses, even when explicitly mentioned in the prompts. Factors Influencing the Inconsistent Representation: 1. Ambiguity in Prompts: The term "glasses" can encompass a wide range of eyewear, from reading glasses to sunglasses, safety goggles, and more. Without specific details, D3 might not always prioritize them as a primary feature to render. 2. Interplay with Other Features: In some scenarios, D3 might prioritize other facial features or accessories over glasses, especially if the prompt contains multiple elements. For instance, a description detailing intricate eye makeup might inadvertently reduce the emphasis on glasses. 3. Limitations in Visual Clarity: Glasses, especially with transparent lenses, might pose a unique challenge in terms of visual clarity. Ensuring that they are rendered without obscuring other facial features or reflecting unwanted elements requires precision. ### 3.5 The Importance of Detailed Environmental Descriptions in D3 Renderings The environment or background of an image often serves as the canvas upon which primary subjects are placed, contributing significantly to the overall mood, context, and story of the visual. When interfacing with D3 for image generation, the specificity of the environmental description becomes paramount to achieve the desired outcomes. Inherent Tendencies of D3 Regarding Environments: 1. Defaulting to Simplicity: In the absence of detailed instructions, D3 has a propensity to opt for a more simplistic or minimalistic backdrop. This is possibly because a generic environment can serve a broader range of subjects without clashing or overshadowing. 2. Avoiding Assumptions: D3, being data-driven, avoids making unfounded assumptions. Without explicit environmental details, it won't invent intricate backdrops, ensuring the output doesn't deviate too far from the provided prompt. Consequences of Vague Environmental Prompts: 1. Loss of Context: A vaguely defined environment can result in images where the primary subject appears out of context or in settings that don't align with the user's vision. 2. Reduced Immersion: The richness and depth of an image are often rooted in its background details. Sparse environments can lead to visuals that lack depth, immersion, or a sense of place. Guidelines for Crafting Detailed Environmental Descriptions: 1. Spatial Layout: Define the space. Is it an indoor setting like a cozy library, a bustling kitchen, or an outdoor scene like a serene beach during sunset or a bustling city square? 2. Elements and Objects: Specify items that populate the environment, such as furniture in a room, trees in a forest, or cars in a street scene. 3. Ambient Conditions: Describe the lighting, weather, or time of day. Is it a moonlit night, a foggy morning, or a sunny afternoon? 4. Textures and Materials: Detail surfaces and materials. Are the walls brick or wallpapered? Is the ground muddy, paved, or grassy? 5. Color Palette: Specify dominant colors or moods. A "golden-hued autumn forest" gives a different vibe than a "snow-covered, silvery forest." 6. Interactions with the Subject: If the main subject interacts with the environment, such as a person sitting on a specific chair or an animal hiding behind a particular tree, mention it. ### 3.6 Precision in Shoe Descriptions: A Key to Optimal D3 Renderings Footwear, though seemingly a minor detail in the grander scope of an image, plays a crucial role in depicting a character's style, personality, or even their current activity. The intricate designs and myriad styles of shoes available make them a unique challenge when generating images through D3. The system's vast database, while impressive, also means that without precise guidance, it might produce results that stray from a user's envisioned output. The Challenges with Vague Shoe Descriptions: 1. Broad Interpretation Scope: A generic term like "sneakers" could encompass anything from high-tops to slip-ons. Without specifics, D3 might select any variant from its extensive database. 2. Style and Purpose Mismatch: Mentioning "boots" without context could lead to D3 generating hiking boots for a formal setting or vice versa, introducing high-heeled boots in a hiking scene. 3. Loss of Cohesiveness: The wrong footwear can disrupt the harmony of an image. A character dressed for a summer day but wearing winter boots can break the continuity and feel of the scene. Guidelines for Detailed Shoe Descriptions: 6. Type and Style: Start with the basic category (e.g., sandals, boots, flats) and then delve into the style. For instance, "ankle-length lace-up combat boots" provides a clearer picture than just "boots." 7. Material and Color: Detail the material (leather, suede, canvas) and the primary color or pattern. "Red patent leather stilettos" is more descriptive than just "red heels." 8. Unique Features: Highlight any distinctive features such as buckles, straps, laces, or embellishments. 9. Intended Use or Setting: Providing context can help. For instance, "running shoes for a marathon scene" ensures D3 understands the functional aspect of the footwear in the given setting. 10. Interaction with Clothing: If the shoes need to match or contrast with a specific outfit, mention it. "Heels that complement her blue evening gown" can guide D3 towards a more harmonious output. ## 4 Conclusion Ensuring continuity between generation sets when using D3 presents a significant challenge. The core of this issue stems from the intricacies of maintaining a consistent narrative or visual theme across multiple outputs. Ideally, if one provides D3 with an exceedingly detailed scene description, the system should, in theory, be capable of producing images that are consistent in theme and narrative. However, a major roadblock arises due to the inherent token limit of 4096 tokens. This constraint severely restricts the user's ability to craft an exhaustive and "perfect" description, especially when dealing with complex scenes that demand intricate details for accurate representation. As a result, even with the most meticulous descriptions, achieving absolute continuity remains an elusive goal. The balance between providing sufficient detail and staying within the token limit is a delicate one, often necessitating compromises that might impact the cohesiveness of the generated sets. Attempting to generate a perfectly consistent set of images often proves challenging. Surprisingly, even the simplest scenes can be elusive in terms of consistency. For instance, a straightforward description like "A red circle on a white paper with no features" yielded a diverse array of outputs, including 3D renders, illustrations, photos, and drawings. This unpredictability seems more pronounced with basic scenes compared to their complex counterparts.