Why your LLM outputs are boring (and how to fix it)

Your recommender keeps suggesting the same three products. Your A/B copy variants read like one writer with a head cold. Your jokes repeat. So you turn up the temperature and now the model is confidently incoherent.

Temperature is the wrong knob. Below are five techniques that produce diverse outputs without flattening into nonsense — shuffle inputs, cache recent outputs, vary phrasing, contrast against an example, and ensemble across models. Each one has working Python.

1. Shuffle input elements

LLMs are sensitive to token order. Shuffle the items in a list — past purchases, retrieved chunks, few-shot examples — and you get meaningfully different outputs at the same temperature. No prompt rewrite needed, just a random.sample call before you build the prompt.

import random
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class ProductRecommendation(BaseModel):
    product_name: str
    reason: str

class Recommendations(BaseModel):
    recommendations: List[ProductRecommendation]

# Initialize the instructor client with Gemini
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def get_recommendations(past_purchases: List[str]) -> Recommendations:
    # Shuffle the past purchases
    shuffled_purchases = random.sample(past_purchases, len(past_purchases))
    
    prompt = f"""
    Based on the user's past purchases, recommend 3 products they might like:
    Past purchases: {', '.join(shuffled_purchases)}

    Provide recommendations in the following format:
    1. Product name: [product name]
       Reason: [reason for recommendation]
    2. Product name: [product name]
       Reason: [reason for recommendation]
    3. Product name: [product name]
       Reason: [reason for recommendation]
    """

    response = client.messages.create(
        messages=[
            {"role": "user", "content": prompt}
        ],
        response_model=Recommendations,
    )

    return response

# Example usage
if __name__ == "__main__":
    past_purchases = ["smartphone", "laptop", "wireless earbuds", "fitness tracker", "smart home speaker"]
    recommendations = get_recommendations(past_purchases)
    for i, rec in enumerate(recommendations.recommendations, 1):
        print(f"{i}. {rec.product_name}: {rec.reason}")

??? note "Output" 1. Smartwatch: Since you own a fitness tracker, you might be interested in a smartwatch with fitness tracking capabilities and other smart features. 2. Noise-canceling Headphones: Given your purchase of wireless earbuds, you might appreciate the superior noise cancellation of higher-end headphones. 3. Smart Home Hub: Your purchase of a smart home speaker suggests interest in building a complete smart home ecosystem, so a smart home hub to centrally control your devices may be useful.

2. Cache recent outputs and reject duplicates

LLMs have no memory of what they just said. Keep a small deque of the last N generations, compare each new candidate against it with difflib, and retry above a similarity threshold. This is the cheapest way to stop a generator from drifting into a rut.

import random
from collections import deque
import time
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
import difflib

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Configure Gemini client
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

class Joke(BaseModel):
    setup: str
    punchline: str

class SimpleRateLimiter:
    def __init__(self, rpm_limit=15):
        self.rpm_limit = rpm_limit
        self.request_times = deque(maxlen=rpm_limit)

    def wait(self):
        current_time = time.time()
        if len(self.request_times) == self.rpm_limit:
            time_to_wait = 60 - (current_time - self.request_times[0])
            if time_to_wait > 0:
                time.sleep(time_to_wait)
        self.request_times.append(time.time())

class JokeGenerator:
    def __init__(self, cache_size=5, max_attempts=10):
        self.cache = deque(maxlen=cache_size)
        self.rate_limiter = SimpleRateLimiter()
        self.max_attempts = max_attempts

    def generate_joke(self) -> Joke:
        for attempt in range(self.max_attempts):
            self.rate_limiter.wait()
            joke = self._get_joke_from_model(attempt)
            if not self._is_similar_to_cached(joke):
                self.cache.append(joke)
                return joke
            # If the joke is similar, we'll loop and try again
        
        # If we've reached this point, we couldn't generate a unique joke
        raise Exception("Unable to generate a unique joke after maximum attempts")

    def _get_joke_from_model(self, attempt) -> Joke:
        themes = ["technology", "food", "animals", "sports", "music", "work", "school"]
        theme = random.choice(themes)
        prompt = f"Tell me a short, original joke about {theme}. This is attempt {attempt + 1}, so make it different from common jokes. Provide the setup and punchline separately."
        response = client.messages.create(
            messages=[
                {"role": "user", "content": prompt}
            ],
            response_model=Joke,
        )
        return response

    def _is_similar_to_cached(self, joke: Joke, similarity_threshold=0.6) -> bool:
        for cached_joke in self.cache:
            setup_similarity = difflib.SequenceMatcher(None, joke.setup.lower(), cached_joke.setup.lower()).ratio()
            punchline_similarity = difflib.SequenceMatcher(None, joke.punchline.lower(), cached_joke.punchline.lower()).ratio()
            if setup_similarity > similarity_threshold or punchline_similarity > similarity_threshold:
                return True
        return False

# Example usage
generator = JokeGenerator()
for i in range(5):
    try:
        joke = generator.generate_joke()
        print(f"Joke {i+1}:")
        print(f"Setup: {joke.setup}")
        print(f"Punchline: {joke.punchline}")
        print()
    except Exception as e:
        print(f"Error: {e}")
        break

??? note "Output" Joke 1: Setup: Why was the JavaScript developer sad? Punchline: Because they didn't Node how to Express themselves!

Joke 2:
Setup: Why did the employee bring a ladder to the meeting?
Punchline: Because they heard it was going to be a high-level discussion!

Joke 3:
Setup: Why did the sloth get fired from his job as a zookeeper?
Punchline: Because he was always moving at a snail's pace!

Joke 4:
Setup: Why was the smartphone sweating?
Punchline: Because it had so many apps!

Joke 5:
Setup: Why did the drummer break up with the singer?
Punchline: Because they couldn't find a common beat.

3. Vary the phrasing of the task

How you frame the task does more work than temperature ever will. Same product, same features — but ask for a tech-savvy description, a beginner-friendly one, a budget-pitch, and an enthusiast riff, and you get four different outputs by design rather than by accident.

import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class ProductDescription(BaseModel):
    description: str

class Product(BaseModel):
    name: str
    features: List[str]

# Initialize the Gemini client with instructor
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def generate_descriptions(product: Product) -> List[ProductDescription]:
    prompts = [
        f"Describe the {product.name} to a tech-savvy audience, highlighting its advanced features.",
        f"Explain what the {product.name} does in simple terms for a general audience.",
        f"Write a persuasive product description for the {product.name} aimed at budget-conscious consumers.",
        f"Create an enthusiastic product description for the {product.name} focused on its unique selling points."
    ]

    descriptions = []
    for prompt in prompts:
        response = client.chat.completions.create(
            messages=[
                {"role": "system", "content": f"You are a product copywriter. Product features: {', '.join(product.features)}"},
                {"role": "user", "content": prompt}
            ],
            response_model=ProductDescription,
        )
        descriptions.append(response)

    return descriptions

# Example usage
if __name__ == "__main__":
    product = Product(
        name="SmartHome Hub 2000",
        features=["Voice control", "Energy monitoring", "Smart device integration", "Mobile app"]
    )

    descriptions = generate_descriptions(product)
    for i, desc in enumerate(descriptions, 1):
        print(f"Description {i}:")
        print(desc.description)
        print()

??? note "Output" Product Descriptions: -------------------------------------------------- Description 1: Introducing the SmartHome Hub 2000: Effortlessly manage your smart home with seamless voice control, precise energy monitoring, and extensive smart device integration. Control your connected devices with simple voice commands, track energy consumption in real-time via our intuitive mobile app, and effortlessly integrate a wide array of smart devices into a unified ecosystem. The SmartHome Hub 2000 offers unparalleled control and efficiency for the modern connected home.

Description 2:
Introducing the SmartHome Hub 2000, your new home's central control system!  Effortlessly manage your smart devices, monitor your energy usage, all with the convenience of voice control and a user-friendly mobile app.  Make your home smarter and more efficient – all from the palm of your hand or the sound of your voice.

Description 3:
Introducing the SmartHome Hub 2000 – affordable smart home control, without compromise!  Take control of your home's energy usage with our built-in energy monitoring system, easily managing your consumption and saving money.  Use your voice to effortlessly manage lights, appliances, and more via voice control. Seamlessly integrate your existing smart devices through our dedicated mobile app.  The SmartHome Hub 2000: Big features, small price tag.

Description 4:
Revolutionize your home with the SmartHome Hub 2000!  Effortlessly control your smart devices with the power of your voice – adjust lighting, set thermostats, and more, all hands-free.  Track your energy usage with precision, saving you money and reducing your carbon footprint. Seamlessly integrate all your favorite smart devices into one intuitive system, and manage everything from our convenient mobile app. The SmartHome Hub 2000: Smart living, simplified.

4. Contrast against an example

Show the model an example output and ask it to take a different angle. This is the KATE contrastive LM technique¹ applied as a prompt pattern — useful when you already have one good version and want a second that explores different ground.

import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel

# Load environment variables
load_dotenv()

# Configure Gemini with API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Define the ProductReview model
class ProductReview(BaseModel):
    review: str

# Initialize the Gemini client with instructor
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def generate_contrastive_review(product_name: str, original_review: str) -> ProductReview:
    prompt = f"""
    Here is an example of a typical product review for {product_name}:
    "{original_review}"

    Please generate a review that takes a different perspective and highlights different aspects of the product.
    Focus on aspects not mentioned in the original review and provide a contrasting opinion where appropriate.
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a product reviewer who provides diverse perspectives."},
            {"role": "user", "content": prompt}
        ],
        response_model=ProductReview
    )

    return response

if __name__ == "__main__":
    # Example usage
    product_name = "UltraBook Pro Laptop"
    original_review = """The UltraBook Pro Laptop is incredibly fast and has a beautiful display. 
    Its sleek design makes it perfect for professionals on the go. 
    The battery life is impressive, lasting all day on a single charge."""

    contrastive_review = generate_contrastive_review(product_name, original_review)
    
    print("Original Review:")
    print(original_review)
    print("\nContrastive Review:")
    print(contrastive_review.review)

<details class="admonition admonition-note" > <summary class="admonition-title">Output</summary> <div class="admonition-content"> <pre><code class="language-">Original Review: The UltraBook Pro Laptop is incredibly fast and has a beautiful display. Its sleek design makes it perfect for professionals on the go. The battery life is impressive, lasting all day on a single charge. Contrastive Review: While the UltraBook Pro boasts speed and aesthetics, I found its keyboard cramped and uncomfortable for extended use. The trackpad, though responsive, lacked the satisfying click of my previous laptop. The build quality feels premium, but the limited port selection proved frustrating. Ultimately, its portability comes at the cost of functionality for serious work. I also experienced some minor overheating issues during intensive tasks.</code></pre> </div> </details>

5. Ensemble across models

Different model families have different training data and different priors. Prompt three of them, then synthesise. You pay for N calls instead of one, so reserve this for outputs the user reviews — not outputs they consume in real time.

import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel
import google.generativeai as genai
import instructor
from groq import Groq

# Load environment variables
load_dotenv()

# Configure APIs
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
groq_client = Groq(api_key=os.getenv('GROQ_API_KEY'))

class Response(BaseModel):
    text: str
    model: str

def create_gemini_client():
    return instructor.from_gemini(
        client=genai.GenerativeModel(
            model_name="models/gemini-1.5-flash-latest"
        ),
        mode=instructor.Mode.GEMINI_JSON,
    )

def ensemble_generation(prompt: str) -> List[Response]:
    responses = []
    
    models_config = {
        'gemini': {
            'name': 'models/gemini-1.5-flash-latest',
            'provider': 'gemini'
        },
        'mixtral': {
            'name': 'mixtral-8x7b-32768',
            'provider': 'groq'
        },
        'llama3': {
            'name': 'llama-3.3-70b-versatile',
            'provider': 'groq'
        }
    }
    
    for model_id, config in models_config.items():
        try:
            if config['provider'] == 'gemini':
                client = create_gemini_client()
                response = client.chat.completions.create(
                    messages=[{"role": "user", "content": prompt}],
                    response_model=Response,
                )
                response.model = model_id
                responses.append(response)
                
            elif config['provider'] == 'groq':
                completion = groq_client.chat.completions.create(
                    messages=[{"role": "user", "content": prompt}],
                    model=config['name'],
                    temperature=0.7,
                    max_tokens=500
                )
                response = Response(
                    text=completion.choices[0].message.content,
                    model=model_id
                )
                responses.append(response)
                
        except Exception as e:
            print(f"Error with {model_id}: {str(e)}")
            continue
            
    return responses

def synthesize_responses(responses: List[Response], prompt: str) -> str:
    # Create a prompt for synthesis using Gemini
    synthesis_prompt = f"""
    I have collected responses from multiple AI models about this prompt: "{prompt}"
    
    Here are their responses:
    
    {chr(10).join([f"{r.model.upper()}:\n{r.text}\n" for r in responses])}
    
    Please create a comprehensive synthesis that:
    1. Combines the unique insights from each model
    2. Eliminates redundancy
    3. Presents information in a clear, structured way
    4. Maintains the most accurate and valuable points
    5. Creates a cohesive narrative
    
    Synthesized response:
    """
    
    try:
        client = create_gemini_client()
        synthesis = client.chat.completions.create(
            messages=[{"role": "user", "content": synthesis_prompt}],
            response_model=Response,
        )
        return synthesis.text
    except Exception as e:
        print(f"Error in synthesis: {str(e)}")
        return "Synthesis failed. Using individual responses instead."

def main():
    # Example prompt
    prompt = "Provide a short explanation of how climate change affects ocean ecosystems."
    
    # Generate responses from multiple models
    responses = ensemble_generation(prompt)
    
    # Print individual responses
    print("\nIndividual Model Responses:")
    print("=" * 50)
    for response in responses:
        print(f"\n{response.model.upper()} response:")
        print(response.text)
        print("-" * 50)
    
    # Generate and print synthesized response
    print("\nSynthesized Response:")
    print("=" * 50)
    synthesized = synthesize_responses(responses, prompt)
    print(synthesized)

if __name__ == "__main__":
    main()

<details class="admonition admonition-note" > <summary class="admonition-title">Output</summary> <div class="admonition-content"> <pre><code class="language-">## **Individual Model Responses:** -------------------------------------------------- **GEMINI response:** Climate change significantly affects ocean ecosystems through rising temperatures, ocean acidification, and altered circulation patterns. Warming waters cause coral bleaching and shifts in species distribution, disrupting established food webs. Increased CO2 absorption leads to ocean acidification, harming shell-forming organisms like corals and shellfish. Changes in currents and upwelling affect nutrient availability, impacting primary productivity and the entire marine food chain. -------------------------------------------------- **MIXTRAL response:** Climate change significantly impacts ocean ecosystems in various ways. Rising ocean temperatures cause coral bleaching, leading to the death of coral reefs that provide habitat for numerous marine species. Warmer waters also affect the distribution and abundance of fish and other marine life, disrupting food chains. Ocean acidification, another consequence of climate change, results from increased carbon dioxide absorption by seawater. This causes a decrease in pH levels, which can harm shell-forming marine organisms such as mollusks, crustaceans, and corals, further threatening biodiversity. Additionally, climate change contributes to sea level rise, causing coastal habitat loss and increased coastal erosion. More frequent and intense marine heatwaves, storms, and ocean deoxygenation also negatively impact ocean ecosystems, causing mass die-offs and disrupting ecosystem services. -------------------------------------------------- **LLAMA3 response:** Climate change significantly impacts ocean ecosystems in several ways: 1. **Ocean Warming**: Rising temperatures alter the distribution and behavior of marine species, disrupting delicate food webs and potentially leading to extinctions. 2. **Ocean Acidification**: Increased CO2 absorption from the atmosphere causes a decrease in pH levels, making it harder for organisms like corals and shellfish to build their calcium carbonate shells and skeletons. 3. **Sea-Level Rise**: Coastal erosion and flooding threaten marine habitats, such as mangroves, salt marshes, and seagrasses, which provide crucial nursery grounds for many species. 4. **Changes in Ocean Circulation**: Shifts in ocean currents and circulation patterns can impact the distribution of heat, nutrients, and species, leading to changes in marine productivity and biodiversity. 5. **Increased Disease and Predation**: Climate stress can weaken marine organisms, making them more susceptible to disease and predation, which can have cascading effects on ecosystem balance. These changes can have far-reaching consequences, including losses in fisheries, decreased water quality, and reduced ecosystem resilience. -------------------------------------------------- **Synthesized Response:** Climate change significantly impacts ocean ecosystems through rising temperatures, ocean acidification, altered circulation patterns, and sea level rise. Warming waters cause coral bleaching and shifts in species distribution, disrupting food webs. Increased CO2 absorption leads to ocean acidification, harming shell-forming organisms. Changes in currents and upwelling affect nutrient availability and marine productivity. Sea level rise causes coastal habitat loss and erosion. More frequent and intense marine heatwaves, storms, and ocean deoxygenation further negatively impact ocean ecosystems, causing mass die-offs and disrupting ecosystem services. These changes weaken marine organisms, increasing susceptibility to disease and predation, with far-reaching consequences such as losses in fisheries, decreased water quality, and reduced ecosystem resilience.</code></pre> </div> </details>

What you will hit when you apply these

Three predictions for the team that runs this playbook:

Shuffle-inputs will reveal a bug, not a feature. If reordering the list materially changes the answer, your prompt depends on input order in ways you did not intend. Pin the order for serious use cases; shuffle for ones where diversity is the point.
The recent-outputs cache will block legitimate repetition. If a customer asks the same question twice, they should get the same answer. Scope the cache per generator role (jokes, copy, suggestions) — not across the whole product.
Ensembling will roughly halve your throughput and double your cost. Only worth it for outputs a human reviews before they ship. For outputs the user consumes directly, pick one model and lean on techniques 1–4.

The real lesson

Temperature is a single knob that trades coherence for variety. The five techniques above let you choose where the diversity comes from — input order, history, framing, contrast, or model diversity — and that choice is almost always the better lever.

If you have one LLM output that should be diverse but is not, send me a sample of the last ten generations and the prompt, and I will tell you which of the five techniques to try first. [email protected].

Footnotes

Liu, Jiachang et al. "What Makes Good In-Context Examples for GPT-3?" (2021). The KATE technique uses contrastive examples to steer generation toward or away from reference outputs. ↩