How to Increase Diversity while Maintaining Accuracy in LLM Outputs

When working with large language models (LLMs), it's common to want varied and diverse outputs, rather than the model repeatedly generating similar responses. The go-to solution is often to increase the temperature parameter, which makes outputs more random by flattening the probability distribution. However, simply increasing temperature can lead to incoherent or low quality outputs.

Fortunately, there are several alternative techniques we can use to generate a wider variety of outputs while still maintaining coherence and quality. In this post, we'll explore 5 practical strategies you can implement today to get more diverse results from your LLMs.

1. Shuffle Input Elements

One straightforward way to generate different outputs is to simply change the order of elements in your input prompt. LLMs are highly sensitive to the order of input tokens. Shuffling the order of items in a list, paragraphs in a document, or examples in a prompt can lead to surprisingly varied outputs, even with the same temperature setting.

For example, if your prompt includes a list of the user's past purchases as context, shuffling the order of those purchases each time will result in different product recommendations being generated. This is a simple trick that can generate variety without any complex changes.

Here's a Python example using Gemini Flash and Instructor:

import random
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class ProductRecommendation(BaseModel):
    product_name: str
    reason: str

class Recommendations(BaseModel):
    recommendations: List[ProductRecommendation]

# Initialize the instructor client with Gemini
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def get_recommendations(past_purchases: List[str]) -> Recommendations:
    # Shuffle the past purchases
    shuffled_purchases = random.sample(past_purchases, len(past_purchases))

    prompt = f"""
    Based on the user's past purchases, recommend 3 products they might like:
    Past purchases: {', '.join(shuffled_purchases)}

    Provide recommendations in the following format:
    1. Product name: [product name]
       Reason: [reason for recommendation]
    2. Product name: [product name]
       Reason: [reason for recommendation]
    3. Product name: [product name]
       Reason: [reason for recommendation]
    """

    response = client.messages.create(
        messages=[
            {"role": "user", "content": prompt}
        ],
        response_model=Recommendations,
    )

    return response

# Example usage
if __name__ == "__main__":
    past_purchases = ["smartphone", "laptop", "wireless earbuds", "fitness tracker", "smart home speaker"]
    recommendations = get_recommendations(past_purchases)
    for i, rec in enumerate(recommendations.recommendations, 1):
        print(f"{i}. {rec.product_name}: {rec.reason}")

Output

Smartwatch: Since you own a fitness tracker, you might be interested in a smartwatch with fitness tracking capabilities and other smart features.
Noise-canceling Headphones: Given your purchase of wireless earbuds, you might appreciate the superior noise cancellation of higher-end headphones.
Smart Home Hub: Your purchase of a smart home speaker suggests interest in building a complete smart home ecosystem, so a smart home hub to centrally control your devices may be useful.

2. Manage a Cache of Recent Outputs

LLMs have no built-in memory of their recent outputs. To avoid repetitive responses, we can maintain a cache of the most recently generated outputs for a given task. Before returning a new generation, check it against the cache. If it's too similar to a recent response, prompt the model to try again.

Here's an example implementation:

import random
from collections import deque
import time
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
import difflib

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Configure Gemini client
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

class Joke(BaseModel):
    setup: str
    punchline: str

class SimpleRateLimiter:
    def __init__(self, rpm_limit=15):
        self.rpm_limit = rpm_limit
        self.request_times = deque(maxlen=rpm_limit)

    def wait(self):
        current_time = time.time()
        if len(self.request_times) == self.rpm_limit:
            time_to_wait = 60 - (current_time - self.request_times[0])
            if time_to_wait > 0:
                time.sleep(time_to_wait)
        self.request_times.append(time.time())

class JokeGenerator:
    def __init__(self, cache_size=5, max_attempts=10):
        self.cache = deque(maxlen=cache_size)
        self.rate_limiter = SimpleRateLimiter()
        self.max_attempts = max_attempts

    def generate_joke(self) -> Joke:
        for attempt in range(self.max_attempts):
            self.rate_limiter.wait()
            joke = self._get_joke_from_model(attempt)
            if not self._is_similar_to_cached(joke):
                self.cache.append(joke)
                return joke
            # If the joke is similar, we'll loop and try again

        # If we've reached this point, we couldn't generate a unique joke
        raise Exception("Unable to generate a unique joke after maximum attempts")

    def _get_joke_from_model(self, attempt) -> Joke:
        themes = ["technology", "food", "animals", "sports", "music", "work", "school"]
        theme = random.choice(themes)
        prompt = f"Tell me a short, original joke about {theme}. This is attempt {attempt + 1}, so make it different from common jokes. Provide the setup and punchline separately."
        response = client.messages.create(
            messages=[
                {"role": "user", "content": prompt}
            ],
            response_model=Joke,
        )
        return response

    def _is_similar_to_cached(self, joke: Joke, similarity_threshold=0.6) -> bool:
        for cached_joke in self.cache:
            setup_similarity = difflib.SequenceMatcher(None, joke.setup.lower(), cached_joke.setup.lower()).ratio()
            punchline_similarity = difflib.SequenceMatcher(None, joke.punchline.lower(), cached_joke.punchline.lower()).ratio()
            if setup_similarity > similarity_threshold or punchline_similarity > similarity_threshold:
                return True
        return False

# Example usage
generator = JokeGenerator()
for i in range(5):
    try:
        joke = generator.generate_joke()
        print(f"Joke {i+1}:")
        print(f"Setup: {joke.setup}")
        print(f"Punchline: {joke.punchline}")
        print()
    except Exception as e:
        print(f"Error: {e}")
        break

Output

Joke 1: Setup: Why was the JavaScript developer sad? Punchline: Because they didn't Node how to Express themselves!

Joke 2: Setup: Why did the employee bring a ladder to the meeting? Punchline: Because they heard it was going to be a high-level discussion!

Joke 3: Setup: Why did the sloth get fired from his job as a zookeeper? Punchline: Because he was always moving at a snail's pace!

Joke 4: Setup: Why was the smartphone sweating? Punchline: Because it had so many apps!

Joke 5: Setup: Why did the drummer break up with the singer? Punchline: Because they couldn't find a common beat.

3. Vary Prompt Phrasing

The phrasing of your prompt has a huge influence on the style and content of the generated text. Experiment with different ways of framing the task to elicit different types of responses.

Here's an example that generates product descriptions with different phrasings:

import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class ProductDescription(BaseModel):
    description: str

class Product(BaseModel):
    name: str
    features: List[str]

# Initialize the Gemini client with instructor
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def generate_descriptions(product: Product) -> List[ProductDescription]:
    prompts = [
        f"Describe the {product.name} to a tech-savvy audience, highlighting its advanced features.",
        f"Explain what the {product.name} does in simple terms for a general audience.",
        f"Write a persuasive product description for the {product.name} aimed at budget-conscious consumers.",
        f"Create an enthusiastic product description for the {product.name} focused on its unique selling points."
    ]

    descriptions = []
    for prompt in prompts:
        response = client.chat.completions.create(
            messages=[
                {"role": "system", "content": f"You are a product copywriter. Product features: {', '.join(product.features)}"},
                {"role": "user", "content": prompt}
            ],
            response_model=ProductDescription,
        )
        descriptions.append(response)

    return descriptions

# Example usage
if __name__ == "__main__":
    product = Product(
        name="SmartHome Hub 2000",
        features=["Voice control", "Energy monitoring", "Smart device integration", "Mobile app"]
    )

    descriptions = generate_descriptions(product)
    for i, desc in enumerate(descriptions, 1):
        print(f"Description {i}:")
        print(desc.description)
        print()

Output

Product Descriptions:

Description 1: Introducing the SmartHome Hub 2000: Effortlessly manage your smart home with seamless voice control, precise energy monitoring, and extensive smart device integration. Control your connected devices with simple voice commands, track energy consumption in real-time via our intuitive mobile app, and effortlessly integrate a wide array of smart devices into a unified ecosystem. The SmartHome Hub 2000 offers unparalleled control and efficiency for the modern connected home.

Description 2: Introducing the SmartHome Hub 2000, your new home's central control system! Effortlessly manage your smart devices, monitor your energy usage, all with the convenience of voice control and a user-friendly mobile app. Make your home smarter and more efficient – all from the palm of your hand or the sound of your voice.

Description 3: Introducing the SmartHome Hub 2000 – affordable smart home control, without compromise! Take control of your home's energy usage with our built-in energy monitoring system, easily managing your consumption and saving money. Use your voice to effortlessly manage lights, appliances, and more via voice control. Seamlessly integrate your existing smart devices through our dedicated mobile app. The SmartHome Hub 2000: Big features, small price tag.

Description 4: Revolutionize your home with the SmartHome Hub 2000! Effortlessly control your smart devices with the power of your voice – adjust lighting, set thermostats, and more, all hands-free. Track your energy usage with precision, saving you money and reducing your carbon footprint. Seamlessly integrate all your favorite smart devices into one intuitive system, and manage everything from our convenient mobile app. The SmartHome Hub 2000: Smart living, simplified.

4. Use Contrastive Prompts

You can use prompts that explicitly ask the model to generate a contrasting or different response from a provided example. This is similar to the KATE contrastive LM technique[^1].

Here's an example implementation:

import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel

# Load environment variables
load_dotenv()

# Configure Gemini with API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Define the ProductReview model
class ProductReview(BaseModel):
    review: str

# Initialize the Gemini client with instructor
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def generate_contrastive_review(product_name: str, original_review: str) -> ProductReview:
    prompt = f"""
    Here is an example of a typical product review for {product_name}:
    "{original_review}"

    Please generate a review that takes a different perspective and highlights different aspects of the product.
    Focus on aspects not mentioned in the original review and provide a contrasting opinion where appropriate.
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a product reviewer who provides diverse perspectives."},
            {"role": "user", "content": prompt}
        ],
        response_model=ProductReview
    )

    return response

if __name__ == "__main__":
    # Example usage
    product_name = "UltraBook Pro Laptop"
    original_review = """The UltraBook Pro Laptop is incredibly fast and has a beautiful display. 
    Its sleek design makes it perfect for professionals on the go. 
    The battery life is impressive, lasting all day on a single charge."""

    contrastive_review = generate_contrastive_review(product_name, original_review)

    print("Original Review:")
    print(original_review)
    print("\nContrastive Review:")
    print(contrastive_review.review)

Output

Original Review: The UltraBook Pro Laptop is incredibly fast and has a beautiful display. Its sleek design makes it perfect for professionals on the go. The battery life is impressive, lasting all day on a single charge.

Contrastive Review: While the UltraBook Pro boasts speed and aesthetics, I found its keyboard cramped and uncomfortable for extended use. The trackpad, though responsive, lacked the satisfying click of my previous laptop. The build quality feels premium, but the limited port selection proved frustrating. Ultimately, its portability comes at the cost of functionality for serious work. I also experienced some minor overheating issues during intensive tasks.

5. Ensemble Multiple Models

If you have access to a set of different LLMs, an ensemble approach can be effective for generating diverse outputs. Prompt each model separately and combine the results.

Here's an example that uses different Gemini and Groq models to generate diverse responses:

import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel
import google.generativeai as genai
import instructor
from groq import Groq

# Load environment variables
load_dotenv()

# Configure APIs
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
groq_client = Groq(api_key=os.getenv('GROQ_API_KEY'))

class Response(BaseModel):
    text: str
    model: str

def create_gemini_client():
    return instructor.from_gemini(
        client=genai.GenerativeModel(
            model_name="models/gemini-1.5-flash-latest"
        ),
        mode=instructor.Mode.GEMINI_JSON,
    )

def ensemble_generation(prompt: str) -> List[Response]:
    responses = []

    models_config = {
        'gemini': {
            'name': 'models/gemini-1.5-flash-latest',
            'provider': 'gemini'
        },
        'mixtral': {
            'name': 'mixtral-8x7b-32768',
            'provider': 'groq'
        },
        'llama3': {
            'name': 'llama-3.3-70b-versatile',
            'provider': 'groq'
        }
    }

    for model_id, config in models_config.items():
        try:
            if config['provider'] == 'gemini':
                client = create_gemini_client()
                response = client.chat.completions.create(
                    messages=[{"role": "user", "content": prompt}],
                    response_model=Response,
                )
                response.model = model_id
                responses.append(response)

            elif config['provider'] == 'groq':
                completion = groq_client.chat.completions.create(
                    messages=[{"role": "user", "content": prompt}],
                    model=config['name'],
                    temperature=0.7,
                    max_tokens=500
                )
                response = Response(
                    text=completion.choices[0].message.content,
                    model=model_id
                )
                responses.append(response)

        except Exception as e:
            print(f"Error with {model_id}: {str(e)}")
            continue

    return responses

def synthesize_responses(responses: List[Response], prompt: str) -> str:
    # Create a prompt for synthesis using Gemini
    synthesis_prompt = f"""
    I have collected responses from multiple AI models about this prompt: "{prompt}"

    Here are their responses:

    {chr(10).join([f"{r.model.upper()}:\n{r.text}\n" for r in responses])}

    Please create a comprehensive synthesis that:
    1. Combines the unique insights from each model
    2. Eliminates redundancy
    3. Presents information in a clear, structured way
    4. Maintains the most accurate and valuable points
    5. Creates a cohesive narrative

    Synthesized response:
    """

    try:
        client = create_gemini_client()
        synthesis = client.chat.completions.create(
            messages=[{"role": "user", "content": synthesis_prompt}],
            response_model=Response,
        )
        return synthesis.text
    except Exception as e:
        print(f"Error in synthesis: {str(e)}")
        return "Synthesis failed. Using individual responses instead."

def main():
    # Example prompt
    prompt = "Provide a short explanation of how climate change affects ocean ecosystems."

    # Generate responses from multiple models
    responses = ensemble_generation(prompt)

    # Print individual responses
    print("\nIndividual Model Responses:")
    print("=" * 50)
    for response in responses:
        print(f"\n{response.model.upper()} response:")
        print(response.text)
        print("-" * 50)

    # Generate and print synthesized response
    print("\nSynthesized Response:")
    print("=" * 50)
    synthesized = synthesize_responses(responses, prompt)
    print(synthesized)

if __name__ == "__main__":
    main()

Output

Individual Model Responses:

GEMINI response: Climate change significantly affects ocean ecosystems through rising temperatures, ocean acidification, and altered circulation patterns. Warming waters cause coral bleaching and shifts in species distribution, disrupting established food webs. Increased CO2 absorption leads to ocean acidification, harming shell-forming organisms like corals and shellfish. Changes in currents and upwelling affect nutrient availability, impacting primary productivity and the entire marine food chain.

MIXTRAL response: Climate change significantly impacts ocean ecosystems in various ways. Rising ocean temperatures cause coral bleaching, leading to the death of coral reefs that provide habitat for numerous marine species. Warmer waters also affect the distribution and abundance of fish and other marine life, disrupting food chains.

Ocean acidification, another consequence of climate change, results from increased carbon dioxide absorption by seawater. This causes a decrease in pH levels, which can harm shell-forming marine organisms such as mollusks, crustaceans, and corals, further threatening biodiversity.

Additionally, climate change contributes to sea level rise, causing coastal habitat loss and increased coastal erosion. More frequent and intense marine heatwaves, storms, and ocean deoxygenation also negatively impact ocean ecosystems, causing mass die-offs and disrupting ecosystem services.

LLAMA3 response: Climate change significantly impacts ocean ecosystems in several ways:

Ocean Warming: Rising temperatures alter the distribution and behavior of marine species, disrupting delicate food webs and potentially leading to extinctions.
Ocean Acidification: Increased CO2 absorption from the atmosphere causes a decrease in pH levels, making it harder for organisms like corals and shellfish to build their calcium carbonate shells and skeletons.
Sea-Level Rise: Coastal erosion and flooding threaten marine habitats, such as mangroves, salt marshes, and seagrasses, which provide crucial nursery grounds for many species.
Changes in Ocean Circulation: Shifts in ocean currents and circulation patterns can impact the distribution of heat, nutrients, and species, leading to changes in marine productivity and biodiversity.
Increased Disease and Predation: Climate stress can weaken marine organisms, making them more susceptible to disease and predation, which can have cascading effects on ecosystem balance.

These changes can have far-reaching consequences, including losses in fisheries, decreased water quality, and reduced ecosystem resilience.

Synthesized Response:
Climate change significantly impacts ocean ecosystems through rising temperatures, ocean acidification, altered circulation patterns, and sea level rise. Warming waters cause coral bleaching and shifts in species distribution, disrupting food webs. Increased CO2 absorption leads to ocean acidification, harming shell-forming organisms. Changes in currents and upwelling affect nutrient availability and marine productivity. Sea level rise causes coastal habitat loss and erosion. More frequent and intense marine heatwaves, storms, and ocean deoxygenation further negatively impact ocean ecosystems, causing mass die-offs and disrupting ecosystem services. These changes weaken marine organisms, increasing susceptibility to disease and predation, with far-reaching consequences such as losses in fisheries, decreased water quality, and reduced ecosystem resilience.

Conclusion

Generating a diverse set of coherent, high-quality outputs is an important challenge when building applications with LLMs. While it's tempting to just turn up the temperature, this can lead to inconsistent quality.

Instead, I recommend experimenting with prompt engineering techniques like shuffling inputs, using contrastive prompts, and varying prompt phrasing to generate diversity in a more targeted way. Implementing a simple output cache can also help avoid repetitive outputs over time.

For more advanced use cases, explore ensembling multiple models to combine their diverse knowledge and perspectives. This allows you to leverage their differences in a productive way.

The key is to get creative and embrace a spirit of experimentation. With modern LLMs, relatively small changes can have an outsized impact on the generated text. Try different approaches, see what works for your application, and iterate from there. Focus on techniques that generate valuable diversity while still maintaining a high standard of quality and coherence.

By combining these strategies thoughtfully, you can build LLM-powered applications that consistently generate fresh, varied, and engaging outputs to delight your users. The possibilities are endless - go out there and start exploring!