Why your LLM outputs are boring (and how to fix it)
Discover creative ways to generate diverse, high-quality outputs, without sacrificing coherence.
How to Increase Diversity while Maintaining Accuracy in LLM Outputs
When working with large language models (LLMs), it's common to want varied and diverse outputs, rather than the model repeatedly generating similar responses. The go-to solution is often to increase the temperature parameter, which makes outputs more random by flattening the probability distribution. However, simply increasing temperature can lead to incoherent or low quality outputs.
Fortunately, there are several alternative techniques we can use to generate a wider variety of outputs while still maintaining coherence and quality. In this post, we'll explore 5 practical strategies you can implement today to get more diverse results from your LLMs.
1. Shuffle Input Elements
One straightforward way to generate different outputs is to simply change the order of elements in your input prompt. LLMs are highly sensitive to the order of input tokens. Shuffling the order of items in a list, paragraphs in a document, or examples in a prompt can lead to surprisingly varied outputs, even with the same temperature setting.
For example, if your prompt includes a list of the user's past purchases as context, shuffling the order of those purchases each time will result in different product recommendations being generated. This is a simple trick that can generate variety without any complex changes.
Here's a Python example using Gemini Flash and Instructor:
import random
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List
# Load environment variables
load_dotenv()
# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
class ProductRecommendation(BaseModel):
product_name: str
reason: str
class Recommendations(BaseModel):
recommendations: List[ProductRecommendation]
# Initialize the instructor client with Gemini
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest",
),
mode=instructor.Mode.GEMINI_JSON,
)
def get_recommendations(past_purchases: List[str]) -> Recommendations:
# Shuffle the past purchases
shuffled_purchases = random.sample(past_purchases, len(past_purchases))
prompt = f"""
Based on the user's past purchases, recommend 3 products they might like:
Past purchases: {', '.join(shuffled_purchases)}
Provide recommendations in the following format:
1. Product name: [product name]
Reason: [reason for recommendation]
2. Product name: [product name]
Reason: [reason for recommendation]
3. Product name: [product name]
Reason: [reason for recommendation]
"""
response = client.messages.create(
messages=[
{"role": "user", "content": prompt}
],
response_model=Recommendations,
)
return response
# Example usage
if __name__ == "__main__":
past_purchases = ["smartphone", "laptop", "wireless earbuds", "fitness tracker", "smart home speaker"]
recommendations = get_recommendations(past_purchases)
for i, rec in enumerate(recommendations.recommendations, 1):
print(f"{i}. {rec.product_name}: {rec.reason}")
??? note "Output" 1. Smartwatch: Since you own a fitness tracker, you might be interested in a smartwatch with fitness tracking capabilities and other smart features. 2. Noise-canceling Headphones: Given your purchase of wireless earbuds, you might appreciate the superior noise cancellation of higher-end headphones. 3. Smart Home Hub: Your purchase of a smart home speaker suggests interest in building a complete smart home ecosystem, so a smart home hub to centrally control your devices may be useful.
2. Manage a Cache of Recent Outputs
LLMs have no built-in memory of their recent outputs. To avoid repetitive responses, we can maintain a cache of the most recently generated outputs for a given task. Before returning a new generation, check it against the cache. If it's too similar to a recent response, prompt the model to try again.
Here's an example implementation:
import random
from collections import deque
import time
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
import difflib
# Load environment variables
load_dotenv()
# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Configure Gemini client
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest",
),
mode=instructor.Mode.GEMINI_JSON,
)
class Joke(BaseModel):
setup: str
punchline: str
class SimpleRateLimiter:
def __init__(self, rpm_limit=15):
self.rpm_limit = rpm_limit
self.request_times = deque(maxlen=rpm_limit)
def wait(self):
current_time = time.time()
if len(self.request_times) == self.rpm_limit:
time_to_wait = 60 - (current_time - self.request_times[0])
if time_to_wait > 0:
time.sleep(time_to_wait)
self.request_times.append(time.time())
class JokeGenerator:
def __init__(self, cache_size=5, max_attempts=10):
self.cache = deque(maxlen=cache_size)
self.rate_limiter = SimpleRateLimiter()
self.max_attempts = max_attempts
def generate_joke(self) -> Joke:
for attempt in range(self.max_attempts):
self.rate_limiter.wait()
joke = self._get_joke_from_model(attempt)
if not self._is_similar_to_cached(joke):
self.cache.append(joke)
return joke
# If the joke is similar, we'll loop and try again
# If we've reached this point, we couldn't generate a unique joke
raise Exception("Unable to generate a unique joke after maximum attempts")
def _get_joke_from_model(self, attempt) -> Joke:
themes = ["technology", "food", "animals", "sports", "music", "work", "school"]
theme = random.choice(themes)
prompt = f"Tell me a short, original joke about {theme}. This is attempt {attempt + 1}, so make it different from common jokes. Provide the setup and punchline separately."
response = client.messages.create(
messages=[
{"role": "user", "content": prompt}
],
response_model=Joke,
)
return response
def _is_similar_to_cached(self, joke: Joke, similarity_threshold=0.6) -> bool:
for cached_joke in self.cache:
setup_similarity = difflib.SequenceMatcher(None, joke.setup.lower(), cached_joke.setup.lower()).ratio()
punchline_similarity = difflib.SequenceMatcher(None, joke.punchline.lower(), cached_joke.punchline.lower()).ratio()
if setup_similarity > similarity_threshold or punchline_similarity > similarity_threshold:
return True
return False
# Example usage
generator = JokeGenerator()
for i in range(5):
try:
joke = generator.generate_joke()
print(f"Joke {i+1}:")
print(f"Setup: {joke.setup}")
print(f"Punchline: {joke.punchline}")
print()
except Exception as e:
print(f"Error: {e}")
break
??? note "Output" Joke 1: Setup: Why was the JavaScript developer sad? Punchline: Because they didn't Node how to Express themselves!
Joke 2:
Setup: Why did the employee bring a ladder to the meeting?
Punchline: Because they heard it was going to be a high-level discussion!
Joke 3:
Setup: Why did the sloth get fired from his job as a zookeeper?
Punchline: Because he was always moving at a snail's pace!
Joke 4:
Setup: Why was the smartphone sweating?
Punchline: Because it had so many apps!
Joke 5:
Setup: Why did the drummer break up with the singer?
Punchline: Because they couldn't find a common beat.
3. Vary Prompt Phrasing
The phrasing of your prompt has a huge influence on the style and content of the generated text. Experiment with different ways of framing the task to elicit different types of responses.
Here's an example that generates product descriptions with different phrasings:
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List
# Load environment variables
load_dotenv()
# Configure Gemini
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
class ProductDescription(BaseModel):
description: str
class Product(BaseModel):
name: str
features: List[str]
# Initialize the Gemini client with instructor
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest",
),
mode=instructor.Mode.GEMINI_JSON,
)
def generate_descriptions(product: Product) -> List[ProductDescription]:
prompts = [
f"Describe the {product.name} to a tech-savvy audience, highlighting its advanced features.",
f"Explain what the {product.name} does in simple terms for a general audience.",
f"Write a persuasive product description for the {product.name} aimed at budget-conscious consumers.",
f"Create an enthusiastic product description for the {product.name} focused on its unique selling points."
]
descriptions = []
for prompt in prompts:
response = client.chat.completions.create(
messages=[
{"role": "system", "content": f"You are a product copywriter. Product features: {', '.join(product.features)}"},
{"role": "user", "content": prompt}
],
response_model=ProductDescription,
)
descriptions.append(response)
return descriptions
# Example usage
if __name__ == "__main__":
product = Product(
name="SmartHome Hub 2000",
features=["Voice control", "Energy monitoring", "Smart device integration", "Mobile app"]
)
descriptions = generate_descriptions(product)
for i, desc in enumerate(descriptions, 1):
print(f"Description {i}:")
print(desc.description)
print()
??? note "Output" Product Descriptions: -------------------------------------------------- Description 1: Introducing the SmartHome Hub 2000: Effortlessly manage your smart home with seamless voice control, precise energy monitoring, and extensive smart device integration. Control your connected devices with simple voice commands, track energy consumption in real-time via our intuitive mobile app, and effortlessly integrate a wide array of smart devices into a unified ecosystem. The SmartHome Hub 2000 offers unparalleled control and efficiency for the modern connected home.
Description 2:
Introducing the SmartHome Hub 2000, your new home's central control system! Effortlessly manage your smart devices, monitor your energy usage, all with the convenience of voice control and a user-friendly mobile app. Make your home smarter and more efficient – all from the palm of your hand or the sound of your voice.
Description 3:
Introducing the SmartHome Hub 2000 – affordable smart home control, without compromise! Take control of your home's energy usage with our built-in energy monitoring system, easily managing your consumption and saving money. Use your voice to effortlessly manage lights, appliances, and more via voice control. Seamlessly integrate your existing smart devices through our dedicated mobile app. The SmartHome Hub 2000: Big features, small price tag.
Description 4:
Revolutionize your home with the SmartHome Hub 2000! Effortlessly control your smart devices with the power of your voice – adjust lighting, set thermostats, and more, all hands-free. Track your energy usage with precision, saving you money and reducing your carbon footprint. Seamlessly integrate all your favorite smart devices into one intuitive system, and manage everything from our convenient mobile app. The SmartHome Hub 2000: Smart living, simplified.
4. Use Contrastive Prompts
You can use prompts that explicitly ask the model to generate a contrasting or different response from a provided example. This is similar to the KATE contrastive LM technique[^1].
Here's an example implementation:
import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
# Load environment variables
load_dotenv()
# Configure Gemini with API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Define the ProductReview model
class ProductReview(BaseModel):
review: str
# Initialize the Gemini client with instructor
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest",
),
mode=instructor.Mode.GEMINI_JSON,
)
def generate_contrastive_review(product_name: str, original_review: str) -> ProductReview:
prompt = f"""
Here is an example of a typical product review for {product_name}:
"{original_review}"
Please generate a review that takes a different perspective and highlights different aspects of the product.
Focus on aspects not mentioned in the original review and provide a contrasting opinion where appropriate.
"""
response = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a product reviewer who provides diverse perspectives."},
{"role": "user", "content": prompt}
],
response_model=ProductReview
)
return response
if __name__ == "__main__":
# Example usage
product_name = "UltraBook Pro Laptop"
original_review = """The UltraBook Pro Laptop is incredibly fast and has a beautiful display.
Its sleek design makes it perfect for professionals on the go.
The battery life is impressive, lasting all day on a single charge."""
contrastive_review = generate_contrastive_review(product_name, original_review)
print("Original Review:")
print(original_review)
print("\nContrastive Review:")
print(contrastive_review.review)
5. Ensemble Multiple Models
If you have access to a set of different LLMs, an ensemble approach can be effective for generating diverse outputs. Prompt each model separately and combine the results.
Here's an example that uses different Gemini and Groq models to generate diverse responses:
import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel
import google.generativeai as genai
import instructor
from groq import Groq
# Load environment variables
load_dotenv()
# Configure APIs
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
groq_client = Groq(api_key=os.getenv('GROQ_API_KEY'))
class Response(BaseModel):
text: str
model: str
def create_gemini_client():
return instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest"
),
mode=instructor.Mode.GEMINI_JSON,
)
def ensemble_generation(prompt: str) -> List[Response]:
responses = []
models_config = {
'gemini': {
'name': 'models/gemini-1.5-flash-latest',
'provider': 'gemini'
},
'mixtral': {
'name': 'mixtral-8x7b-32768',
'provider': 'groq'
},
'llama3': {
'name': 'llama-3.3-70b-versatile',
'provider': 'groq'
}
}
for model_id, config in models_config.items():
try:
if config['provider'] == 'gemini':
client = create_gemini_client()
response = client.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
response_model=Response,
)
response.model = model_id
responses.append(response)
elif config['provider'] == 'groq':
completion = groq_client.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
model=config['name'],
temperature=0.7,
max_tokens=500
)
response = Response(
text=completion.choices[0].message.content,
model=model_id
)
responses.append(response)
except Exception as e:
print(f"Error with {model_id}: {str(e)}")
continue
return responses
def synthesize_responses(responses: List[Response], prompt: str) -> str:
# Create a prompt for synthesis using Gemini
synthesis_prompt = f"""
I have collected responses from multiple AI models about this prompt: "{prompt}"
Here are their responses:
{chr(10).join([f"{r.model.upper()}:\n{r.text}\n" for r in responses])}
Please create a comprehensive synthesis that:
1. Combines the unique insights from each model
2. Eliminates redundancy
3. Presents information in a clear, structured way
4. Maintains the most accurate and valuable points
5. Creates a cohesive narrative
Synthesized response:
"""
try:
client = create_gemini_client()
synthesis = client.chat.completions.create(
messages=[{"role": "user", "content": synthesis_prompt}],
response_model=Response,
)
return synthesis.text
except Exception as e:
print(f"Error in synthesis: {str(e)}")
return "Synthesis failed. Using individual responses instead."
def main():
# Example prompt
prompt = "Provide a short explanation of how climate change affects ocean ecosystems."
# Generate responses from multiple models
responses = ensemble_generation(prompt)
# Print individual responses
print("\nIndividual Model Responses:")
print("=" * 50)
for response in responses:
print(f"\n{response.model.upper()} response:")
print(response.text)
print("-" * 50)
# Generate and print synthesized response
print("\nSynthesized Response:")
print("=" * 50)
synthesized = synthesize_responses(responses, prompt)
print(synthesized)
if __name__ == "__main__":
main()
Conclusion
Generating a diverse set of coherent, high-quality outputs is an important challenge when building applications with LLMs. While it's tempting to just turn up the temperature, this can lead to inconsistent quality.
Instead, I recommend experimenting with prompt engineering techniques like shuffling inputs, using contrastive prompts, and varying prompt phrasing to generate diversity in a more targeted way. Implementing a simple output cache can also help avoid repetitive outputs over time.
For more advanced use cases, explore ensembling multiple models to combine their diverse knowledge and perspectives. This allows you to leverage their differences in a productive way.
The key is to get creative and embrace a spirit of experimentation. With modern LLMs, relatively small changes can have an outsized impact on the generated text. Try different approaches, see what works for your application, and iterate from there. Focus on techniques that generate valuable diversity while still maintaining a high standard of quality and coherence.
By combining these strategies thoughtfully, you can build LLM-powered applications that consistently generate fresh, varied, and engaging outputs to delight your users. The possibilities are endless - go out there and start exploring!