How to Unlock Advanced Reasoning in LLMs

Chain-of-thought prompting encourages the LLM to break down its reasoning into a step-by-step process before providing a final answer. This has several key benefits:

Improved accuracy on complex reasoning tasks
Greater transparency into the model's thought process
Reduced hallucination by grounding the output in a logical sequence

While simply adding "Let's think step by step" to your prompts can help, there are more advanced techniques to make CoT even more effective. Here are three key strategies.

Technique 1: Target the Reasoning Chain

The default CoT approach relies on the model to come up with reasoning steps on its own. But you can guide it to a more focused, relevant chain of reasoning.

Instead of a generic "Let's think step by step", provide a targeted reasoning framework in your prompt:

import instructor
import google.generativeai as genai
from pydantic import BaseModel
from typing import List
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Configure Gemini API
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class AnalysisResult(BaseModel):
    key_data_points: List[str]
    calculations: List[str]
    conclusion: str

def targeted_cot_analysis(problem: str) -> AnalysisResult:
    # Initialize Gemini client with instructor
    client = instructor.from_gemini(
        client=genai.GenerativeModel(
            model_name="models/gemini-1.5-flash-latest",
        ),
        mode=instructor.Mode.GEMINI_JSON,
    )

    prompt = f"""
    Analyze the following problem using this targeted reasoning framework:

    1. First, identify the key data points we need to answer the question. 
    2. Then, walk through the calculations required step-by-step.
    3. Finally, state the conclusion clearly and concisely.

    Problem: {problem}

    Provide your analysis:
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are an expert data analyst."},
            {"role": "user", "content": prompt}
        ],
        response_model=AnalysisResult,
    )

    return response

# Example usage
if __name__ == "__main__":
    problem = "A company's revenue increased by 20% last year, from $1 million to $1.2 million. If expenses remained constant at $800,000, what was the change in profit?"

    result = targeted_cot_analysis(problem)
    print("Key Data Points:")
    for point in result.key_data_points:
        print(f"- {point}")
    print("\nCalculations:")
    for step in result.calculations:
        print(f"- {step}")
    print(f"\nConclusion: {result.conclusion}")

Output

Key Data Points:
- Initial revenue ($1,000,000)
- Revenue increase (20%)
- Final revenue ($1,200,000)
- Constant expenses ($800,000)

Calculations:
- Initial profit = Initial revenue - Expenses = $1,000,000 - $800,000 = $200,000
- Final profit = Final revenue - Expenses = $1,200,000 - $800,000 = $400,000
- Change in profit = Final profit - Initial profit = $400,000 - $200,000 = $200,000

Conclusion: The company's profit increased by $200,000 last year.

By outlining the key steps, you focus the model on the most important aspects of the problem. This leads to clearer, more direct outputs.

Technique 2: Include Checkpoints and Verifications

To further reduce hallucination, add checkpoints into your CoT prompts that ask the model to verify its work. For example:

import os
from dotenv import load_dotenv
import instructor
import google.generativeai as genai
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini API
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class VerifiedCalculation(BaseModel):
    extracted_numbers: List[float]
    values_check: str
    calculation_steps: List[str]
    final_answer: float
    reasonability_check: str

def verified_cot_calculation(problem: str) -> VerifiedCalculation:
    # Initialize Gemini client with instructor
    client = instructor.from_gemini(
        client=genai.GenerativeModel(
            model_name="models/gemini-1.5-flash-latest",
        ),
        mode=instructor.Mode.GEMINI_JSON,
    )

    prompt = f"""
    Solve this problem step-by-step with verification:

    1. Extract the relevant numbers from the problem statement. 
    2. Before calculating, check that we have all necessary values.
    3. Perform the calculation, showing each step.
    4. Sense check the final answer to verify it's reasonable.

    Problem: {problem}

    Provide your solution:
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "user", "content": prompt}
        ],
        response_model=VerifiedCalculation,
    )

    return response

def main():
    # Example usage
    problem = "If a train travels 120 miles in 2 hours, what is its average speed in miles per hour?"

    try:
        result = verified_cot_calculation(problem)

        print(f"Extracted numbers: {result.extracted_numbers}")
        print(f"Values check: {result.values_check}")
        print("Calculation steps:")
        for step in result.calculation_steps:
            print(f"- {step}")
        print(f"Final answer: {result.final_answer}")
        print(f"Reasonability check: {result.reasonability_check}")

    except Exception as e:
        print(f"An error occurred: {str(e)}")

if __name__ == "__main__":
    main()

Output

Extracted numbers: [120.0, 2.0]
Values check: All necessary values (distance and time) are present.
Calculation steps:
- Average speed is calculated by dividing distance by time.
- Average speed = Distance / Time
- Average speed = 120 miles / 2 hours
- Average speed = 60 miles per hour
Final answer: 60.0
Reasonability check: The answer (60 mph) is reasonable. It's a typical speed for a train, and it aligns with the expectation that covering 120 miles in 2 hours requires a substantial speed.

Here's another simple example of how you might implement a CoT prompt with verification and track its performance:

import os
from dotenv import load_dotenv
import instructor
import google.generativeai as genai
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini API
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))

class VerifiedAnswer(BaseModel):
    reasoning_steps: List[str]
    assumptions: List[str]
    final_answer: str
    confidence_score: float

# Initialize Gemini client
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

def cot_with_verification(question: str) -> VerifiedAnswer:
    prompt = f"""
    Answer the following question using chain-of-thought reasoning. 
    Include verification steps and list any assumptions you make.

    Question: {question}

    Follow this format:
    1. Break down the problem
    2. List any assumptions
    3. Show your reasoning step-by-step
    4. Verify your logic
    5. State your final answer
    6. Provide a confidence score (0-1) for your answer

    Your response:
    """

    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        response_model=VerifiedAnswer,
    )

    return response

def main():
    # Example questions for demonstration
    questions = [
        "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?",
        "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?",
    ]

    for question in questions:
        try:
            result = cot_with_verification(question)
            print(f"\nQuestion: {question}")
            print("\nReasoning Steps:")
            for step in result.reasoning_steps:
                print(f"- {step}")
            print(f"\nAssumptions: {result.assumptions}")
            print(f"Final Answer: {result.final_answer}")
            print(f"Confidence Score: {result.confidence_score}")
            print("\n---")
        except Exception as e:
            print(f"Error processing question: {str(e)}")

if __name__ == "__main__":
    main()

Output

Question: If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?

Reasoning Steps: - It takes 5 machines 5 minutes to make 5 widgets. This means that each machine makes one widget in 5 minutes. - If each machine makes one widget in 5 minutes, then 100 machines will also take 5 minutes to make 100 widgets.

Assumptions: ['Each machine works at the same rate.', 'There are no bottlenecks or other factors that would slow down production.'] Final Answer: It will take 5 minutes for 100 machines to make 100 widgets. Confidence Score: 1.0

Question: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

Reasoning Steps: - Let's use variables to represent the unknowns: Let the cost of the ball be 'x' dollars. The bat costs $1.00 more than the ball, so the bat costs 'x + $1.00' dollars. - The total cost of the bat and ball is $1.10.

Therefore, we can set up an equation: x + (x + $1.00) = $1.10
- Combine like terms: 2x + $1.00 = $1.10
- Subtract $1.00 from both sides: 2x = $0.10
- Divide both sides by 2: x = $0.05

Assumptions: ["The problem's statement is accurate and complete.", 'There are no hidden costs or discounts.'] Final Answer: The ball costs $0.05. Confidence Score: 1.0

Technique 3: Provide Worked Examples

Few-shot prompting with examples of effective chains of reasoning can steer the model to produce higher quality CoT outputs. In your prompt, include 2-3 samples demonstrating the kind of step-by-step logic you want.

Here's an example for a data analysis task:

import os
from dotenv import load_dotenv
import google.generativeai as genai
import instructor
from pydantic import BaseModel
from typing import List

# Load environment variables
load_dotenv()

# Configure Gemini with API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

class DataAnalysis(BaseModel):
    steps: List[str]
    conclusion: str

def data_analysis_with_examples(data: str) -> DataAnalysis:
    # Initialize the Gemini client with instructor
    client = instructor.from_gemini(
        client=genai.GenerativeModel(
            model_name="models/gemini-1.5-flash-latest",
        ),
        mode=instructor.Mode.GEMINI_JSON,
    )

    prompt = f"""
    Here are some examples of step-by-step data analysis:

    Example 1: 
    Step 1: Identify the key metrics - daily active users and conversion rate.
    Step 2: Calculate the baseline values - average over the past 30 days.
    Step 3: Compare to industry benchmarks to evaluate current performance.
    Step 4: Recommend 10% improvement in conversion to reach top quartile.

    Example 2:
    Step 1: Determine goal - improve customer retention.
    Step 2: Pull monthly churn data and cohort it by signup month. 
    Step 3: Identify the cohorts with the highest churn rates.
    Step 4: Dig into reasons for churn - analyze cancellation surveys.
    Step 5: Hypothesize that improving onboarding will reduce early churn.

    Now, analyze this data step-by-step to determine the top growth opportunities:

    {data}

    Provide your analysis:
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are an expert data analyst."},
            {"role": "user", "content": prompt}
        ],
        response_model=DataAnalysis,
    )

    return response

# Example usage
if __name__ == "__main__":
    data = """
    Monthly Active Users: 50,000
    New Sign-ups per Month: 5,000
    Churn Rate: 8%
    Average Revenue per User: $25
    Customer Satisfaction Score: 7.5/10
    """

    result = data_analysis_with_examples(data)
    print("Analysis Steps:")
    for step in result.steps:
        print(f"- {step}")
    print(f"\nConclusion: {result.conclusion}")

Output

Analysis Steps:
- Step 1: Identify Key Metrics: Focus on Monthly Active Users (MAU), New Sign-ups, Churn Rate, Average Revenue Per User (ARPU), and Customer Satisfaction Score (CSAT).
- Step 2: Analyze Current Metrics: MAU is 50,000, with 5,000 new sign-ups monthly. Churn rate is at 8%, ARPU is $25, and CSAT is 7.5/10.
- Step 3: Identify Areas for Improvement: The 8% churn rate presents a significant opportunity for growth. While MAU is stable, increasing ARPU and CSAT could significantly boost revenue.
- Step 4: Explore Growth Opportunities: Investigate reasons behind the 8% churn rate through surveys or in-app feedback. Explore strategies to improve onboarding and customer support to increase CSAT and reduce churn. Consider potential upselling or cross-selling opportunities to boost ARPU.
- Step 5: Prioritize Actions: Focus on reducing churn to increase profitability and MAU. Then, explore opportunities to increase ARPU and CSAT to further drive revenue growth. Implement A/B testing for any proposed changes to quantify their impact.

Conclusion: The analysis indicates that focusing on reducing the 8% churn rate presents the most significant opportunity for growth. Improving customer onboarding, enhancing customer support and exploring strategies to increase customer lifetime value through upselling or cross-selling will likely yield significant improvements. Continuous monitoring of key metrics and A/B testing will be crucial to optimize growth strategies.

By demonstrating what you want, you help the model produce similar well-structured lines of reasoning. The examples act as guard rails to keep the CoT on track.

Implementing Advanced Chain-of-Thought

Here's a quick guide to implementing these techniques in your LLM application:

Audit your existing prompts and identify areas where chain-of-thought can add value - look for complex, multi-step problems.
Experiment with adding targeted reasoning frameworks to guide the model's thought process. Start simple and iterate.
Build verification steps and checkpoints into longer chains of reasoning. Have the model flag if it is making assumptions.
Curate a library of examples of effective chains of thought for common problem types. Inject relevant examples into prompts.
Monitor quality and hallucination rate of CoT outputs vs. regular prompts. Tweak your prompting approach based on results.

By systematically leveling up your chain-of-thought prompting, you can make your LLM applications more robust, accurate and explainable. The incremental effort to implement advanced CoT techniques can pay off in significantly better end outputs.