Your audio recordings are a database you never queried
One prompt turns a folder of meeting recordings, sales calls, and customer interviews into structured insights you can actually act on.
You have hours of sales calls, team meetings, and customer interviews piling up in a folder somewhere. Nobody is going to relisten to any of it. The action items, objections, and feature requests that justified the recording in the first place are sitting in audio nobody will ever open again.
You do not need a transcription service. You need one prompt that pulls the structure you actually care about out of the recording:
import mantis
action_items = mantis.extract("quarterly_planning_q2.mp3",
"Extract all action items, who is responsible, and deadlines")
That is the whole pattern. By the end of this post you will know:
- How to turn any recording into structured output with one prompt
- Two extraction shapes that pay back the day you ship them
- The specific failure modes you will hit after the first week
- One Python library that does this in five lines so you can skip the plumbing
The pattern
Audio is text you have not transcribed yet. The moment you treat a recording as queryable data, the question stops being "how do I summarise this" and becomes "what do I want to know."
Three pieces:
- A recording (mp3, wav, m4a — whatever your phone or Zoom produced).
- A prompt that names the fields you want.
- A model that handles the audio in one shot, so you skip the transcribe-then-prompt round trip.
The library I wrote for this is Mantis AI. It wraps Gemini 1.5 Flash, which accepts audio directly, so the extraction is one call instead of two. You will need a Google AI Studio API key. Install it with pip install mantisai. The rest of this post is what to do with it.
Sales calls
Sales teams record calls for "training and analysis" and then never play them again. The recordings are the most expensive audio in your organisation — they cost a closing rep an hour each — and the insight density is high. Pain points, objections, competitive mentions, decision-makers, next steps. All of it is in the call. None of it is in your CRM.
One prompt:
import mantis
sales_intelligence = mantis.extract("enterprise_client_call.wav",
"Analyze this sales call and extract: 1) Customer pain points, 2) Objections raised, 3) Product features of interest, 4) Competitive mentions, 5) Budget considerations, 6) Decision-makers, 7) Next steps")
print(sales_intelligence)
The output is the kind of thing a rep would write up if they had two hours and discipline neither of them have:
Objections:
- Implementation timeline (need solution before Q3)
- Price point higher than competing solution
- Concerned about user training requirements
Budget: $75K allocated, payment terms spread across fiscal year, ROI within 6 months.
Decision-makers: Final approval from CTO. Evaluation committee includes IT Director
and Operations Manager. Board presentation April 10.
That paragraph is the CRM update your rep was never going to write. Pipe it into Salesforce or HubSpot and now your forecast is built on what was actually said, not what the rep remembered to type.
The compound benefit: once you run this across a quarter of calls, you can ask aggregate questions. Which objection appears in 60% of lost deals. Which competitor name shows up most. Which feature request keeps surfacing from the same segment. The first call gives you a CRM update. The hundredth call gives you a product strategy.
Customer interviews
Product and research teams run interviews to learn what to build next. The interviews are recorded, mostly transcribed, occasionally summarised, and almost never queried at the level the team needs to make a decision. "What did all five users say about onboarding" is a 90-minute exercise in scrolling.
Same pattern, different prompt:
import mantis
interview_insights = mantis.extract("customer_feedback_session.m4a",
"Analyze this customer interview and extract: 1) Pain points, 2) Feature requests, 3) Positive feedback, 4) Use cases, 5) Notable quotes, 6) Overall sentiment")
print(interview_insights)
The output puts the parts that survive a roadmap meeting at the top:
Pain points:
- Search returns irrelevant results
- Mobile experience significantly worse than desktop
- Export fails on datasets over 500 records
Notable quotes:
- "The mobile experience feels like an afterthought rather than a core part of the product."
- "When it works, it saves me about 5 hours every week."
Two things change once this is running. First, the team stops arguing about what users said and starts arguing about what to do about it — because the quotes are right there. Second, the cost of a customer interview drops far enough that you run more of them, which is the actual moat.
The prompt is the product
The code above is five lines. The thing that determines whether the output is useful is not the library or the model — it is the field list in the prompt. "Summarise this call" gets you something a junior would write. "Extract the seven things my pipeline review needs by Friday" gets you the pipeline review.
Two rules that hold up across recordings:
- Number your fields. Models follow numbered lists more reliably than prose. "Extract: 1) X, 2) Y, 3) Z" beats "extract X, Y, and Z" in every test I have run.
- Name the decision the output feeds. A prompt written for "what the CRM needs" produces different output than "what the call summary needs," even when the fields overlap. Decide the consumer first, then write the field list.
If you cannot write the field list, you do not have a clear enough idea of what you want from the recording. Run one call by hand first. The fields you take notes on are the fields the prompt should extract.
What you will hit in the first month
Three predictions for the team that starts piping calls through this:
- Your first month of action items will show that the same three people commit to 70% of the work. Action item extraction does not just save time. It is an accountability instrument the team has never had before. Expect mild discomfort from whoever has been coasting.
- You will find compliance language in old sales calls you did not know you had promised. Reps say things in the moment. When you extract "commitments made by the seller" across a quarter of calls, some of those commitments will be ones legal would not have signed off on. This is a feature, not a bug — but it is a feature that needs a process.
- Your sales team will resist call recording until the first time it saves them a deal. The resistance is predictable. The reversal is also predictable. The deal that gets saved is usually one where the rep forgot the prospect's actual concern and the extracted summary surfaced it before the follow-up. After that, the resistance disappears.
The real lesson
You already have the data. You recorded it. The cost of extracting structure from it is now one model call and a numbered list. The question is no longer "should we transcribe this" — the question is "what would we ask a database of every call we have ever had, if we had one."
The answer to that question is the prompt. Write it, point it at a recording, and you have one.
If you have a folder of calls you keep meaning to do something with, send me one transcript and the question you want answered from it, and I will write the extraction prompt that gets you there. [email protected].