
Your LLM Can Book Flights Now — Here's How Tool Calling Works
An LLM by itself can only generate text. It can't check the weather, query a database, or send an email. Tool calling gives it hands. Here's the mechanism, the agentic loop, and the security rules.
An LLM by itself can only do one thing: generate text.
It can't check the weather. It can't query a database. It can't send an email. It can't book an appointment. It can't pull up patient records. It can't transfer a phone call. It has no hands.
Tool calling gives it hands.
What Tool Calling Actually Is
When you send a message to an LLM with tool calling enabled, you also send a list of available tools:
{
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
}
},
{
"name": "book_appointment",
"description": "Book an appointment at the clinic",
"parameters": {
"patient_id": {"type": "string"},
"date": {"type": "string", "format": "YYYY-MM-DD"},
"time": {"type": "string", "format": "HH:MM"},
"doctor": {"type": "string"}
}
},
{
"name": "send_sms",
"description": "Send an SMS to a phone number",
"parameters": {
"to": {"type": "string"},
"message": {"type": "string"}
}
}
]
}
The LLM sees this tool list as part of its context. When the user says something that requires action, the model doesn't generate a text response. Instead, it generates a structured tool call:
{
"tool_call": {
"name": "book_appointment",
"arguments": {
"patient_id": "P12345",
"date": "2026-03-25",
"time": "10:00",
"doctor": "Dr. Mehta"
}
}
}
Your application receives this tool call, executes it (actually books the appointment in your system), and sends the result back to the LLM:
{
"tool_result": {
"name": "book_appointment",
"result": "Appointment confirmed. Booking ID: APT-789."
}
}
The LLM then generates a text response to the user: "I've booked your appointment with Dr. Mehta on March 25th at 10 AM. Your booking ID is APT-789."
How the Model Decides When to Call a Tool
This is the part that feels like magic but isn't.
The model doesn't have special "tool-calling logic." It's still doing next-token prediction. But during training (specifically, during fine-tuning for tool use), it learned patterns like:
- User asks about weather -> generate a
get_weathertool call - User asks to book something -> generate a
book_appointmenttool call - User asks a general question -> generate a text response (no tool)
The model learned to produce structured JSON when the context suggests an action is needed, and to produce natural text when it doesn't.
The decision happens at the token level. After processing the user's message and the tool definitions, the model's next-token prediction either starts generating a tool call structure ({"tool_call":) or starts generating a text response ("Sure, I can help...").
The tool descriptions matter enormously. The model uses the description field to decide when to use a tool. A vague description leads to the model calling the tool at wrong times or not calling it when it should.
// BAD: Vague description
{"name": "lookup", "description": "Look up information"}
// GOOD: Specific description
{"name": "get_patient_record", "description": "Retrieve a patient's medical record by phone number. Use this when the patient asks about their appointments, medications, allergies, or treatment history."}
The more specific your description, the more accurately the model decides when to use the tool.
The Agentic Loop
Single tool calls are useful. But the real power is in chaining — the model calls a tool, reads the result, and decides what to do next. Maybe it calls another tool. Maybe it responds to the user. Maybe it calls three tools in sequence.
This creates a loop:
User message
-> LLM decides: call a tool
-> Application executes tool, returns result
-> LLM reads result, decides: call another tool
-> Application executes tool, returns result
-> LLM reads result, decides: respond to user
-> Text response to user
This is what makes an "agent." Not a chatbot that can call one function. A system that can chain multiple actions, react to results, and adapt its plan based on what it learns.
Example: IVF clinic voice agent
Patient: "I need to reschedule my monitoring appointment, and can you also tell me my last estradiol level?"
The agent thinks:
- I need to look up this patient first
- Then get their current appointment
- Then get their lab results
- Then ask when they want to reschedule
- Then book the new appointment
- Then send a confirmation
Step 1: tool_call -> get_patient_by_phone("+91-9876543210")
Result: {patient_id: "P12345", name: "Priya Sharma"}
Step 2: tool_call -> get_appointments("P12345")
Result: {next_appointment: "Mar 22, 10 AM, Dr. Mehta, Monitoring"}
Step 3: tool_call -> get_lab_results("P12345", type="estradiol")
Result: {latest: "450 pg/mL", date: "Mar 18"}
Step 4: text_response -> "Priya, your last estradiol was 450 on March 18th.
Your current monitoring is March 22 at 10 AM with Dr. Mehta.
What date would you like to reschedule to?"
[Patient says "Thursday the 27th, same time"]
Step 5: tool_call -> reschedule_appointment("P12345", new_date="2026-03-27", time="10:00")
Result: {confirmed: true, booking_id: "APT-801"}
Step 6: tool_call -> send_sms("+91-9876543210", "Monitoring rescheduled to Mar 27, 10 AM, Dr. Mehta. Booking: APT-801")
Result: {sent: true}
Step 7: text_response -> "Done. Your monitoring is rescheduled to Thursday March 27th at 10 AM.
I've sent you an SMS confirmation."
Six tool calls, two text responses, one conversational turn. The model orchestrated all of this from a single patient request. No human coded the sequence. The model decided the order, the tools, and the parameters based on the conversation context and tool descriptions.
Parallel Tool Calling
Some requests need multiple independent lookups. The model can call multiple tools simultaneously:
{
"tool_calls": [
{"name": "get_appointments", "arguments": {"patient_id": "P12345"}},
{"name": "get_lab_results", "arguments": {"patient_id": "P12345", "type": "all"}},
{"name": "get_medication_schedule", "arguments": {"patient_id": "P12345"}}
]
}
These three calls are independent — they don't need each other's results. Your application can execute them concurrently and return all results at once. The model then synthesizes all three results into a single response.
Parallel tool calling reduces latency significantly for multi-lookup requests. Instead of three sequential round-trips (model -> tool -> model -> tool -> model -> tool -> model), you get one round-trip with three parallel executions.
Structured Output: Guaranteeing the Format
Tool calling relies on the model producing valid JSON. But LLMs are probabilistic — they can produce malformed JSON, missing fields, or wrong types.
Structured output (also called "constrained generation") forces the model to produce valid JSON that matches a specific schema. The model's token generation is constrained at each step — it can only produce tokens that result in valid JSON according to the schema.
Why Tool Calling Changes Everything
Before tool calling, LLMs were sophisticated text generators. Smart, capable, but ultimately passive. They could tell you what to do but couldn't do it.
Tool calling inverts this. The LLM becomes a decision engine that controls external systems.
- A voice agent that books real appointments in a real calendar
- A coding assistant that runs real tests on real code
- A customer support bot that issues real refunds through a real payment system
- A healthcare agent that sends real lab results via real SMS
- A trading assistant that places real orders on a real exchange
The LLM's job shifts from "generate text" to "decide which action to take and with what parameters." The text it generates is often secondary to the tool calls it makes.
This is the foundation of the "AI agent" concept. An agent is an LLM in a loop, with access to tools, making decisions about which tools to call based on the current context and its objectives.
The tools are the hands. The LLM is the brain. The loop is what makes it an agent.
Practical Considerations
Security
The LLM decides what tools to call and with what parameters. If your tool list includes delete_patient_record and the model hallucinates the wrong patient ID, a real record gets deleted.
Cost
Each tool call round-trip means re-sending the entire conversation context to the LLM. If your context is 10,000 tokens and the agent makes 5 tool calls, you're processing 50,000+ input tokens. At GPT-4o rates, that's $0.125 for one conversation turn.
For voice AI, where latency matters and calls happen at volume, tool call costs add up. Optimizations: cache frequent lookups, pre-fetch likely information, use cheaper models (GPT-4o-mini) for simple tool decisions.
Reliability
LLMs sometimes:
- Call the wrong tool (misidentify intent)
- Pass wrong parameters (hallucinate a patient ID)
- Get stuck in loops (tool fails, model retries indefinitely)
- Skip a tool call that's needed (decide to generate text instead of looking up information)
The Complete Picture
Four posts on LLMs:
- How they work: Tokenization, embedding, attention, next-token prediction. The mechanism.
- Why they fail: Hallucination, context limits, memory loss. The constraints.
- What reasoning means: Chain of thought, test-time compute, thinking tokens. The scaling trick.
- Tool calling: Function execution, agentic loops, structured output. The hands.
The mechanism produces text. The constraints define where it breaks. The reasoning extends what it can solve. The tools extend what it can do.
Together, they're the building blocks of every AI application being built today. Voice agents, coding assistants, customer support bots, medical triage systems, trading interfaces. Different combinations of the same four building blocks.
Understanding them is understanding the foundation. Everything else is engineering on top.
This is post 8 of the AI Engineering Explained series.
Next post: How Image Generation Works. From noise to art — diffusion models, latent space, and why AI still can't draw hands.