OpenAI function calling tutorial: building tools for GPT
A complete guide to OpenAI function calling. defining tools, handling parallel calls, streaming, and building a tool-using agent from scratch.
TL;DR: I spent two days debugging why my agent kept calling tools with wrong parameters. The problem was not the model. It was how I defined the tool schemas. This guide covers OpenAI function calling from scratch: JSON Schema, parallel calls, streaming, and the exact patterns that work.
Function calling is the single most important primitive in building AI agents. It’s what turns a chat model from a text generator into something that can do things: query databases, call APIs, send emails, compute results.
I’ve built agents using both OpenAI’s and Anthropic’s tool use APIs. Here’s my complete guide to OpenAI function calling, built from production experience rather than documentation examples.
Key takeaways:
- Function calling lets the model request structured function execution: it doesn’t execute functions itself, it asks you to do it
- Define tools as JSON Schema objects in the
toolsparameter alongside messages- Parallel function calling means the model can request multiple tools in a single response: handle them all before returning results
- Streaming with function calls works by collecting partial
tool_callsdelta chunks by index- A complete agent loop needs just OpenAI’s SDK: no frameworks required
OpenAI’s function calling documentation defines the standard for tool-use APIs: models that accept structured tool definitions and return callable function invocations. This is the most widely adopted tool-use format in the industry.
What function calling is
The name is misleading. OpenAI’s function calling doesn’t mean the model calls functions on your computer. The model outputs a structured request that says “I want to call this function with these arguments.” Your code decides whether to execute it.
The flow looks like this:
User: "What's the weather in Bengaluru?"
Model: "I should check the weather API."
↓
Model outputs: { "function": "get_weather", "args": { "location": "Bengaluru" } }
↓
Your code executes get_weather("Bengaluru") → "26°C, partly cloudy"
↓
You send the result back to the model
Model: "The weather in Bengaluru is 26°C and partly cloudy."
The model never touches your API keys, never executes code on your server. It just requests tool execution. You control what runs.
How do I define tools for OpenAI function calling?
Tools are defined as JSON Schema objects. Each tool has a name, description, and parameters schema. The description is critical: it’s how the model knows when to call the tool.
import openai
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given location. Returns temperature, conditions, humidity, and wind speed.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'Bengaluru, India' or 'San Francisco, CA'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units. Defaults to celsius for India, fahrenheit for US."
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "get_air_quality",
"description": "Get air quality index and PM2.5 data for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
Rule of thumb for descriptions: Describe when to call the function, not just what it does. A function name like get_weather is obvious. The description should clarify edge cases:
- “Call when user asks about weather, temperature, or climate conditions”
- “Call for both current conditions and short-term forecasts”
- “Does NOT support historical weather data”
This prevents the model from calling the wrong tool or calling a tool for tasks it can’t handle.
How does the basic function calling loop work?
Here’s a working agent loop from scratch: no frameworks, just OpenAI’s SDK:
import json
import openai
def agent_loop(user_input: str, tools: list, system_prompt: str = None):
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": user_input})
while True:
response = openai.responses.create(
model="gpt-4o",
input=messages,
tools=tools,
tool_choice="auto"
)
output = response.output
# Check if the model wants to call tools
if output and output[0].type == "function_call":
tool_call = output[0]
# Extract function name and arguments
func_name = tool_call.name
func_args = json.loads(tool_call.arguments)
print(f" → Calling: {func_name}({func_args})")
# Execute the function
if func_name == "get_weather":
result = get_weather(**func_args)
elif func_name == "get_air_quality":
result = get_air_quality(**func_args)
else:
result = {"error": f"Unknown function: {func_name}"}
# Add the function call and result to messages
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": tool_call.call_id,
"type": "function",
"function": {
"name": func_name,
"arguments": tool_call.arguments
}
}]
})
messages.append({
"role": "tool",
"tool_call_id": tool_call.call_id,
"content": json.dumps(result)
})
# Continue the loop: the model will use the tool result
continue
# No tool calls: return the text response
return output[0].content
This is the core pattern. The loop:
- Sends messages to the model with available tools
- If the model requests a function call, executes it and sends the result back
- If the model returns text, we’re done
I'm using the newer openai.responses.create() API here (the Responses API), which is cleaner for agent loops than the older Chat Completions API. If you're on openai.ChatCompletion.create(), the structure is similar but uses response.choices[0].message.tool_calls instead.
How does parallel function calling work?
One of the biggest improvements in recent OpenAI models is parallel function calling: the model can request multiple function calls at once. This is critical for efficiency.
When a user asks “What’s the weather and air quality in Bengaluru?”, the model can call both get_weather and get_air_quality simultaneously instead of sequentially.
def agent_loop_parallel(user_input: str, tools: list):
messages = [{"role": "user", "content": user_input}]
while True:
response = openai.responses.create(
model="gpt-4o",
input=messages,
tools=tools,
tool_choice="auto"
)
output = response.output
# Collect all function calls
function_calls = [item for item in output if item.type == "function_call"]
if function_calls:
# Execute ALL function calls (these could run in parallel)
tool_messages = []
for fc in function_calls:
func_name = fc.name
func_args = json.loads(fc.arguments)
print(f" → Calling: {func_name}({func_args})")
if func_name == "get_weather":
result = get_weather(**func_args)
elif func_name == "get_air_quality":
result = get_air_quality(**func_args)
else:
result = {"error": f"Unknown function: {func_name}"}
# Add each result to the assistant message
tool_messages.append({
"role": "tool",
"tool_call_id": fc.call_id,
"content": json.dumps(result)
})
# Add assistant message with all tool calls
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": fc.call_id,
"type": "function",
"function": {"name": fc.name, "arguments": fc.arguments}
}
for fc in function_calls
]
})
# Add all tool results
messages.extend(tool_messages)
continue
return output[0].content
The key insight: execute all parallel calls before returning to the model. The model expects to receive all results together.
For performance, I run parallel calls with concurrent.futures.ThreadPoolExecutor:
import concurrent.futures
def execute_parallel_calls(function_calls):
"""Execute multiple function calls in parallel using threads."""
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_call = {
executor.submit(execute_function, fc): fc
for fc in function_calls
}
results = []
for future in concurrent.futures.as_completed(future_to_call):
fc = future_to_call[future]
try:
result = future.result()
results.append((fc.call_id, result))
except Exception as e:
results.append((fc.call_id, {"error": str(e)}))
return results
How does streaming work with function calls?
Streaming complicates function calling because the model sends tool_calls deltas as stream chunks instead of a complete JSON object. Each chunk has an index property that groups partial arguments for the same function call.
def agent_loop_streaming(user_input: str, tools: list):
messages = [{"role": "user", "content": user_input}]
while True:
stream = openai.responses.create(
model="gpt-4o",
input=messages,
tools=tools,
tool_choice="auto",
stream=True
)
# Collect streaming chunks
text_content = ""
tool_call_deltas = {} # index → {id, function: {name, arguments}}
for event in stream:
if event.type == "response.output_text.delta":
text_content += event.delta
elif event.type == "response.function_call_arguments.delta":
idx = event.item_id
if idx not in tool_call_deltas:
tool_call_deltas[idx] = {"id": "", "name": "", "arguments": ""}
# Accumulate function call name and arguments
# (structure depends on SDK version: check your response schema)
if hasattr(event, 'name'):
tool_call_deltas[idx]["name"] += event.name
if hasattr(event, 'arguments'):
tool_call_deltas[idx]["arguments"] += event.arguments
# After streaming completes, process tool calls
if tool_call_deltas:
tool_messages = []
for call_id, delta in tool_call_deltas.items():
func_args = json.loads(delta["arguments"])
if delta["name"] == "get_weather":
result = get_weather(**func_args)
else:
result = {"error": f"Unknown function"}
tool_messages.append({
"role": "tool",
"tool_call_id": call_id,
"content": json.dumps(result)
})
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [
{"id": call_id, "type": "function",
"function": {"name": d["name"], "arguments": d["arguments"]}}
for call_id, d in tool_call_deltas.items()
]
})
messages.extend(tool_messages)
continue
return text_content
When streaming, always check the stream event type before accessing fields. Different SDK versions structure streaming events differently. I've been burnt by this twice: test your stream parsing against the actual SDK version you're using.
How do I handle errors in function calls?
Function calls fail. APIs return 500s. Network drops. Invalid arguments. Your agent needs to handle these gracefully.
def safe_execute_function(func_name: str, func_args: dict) -> dict:
"""Execute a function with error handling. Returns a result dict regardless of outcome."""
try:
if func_name == "get_weather":
return get_weather(**func_args)
elif func_name == "get_air_quality":
return get_air_quality(**func_args)
else:
return {"error": f"Unknown function: {func_name}", "success": False}
except KeyError as e:
return {"error": f"Missing required parameter: {e}", "success": False}
except TypeError as e:
return {"error": f"Invalid arguments: {e}", "success": False, "args": func_args}
except Exception as e:
return {"error": f"Function execution failed: {str(e)}", "success": False}
When a function fails, return a structured error message to the model. The model can then:
- Explain the error to the user
- Try again with corrected arguments
- Try a different approach
Models handle errors surprisingly well if you return clear error messages. I’ve had the model suggest fixes for API credential issues based on the error text alone.
How does OpenAI function calling compare to Anthropic?
I build with both providers. Here’s how they compare for function calling:
| Aspect | OpenAI | Anthropic |
|---|---|---|
| Tool definition | JSON Schema in tools parameter | JSON Schema in tools parameter |
| Response format | tool_calls array on message | content blocks with tool_use type |
| Parallel calls | Native in one response | Native in one response |
| Streaming | Delta chunks with index | Content block deltas |
| Thinking before tools | No, calls directly | Optional thinking block before tool calls |
| Error recovery | Good with clear messages | Better. Claude is more cautious about retrying |
Anthropic’s key difference: Claude can optionally think before calling tools, which produces better results for complex multi-step reasoning. OpenAI’s models tend to call tools more eagerly but also more prematurely.
I use OpenAI for simpler tool use (fetch data, compute results) and Anthropic when the agent needs to reason deeply before acting (multi-step analysis, research agents).
When function calling breaks
After months of production use, here’s what causes function calling to fail:
Ambiguous schemas. If two functions have overlapping descriptions (e.g., search_documents and search_web), the model gets confused about which to call. I’ve seen the model call search_documents when it should call search_web simply because the descriptions weren’t distinct enough.
Fix: Make descriptions mutually exclusive. “Use for searching the local document store” vs “Use for searching the internet.”
Contradictory instructions. If your system prompt says “Never make up information” but you also have a generate_report function that expects complete data, the model may refuse to call the function because it can’t satisfy both constraints.
Fix: Review your system prompt for conflicts with tool descriptions.
Missing required parameters. The model sometimes omits optional parameters it should include. Making the parameter required (in JSON Schema) forces the model to provide it but increases the chance of hallucinated values.
Fix: Accept reasonable defaults in your function implementation instead of requiring the model to provide every parameter.
Related: Best AI agent frameworks in 2026: where frameworks help and where they get in the way.
How do I build an agent from scratch with function calling?
Here’s the complete agent pattern I use for production. It’s about 80 lines of Python with no framework dependencies:
import json
import openai
from datetime import datetime
class FunctionCallingAgent:
def __init__(self, tools: list, functions: dict, model="gpt-4o", max_steps=10):
self.tools = tools
self.functions = functions # {"function_name": callable}
self.model = model
self.max_steps = max_steps
self.steps = 0
self.messages = []
def run(self, user_input: str) -> str:
self.messages = [
{"role": "system", "content": f"You are a helpful assistant. Today is {datetime.now().strftime('%Y-%m-%d')}. Use tools when needed."},
{"role": "user", "content": user_input}
]
while self.steps < self.max_steps:
self.steps += 1
response = openai.responses.create(
model=self.model,
input=self.messages,
tools=self.tools,
tool_choice="auto"
)
output = response.output
function_calls = [o for o in output if o.type == "function_call"]
if not function_calls:
return output[0].content
# Execute all tool calls
assistant_tool_calls = []
for fc in function_calls:
func = self.functions.get(fc.name)
if not func:
result = {"error": f"Unknown function: {fc.name}"}
else:
try:
args = json.loads(fc.arguments)
result = func(**args)
except Exception as e:
result = {"error": str(e)}
assistant_tool_calls.append({
"id": fc.call_id,
"type": "function",
"function": {"name": fc.name, "arguments": fc.arguments}
})
self.messages.append({
"role": "tool",
"tool_call_id": fc.call_id,
"content": json.dumps(result)
})
self.messages.append({
"role": "assistant",
"content": None,
"tool_calls": assistant_tool_calls
})
return "Agent stopped: max steps reached."
# Usage
tools = [..] # Your tool definitions
functions = {
"get_weather": get_weather,
"get_air_quality": get_air_quality,
}
agent = FunctionCallingAgent(tools, functions)
result = agent.run("What's the weather and air quality in Bengaluru?")
That’s it. No LangChain. No LangGraph. One class, 80 lines, production-ready if you add logging and error handling on top.
Function calling is the foundation. Everything else, state machines, multi-agent orchestration, monitoring, is built on top of this pattern. Master this first, and you can build anything.
FAQ
What is OpenAI function calling? Function calling is an OpenAI API feature where the model can request execution of functions you define. Instead of generating text only, the model outputs structured JSON tool call requests that your application executes and returns results for.
How do I handle parallel function calls? When the model requests multiple function calls in a single response, iterate through all tool calls, execute each one, append all results to the messages array, and send back to the API. The model processes all results together.
Does function calling work with streaming? Yes. In streaming mode, tool_calls arrive as chunks with a unique index per call. You accumulate the full function name and arguments by aggregating chunks with the same index. Once all chunks are processed, you execute the function and send the result.
How is OpenAI function calling different from Anthropic tool use? OpenAI uses JSON Schema for tool definitions and returns tool_calls as structured objects. Anthropic uses a similar schema format but requires a separate ‘thinking’ block before tool calls and handles tool results differently in the API. OpenAI’s approach is more straightforward for simple use cases.
Related Posts
- How to build your first AI agent in 2026. A step-by-step tutorial from scratch, building the core loop and tools
- AI agent error handling patterns. Retry strategies, circuit breakers, and graceful degradation for function-calling agents
- Is your agent extension working?. How to measure whether your MCP server or agent extension produces real lift vs drag
This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at hello@agenticup.dev.