Most of the time you already have an API — you built or subscribed to it in the dashboard — and you just want to call it. That’s the first example. The second shows how to create an API from code, for when you want the whole flow automated.
Set your key first:
export PARSE_API_KEY="pmx_your_key_here"
Using an API
Say you have an arxiv.org API with three endpoints:
| Method | Endpoint | Parameters |
|---|
GET | search_papers | query, author, title, category, sort_by, start |
GET | get_paper | arxiv_id |
GET | get_category_taxonomy | — |
You call each one at https://api.parse.bot/scraper/{scraper_id}/{endpoint_name}. GET endpoints take query-string params; the response is the endpoint’s own JSON. Grab your scraper_id from the API’s page in the dashboard (or the “Now plug it in” snippet).
import os
import httpx
SCRAPER_ID = "9380e1b0-fae2-4340-9056-3d416f86c775"
client = httpx.Client(
base_url=f"https://api.parse.bot/scraper/{SCRAPER_ID}",
headers={"X-API-Key": os.environ["PARSE_API_KEY"]},
timeout=60,
)
def call(endpoint: str, **params):
r = client.get(f"/{endpoint}", params=params)
r.raise_for_status() # raises on 4xx/5xx — see the Errors guide
return r.json()
# Search for papers
results = call("search_papers", query="diffusion models", sort_by="relevance")
print(results)
# Fetch one paper's metadata
paper = call("get_paper", arxiv_id="2301.00001")
print(paper)
# Browse the category taxonomy (no params)
taxonomy = call("get_category_taxonomy")
print(taxonomy)
Some endpoints are POST instead of GET — send those params in a JSON body (-d '{"page": 1}' / json={...} / body: JSON.stringify(...)) instead of the query string. The endpoint’s page in the dashboard tells you which method it uses.
Creating an API from code
This is the automated build flow: submit a URL, poll until it’s ready, then call it. Use it when you want to spin up APIs programmatically rather than in the dashboard.
import os
import time
import httpx
BASE = "https://api.parse.bot"
client = httpx.Client(
base_url=BASE,
headers={"X-API-Key": os.environ["PARSE_API_KEY"]},
timeout=60,
)
def create_api(url: str, task: str | None = None) -> str:
"""Kick off a build and return the task_id."""
r = client.post("/dispatch", json={"url": url, "task": task})
r.raise_for_status()
body = r.json()
print(f"task_id={body['task_id']} matched={body['matched']}")
return body["task_id"]
def wait_for_completion(task_id: str, interval: float = 4.0, timeout: float = 300):
"""Poll until the task reaches a terminal state. Returns the generated_api."""
deadline = time.time() + timeout
while time.time() < deadline:
task = client.get(f"/dispatch/tasks/{task_id}").json()
status = task["status"]
print(f"status={status}")
if status == "completed":
return task["generated_api"]
if status == "failed":
raise RuntimeError(f"Build failed: {task.get('error')}")
if status == "cancelled":
raise RuntimeError("Build was cancelled")
if status == "needs_input":
# Answer the agent's question. Inspect task["user_input_prompt"]
# to see what it's asking; here we send a generic example.
print(f"agent needs input: {task.get('user_input_prompt')}")
client.post(f"/dispatch/{task_id}", json={
"user_response": {"search_term": "example"},
})
time.sleep(interval)
raise TimeoutError(f"Task {task_id} did not finish in {timeout}s")
if __name__ == "__main__":
task_id = create_api("https://books.toscrape.com", "get book titles, prices, and ratings")
api = wait_for_completion(task_id)
print(f"\nBuilt '{api['name']}' — {len(api['endpoints'])} endpoint(s):")
for ep in api["endpoints"]:
print(f" {ep['method']} {ep['endpoint_name']} — {ep['description']}")
# Now call it — same pattern as "Using an API" above
scraper_id = api["scraper_id"]
r = client.post(f"/scraper/{scraper_id}/get_books", json={"page": 1})
print("\nResult:", r.json())
The JavaScript equivalent follows the same shape — POST /dispatch, poll GET /dispatch/tasks/{id}, then call /scraper/{id}/{endpoint}.
Notes
- Reuse one HTTP client so connections are pooled across calls.
- Back off on
429. Honor Retry-After and the X-RateLimit-* headers.
- Check the HTTP status, not just the body. A
502 means the target site failed, a 500 means the scraper bugged out — they call for different handling. See Errors.
- The standard library’s
urllib and requests work just as well as httpx.