Set up some basic imports and variables:
import requests
import json
import time
# API configuration
BASE_URL = 'https://api.usesieve.com'
# Authentication credentials
email = "your.email@example.com"
password = "your_password"
user_type = "requester"
full_name = "Tutorial User"
registration_response = requests.post(f"{BASE_URL}/api/v1/auth/register", json={
"email": email,
"password": password,
"user_type": user_type,
"full_name": full_name
})
registration_response.json()
Implement authentication with sieve:
login_result = requests.post(f"{BASE_URL}/api/v1/auth/login", json={
"email": email,
"password": password
})
if login_result.status_code == 200:
print("Login successful!")
elif login_result.status_code != 200:
print("Login failed. Check your credentials again.")
# Save the token for future API calls
token = login_result.json().get('token')
# Set the token in your header
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'}
Let's extract specific data points from a PDF document:
pdf_request_response = requests.post(f'{BASE_URL}/api/v1/process', headers=headers, json={
"document_type": "pdf",
"metadata": {
"url": "https://www.cecafe.com.br/site/wp-content/uploads/2023/03/CECAFE-Monthly-Coffee-Report-JANUARY-2025.pdf",
'data_points': ['Coffee export volume in most recent period (USD)']
}
})
pdf_request_response.raise_for_status()
pdf_job_id = pdf_request_response.json().get("job_id")
print("job id: ", pdf_job_id)
url = f'{BASE_URL}/api/v1/status/{pdf_job_id}'
pdf_response = requests.get(url, headers=headers)
pdf_response.raise_for_status()
pdf_response.json()
print("status:", pdf_response.json().get("status"))
print("result:", pdf_response.json())
if pdf_response.json().get("status") == "completed":
print("response:", pdf_response.json().get("ai_result").get("result"))
Extract specific financial data points from SEC filings or other financial documents:
financial_request_response = requests.post(f'{BASE_URL}/api/v1/process', headers=headers, json={
"document_type": "sec",
"content": "None",
"metadata": {
'ticker': 'EBAY',
'metrics': ["Cost of goods sold"]
}
})
financial_request_response.raise_for_status()
financial_job_id = financial_request_response.json().get("job_id")
print("job id: ", financial_job_id)
url = f'{BASE_URL}/api/v1/status/{financial_job_id}'
financial_response = requests.get(url, headers=headers)
financial_response.raise_for_status()
financial_response.json()
print("status:", financial_response.json().get("status"))
if financial_response.json().get("status") == "completed":
print("response:", financial_response.json().get("ai_result").get("result"))
Map a credit card transaction description to a merchant's stock ticker:
cc_request_response = requests.post(f'{BASE_URL}/api/v1/process', headers=headers, json={
"document_type": "credit_card",
"metadata" : {
"transaction": "WHOLEFDS MKT 10259 AUSTIN TX",
'transaction_date': '2024-01-15',
'amount': 156.78,
"data_points": ["merchant stock ticker"]
}
})
cc_request_response.raise_for_status()
cc_job_id = cc_request_response.json().get("job_id")
print("job id: ", cc_job_id)
url = f'{BASE_URL}/api/v1/status/{cc_job_id}'
cc_response = requests.get(url, headers=headers)
cc_response.raise_for_status()
cc_response.json()
print("status:", cc_response.json().get("status"))
if cc_response.json().get("status") == "completed":
print("response:", cc_response.json().get("ai_result").get("result"))
Retrieve data from a specified website:
web_request_response = requests.post(
f"{BASE_URL}/api/v1/process",
headers=headers,
json={
"document_type": "website",
"metadata": {
"url": "https://barometricpressure.app/new-york",
"data_points": ["Barometric pressure in nyc"]
}
}
)
web_request_response.raise_for_status()
web_job_id = web_request_response.json().get("job_id")
print("job id: ", web_job_id)
url = f'{BASE_URL}/api/v1/status/{web_job_id}'
web_response = requests.get(url, headers=headers)
web_response.raise_for_status()
print(web_response.json())
print("status:", web_response.json().get("status"))
if web_response.json().get("status") == "completed":
print("response:", web_response.json().get("ai_result").get("result"))
Retrieve a verified earnings date for any public company:
earnings_response = requests.post(
f"{BASE_URL}/api/v1/process",
headers=headers,
json={
"document_type": "earnings_search",
"metadata": {
"company": "COST" # company name or ticker
}
}
)
earnings_job_id = earnings_response.json()["job_id"]
print(f"Job ID: {earnings_job_id}")
earnings_status_response = requests.get(
f"{BASE_URL}/api/v1/status/{earnings_job_id}",
headers=headers
)
print("status:", earnings_status_response.json().get("status"))
if earnings_status_response.json().get("status") == "completed":
print("response:", earnings_status_response.json().get("ai_result").get("result"))
This document demonstrates how we can use Sieve's simple UI to extract structured data from various document types. We'll cover how to register, login, and perform a data request.
We will cover an example use case which requests a data point from a PDF document, using Nvidia's latest 10-K report as an example.
In your browser, navigate to app.usesieve.com, then click on the Register tab. You should see the following screen.
Enter your preferred email, password, and full name. Under User Type make sure you select requester so you can submit data augmentation requests. See example below.
If you have registered, click on the Login tab and enter your username and password that you have registered. Click the Login button to access the product. After you have logged in, you should see the following screen:
Now you are ready to start using Sieve! Let's extract specific data points from a PDF document. In this example, we will extract gaming revenue from Nvidia's 10k report for the fiscal year ending January 26, 2025.
To sumbit this request, we will leave "Document Type" set to `pdf`.
Paste Nvidia's 10-K URL, shown below, into the PDF URL field.
https://s201.q4cdn.com/141608511/files/doc_financials/2025/q4/177440d5-3b32-4185-8cc8-95500a9dc783.pdf
Under Data Points to Extract, add the text `Gaming Revenue for year ending Jan 26, 2025 in USD`. Click Submit. You should see the page load for a few seconds before it confirms that the document has been submitted for processing. The example request is shown below.
Now that you have submitted a request, the data point is being identified by AI and validated by data professionals. You can watch for your result by clicking on the Request Hisory tab and pressing the Refresh button. Once the data has been retrieved, the status will be updated to complete and the data will be available under Results.
Under Data Points to Extract, add the text `Gaming Revenue for year ending Jan 26, 2025 in USD`. Click Submit. You should see the page load for a few seconds before it confirms that the document has been submitted for processing. The example request is shown below.