Income Tax OCR APIs
OCR APIs are advanced document processing services that extract structured data from tax documents. Use these APIs to automatically parse Form 16 and Form 26AS documents for tax compliance and income verification workflows.Key Features
Form 16 Extraction
Extract salary details, TDS information, and investment declarations from employer-issued TDS certificates.
Form 26AS Extraction
Extract creditable TDS amounts, deductor details, and tax payment information from annual TDS statements.
High Accuracy
Advanced AI models with confidence scoring ensure reliable data extraction from various document formats.
Bulk Processing
Process multiple documents asynchronously with job-based processing for scalability.
How It Works
Upload Document
Submit PDF or image file with optional metadata like taxpayer PAN and financial year.
AI Processing
System automatically detects document type and extracts text, tables, and structured data.
API Categories
Form 16 APIs
Extract data from TDS certificates issued by employers.
Form 26AS APIs
Extract data from annual TDS statements from the Income Tax Department.
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/it/ocr/form-16/pdf | POST | Extract data from Form 16 PDF documents |
/it/ocr/form-26as/pdf | POST | Extract data from Form 26AS PDF documents |
Common Use Cases
- Income Verification: Extract salary and TDS details for loan applications and account opening
- Tax Compliance: Auto-populate ITR forms with extracted data from tax documents
- Payroll Reconciliation: Validate employee salary information against Form 16
- Document Archival: Index and search historical tax documents for audit purposes
- Bulk Processing: Process multiple employee documents for large organizations
Integration Examples
Basic Form 16 Extraction
Response Structure
Best Practices
- Document Quality: Upload clear, well-lit scans or digital PDFs for best extraction results
- Complete Documents: Ensure all pages are included for multi-page documents
- Metadata: Provide taxpayer PAN and financial year for better validation
- Confidence Thresholds: Set appropriate confidence thresholds based on your use case
- Validation: Always validate extracted data, especially for high-value decisions
Related Documentation
What document formats are supported?
What document formats are supported?
The APIs support PDF documents and common image formats (JPEG, PNG). For best results, use original digital PDFs rather than scanned images.
How accurate is the OCR extraction?
How accurate is the OCR extraction?
Accuracy varies by document quality and type, but typically ranges from 85-95% confidence. Each extracted field includes a confidence score for validation.
Can I process multiple documents at once?
Can I process multiple documents at once?
Yes, the APIs support async job-based processing for bulk document processing. Submit multiple documents and track progress via job status endpoints.
What happens if extraction confidence is low?
What happens if extraction confidence is low?
Fields with low confidence scores are flagged in the response. You can set minimum confidence thresholds or implement manual review workflows for critical data.
Is there a file size limit?
Is there a file size limit?
Individual documents should be under 10MB. For larger files or bulk processing, contact support for optimized processing options.