Income Tax OCR APIs
The OCR (Optical Character Recognition) APIs automate the extraction and classification of data from tax documents. These APIs parse PDFs and images to extract structured information for tax workflows.Supported Documents
Form 16
TDS Certificate from employers. Extract salary details, tax deducted, and investment declarations.
Form 26AS
TDS statement showing all tax deductions. Extract creditable tax amounts and deductor details.
How It Works
- Upload Document: Submit PDF or image file
- Processing: System extracts and classifies data
- Get Results: Receive structured JSON with confidence scores
Related Documentation
Processing Flow
- Upload Document: Submit PDF/image with metadata
- Initial Classification: Auto-detect document type and version
- OCR Processing: Extract text and tabular data using advanced AI models
- Data Validation: Validate extracted fields against expected patterns
- Normalization: Standardize dates, numbers, and formats
- Return Results: Return structured JSON with confidence scores
Response Structure
Successful Extraction
Confidence Scores
Each extracted field includes a confidence score (0.0 to 1.0):- 0.95+: Very high confidence, can be used directly
- 0.85-0.95: High confidence, minimal validation needed
- 0.75-0.85: Moderate confidence, recommend review
- <0.75: Low confidence, manual verification recommended
Partial Extraction
If some fields cannot be extracted:Integration Patterns
Customer KYC/Income Verification
- Accept Form 16 upload from customer
- Extract income and employer details
- Verify against other sources
- Use for lending decisions or account opening
Tax Compliance Workflows
- Employee uploads Form 16 from multiple employers
- System extracts TDS details from each
- Verify total TDS matches Form 26AS (reconcile discrepancies)
- Auto-populate ITR with extracted data
- Flag any mismatches for manual review
Payroll Integration
- HR uploads employee Form 16 PDFs in bulk
- Extract salary information and TDS
- Validate against payroll records
- Reconcile any discrepancies
- Generate compliance reports
Document Archival
- Upload historical tax documents
- Extract and index searchable data
- Create audit trail of document processing
- Enable quick retrieval and verification
Async Processing Pattern
Like Calculator APIs, OCR operations follow async job-based processing:- Submit Document: POST upload request → Get
job_id - Check Status: GET job status endpoint → Receive processing progress
- Fetch Results: GET results endpoint → Download extraction results
- Processing of large documents without timeout
- Bulk processing of multiple documents
- Progress tracking for UI feedback
- Retry mechanisms for failed extractions
Error Handling & Validation
Common Issues & Solutions
| Issue | Cause | Solution |
|---|---|---|
| Low Confidence Score | Poor document quality or handwriting | Manual review or re-upload with better quality |
| Missing Fields | Document doesn’t contain expected data | Verify correct document type, check document completeness |
| Incorrect Extraction | Wrong field mapping or poor OCR | Provide feedback for model improvement, manual correction |
| Unsupported Format | Document version not recognized | Ensure document is original/official copy |
Quality Checks
- Page Count: Verify all pages uploaded (multi-page documents)
- Document Completeness: Ensure all required fields present
- Data Validity: Validate extracted dates, amounts, PAN formats
- Cross-field Validation: Verify relationships between fields (e.g., salary components sum to gross)
Best Practices
- Document Quality: Upload clear, well-lit scans or digital PDFs for best results
- Complete Documents: Ensure all pages included, especially if multi-page
- Metadata: Provide taxpayer PAN and financial year for better validation
- Confidence Thresholds: Set appropriate confidence thresholds for your use case
- Validation: Always validate extracted data, especially for high-value decisions
- Feedback: Report extraction errors to improve model accuracy over time
- Privacy: Handle sensitive documents securely, follow data privacy regulations