Skip to main content

Income Tax OCR APIs

OCR APIs are advanced document processing services that extract structured data from tax documents. Use these APIs to automatically parse Form 16 and Form 26AS documents for tax compliance and income verification workflows.

Key Features

Form 16 Extraction

Extract salary details, TDS information, and investment declarations from employer-issued TDS certificates.

Form 26AS Extraction

Extract creditable TDS amounts, deductor details, and tax payment information from annual TDS statements.

High Accuracy

Advanced AI models with confidence scoring ensure reliable data extraction from various document formats.

Bulk Processing

Process multiple documents asynchronously with job-based processing for scalability.

How It Works

1

Upload Document

Submit PDF or image file with optional metadata like taxpayer PAN and financial year.
2

AI Processing

System automatically detects document type and extracts text, tables, and structured data.
3

Data Validation

Extracted information is validated against expected patterns and formats.
4

Get Results

Receive structured JSON response with confidence scores and extracted data.

API Categories

API Endpoints

EndpointMethodDescription
/it/ocr/form-16/pdfPOSTExtract data from Form 16 PDF documents
/it/ocr/form-26as/pdfPOSTExtract data from Form 26AS PDF documents

Common Use Cases

  • Income Verification: Extract salary and TDS details for loan applications and account opening
  • Tax Compliance: Auto-populate ITR forms with extracted data from tax documents
  • Payroll Reconciliation: Validate employee salary information against Form 16
  • Document Archival: Index and search historical tax documents for audit purposes
  • Bulk Processing: Process multiple employee documents for large organizations

Integration Examples

Basic Form 16 Extraction

{
  "file": "Base64 encoded PDF",
  "taxpayer_pan": "AAAPI0000A",
  "financial_year": "2024-25"
}

Response Structure

{
  "code": 200,
  "data": {
    "document_type": "form_16",
    "confidence": 0.95,
    "extraction_details": {
      "employer": {
        "name": "ABC Technologies Pvt Ltd",
        "pan": "AABCT1234K"
      },
      "employee": {
        "name": "Rajesh Kumar",
        "pan": "BXRPK5678A"
      },
      "salary": {
        "gross_salary": 2400000,
        "basic": 1000000,
        "tds_deducted": 350000
      }
    }
  }
}

Best Practices

  • Document Quality: Upload clear, well-lit scans or digital PDFs for best extraction results
  • Complete Documents: Ensure all pages are included for multi-page documents
  • Metadata: Provide taxpayer PAN and financial year for better validation
  • Confidence Thresholds: Set appropriate confidence thresholds based on your use case
  • Validation: Always validate extracted data, especially for high-value decisions
The APIs support PDF documents and common image formats (JPEG, PNG). For best results, use original digital PDFs rather than scanned images.
Accuracy varies by document quality and type, but typically ranges from 85-95% confidence. Each extracted field includes a confidence score for validation.
Yes, the APIs support async job-based processing for bulk document processing. Submit multiple documents and track progress via job status endpoints.
Fields with low confidence scores are flagged in the response. You can set minimum confidence thresholds or implement manual review workflows for critical data.
Individual documents should be under 10MB. For larger files or bulk processing, contact support for optimized processing options.