OCR Processing

The Easy Scanlate uses RapidOCR (based on PP-OCRv5) with separate Detection and Recognition engines to detect and extract text from your images. The engine uses ONNX Runtime for cross-platform compatibility. You can run OCR on all images at once (Batch OCR) or on specific, user-defined areas (Manual OCR).

1. Batch OCR

This is the primary method for automatically extracting text from all images in your project.

  1. Load your project in the Main Application Window.
  2. Click the Process OCR button in the Right Panel’s top controls.
  3. The OCR Progress Bar will show the status as images are processed.
  4. Once complete, detected text boxes will appear on the images in the Left Panel, and the extracted text will populate the Results Widget.

2. Manual OCR

Use Manual OCR to target specific areas that Batch OCR may have missed or to correct inaccurate detections.

  1. Click the Manual OCR toggle button in the Right Panel. Your cursor will change to a crosshair.
  2. In the Left Panel, click and drag to draw a rectangle around the text you want to extract.
  3. Release the mouse button. The tool will run OCR only on that selected region.
  4. The new result will appear as a text box on the image and as a new entry in the Results Widget. Manual OCR results are marked with the is_manual flag and assigned floating-point row numbers (e.g., 1.1, 2.1) to insert them between existing batch results based on their vertical position.
  5. Click the Manual OCR toggle button again to exit Manual OCR mode.

3. OCR Parameters

You can fine-tune the OCR engine’s behavior in Settings (Ctrl+,) > OCR Processing. Adjusting these settings can significantly improve accuracy and performance for your specific project.

Additionally, in Settings > General:

4. OCR Result Structure

Each OCR result is stored as a dictionary with the following fields:

FieldTypeDescription
row_numberint or floatUnique identifier for ordering. Batch OCR uses integers (1, 2, 3…). Manual OCR uses floats (1.1, 2.1…) to insert between existing results.
coordinatesList[[x, y]]Four corner points of the text bounding box: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]].
textstrThe extracted text content.
confidencefloatOCR confidence score (0.0 to 1.0).
filenamestrThe source image filename.
is_manualboolTrue if added via Manual OCR, False if from Batch OCR.
translationsdictStores translated versions of the text, keyed by profile name.
custom_styledict (optional)Applied when Auto Context Fill is enabled, sets transparent background ({'bg_color': '#00000000'}).

Results are sorted vertically (top-to-bottom) based on the minimum y-coordinate of their bounding boxes.

5. OCR Processing Pipeline

The OCR engine follows this pipeline for each image:

  1. Image Preprocessing

    • Convert to RGB mode if needed
    • Apply contrast enhancement if enabled (ocr_adjust_contrast)
    • Resize if image width exceeds threshold (ocr_resize_threshold)
  2. Detection Phase

    • Uses PP-OCRv5 Mobile Detection model (ch_PP-OCRv5_mobile_det.onnx)
    • Detects text regions in the image
  3. Recognition Phase

    • Each detected region is cropped using perspective transformation
    • Uses Korean PP-OCRv5 Recognition model (korean_PP-OCRv5_rec_mobile_infer.onnx)
    • Korean dictionary (korean_dict.txt) for text recognition
  4. Post-Processing

    • Scale coordinates back to original image dimensions (if resized)
    • Filter results by text height and confidence thresholds
    • Merge nearby text boxes based on distance threshold
    • Sort results vertically (top-to-bottom)