Lessons from building a lottery OCR

LottoBuddy started as a simple idea: take a photo of a lottery ticket, extract the numbers automatically. Sounds like a weekend project. It was not a weekend project.

The first version used a standard OCR library. It worked fine on clean, high-contrast scans. Real lottery tickets are printed on thermal paper, photographed under harsh lighting, sometimes partially crumpled, sometimes wet. The error rate was embarrassing.

What I underestimated

Lottery ticket formats differ by region, by game type, by print run. A number that looks like "8" in one font is "6" in another at 72dpi. The spacing varies. The ink bleeds. Users don't hold their phones steady.

I spent two weeks tuning image pre-processing: grayscale conversion, adaptive thresholding, morphological cleanup. Each pass improved accuracy on one subset and broke another. Classic whack-a-mole.

The shift that actually worked was training a small custom model on ticket-specific patterns rather than fighting with general-purpose OCR. Smaller data requirement, much higher precision for the specific domain.

What I'd do differently

I'd prototype with five real user photos before writing a single line of OCR code. Not five perfect scans - five photos the way users actually take them: bad angle, bad light, one thumb in the frame. Scope the problem first, build second.

The real lesson isn't OCR-specific. It's that every "simple" feature has a hidden physical world attached to it. The complexity isn't in the algorithm - it's in the variance of reality.