Manufacturing / CPG

Automated SKU Detection for PSAG's Production Line

PSAG operators classified 200+ paper product SKUs by eye on a moving production line. We shipped a real-time deep learning pipeline that replaced the manual step and raised accuracy to 99%.

Computer Vision Deep Learning MLOps Edge Deployment

85% Error reduction

200+ SKUs detected

3× Throughput gain

99.2% Detection accuracy

PSAG runs a high-mix paper products line with more than 200 active SKUs at any moment. Correct SKU identification is the hinge every downstream step depends on. Labeling, palletizing, invoicing, and warehouse routing all fail silently when a unit is misclassified upstream.

Before this engagement, classification was fully manual. A trained operator stood at the end of the line and called out SKU codes as products passed. It worked, but it was slow, inconsistent across shifts, and impossible to scale. Every SKU catalog refresh forced weeks of retraining.

sesgo.ai shipped a real-time computer vision pipeline that reads each unit as it moves, classifies it against the live SKU catalog, and reports the result to the existing MES in under 50 milliseconds. The system went live in 14 weeks and has been the system of record for SKU identification on the line ever since.

01 · The Challenge

A production line running on human eyes

PSAG's plant produces paper goods in tight cycles: short runs, frequent changeovers, and a catalog that evolves weekly as customers rotate promotions and seasonal SKUs. That context created a set of compounding problems off-the-shelf vision systems could not solve.

Manual SKU classification was the first bottleneck. A single trained operator could process roughly 40 units per minute before accuracy degraded. The target throughput for the line was 140 units per minute. That gap was filled by over-staffing, queueing units off-line for batch checks, and accepting a baseline mislabeling rate of around 9%, which carried downstream cost in customer complaints and returns.

The environment itself was hostile to canned computer vision. Lighting shifted across shifts, the camera enclosure picked up paper dust, and products arrived in non-deterministic orientations. More importantly, the catalog changed faster than any vendor system could keep up with. Some new SKUs differed only by a small printed detail that a generic object-detection model would miss without targeted training.

The team had tried pilot-grade solutions before. Notebook-trained models looked accurate on clean validation sets but collapsed on the line. None of them had the MLOps scaffolding to retrain, redeploy, and monitor fast enough to keep pace with the catalog. The result was a drawer of promising demos and a line still running on human eyes.

The executive ask was simple and measurable: cut the mislabeling rate by 80% or more, hit line throughput, and make the system owned by PSAG's own team within the same quarter. No black box, no vendor lock-in.

02 · Our Approach

KPI-first scoping, staged rollout, continuous learning by design

We opened the engagement with a one-week KPI-first scoping sprint. Three metrics defined the contract: error rate on a representative SKU sample, sustained throughput under real line conditions, and end-to-end inference latency on the edge. If the system could not defend those three numbers, nothing else mattered.

Before any model work started, we ran a three-week data-collection protocol on the live line. We built a lightweight labeling station next to the inspection post, instrumented the conveyor cameras, and captured a stratified sample across shifts, lighting conditions, and catalog churn. Active learning was baked in from day one: the labeling queue prioritized the units the evolving model was least confident about.

Model choice was a deliberate trade-off. We picked a YOLO-style single-stage detector (fine-tuned YOLOv8) because it hit the latency budget on the target hardware without sacrificing accuracy on the long-tail SKUs. Two-stage detectors tested higher on clean benchmarks but failed the 50ms latency ceiling under realistic load.

KPI-first scoping Error rate, sustained throughput, and inference latency locked as the three contract numbers before any model was trained.
Data collection on the real line Three-week in-situ labeling protocol with active learning. The model queued its own uncertain cases for human review.
Edge inference on industrial hardware TensorRT-optimized deployment on NVIDIA Jetson nodes, sub-50ms budget, isolated network for line-floor reliability.
Continuous learning loop Operator confirmations flow back into weekly retraining. Catalog refreshes are absorbed in days, not weeks.

The rollout itself was staged to avoid the classic "big bang" failure pattern. For the first two weeks the system ran in shadow mode, scoring every unit but keeping the human operator as the source of truth. The next two weeks it ran as an advisor, its classification visible to the operator and confirmed or corrected in a click. Only after three weeks of advisor mode with measured accuracy above threshold did the system graduate to autonomous operation, with the operator shifting to exception handling and QA coaching.

03 · The Solution

Four cameras, four edge nodes, one weekly learning loop

The deployed system is a closed loop between the line and the training environment. Four industrial cameras cover the inspection post from complementary angles. Each camera is paired with an NVIDIA Jetson edge node running a TensorRT-optimized build of the detection model. Classification is local, sub-50ms, and resilient to network outages.

Detections stream into a results queue that feeds two downstream consumers. The first is the production reporting dashboard, where line supervisors see SKU counts, accuracy, and exception flags in real time. The second is the feedback loop, the mechanism that keeps the system from degrading as the catalog evolves.

               PRODUCTION LINE                           LEARNING LOOP

   +-----------+    +---------------+    +-----------+
   |  Cameras  |--->|    Edge       |--->|  Results  |-----+
   |  (x4)     |    |  Inference    |    |   Queue   |     |
   |           |    | (TensorRT)    |    |           |     |
   +-----------+    +-------+-------+    +-----+-----+     |
                            |                  |           |
                            |                  v           v
                            |           +-------------+  +----------+
                            |           |  Dashboard  |  | Feedback |
                            |           |  (Grafana)  |  | Capture  |
                            |           +-------------+  +----+-----+
                            |                                 |
                            |                                 v
                            |                          +--------------+
                            |                          |  Labeling    |
                            |                          |  Station     |
                            |                          +------+-------+
                            |                                 |
                            |                                 v
                            |                          +--------------+
                            |                          |  Weekly      |
                            |                          |  Training    |
                            |                          |  (Airflow)   |
                            |                          +------+-------+
                            |                                 |
                            |                                 v
                            |                          +--------------+
                            +<-------------------------+   Model      |
                              Deploy updated weights   |   Registry   |
                                                       |   (MLflow)   |
                                                       +--------------+

Edge inference closes the loop with a weekly retraining pipeline. Catalog refreshes propagate to the line within days.

The feedback loop is the piece that makes the system durable. Every time an operator corrects a detection, the correction is logged with the source frame and pushed to the labeling station. An Airflow DAG runs weekly: it combines new labels with the existing training set, retrains the model, runs a regression eval against a held-out set, and, if the regression passes, pushes the new weights to the MLflow model registry. The edge nodes pull the latest registered model automatically.

Observability was treated as a first-class requirement, not a bolt-on. A Grafana dashboard exposes per-SKU precision and recall, per-camera throughput, and drift indicators against a reference distribution. PSAG's ops team gets the same view we do, and alerts fire to their on-call rotation, not to ours.

04 · Results

Production-grade accuracy, real throughput, a team that owns the system

85% Error reduction

Mislabeling rate dropped from ~9% to ~1.4% within the first full month of autonomous operation, measured against independent audits.

3× Throughput gain

Sustained line throughput went from 40 units per minute under manual inspection to 142 units per minute, above the original engineering target.

99.2% Detection accuracy

Measured against a weekly held-out sample labeled by two independent reviewers. Accuracy held steady across shifts and catalog refreshes.

45ms Average inference latency

End-to-end, camera-to-result, on the production hardware. Five milliseconds under the contract budget with headroom for model upgrades.

Beyond the headline numbers, the engagement shifted where human attention goes. Operators moved from rote classification to exception handling, QA coaching, and maintaining the labeling station, the higher-value work the team had been chasing for two years. Catalog refresh cadence fell from three weeks to four days, because new SKUs absorb into the weekly retraining cycle instead of waiting on a vendor release.

Over 12,000 units are now inspected per day by the system. The system itself produces the audit trail it once took humans to compile, and PSAG's QA function passes external inspections using that trail as the primary evidence.

ROI was defended at the CFO level within 60 days of autonomous go-live. The cost of the engagement was recovered inside one quarter on reduced mislabeling and throughput gains alone, before any credit was taken for operator time redirected or audit time saved.

They brought us from pilot to production. Computer vision running on the line, 85% error reduction, and a team that owns the system from day one. That is rare.

Technology Leadership · PSAG

Technologies deployed

PyTorch
YOLOv8
TensorRT
Docker
Kubernetes
MLflow
NVIDIA Jetson
Airflow
AWS
Postgres
Grafana

Planning a similar system?

Book a 30-minute strategy call. We will stress-test the use case and share how we would approach it before you commit.

Book a strategy call Contact us