Measuring Mattresses from Photos Using a Reference Object

TL;DR

Four-stage computer vision pipeline measures mattress length × width from 3 smartphone photos in ~20 seconds.
Median combined accuracy of ~97% across 36 mattress sizes; replaces manual tape measurement on the factory floor.
Depth Anything v3 (DA3) generates a 3D point cloud; SAM3 segments the mattress; Hough Lines detect edges; distance is computed directly in 3D metric space.
An A4 sheet (30×21 cm) placed on the mattress acts as the scale reference for the otherwise relative DA3 depth output.
Switching corner detection from approxPolyDP to hough_lines removed a systematic underestimation bias and pushed median accuracy to ~98%.
DA3 inference is the dominant cost at ~9.7s/sample; total pipeline is ~19.8s/sample on a single GPU instance.

The Problem We Solved

Every mattress that leaves a production line needs to be measured for QC sign-off. The traditional answer — a worker with a tape measure — is slow, inconsistent, and disconnected from digital factory systems. At scale, it becomes a bottleneck.

We built a system where a worker places one A4 sheet on the mattress, takes 3 photos with a smartphone, and walks away. The pipeline runs in the background and returns length and width automatically. No tape measure. No manual entry. No contact with the product.

	Manual (Tape Measure)	This System
Time per mattress	~30–60 seconds	~25 seconds
Requires physical contact	Yes	No
Human error	Present	Eliminated
Connects to digital systems	Manual entry	Automatic
Works unattended	No	Yes
Accuracy	Operator-dependent	~97%

Background

Quality control on mattress production lines currently requires manual measurement with a tape measure — a process that is slow, inconsistent, and incompatible with digital factory management systems. We investigated whether smartphone imagery, combined with modern depth estimation and segmentation models, could replace manual measurement with acceptable accuracy.

The core difficulty is that mattress surfaces are textureless by design — uniformly white, flat fabric that provides minimal visual features for depth estimation models. No existing off-the-shelf API or model handled this reliably. We therefore composed a custom pipeline addressing each sub-problem individually.

Pipeline Overview

The system processes 3 input photographs and returns length and width in centimeters and inches.

Input: 3 smartphone photos (factory floor, overhead angle)
  │
  ├─ Stage 1  Texture Enhancement      CLAHE + Unsharp Masking
  ├─ Stage 2  3D Mesh Reconstruction   Depth Anything v3 (DA3)
  ├─ Stage 3  Top-Down Projection      Orthographic plane fitting
  └─ Stage 4  Segmentation + Measure   SAM3 + Hough Lines
  │
Output: Length × Width (cm and inches)

A printed A4 sheet (30×21 cm) placed on the mattress surface serves as a metric scale reference throughout, since DA3 produces relative rather than metric depth.

Methodology

Stage 1 — Input and Texture Enhancement

Original	Texture Enhanced

Figure 1: Left — raw original photo from the factory floor. Right — after CLAHE + Unsharp Masking: the quilted fabric pattern, stitching lines and surface relief are visibly amplified. The A4 reference sheet (bottom-right on the mattress) is used for metric scale calibration.

Plain white mattress fabric provides insufficient visual texture for DA3 to produce reliable depth estimates. We apply CLAHE (Contrast Limited Adaptive Histogram Equalization) to each input image to recover latent micro-texture — stitching patterns, quilted surface relief, edge shadows — followed by Unsharp Masking (USM) to amplify high-frequency edges. This pre-processing step is load-bearing: without it, Stage 2 consistently produces flat or noisy depth maps on uniform fabric.

Stage 2 — Depth Estimation and 3D Reconstruction

Figure 2a — Rotating 3D mesh of the mattress reconstructed by DA3

Figure 2a: Rotating 3D mesh generated by DA3 from 3 smartphone photos. The mattress surface, conveyor frame, and factory floor are all reconstructed in 3D space from images alone — no depth sensor required.

Figure 2b

Figure 2b: Left: original input photograph. Right: DA3 monocular depth map. Warm tones (orange/yellow) indicate surfaces closer to the camera; cool tones (blue/purple) indicate greater distance. The mattress surface reads as a geometrically consistent flat warm plane, distinct from the conveyor and factory floor below.

We use Depth Anything v3 (DA3) to estimate per-pixel depth from the texture-enhanced images. The resulting depth map is converted into a 3D point cloud and exported as a GLB mesh. DA3 is a monocular model and produces relative depth; metric scale is recovered in Stage 4 using the A4 reference.

This stage is the primary computational bottleneck, requiring ~10.5 seconds per run on a GPU instance. All downstream accuracy is bounded by the quality of the depth map produced here.

Stage 3 — Top-Down Orthographic Projection

Figure 3: Top-down orthographic projection of the 3D point cloud. The mattress surface texture is visible from directly above. Factory surroundings (conveyor rails, floor, rack structures) are present but spatially distinct from the mattress region.

To expose length and width, we fit a dominant plane to the mattress surface within the 3D point cloud and re-project all points onto a 2D plane aligned with that surface normal. The quality of this plane-fitting step directly determines measurement accuracy; misalignment in the projection plane is the primary source of systematic error in our current results.

Stage 4 — Segmentation

Figure 4: SAM3 segmentation output. The green overlay indicates the detected mattress region (confidence: 0.9612, area: 467,182 px). SAM3 cleanly separates the mattress from the conveyor, rack structure, and factory floor with no task-specific fine-tuning.

We use SAM3 (Segment Anything Model 3, Meta AI) with no fine-tuning to produce a binary mask of the mattress boundary from the top-down projection. Mask quality — particularly edge cleanliness — directly affects measurement precision.

Stage 4 — Measurement: Finding the Right Approach

Getting from a clean SAM3 mask to accurate centimeter measurements required more iteration than any other stage. We went through two fundamentally different approaches before arriving at the current solution.

Approach 1 — Pixel Distance Between Points (Discarded)

Our initial approach was straightforward: detect the four corners of the mattress in the top-down image, then calculate the pixel distance between opposing corner pairs, and convert to centimeters using the A4 reference sheet as a scale factor.

This worked in ideal conditions but broke down quickly in practice. Pixel distance is sensitive to camera angle, projection distortion, and any tilt in the top-down view. Small errors in the projection plane (Stage 3) caused disproportionate errors in pixel distances. We also tried minAreaRect and approxPolyDP from OpenCV — both operate purely in 2D pixel space and inherited the same sensitivity.

minAreaRect — fits minimum-area bounding rectangle to the contour. Median accuracy ~98% but with significant outlier cases.
approxPolyDP — approximates the contour as a polygon, extracts longest edges. Showed the widest variance of all approaches tested, particularly on non-rectangular projections.

Neither pixel-based method held up. The core problem: measuring in pixels and scaling back to real-world units amplifies any upstream geometric error.

The Root Cause: Why Predictions Were Systematically Smaller Than Ground Truth

While analyzing logs from early runs, we noticed a non-random pattern: ~70% of predictions were consistently smaller than the ground truth. This wasn't noise — it was a systematic bias pointing to a specific algorithmic flaw.

We pulled the annotated corner detection images and the problem became clear immediately.

approxPolyDP corner (TR)	hough_lines corner (TR)

approxPolyDP works by finding the polygon with the maximum area that fits inside the mask. The result is that its corners are placed slightly inward from the actual mattress boundary — the algorithm is optimizing for area coverage, not for edge alignment. Every corner gets pulled toward the center, and every measurement ends up shorter than reality.

hough_lines works differently — it detects the dominant straight lines along the mask edges, then computes corners as the intersection of those lines. The corner lands at the true geometric boundary of the mattress, not inside it.

approxPolyDP full — corners placed inside the true boundary

Figure 5a: approxPolyDP — corners (cyan dots) are placed slightly inside the actual mattress edge. Measurements: 202.1 × 177.4 cm.

hough_lines full — corners aligned to true edge

Figure 5b: hough_lines — corners align precisely with the detected edge lines. Measurements: 198.5 × 176.0 cm — closer to ground truth.

Switching from approxPolyDP to hough_lines resolved the systematic underestimation and pushed average accuracy from ~95.5% to ~97%.

Approach 2 — 3D Point Distance via DA3 Features (Current)

The key insight was that DA3 doesn't just produce a depth image — it produces a full 3D point cloud where every pixel has an associated (x, y, z) coordinate in metric space. Instead of measuring pixel distances in 2D and converting, we can directly compute the Euclidean distance between two 3D points in the point cloud.

We use hough_lines to detect the dominant edge lines on the SAM3 mask, identify the endpoints of those lines, look up their corresponding 3D coordinates from the DA3 point cloud, and compute the real-world distance directly:

d = sqrt((x2-x1)² + (y2-y1)² + (z2-z1)²)

This eliminates the pixel-to-cm conversion step entirely — scale is already embedded in the 3D geometry. Small positional errors in 2D no longer cascade into large metric errors, because measurement happens in 3D space.

Figure 5: hough_lines + DA3 3D point distance — 194×180 cm. Edge lines detected on the SAM3 mask; endpoints resolved to 3D coordinates from the DA3 point cloud; Euclidean distance computed directly in metric space. Green box shows the A4 reference used for DA3 scale anchoring.

Results

Batch run: 2026-05-19 | Dataset: 38 mattress sizes | Success: 36/38 | Total time: 753.7s (~19.8s per attempt)

The results validate the system for production use. At ~97% median accuracy across 36 sizes and under 20 seconds per measurement, the pipeline meets the QC throughput requirements of a live production line. The 2 pipeline failures (no result returned) are preferable to silent wrong answers — they surface immediately for re-measurement.

Algorithm Comparison

We benchmarked the three viable corner-detection algorithms on the full batch — all operating on top of the SAM3 mask and DA3 3D point cloud — to make the production choice explicit. Results across 36 successful samples:

Algorithm	Avg Long Acc	Avg Short Acc	MAE	Status
`hough_lines` + DA3 3D distance	96.63%	96.11%	6.49 cm	Production approach
`minAreaRect` + DA3 3D distance	97.10%	96.92%	5.32 cm	Competitive alternative
`approxPolyDP` + DA3 3D distance	96.18%	95.53%	7.35 cm	Discarded — systematic underestimation bias

Means are dragged below 97% by two outlier sizes (78X66X6, 81X66X6). The median per-sample accuracy is ~98% for hough_lines and ~97-98% across all three algorithms — visible in the distribution plot below. minAreaRect posts a slightly higher raw average only because it happens to avoid the 81X66X6 failure, but hough_lines has a higher median, a tighter inter-quartile range, and wins on more individual sizes. We prioritized per-case stability and the cleaner edge-aligned corners hough_lines produces, so it remains the production choice.

Accuracy Distribution by Algorithm

Figure 6: Combined accuracy distribution across 36 mattress sizes. Box shows IQR; orange line is median; whiskers extend to 1.5× IQR; open circles are outliers. hough_lines posts the highest median (~98%) and tightest IQR; minAreaRect is close behind; approxPolyDP has the widest spread.

Accuracy by Mattress Dimension

Figure 7: Per-size accuracy for minAreaRect, shown as the strongest non-production comparator. Blue bars exceed the 95% target; the single red bar is the 78X66X6 outlier. hough_lines follows a similar per-size profile but additionally drops on 81X66X6 — the one case where minAreaRect is more robust.

The per-dimension breakdown shows strong performance across the range. Failures cluster on the 66 cm width class — 78X66X6 fails on every algorithm (combined ~65–79%), and hough_lines additionally drops on 81X66X6. Since the failure on 78X66X6 is consistent across all algorithms, the root cause is upstream — most likely a distorted top-down projection at that aspect ratio. Two further sizes failed to return any measurement at all (78X36X8, 84X78X6) — these are pipeline errors, not inaccurate results.

Timing Breakdown

Phase	Total (38 attempts)	Avg per sample
Phase 1 — DA3 depth + mesh	368.0s	~9.7s
Phase 2 — SAM3 segment + measure	385.6s	~10.1s
Total	753.7s	~19.8s

At ~20 seconds per mattress, the pipeline is fast enough to run in parallel with a moving production line — a meaningful improvement over earlier prototypes and well within the throughput window of a factory QC station.

Challenges Encountered

Textureless surfaces. The fundamental difficulty of this problem — not solvable by any single model — is the lack of visual texture on white mattress fabric. Our CLAHE + USM pre-processing is a practical workaround, not a principled solution. A depth sensor (structured light or ToF) would eliminate this constraint entirely.

No existing solution to adapt. This problem does not map cleanly onto any standard CV benchmark task. Every component we tested off-the-shelf failed in some material way. The system works through careful composition of purpose-built stages, not through any single powerful model.

Outlier sizes in the 66 cm width class. 78X66X6 fails across every algorithm (combined ~65–79%), and hough_lines additionally drops on 81X66X6. The cross-algorithm consistency on 78X66X6 points to an upstream root cause — likely the top-down projection producing a distorted view at that aspect ratio. Two further sizes (78X36X8, 84X78X6) returned None from the measurement stage, indicating a segmentation failure rather than an inaccurate measurement.

Limitations

Benchmark covers 36 mattress sizes; production inventory contains 200+. Results may not generalize to all sizes.
Average inference time is ~19.8s per mattress (Phase 1: ~9.7s, Phase 2: ~10.1s). This is viable for batch QC but requires further reduction for real-time line integration.
The A4 scale reference must be placed on the mattress surface before imaging. This adds a manual step to the current workflow.
All results reported are from a single benchmark run (2026-05-19). Cross-run variance has not been fully characterized.
approxPolyDP was evaluated and discarded due to systematic underestimation; minAreaRect is competitive on average but less stable per-case than hough_lines.

Current Status

The pipeline is deployed as a Streamlit application on a GPU inference server and is actively used for measurement validation. Key open items:

Root-cause analysis for the 66 cm width outliers (78X66X6, 81X66X6) and None results on 78X36X8, 84X78X6
Inference latency is now ~19.8s/sample; target is <10s for line integration
Benchmark expansion to full production mattress size inventory (currently 36/200+ sizes covered)

Technical Stack

Component	Technology
Depth estimation	Depth Anything v3 (DA3)
3D mesh	Open3D, trimesh
Segmentation	SAM3 (Meta AI, no fine-tuning)
Texture enhancement	OpenCV CLAHE + Unsharp Masking
Measurement	OpenCV `HoughLines` + DA3 3D point distance
Scale calibration	Printed A4 fiducial reference (30×21 cm)
Application	Streamlit
Infrastructure	AWS GPU instance (g-series)