Services

Computer Vision Development Services for US & EU Industrial and Product Teams

We build computer vision systems for product, industrial, and consumer use cases — from defect detection on a production line to in-app object recognition shipping at p95 under 80 ms. YOLO v11, SAM 2, CLIP, DINOv2, and custom heads on ViT/Swin when the domain demands it. Edge deployment on Jetson and mobile NPUs, cloud serving on NVIDIA Triton, full annotation pipelines on Label Studio or Roboflow, MLOps with drift monitoring. Feasibility from 8,000 EUR, pilot from 42,000 EUR, production retainer from 14,000 EUR/month.

Computer vision projects fail in predictable ways: someone picks a model from a blog post before anyone looks at the actual frames, annotation is treated as a one-time cost rather than an ongoing investment, edge versus cloud is decided by preference instead of latency and unit economics, and nobody monitors for drift until accuracy quietly collapses in month four. We work the other direction. The first deliverable is a written model-selection memo against your real frames. Annotation is a pipeline with active learning, not a one-off job. Edge versus cloud is benchmarked, not assumed. Drift is a tracked SLO with a retraining workflow ready before launch — not a fire-drill three months later.

What we deliver in a computer vision engagement

Use-case scoping & dataset strategy

Workshop on the real business decision the model needs to support, frame sourcing plan, class taxonomy, target precision and recall by class, and a written feasibility memo with a go/no-go on the dataset before any training.

Model selection (YOLO/SAM/CLIP/custom)

Side-by-side benchmark on your real frames: YOLO v11/v8 for detection, SAM 2 for segmentation, CLIP/DINOv2 for retrieval and zero-shot, Detectron2 or custom heads when the domain demands it. Cost, latency, accuracy in writing.

Edge vs cloud deployment

Benchmark on real hardware: NVIDIA Jetson Orin, OAK-D, Coral, iOS Core ML, Android NNAPI for edge; NVIDIA Triton on T4/A10G/H100, AWS Rekognition, GCP Vision, Azure Vision for cloud. Recommendation backed by numbers.

Annotation pipelines

Foundation-model pre-labelling (SAM 2, GroundingDINO, CLIP), human-in-the-loop review in Label Studio, CVAT, or Roboflow, inter-annotator agreement tracking (Cohen kappa > 0.85), and active learning for the next batch.

MLOps & drift monitoring

Output distribution tracking, embedding-space drift via MMD/KS in CLIP or DINOv2 features, per-slice precision/recall dashboards in Grafana, MLflow experiment tracking, scheduled retraining, and documented rollback paths.

Privacy & compliance for biometric data

DPIA co-authored with your privacy team, on-device inference where feasible, hashed face templates instead of raw embeddings, age-out retention. GDPR Article 9, BIPA, CUBI, Washington H.B. 1493 covered.

Stack we use

PyTorch TensorFlow YOLO v11 YOLOv8 Detectron2 Segment Anything (SAM 2) CLIP DINOv2 OpenCV ONNX TensorRT NVIDIA Triton Roboflow CVAT Label Studio AWS Rekognition GCP Vision Azure Vision Modal Replicate MLflow

How a computer vision engagement works

  1. 01

    Feasibility

    Weeks 1–3: scoping workshop, dataset audit on your real frames, model-selection memo, edge-vs-cloud benchmark, target precision/recall per class, written delivery plan. Go/no-go before pilot.

  2. 02

    Dataset & baseline

    Weeks 4–7: annotation pipeline with foundation-model pre-labelling, golden eval set, baseline model (YOLO/SAM/CLIP/custom) trained against the dataset. Per-slice precision/recall report before iteration.

  3. 03

    Training & ablations

    Weeks 8–11: ablations on architecture, augmentation, loss, and class balance. Active learning to focus annotation on uncertain frames. TensorRT/ONNX quantization for the chosen deployment target.

  4. 04

    Deployment & monitoring

    Weeks 12–14: edge or cloud deployment, load testing, drift dashboards in Grafana, retraining workflow in MLflow, runbooks, rollback path, handover. Optional retainer for production support.

Engagement models

CV feasibility

Two to three weeks fixed. Use-case scoping, dataset audit, model-selection memo against real frames, edge-vs-cloud benchmark, written delivery plan with cost projection. Credit applied to pilot if you proceed. 8,000 EUR fixed.

CV pilot

10–14 weeks. One model, full annotation pipeline, dataset construction, training and ablations, deployment to one target (edge device or cloud endpoint), drift monitoring, runbooks, 30 days post-launch support. 42,000 EUR fixed.

Production support retainer

Drift response, periodic retraining, dataset expansion, model upgrades, additional classes or use cases, edge fleet management, on-call. One senior CV engineer plus MLE support, six-month minimum. From 14,000 EUR/month.

Pricing excludes GPU compute, annotation labour for high-volume datasets, and edge hardware — billed on your accounts directly. Typical pilot GPU spend is 3,000–9,000 EUR.

Why US & EU teams pick YuSMP for computer vision

GDPR-aligned · ISO 27001 ready · SOC 2 Type II in progress · HIPAA-capable · CCPA-acknowledged

Numbers before models

No model is chosen before we benchmark candidates on your real frames. The first deliverable is a written model-selection memo with cost, latency, and per-class accuracy — not a slide deck citing benchmarks on COCO.

Annotation is a pipeline, not a one-off

Foundation-model pre-labelling, human-in-the-loop review with inter-annotator-agreement gates, active learning for the next batch. The pipeline keeps running after launch, because drift will not pause for your roadmap.

Biometric compliance done right

DPIA co-authored before any frame is processed. On-device inference where feasible, hashed templates instead of raw embeddings, age-out retention. GDPR Article 9, BIPA, CUBI, and Washington H.B. 1493 walked through with you.

For regulated workloads we sign HIPAA BAAs, run on HIPAA-eligible regions only, and integrate with your existing DLP and data governance — not parallel to it.

Frequently asked questions

When should we use YOLO, SAM 2, CLIP, or a custom-trained model?

It comes down to the task and the data. YOLO v11 and YOLOv8 are the default for object detection and instance segmentation when you have boxes or masks; v11 is faster and more accurate, v8 has the larger ecosystem of pretrained checkpoints. SAM 2 is what we reach for when you need segmentation masks without click-level labelling, especially for video. CLIP and DINOv2 are the picks for zero-shot classification, image retrieval, and visual search. Custom training (Detectron2, MMDetection, custom heads on ViT/Swin backbones) earns its keep when the domain is far from natural images: X-rays, satellite, semiconductor wafers, microscopy. The first deliverable is always a written model-selection memo, not a chosen model.

Should the model run at the edge or in the cloud?

Latency, privacy, and unit economics decide. Edge (NVIDIA Jetson, OAK-D, Coral, mobile NPUs) wins when you need sub-100 ms response, when bandwidth is constrained, or when sending video to the cloud is a privacy or compliance non-starter. Cloud (NVIDIA Triton on GPU instances, AWS Rekognition for commodity tasks, GCP Vision, Azure Vision) wins when you need centralized model updates, when accuracy beats latency, or when devices cannot host a 200 MB model. Many production systems do both: a small detector on-device for triage, a larger model in the cloud for verification. We benchmark both paths on your real frames before recommending.

How do you handle annotation when our team does not have labelled data yet?

Three-step playbook. First, pre-label with foundation models: SAM 2 for masks, GroundingDINO for boxes, CLIP for classification, frontier VLMs (GPT-4o, Claude 3.7) for hard cases. This cuts annotation time by 60 to 80 percent. Second, human-in-the-loop review in Label Studio, CVAT, or Roboflow with an inter-annotator agreement target above 0.85 (Cohen kappa) before any frame enters training. Third, active learning: the model picks the next batch to label based on uncertainty, not random sampling. We can run the annotation team ourselves or set up the pipeline and hand it to yours.

How do you monitor a CV model in production and catch data drift?

Three signals tracked daily. First, output distribution: per-class confidence histograms, detection-count drift, mask-area drift, plotted against a seven-day baseline in Grafana. Second, input drift: embedding shift in CLIP or DINOv2 feature space using MMD or KS tests against the training set. Third, ground-truth feedback: a tunable percent of inference frames routed to human review (or to a downstream business signal that proxies for ground truth), and weekly precision/recall reports per slice. Alerts fire on threshold breach and trigger the retraining workflow in MLflow, with a documented rollback path.

What about GDPR and biometric data — can you handle face or person detection?

Yes, with the compliance work scoped in from week one. Under GDPR Article 9, biometric data is special category data: legal basis must be explicit consent, vital interest, or substantial public interest. We co-author the DPIA with your privacy team before any frame is processed. Technical safeguards include on-device inference where feasible, hashed face templates instead of raw embeddings, age-out retention, and IAM-segregated storage. For US deployments we follow BIPA (Illinois), CUBI (Texas), and Washington H.B. 1493. We are GDPR-aligned, ISO 27001 ready, SOC 2 Type II in progress, HIPAA-capable, and CCPA-acknowledged.

How long does a typical CV pilot take and what does it cost?

Feasibility is a fixed 8,000 EUR over two to three weeks: use-case scoping, dataset audit, model-selection memo, edge vs cloud benchmark on sample frames, and a written delivery plan with cost projection. A pilot — one model, dataset construction, training, and a production deployment on one channel (edge device or cloud endpoint) — is fixed 42,000 EUR over 10 to 14 weeks. Production support, drift monitoring, periodic retraining, and model upgrades run from 14,000 EUR/month with a six-month minimum. GPU compute, annotation labour, and edge hardware are billed on your accounts directly.

Have a CV use case and need a written feasibility memo first?

Book a discovery call