Building a Semi-Production-Ready AR Object Detection App with Real-Time Video Overlays in iOS

When AR Meets Metallica: Real-Time Object Detection with Video Overlays

Introduction

I was sitting at my desk one morning, Metallica blasting, coffee in hand. My mug — bright yellow with the Metallica logo — stared back at me. And that’s when the thought hit: what if my iPhone could recognize this mug and instantly start playing a Metallica video right on top of it? Like, a personal AR concert every time I poured coffee.

That’s the spark that kicked off this project. The mission: build an iOS AR app that detects an object in real-time and overlays contextual video content. Not just as a flashy demo, but something that feels production-ready. That means:

Sustained 30+ FPS
Memory footprint under 200MB
Multiple overlays supported without lag

The tools: ARKit for rendering, Vision + Core ML for object detection, and a soup of Swift, SwiftUI, and UIKit to stitch everything together.

And because I love to push things, I wired in adaptive performance management, intelligent caching, and even circuit breaker patterns for ML inference. This post is a tour through that journey, from the first idea to a system that actually holds up under load. Code’s on GitHub — you’ll need your own ML model and video, but the scaffolding is all here.

Here’s a taste of the balancing act:

struct Config {
    static let maxConcurrentDetections = 5
    static let processingInterval: CFTimeInterval = 0.3  // Sweet spot for real-time
    static let videoPlaneWidth: Float = 0.4             // AR overlay dimensions
    static let maxDetectionAge: TimeInterval = 30.0     // Detection lifecycle
}

Coffee, AR, and Metallica. Let’s get into it.

Architecture Overview

When you start mixing AR and ML, complexity can explode quickly. I broke it down into layers, each with a clear job:

The Core Band Members

ARViewController: The frontman — orchestrates the whole show
VideoOverlayManager: The drummer — keeps the beat with resource management
ARState: The bassist — holds everything together
Performance Managers: The sound engineers — constantly tuning for optimal output

Why This Architecture Doesn’t Suck

Protocol-based design (swap components like changing guitar strings)
Separation of concerns (each component does one thing well)
Reactive data flow (no polling, no waste)
Testability baked in from day one

Protocol-Driven Design:

protocol PerformanceMonitorProtocol: AnyObject {
    var currentFPS: Double { get }
    var memoryUsageMB: Double { get }
    func startMonitoring()
    func stopMonitoring()
}

protocol LoggerProtocol {
    func info(_ message: String)
    func error(_ message: String, error: Error?)
    func debug(_ message: String)
}

Keeps things testable without turning into one monster God Class.

Real-Time Object Detection Pipeline

The detection loop is where things get messy if you’re not careful. Running a Core ML model on every single frame? That’ll melt your phone faster than a Metallica solo. The trick is throttling and smart scheduling.

private func processCurrentFrame() {
    guard let frame = sceneView.session.currentFrame,
          !isProcessing else { return }

  let now = CACurrentMediaTime()
    guard now - lastProcessTime >= Config.processingInterval else { return }
    isProcessing = true
    lastProcessTime = now
    visionQueue.async { [weak self] in
        guard let self else { return }
        let request = VNCoreMLRequest(model: self.visionModel) { req, _ in
            self.handleDetectionResults(req.results)
        }
        request.imageCropAndScaleOption = .scaleFill
        request.usesCPUOnly = false // lean on GPU/Neural Engine
        self.mlCircuitBreaker.executeWithProtection {
            let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage)
            try handler.perform([request])
        }
    }
}

Notice the processingInterval. At ~0.3s per frame, the model keeps pace without frying the hardware. Anything tighter felt jittery. Anything looser made the detection lag.

Filtering matters too — we don’t want low-confidence junk flooding the pipeline:

private func filterDetections(_ observations: [VNClassificationObservation]) -> [DetectionInfo] {
    return observations
        .filter { $0.confidence >= confidenceThreshold }
        .prefix(Config.maxConcurrentDetections)
        .map { observation in
            DetectionInfo(
                identifier: observation.identifier,
                confidence: observation.confidence,
                timestamp: CACurrentMediaTime(),
                hasVideoOverlay: videoOverlayManager.hasVideo(for: observation.identifier)
            )
        }
}

The result: stable detections, smoothed over time, with bounding boxes tracked across frames.

Advanced Video Overlay Management

Once detection is stable, the fun begins — attaching videos to objects in AR space. But you can’t just spawn AVPlayerinstances like candy. That’ll chew through memory. Instead, I built an LRU cache that keeps the last five overlays hot.

class VideoOverlayManager {
    private var playerCache: [String: AVPlayer] = [:]
    private let maxCacheSize = 5

    private func manageCacheSize() {
        guard playerCache.count > maxCacheSize else { return }
        let sortedByUsage = playerCache.sorted { first, second in
            first.value.currentTime().seconds < second.value.currentTime().seconds
        }
        let itemsToRemove = sortedByUsage.prefix(playerCache.count - maxCacheSize)
        itemsToRemove.forEach { key, player in
            player.pause()
            player.replaceCurrentItem(with: nil)
            playerCache.removeValue(forKey: key)
        }
    }
}

For rendering, SceneKit handles the 3D planes. Each plane gets a video texture with fade-in animations for polish.

private func createVideoNode(for detection: DetectionInfo, with player: AVPlayer) -> SCNNode {
    let videoNode = SCNNode()
    let videoScene = SKScene(size: CGSize(width: 1280, height: 720))
    let videoPlayer = SKVideoNode(avPlayer: player)

    videoPlayer.position = CGPoint(x: videoScene.size.width/2, y: videoScene.size.height/2)
    videoPlayer.size = videoScene.size
    videoScene.addChild(videoPlayer)

    let plane = SCNPlane(width: Config.videoPlaneWidth, height: Config.videoPlaneHeight)
    plane.firstMaterial?.diffuse.contents = videoScene
    plane.firstMaterial?.isDoubleSided = true

    SCNTransaction.begin()
    SCNTransaction.animationDuration = 0.3
    videoNode.opacity = 1.0

    SCNTransaction.commit()
    videoNode.geometry = plane
    return videoNode
}

It feels wild — like the videos are glued right onto the mug and hanging out in your kitchen in real space.

Adaptive Performance Management

Real devices aren’t test benches. They overheat. They get low on battery. They choke on memory. That’s where adaptive performance comes in.

I built a manager that scores device health across thermal, memory, and battery, then adjusts processing intervals and video quality on the fly.

@Observable
class AdaptivePerformanceManager {
    enum PerformanceLevel: String {
        case low, normal, high, auto
        var processingInterval: TimeInterval {
            switch self {
            case .low: return 0.6
            case .normal: return 0.3
            case .high: return 0.15
            case .auto: return 0.3
            }
        }
    }

    private func calculatePerformanceScore() -> Double {
        var score = 1.0
        switch thermalState {
        case .fair: score *= 0.9
        case .serious: score *= 0.7
        case .critical: score *= 0.5
        default: break
        }
        if batteryState == .unplugged && batteryLevel < 0.2 {
            score *= 0.7
        }
        return max(0.3, min(1.0, score))
    }
}

This guy’s basically the sound engineer at the concert, adjusting knobs live so the mix doesn’t blow the speakers.

Memory Management Strategies

Memory pressure is sneaky. You’ll think things are fine until iOS nukes your app. So I built a bouncer — my MemoryManager — to kick stuff out when things get too rowdy.

class MemoryManager {
    enum MemoryPressureLevel { case normal, medium, high, critical }

    private func executeCleanupStrategy(for level: MemoryPressureLevel) {
        switch level {
        case .medium: clearDetectionsOlderThan(seconds: 20)
        case .high:
            videoOverlayManager.clearInactiveCache()
            clearDetectionsOlderThan(seconds: 10)
        case .critical:
            videoOverlayManager.keepOnlyActiveVideo()
            clearAllDetectionsExceptCurrent()
        default: break
        }
    }
}

The app stays lean, never ballooning past 200MB.

Error Recovery and Circuit Breaker Patterns

Nothing kills a demo like a crash. Instead of praying, I added circuit breakers around ML inference. If the failure rate spikes, the breaker flips open and blocks requests until the system chills out.

class MLInferenceCircuitBreaker {
    enum State { case closed, open, halfOpen }

    func executeWithProtection<T>(_ operation: () throws -> T) throws -> T {
        switch state {
        case .open: throw CircuitBreakerError.circuitOpen
        case .halfOpen:
            do {
                let result = try operation()
                recordSuccess()
                if successCount >= successThreshold { state = .closed }
                return result
            } catch { state = .open; throw error }
        case .closed:
            do { return try operation() }
            catch { state = .open; throw error }
        }
    }
}

Error recovery kicks in with retries and exponential backoff. It’s defensive programming, but it keeps the app alive.

SwiftUI and UIKit Integration

SwiftUI isn’t ready to handle AR rendering directly, so UIKit does the heavy lifting. The UI stays reactive, while AR stays performant.

@Obserbable
final class ARState{
    var detectionCount = 0
    var currentFPS: Double = 0
    var memoryUsage: Double = 0
  ... Other properties
}

The pattern feels natural: UIKit where it’s strongest, SwiftUI where it shines.

Testing and Validation Strategies

To keep things production-ready, I built a lightweight test harness. It runs performance benchmarks, memory stress tests, and recovery drills.

class PerformanceTestHarness {
    func runPerformanceSuite() async -> TestResults {
        var results = TestResults()
        results.sustainedFPS = await measureSustainedFrameRate(duration: 60)
        results.peakMemory = await measurePeakMemory { await triggerDetection() }
        results.recoveryTime = await measureRecoveryTime { simulateMemoryPressure(.critical) }
        return results
    }
}

Running this across devices (from iPhone SE to Pro Max) helped tune defaults and catch thermal throttling early.

Production Deployment Considerations

A couple of final touches for production:

Keep that binary lean → strip dead weight and squash those assets.
Privacy manifest → ML models often need to declare data usage.
Analytics → track FPS, crashes, and user sessions.

Future ideas? Cloud-based model updates, multi-user AR sessions, maybe even real-time concerts mapped to mugs worldwide. Why not?

Key Takeaways

Adaptive Performance keeps your app alive on real devices.
Defensive Programming avoids meltdown when things fail.
Memory Awareness is everything in AR+ML.
Layered Architecture makes testing and scaling sane.
User Experience First — frame rate trumps everything else.

The full pattern brings it together:

class ProductionARSystem {
    private let performanceManager = AdaptivePerformanceManager()
    private let memoryManager = MemoryManager()
    private let errorRecovery = ErrorRecoveryManager()
    private let circuitBreaker = MLInferenceCircuitBreaker()

    func process(frame: ARFrame) async {
        guard performanceManager.canProcess else { return }
        do {
            try await circuitBreaker.executeWithProtection {
                try await processFrameWithOptimizations(frame)
            }
        } catch {
            await errorRecovery.attemptRecovery(from: error)
        }
    }
}

🎯 Bonus: More Real-World iOS Survival Stories

If you’re hungry for more tips, tricks, and a few battle-tested stories from the trenches of native mobile development, swing by my collection of articles: https://medium.com/@wesleymatlock. These posts are packed with real-world solutions, some laughs, and the kind of knowledge that’s saved me from a few late-night debugging sessions. Let’s keep building apps that rock — and if you’ve got questions or stories of your own, drop me a line. I’d love to hear from you.

☕️🎸✈️ — Wes

Get The Code on GitHub

Want to try this out yourself? I’ve put the full project on GitHub so you can tinker with the AR detection pipeline, video overlay manager, and preformance tools. One thing to know: the repo doesn’t include my personal Metallica video or ML model. You’ll need. to drop in your own Code ML model and a video of your choice, but the scaffolding is ready for you.

👉 AR-Object-Detection Github