When AR Meets Metallica: Real-Time Object Detection with Video Overlays
Introduction
I was sitting at my desk one morning, Metallica blasting, coffee in hand. My mug — bright yellow with the Metallica logo — stared back at me. And that’s when the thought hit: what if my iPhone could recognize this mug and instantly start playing a Metallica video right on top of it? Like, a personal AR concert every time I poured coffee.
That’s the spark that kicked off this project. The mission: build an iOS AR app that detects an object in real-time and overlays contextual video content. Not just as a flashy demo, but something that feels production-ready. That means:
- Sustained 30+ FPS
- Memory footprint under 200MB
- Multiple overlays supported without lag
The tools: ARKit for rendering, Vision + Core ML for object detection, and a soup of Swift, SwiftUI, and UIKit to stitch everything together.
And because I love to push things, I wired in adaptive performance management, intelligent caching, and even circuit breaker patterns for ML inference. This post is a tour through that journey, from the first idea to a system that actually holds up under load. Code’s on GitHub — you’ll need your own ML model and video, but the scaffolding is all here.
Here’s a taste of the balancing act:
struct Config {
static let maxConcurrentDetections = 5
static let processingInterval: CFTimeInterval = 0.3 // Sweet spot for real-time
static let videoPlaneWidth: Float = 0.4 // AR overlay dimensions
static let maxDetectionAge: TimeInterval = 30.0 // Detection lifecycle
}
Coffee, AR, and Metallica. Let’s get into it.
Architecture Overview
When you start mixing AR and ML, complexity can explode quickly. I broke it down into layers, each with a clear job:
The Core Band Members
- ARViewController: The frontman — orchestrates the whole show
- VideoOverlayManager: The drummer — keeps the beat with resource management
- ARState: The bassist — holds everything together
- Performance Managers: The sound engineers — constantly tuning for optimal output
Why This Architecture Doesn’t Suck
- Protocol-based design (swap components like changing guitar strings)
- Separation of concerns (each component does one thing well)
- Reactive data flow (no polling, no waste)
- Testability baked in from day one
Protocol-Driven Design:
protocol PerformanceMonitorProtocol: AnyObject {
var currentFPS: Double { get }
var memoryUsageMB: Double { get }
func startMonitoring()
func stopMonitoring()
}
protocol LoggerProtocol {
func info(_ message: String)
func error(_ message: String, error: Error?)
func debug(_ message: String)
}
Keeps things testable without turning into one monster God Class.
Real-Time Object Detection Pipeline
The detection loop is where things get messy if you’re not careful. Running a Core ML model on every single frame? That’ll melt your phone faster than a Metallica solo. The trick is throttling and smart scheduling.
private func processCurrentFrame() {
guard let frame = sceneView.session.currentFrame,
!isProcessing else { return }
let now = CACurrentMediaTime()
guard now - lastProcessTime >= Config.processingInterval else { return }
isProcessing = true
lastProcessTime = now
visionQueue.async { [weak self] in
guard let self else { return }
let request = VNCoreMLRequest(model: self.visionModel) { req, _ in
self.handleDetectionResults(req.results)
}
request.imageCropAndScaleOption = .scaleFill
request.usesCPUOnly = false // lean on GPU/Neural Engine
self.mlCircuitBreaker.executeWithProtection {
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage)
try handler.perform([request])
}
}
}
Notice the processingInterval
. At ~0.3s per frame, the model keeps pace without frying the hardware. Anything tighter felt jittery. Anything looser made the detection lag.
Filtering matters too — we don’t want low-confidence junk flooding the pipeline:
private func filterDetections(_ observations: [VNClassificationObservation]) -> [DetectionInfo] {
return observations
.filter { $0.confidence >= confidenceThreshold }
.prefix(Config.maxConcurrentDetections)
.map { observation in
DetectionInfo(
identifier: observation.identifier,
confidence: observation.confidence,
timestamp: CACurrentMediaTime(),
hasVideoOverlay: videoOverlayManager.hasVideo(for: observation.identifier)
)
}
}
The result: stable detections, smoothed over time, with bounding boxes tracked across frames.
Advanced Video Overlay Management
Once detection is stable, the fun begins — attaching videos to objects in AR space. But you can’t just spawn AVPlayer
instances like candy. That’ll chew through memory. Instead, I built an LRU cache that keeps the last five overlays hot.
class VideoOverlayManager {
private var playerCache: [String: AVPlayer] = [:]
private let maxCacheSize = 5
private func manageCacheSize() {
guard playerCache.count > maxCacheSize else { return }
let sortedByUsage = playerCache.sorted { first, second in
first.value.currentTime().seconds < second.value.currentTime().seconds
}
let itemsToRemove = sortedByUsage.prefix(playerCache.count - maxCacheSize)
itemsToRemove.forEach { key, player in
player.pause()
player.replaceCurrentItem(with: nil)
playerCache.removeValue(forKey: key)
}
}
}
For rendering, SceneKit handles the 3D planes. Each plane gets a video texture with fade-in animations for polish.
private func createVideoNode(for detection: DetectionInfo, with player: AVPlayer) -> SCNNode {
let videoNode = SCNNode()
let videoScene = SKScene(size: CGSize(width: 1280, height: 720))
let videoPlayer = SKVideoNode(avPlayer: player)
videoPlayer.position = CGPoint(x: videoScene.size.width/2, y: videoScene.size.height/2)
videoPlayer.size = videoScene.size
videoScene.addChild(videoPlayer)
let plane = SCNPlane(width: Config.videoPlaneWidth, height: Config.videoPlaneHeight)
plane.firstMaterial?.diffuse.contents = videoScene
plane.firstMaterial?.isDoubleSided = true
SCNTransaction.begin()
SCNTransaction.animationDuration = 0.3
videoNode.opacity = 1.0
SCNTransaction.commit()
videoNode.geometry = plane
return videoNode
}
It feels wild — like the videos are glued right onto the mug and hanging out in your kitchen in real space.
Adaptive Performance Management
Real devices aren’t test benches. They overheat. They get low on battery. They choke on memory. That’s where adaptive performance comes in.
I built a manager that scores device health across thermal, memory, and battery, then adjusts processing intervals and video quality on the fly.
@Observable
class AdaptivePerformanceManager {
enum PerformanceLevel: String {
case low, normal, high, auto
var processingInterval: TimeInterval {
switch self {
case .low: return 0.6
case .normal: return 0.3
case .high: return 0.15
case .auto: return 0.3
}
}
}
private func calculatePerformanceScore() -> Double {
var score = 1.0
switch thermalState {
case .fair: score *= 0.9
case .serious: score *= 0.7
case .critical: score *= 0.5
default: break
}
if batteryState == .unplugged && batteryLevel < 0.2 {
score *= 0.7
}
return max(0.3, min(1.0, score))
}
}
This guy’s basically the sound engineer at the concert, adjusting knobs live so the mix doesn’t blow the speakers.
Memory Management Strategies
Memory pressure is sneaky. You’ll think things are fine until iOS nukes your app. So I built a bouncer — my MemoryManager
— to kick stuff out when things get too rowdy.
class MemoryManager {
enum MemoryPressureLevel { case normal, medium, high, critical }
private func executeCleanupStrategy(for level: MemoryPressureLevel) {
switch level {
case .medium: clearDetectionsOlderThan(seconds: 20)
case .high:
videoOverlayManager.clearInactiveCache()
clearDetectionsOlderThan(seconds: 10)
case .critical:
videoOverlayManager.keepOnlyActiveVideo()
clearAllDetectionsExceptCurrent()
default: break
}
}
}
The app stays lean, never ballooning past 200MB.
Error Recovery and Circuit Breaker Patterns
Nothing kills a demo like a crash. Instead of praying, I added circuit breakers around ML inference. If the failure rate spikes, the breaker flips open and blocks requests until the system chills out.
class MLInferenceCircuitBreaker {
enum State { case closed, open, halfOpen }
func executeWithProtection<T>(_ operation: () throws -> T) throws -> T {
switch state {
case .open: throw CircuitBreakerError.circuitOpen
case .halfOpen:
do {
let result = try operation()
recordSuccess()
if successCount >= successThreshold { state = .closed }
return result
} catch { state = .open; throw error }
case .closed:
do { return try operation() }
catch { state = .open; throw error }
}
}
}
Error recovery kicks in with retries and exponential backoff. It’s defensive programming, but it keeps the app alive.
SwiftUI and UIKit Integration
SwiftUI isn’t ready to handle AR rendering directly, so UIKit does the heavy lifting. The UI stays reactive, while AR stays performant.
@Obserbable
final class ARState{
var detectionCount = 0
var currentFPS: Double = 0
var memoryUsage: Double = 0
... Other properties
}
The pattern feels natural: UIKit where it’s strongest, SwiftUI where it shines.
Testing and Validation Strategies
To keep things production-ready, I built a lightweight test harness. It runs performance benchmarks, memory stress tests, and recovery drills.
class PerformanceTestHarness {
func runPerformanceSuite() async -> TestResults {
var results = TestResults()
results.sustainedFPS = await measureSustainedFrameRate(duration: 60)
results.peakMemory = await measurePeakMemory { await triggerDetection() }
results.recoveryTime = await measureRecoveryTime { simulateMemoryPressure(.critical) }
return results
}
}
Running this across devices (from iPhone SE to Pro Max) helped tune defaults and catch thermal throttling early.
Production Deployment Considerations
A couple of final touches for production:
- Keep that binary lean → strip dead weight and squash those assets.
- Privacy manifest → ML models often need to declare data usage.
- Analytics → track FPS, crashes, and user sessions.
Future ideas? Cloud-based model updates, multi-user AR sessions, maybe even real-time concerts mapped to mugs worldwide. Why not?
Key Takeaways
- Adaptive Performance keeps your app alive on real devices.
- Defensive Programming avoids meltdown when things fail.
- Memory Awareness is everything in AR+ML.
- Layered Architecture makes testing and scaling sane.
- User Experience First — frame rate trumps everything else.
The full pattern brings it together:
class ProductionARSystem {
private let performanceManager = AdaptivePerformanceManager()
private let memoryManager = MemoryManager()
private let errorRecovery = ErrorRecoveryManager()
private let circuitBreaker = MLInferenceCircuitBreaker()
func process(frame: ARFrame) async {
guard performanceManager.canProcess else { return }
do {
try await circuitBreaker.executeWithProtection {
try await processFrameWithOptimizations(frame)
}
} catch {
await errorRecovery.attemptRecovery(from: error)
}
}
}
🎯 Bonus: More Real-World iOS Survival Stories
If you’re hungry for more tips, tricks, and a few battle-tested stories from the trenches of native mobile development, swing by my collection of articles: https://medium.com/@wesleymatlock. These posts are packed with real-world solutions, some laughs, and the kind of knowledge that’s saved me from a few late-night debugging sessions. Let’s keep building apps that rock — and if you’ve got questions or stories of your own, drop me a line. I’d love to hear from you.
☕️🎸✈️ — Wes
Get The Code on GitHub
Want to try this out yourself? I’ve put the full project on GitHub so you can tinker with the AR detection pipeline, video overlay manager, and preformance tools. One thing to know: the repo doesn’t include my personal Metallica video or ML model. You’ll need. to drop in your own Code ML model and a video of your choice, but the scaffolding is ready for you.