So, my last article was relevant for about 3 days.

Shortly after musing on how Hide and Seek was AR’s greatest hurdle, Niantic, the developer of Pokemon Go, revealed an Occlusion mode in it’s AR engine. Pikachu is dodging around people, behind bushes. Pikachu is playing Hide and Seek. Bravo!

In 2017, a team of scientists from University College London specializing in the fields of Computer Vision and Machine Learning formed Matrix Mill, to create “machines that think around occlusions.” A year later, they joined Niantic with a working model of this technology.

What is this dark magic? One of the main ingredients of computer vision are convolutional neural networks (CNNs). Think of them as an elaborate method of connect-the-dots, a way for the computer to infer and rebuild what it’s seeing through its camera.

Deep CNNs work by consecutively modeling small pieces of information and combining them deeper in network. One way to understand them is that the first layer will try to detect edges and form templates for edge detection. Then subsequent layers will try to combine them into simpler shapes and eventually into templates of different object positions, illumination, scales, etc. The final layers will match an input image with all the templates and the final prediction is like a weighted sum of all of them. So, deep CNNs are able to model complex variations and behaviour giving highly accurate predictions (1).

Visual Effects artists have long been working with neural networks without realizing it. They are integral to image analysis techniques such as 3D motion tracking, advanced motion blur, and time remapping. Photogrammetry, the technique of building a 3D model from several photographs, is the perfect example of utilizing CNNs.

I see photogrammetry as the static form of what Niantic and Matrix Mill are doing with their Real World AR Occlusion. The input video for the game is probably requiring a bit of scene analysis to detect a ground plane and effectively build up a rough 3D model of the scene. Computer Vision, an advanced method that can anticipate what will happen within a scene based on context, would be used (in the case of the Niantic demo) to handle a person walking through the frame in a public area.

A Computer Vision evaluated scene. Numbers are percentage of accuracy.

What’s particularly amazing about the Niantic Real World Occlusion prototype is that it’s working in real-time, on a mobile device. Computer vision assessed scenes are typically post-processed - what we’re experiencing is live masking in real-time, all while continuing to lock to a ground plane AND rendering CG models with active lighting.

The occlusion prototype is in its infancy, but is showing great potential to break down the barrier between augmented and mixed reality. With photography, the best camera is the one that’s always in your hand - the same will hold true with the burgeoning field of extended reality.

Aarshay Jain, Deep Learning for Computer Vision – Introduction to Convolution Neural Networks

The other day I was asked to explain the difference between augmented reality (AR) and mixed reality (MR). I had just given a presentation on the topic and clearly botched communicating the difference between the two, before spiralling into the inevitable log jam of virtual and extended reality. Big mistake.

What I've found is, you can’t play hide and seek with content in AR. That’s it, there’s no way around it. A familiar character can help demonstrate this:

What’s happening here? In the image on the left, the AR camera is successfully tracking the scene and properly inserting the object in terms of perspective and lighting. But the camera can’t “see” the tree in terms of depth. All it can really do is detect a collection of points of contrast in the image raster, and when it finds enough to build a planar grid, it can build a virtual ground plane and lock to the target. All in real-time, at 60 frames per second.

On the right, the Mixed Reality camera is assisted by a depth scanner, sending light rays or lasers to read the depth of your environment, and literally reconstructing the scene in 3D so it can composite the virtual object back into the real-world, behind the tree. Still in real-time, at 60 frames per second. That’s a lot. Trust me. As a 3D artist, the potential for mixed reality is mind-blowing. Holograms! In-camera previsualization! Real-time film-making!

Currently, there are only a couple of consumer devices that can do mixed reality: the Microsoft Hololens and the yet-to-be-released Magic Leap One. Both are packing a lot of tech, and a little pricey. Which will probably keep MR at bay for the near future. Thankfully we can expect the technology to improve in quality and costs to lower.

So, what about AR? Here I’m drawing a line in the sand in terms of fundamental definition. Will this limitation be the undoing of the technology? Nope - it’s actually helping move things forward. Apple’s release of ARKit and ARCore by Google have opened up the playing field any hobbyist with a modern device. These individuals, along with the seasoned developers, are pushing boundaries both in terms challenges for the medium and solutions to them. AR on mobile devices also has an advantage - the visual rendering quality on mobile devices is superior in terms of resolution.

There are workarounds - clever animation techniques to allow for characters to reveal themselves. For instance, in Pokemon Go, a character will emerge from virtual shrubs. Or pre-configured spaces (such as a gallery) can be loaded into the virtual scene, recreating the real-world environment. In theory, using geolocation to augment a national monument or city skyline, along with some image recognition, should be possible.

Of course in certain situations, such as wide open spaces, the lack of a depth camera is a non-issue. There might also be a way around the hardware required for mixed reality as well, or at least a decent fake. There are non real-time software techniques such as photogrammetry where the camera can reconstruct a scene when provided with an array of angles. Will this exist on your phone someday? Probably. I can guarantee it won't be as effective as a hardware based, true mixed reality experience. But adding this ability will open a lot of doors.

For me, I’m eagerly anticipating a game of hide and seek on a mobile device. That will be a killer app that will open a trove of possibilities: treasure hunts, laser tag, and first person shooters. The genre of Horror. The element of surprise integrated into the real world will make mixed reality a must-have experience.

BLOG

Neural Networks: Magic Little Elves.

Hide and Seek: AR’s greatest hurdle.