On September 22, we published our second step into the world of Augmented Reality, and it's a big one.  (You can check out the story here, and the AR feature is viewable in the iOS Classic app, which you can download here.)  I wanted to take a step back and outline the decisions we made, how we see the AR landscape, and what we're doing now and in the future to participate.

Why AR

It was clear to us, from the beginning, that AR was different than a lot of the other flash-in-the-pan tech projects we try out.  It wasn’t just that it didn’t require fancy new hardware, or that it was a step away from the incredibly popular selfie-filters on Snapchat, or the participation of the biggest players in consumer tech in the field.  What AR represented seemed particularly appropriate for journalism.

Journalism, at its core, is about context.  Journalists don’t have a monopoly on the core mission to inform the public, but our focus is on informing the public about what’s going on right now in their world: building, informing and broadening the context on which they make decisions.  It’s the backbone of every story we tell and every question we ask.

AR is uniquely suited for this.  It’s a medium that requires the involvement of a person’s physical world, and adds data on top of it: it literally adds context to their space.  Given the right placement, stories we would tell that involve a person’s daily life become much more impactful when presented in this way.

AR can also break through a distinct issue in communication: scale.  Let’s say we’re trying to describe the height of an object: a wall, a meteor, a new building, perhaps.   It’s a newsworthy object, and we’re trying to show our readers how large this object is.  We could describe it with words, sure: as big as a football field is long, the size of twenty school buses stacked, etc. Or we could show pictures: here’s a picture of the object, with a tiny person next to it for scale.  A video could offer a drone-powered tour of the object, with the background in view.

The problem with all of these is it makes scale academic. The reader has to translate that height intellectually, and lacks the ability to understand the most amazing part about what they're being shown: the feeling of being next to something that dwarfs them.  With AR, we could place that object in front of them.  Scale is not intimated or hinted at, but fully communicated.

So, we decided to approach AR with the intention of turning it into a journalistic tool rather than a production format.  This was not something to take a story and publish an “AR version”: we would look for stories and graphics that could best, or only, be told with AR.

Publishing at Scale

This meant we couldn’t do what we normally do with new technologies like this: assemble a small team and produce a proof of concept showing how it worked, and not really think in terms of flexible, scalable tools. We had to approach this thinking “what does it look like for any writer to publish an AR graphic? How do we make this as flexible as video or images?“

Here are some requirements we knew we had to meet in order for AR to be a useful journalistic tool.

  1. Speed.  Not necessarily speed in performance (although we had a high standard in that regard), but speed in publishing.  Our graphics team needs to be able to publish this stuff quickly, and producers need to be able to embed it quickly, without a native developer on hand.  AR features need to be passed around quickly, like assets.
  2. Flexibility.  Although our general principle with regard to engineering frameworks is to get as close to the metal as we can on any given platform, as soon as editorial gets involved, the focus changes. Our writers can't write different versions of a story for every platform – why should we expect them to handle platform specific AR components?  Whatever we work with needs to abstract away the platform-specific differences as much as possible, so we can write an asset that can run anywhere; iOS, Android, and the Web.
  3. HTML5. Whatever we use should leverage HTML5 tools and be written primarily in JS. This was a hard pill to swallow for some of us, as we knew that it could take a substantial chuck out of our performance budget.   Two things influenced the decision to look for HTML renderers.  First, our graphics team, the team that produces these stories and infographics, works primarily (and almost exclusively) on the web.  High level of expertise in the language adds to speed of asset production.  Second, we looked at the roadmap for frameworks like AR.js and WebAR, and we're really optimistic on the near arrival of web-based AR experiences.  The ability to publish AR embeds in a browser makes these experiences discoverable, shareable, and available to a wider audience.  The native renderer could be whatever it needed to be, but the composition layer should be in HTML5 or we doubted it would get any traction.
  4. Performance.  We explored all the AR experiences we could find and came away with one common sentiment: marker-based tracking sucked.  Passing around QR codes or banking on an image being present seemed to be kitschy and inelegant.  The real magic came from markerless tracking, surface recognition and point cloud mapping (or, SLAM, as the industry began to refer to it).  You could pull your camera out, and the software would start looking for flat planes and surfaces by which to judge depth and on which to place objects – no QR codes needed.  This was undoubtedly the way to go: if you didn't need a QR code, no one would really want one, would they?

Even as we were having AR-related conversations with Apple and Google leading up to our first project, we knew that their respective solutions (ARKit and AR CORE, respectively) wouldn’t work for our needs, at least without substantial work on our part to abstract the native code into JS.  These were native frameworks that performed well, but they each required native development work specific to each platform and app releases for each story – not exactly flexible.

We found a winner for our first major projects with Wikitude, a vendor who proved a fantastic partner over the course of these projects (and more to come).  Wikitude provided the performance on device that we needed, and did so while being the only vendor to provide an HTML layer that they connected to their native SDK, and SLAM markerless tracking.   In addition, it works on a larger subset of iOS devices than ARKit does, supporting as far back as iPhone 5s!

Wikitude solved the HTML and Performance requirements. Now it was on us to make the production process fast and flexible.

The AR Embed

Our first go at AR required graphics work for designing and producing the story in HTML5, and native apps engineering work to integrate the Wikitude SDK, render the view and handle native events.  This meant we shipped the story as native code with an app release.  For a first project, this was good enough, but we obviously needed to come up with something better.

What we wanted to do was completely abstract away native app developers.  Graphics should produce a feature using AR, test it, and publish it in a story without a native developer needing to touch it, and without an app release.  

So, here’s what we decided to build: graphics would package together HTML assets in a zipped bundle, like a web archive.  They would then create a block inside a story that used data-attributes to tell the apps what rules to follow and what features it would need (platform isolation, native features, etc).  Our apps would then listen for this feature in the same way we listen for instructions to render a video or an image, and use the data-attributes to set up an embed and give users a graphic to tap on to begin.  Everything about entering and exiting would be controlled by native code, but everything inside the experience would ship in that bundle and be totally opaque to the app itself, save for any native functionality the app might provide via a JS shim (like opening the native share sheet or taking a screenshot).

We built it, and with iOS Classic 3.11, we shipped it to our users, about a week before our next AR story was to publish.

Next

Building the embed in this way gave us some clear next steps, which we’ll be embarking on over the next few months.  

First, Android users finally get to be in on the fun.  Android Classic should listen to the same data attributes, take advantage of our uniform publishing system, and follow the same instructions.  That way, we have the option to segment between iOS and Android, but by default, we can write one experience and it will work everywhere.

Second, although we won’t be using either directly, leveraging ARCore and ARKit is something we definitely want to do. Not only is it great for our relationships with Apple and Google, it enables us to hook on to their moving platforms for performance gains down the road.  If Apple makes a new device or updates ARKit in iOS 12, we want to get those benefits for all our projects.

Third, as always, we keep listening.  This next stage of the AR lifecycle is where UX concepts get tested and winners emerge, where users become accustomed to certain interface elements when inside an AR experience.  Finding the winners and designing intuitive user interfaces can make the difference between a good and a bad experience, and for journalists, between understanding and appreciating what we have to say and not understanding it at all.