A Picture Is Worth A Thousand Equations: A Technological Examination of a Screenshot From Ni No Kuni 2

Let’s talk about this image:

A blond young man holds his sword to the camera. In the background is a table, a marble floor, and a door.All rights to this image and the game Ni No Kuni 2 belong to Level 5 and Namco Bandai, and it is used here to discuss the underlying graphics technologies behind the game

This is from a game released this year called Ni no Kuni II: Revenant Kingdom. It is a Japanese RPG developed by Level-5, which explains the art-style influences. The image is meant to invoke the feeling of a Japenese Animation House, specifically Studio Ghibli, the studio behind Spirited Away, Howl’s Moving Castle, Castle in the Sky, Nausicaä of the Valley of the Wind, and Ponyo, among others. They have a reputation for creating polished, detailed, and imaginative 2D animated movies.

Ni no Kuni II: Revenant Kingdom is a 3D rendered video game with an art style meant to evoke Studio Ghibli’s 2D art style as closely as possible, and in many frames, as the above demonstrates, they get very very close, and there is a lot of fascinating technology behind the image they are creating here.

First of all, let’s compare images: here is a still from Spirited Away. Note the two distinct styles for the animated foreground and the static background. The background elements are all in a very painterly style. Colors are a little more muted but still vibrant. Special attention is spent on the details and the shadows, and arguably, this is a much more realistic style, despite still containing all of the fantasy and imagination the foreground does. Whereas the characters, effects, and animated objects are all done in a much more simple style, with fewer gradients and lines overall, making the image feel a lot more flat, really only using shadows to help the viewer track the location of the light source (important for staying oriented during cuts), and fleshing out the contours of characters to give them a little more definition. Colors are much more eyepopping to make them easier to track and understand as they move.

To duplicate this, Level-5 needed to capture two very distinct graphical styles in one graphics engine, and what they were able to accomplish, is one of the more memorable visual styles in the last decade (when it is committed to).

Let’s break down the image into sections and talk about some of the technology driving it. First, I would like to direct you the left side of the image: the marble floor between the rug and the door. You can see fairly clearly the door reflected in the floor. In previous graphics generations, the environmental artists would have probably taken one screenshot of a generic room and turned it into a texture called a cube-map that would have applied the generic one-size-fits-all reflection to every marble surface in that area of the game. This is super efficient and costs almost no additional GPU or CPU cycles to pull off, but it doesn’t reflect any dynamic objects or represent the actual environment around the reflection very well.

Before I tell you how this is accomplished, if you don’t already know, take a look at the marble floor in this next screenshot and think about what is different between the two.

Two young men sit across from each other at an ornate stone table. The room is filled with otherworldy light and a polished marble floor.

Notice how the reflection blurs significantly in this screenshot? It is at once stylish but also manages to hide one of the biggest shortcomings with the effect being utilized here. All modern video cards utilize a concept called a frame-buffer. Before we push a frame out to the screen, we store all of our pixel data in a buffer so that we can run post-processing filters on it—calculations that don’t require a full 3D calculation to pull off. One of the things we can do is when we save this pixel data, in addition to color and position, we can store a z-position for that pixel, giving us a rough approximation of how forward or back that pixel is in comparison to those around it. By using that z-buffer data, we can calculate what pixels in our frame buffer a ray from the viewer’s position would encounter after bouncing off a surface marked as reflective in the game engine. This allows the engine to do a relatively complex real-time reflection effect at the fraction of the cost that other real-time methods can allow because we can work entirely off of data already generated by our 3D render, and it allows it to be done at whatever resolution the initial render was done at. Other methods involve rendering a scene twice once from the viewer’s perspective, and one more time from the perspective of the reflective surface, as the viewer could see it. This produces more accurate results but usually costs at least as many resources as it took to render the scene the first time, or it is done at a much lower resolution to limit that cost. So, from that statement, you can probably already tell that this method has some serious limitations. The largest by far is that because we are working with 3D data and a 2D representation of that data, we don’t have any idea of what the wall looks like behind that table. It’s not in our frame buffer, so we can only guess. To get around this, the artists at Level-5 chose to do a strong directional blur. This adds to the almost ethereal feeling of this particular scene and those like it, but it also hides the fact that the table is rendered in the reflection where it realistically shouldn’t be.

This is a method called screen space reflections: it allows the artists to leverage low-cost medium accuracy dynamic reflective surfaces, a detail that would certainly be in our Studio Ghibli style. We then compound that extra level of detail with a texturing technique called bump-mapping. This is an older technology now and has been in common use in real-time render engines for at least a decade. By creating an alpha map of depth data, we can quickly calculate how a directional light source would cast shadows on an object. This allows high detail on objects without requiring any additional polygonal data to be provided in the models. And it continues the theme of computers being relatively slow at rendering actual 3D data, but we can calculate 2D approximations of that data very very quickly.

At this point, we should also acknowledge the texture art of the background here. In order to achieve that painterly effect, they took the obvious course of action and had their texture artists work in a painterly style. To our bump-mapped textures and Screen Space Reflections, we then add Screen Space Ambient Occlusion, which is a technique of using our same frame-buffer data from the reflections and calculating which objects are occluding each other and going around the edges and just darkening them a little bit more to make them more defined. You can see this at the bottom of the door and in the corners of the door. Notice how the texture seems to darken as one edge approaches the other? Let’s get some real-time shadows going on, probably a little sharper than we might do in a more realistic game, add in a little material alpha mapping for the silver goblet so that it is clearly understood as metallic.

All that done we have a very solid background, but now we need to shift gears entirely to get our character together. The first thing we need is something called cell shading. This is a rendering technique named after celluloid, the transparent material character animations are drawn on so that they can be laid over a background and photographed. These are also known as animation cells. To accomplish this, we will need to limit a great deal of our lighting data. Cell shading is relatively simple from this standpoint, if we are already calculating self-shadows and dynamic lighting we can then establish some light-level thresholds and instead of blending our levels together, we can simply establish a few steps. Let’s say we have 12 possible light levels. We allow levels 1-7 to be full light, no darkness applied; 8-9 will be a half-level, exactly halfway between full light and full shadow; 10 – 12 will be the darkest our shadows will ever get. This should create highly visible banding around the contours of our model and start to give us that animation cell look. Next, we need to find where the edges of our 3d object are, any positions where a pixel does not have any visible pixels from that object immediately behind or next to it, or piece of the object behind it. In those positions, we need to draw a little bit of an outline. Depending on your implementation, you can do this from the frame buffer if you’re looking for more simple and less accurate outlines, but it looks as if Level-5 have done the edge calculations during the rendering process for the more accurate inner object lines—lines like prominent cheekbones or the folds of the character’s shirt in our main screenshot.

If you take nothing else from this article, just realize that a lot of very clever math went into making something as visually appealing as the screenshot above, and the craft on display is even more beautiful in motion. And everything I’ve described here is only a few of the beautiful effects and technology behind this game. There are sections of this game that are not as stylistically appealing, but for those who like JRPGs, I would highly recommend it.

Here’s a video so you can see it in motion for yourself: