How To Process Media Data with the Metal Framework [Tutorial]
- 17768 views
- 7 min
- May 17, 2018
Processing visual data like images and videos is an integral part of mobile development. We recently searched for a tool to do this in the simplest and most efficient manner. Now we want to share our experience and introduce our favorite ‒ Metal. What is the Apple Metal framework and how does it work? What benefits does it offer you? Let’s look under the hood of Metal.
What is Metal?
Metal is a framework that provides an API with which you can render data or perform data-parallel computations on the graphics processing unit (GPU). The GPU is a single-chip processor that draws graphics on the user’s device and thus reduces the load on the central processing unit (CPU).
Metal allows you to perform hardware acceleration, meaning you can process a range of tasks that require more load than usual solutions, such as cryptography, machine learning, or even VR. Unlike third-party APIs, the Metal API is optimized to work with Apple devices running on both iOS and macOS.
Why do I need to use Metal?
Metal provides you with shaders you can compile when building your own application and then use in the runtime if necessary. To write shaders, we use a special language ‒ the Metal Shading Language. The Metal Shading Language is based on the C++ 14 Specification (aka the ISO/IEC JTC1/SC22/WG21 N4431 language specification).
Metal also handles memory and resource management. The main classes in this case are MTLBuffer and MTLTexture: they help determine how data is represented in the GPU memory. You can also use the ready-made classes from Metal to add new effects to images, such as Blur Filter or Threshold Filter.
With Metal, you can use low-level programming language capabilities for various calculations, which speeds up program execution with no need to use high-level abstractions.
Here are a few more benefits of Metal for iOS devices:
- Lowest overhead access to the GPU, eliminating all bottlenecks usually caused by data transfer between the CPU and GPU in other frameworks
- Up to 10 times the number of draw calls compared to OpenGL
- Allows you to run compute applications with performance levels similar to technologies such as CUDA and OpenCL
- Built-in memory and resource management
Among the main competitors of Metal are APIs from third-party developers that solve similar tasks, such as Vulkan, OpenGL, and DirectX. In a way, the significant disadvantage of Metal is that it’s confined to Apple platforms. Yet you can overcome this problem by using additional libraries such as MoltenVK. MoltenVK lets you use the cross-platform API and at the same time run apps based on the Metal API.
Basic concepts of image processing
Let’s look through the main classes we’ll be using in our tutorial:
- MTLDevice is a protocol that defines the interface to a GPU. You need to query the GPU features on a specific device and allocate Metal objects for your app.
- MTLLibrary is an object that contains all the compiled Metal shaders obtained from the text string when building the application or from the runtime itself.
- MTLCommandQueue is a protocol that defines an object storing the ordered list of command buffers to execute.
- MTKTextureLoader is a class helper that allows you to load your textures into the application. It supports the following file types: PNG, JPEG, TIFF, KTX, and PVR (including advanced options).
- MTKView is a class that simplifies drawing of all data in your application.
In order to store the entire configuration in one place, we’ll create a context class:
Now we need to create a file with a simple shader and a .metal extension. Note the syntax of the code and the name of the function. What you see isn’t a Swift snippet but actual Metal code (because this isn’t Swift):
Next, we’ll load the shader and try to process the texture:
First, you need to pay special attention to the name "compute_shader." The value of the parameter must match the name of the function you’ve implemented in the shader file. Let’s move to the settings for image rendering.
The make buffer function creates an MTLBuffer object by copying data from an existing storage allocation into a new allocation:
We need to consider two more terms:
- MTLCommandBuffer ‒ An object that stores commands for execution on the GPU.
- MTLRenderPipelineState ‒ An object that stores the state and graphical functions.
- MTLBuffer ‒ A memory allocation that stores unformatted data accessible to the GPU.
- MTLComputeCommandEncoder – An encoder that specifies the data-parallel compute processing state and commands and executes compute functions.
A shader processes each pixel on the screen. In fact, a shader is just a program executed in the graphics pipeline. The above shader is extremely simple: it just copies each pixel of the original texture into a new texture without changes.
Let’s slightly complicate the example and try to pass parameters to the shader code.
Here we create a simple array from one parameter.
Now we need to change the shader code a bit:
Note the new data parameter in the function header. Using this parameter, we can get the value we’ve set above and process this value inside the shader. We can pass it to the output.write function. This way the original image will get a reddish shade.
Unary image kernels
A unary image kernel works on a single input texture to produce a single output texture. There are several categories of unary image operations available in the Mathematical Programming System (MPS):
- Convolutional operations (Box, Tent, GaussianBlur, Sobel, Convolution)
- Threshold (ThresholdBinary, ThresholdBinaryInverse, ThresholdToZero, ThresholdToZeroInverse, ThresholdTruncate)
- Resampling (LanczosScale)
- Morphological operations (Erode, Dilute)
- Sliding-neighborhood operations (AreaMax, AreaMin, Median, Integral, IntegralOfSquares, Threshold)
To instantiate MPS kernels, you need a Metal device and some number of other parameters determining how the kernel should behave. For example, the MPSImageGaussianBlur kernel takes a sigma parameter that determines its blur radius:
Kernel properties most often aren’t modified after initialization. Therefore, if you want to use different parameters, you have to create multiple filter instances. Even though kernels are fairly lightweight objects, you still will notice the performance hit if you create new instances every frame. So try to reuse kernels as often as possible.
Encoding kernels to a command buffer
To make a kernel work correctly, you must encode it to a command buffer. Kernels operate on Metal textures, read from their source, and write to their destination. You can simply encode a kernel to a command buffer to get the desired result:
We recently created MediaWatermark, an open source GPU/CPU-based iOS watermark library. Now we want to update this library to support the Apple Metal API. For more details about MediaWatermark, check out our article MediaWatermark: GPU/CPU-based iOS Watermark Library.
MediaWatermark library update
First, we gave the MediaItem class a new instance method:
An object of the MediaFilter type is a parameter of the above method. We also added five new filters:
- Color filter
- Sepia filter
- Blur filter
- Threshold filter
- Sobel filter
Color filter (red parameter with value of 1)
Gaussian blur (sigma = 45)
Here you can see an example of the use of new filters:
If your app depends on the latest and greatest in media data processing on iOS devices, Metal is the right choice for you. Metal frees the CPU to deal with things requiring complicated computations like physics engines, audio processing, and artificial intelligence, helping you build faster and more efficient apps.
Check out our MediaWatermark repository to see all the capabilities of Metal.