Introduction

Vulkan is known for being explicit and verbose. But the required verbosity has steadily reduced with each successive version, its new features, and previous extensions being absorbed into the core API. Similarly, RAII has been a pillar of C++ since its inception, yet most existing Vulkan tutorials do not utilize it, instead choosing to "extend" the explicitness by manually cleaning up resources.

To fill that gap, this guide has the following goals:

  • Leverage modern C++, VulkanHPP, and Vulkan 1.3 features
  • Focus on keeping it simple and straightforward, not on performance
  • Develop a basic but dynamic rendering foundation

To reiterate, the focus is not on performance, it is on a quick introduction to the current standard multi-platform graphics API while utilizing the modern paradigms and tools (at the time of writing). Even disregarding potential performance gains, Vulkan has a better and more modern design and ecosystem than OpenGL, eg: there is no global state machine, parameters are passed by filling structs with meaningful member variable names, multi-threading is largely trivial (yes, it is actually easier to do on Vulkan than OpenGL), there are a comprehensive set of validation layers to catch misuse which can be enabled without any changes to application code, etc.

For an in-depth Vulkan guide, the official tutorial is recommended. vkguide and the original Vulkan Tutorial are also very popular and intensely detailed.

Target Audience

The guide is for you if you:

  • Understand the principles of modern C++ and its usage
  • Have created C++ projects using third-party libraries
  • Are somewhat familiar with graphics
    • Having done OpenGL tutorials would be ideal
    • Experience with frameworks like SFML / SDL is great
  • Don't mind if all the information you need isn't monolithically in one place (ie, this guide)

Some examples of what this guide does not focus on:

  • GPU-driven rendering
  • Real-time graphics from ground-up
  • Considerations for tiled GPUs (eg mobile devices / Android)

Source

The source code for the project (as well as this guide) is located in this repository. A section/* branch intends to reflect the state of the code at the end of a particular section of the guide. Bugfixes / changes are generally backported, but there may be some divergence from the current state of the code (ie, in main). The source of the guide itself is only up-to-date on main, changes are not backported.

Getting Started

Vulkan is platform agnostic, which is one of the main reasons for its verbosity: it has to account for a wide range of implementations in its API. We shall be constraining our approach to Windows and Linux (x64 or aarch64), and focusing on discrete GPUs, enabing us to sidestep quite a bit of that verbosity. Vulkan 1.3 is widely supported by the target desktop platforms and reasonably recent graphics cards.

This doesn't mean that eg an integrated graphics chip will not be supported, it will just not be particularly designed/optimized for.

Technical Requirements

  1. Vulkan 1.3+ capable GPU and loader
  2. Vulkan 1.3+ SDK
    1. This is required for validation layers, a critical component/tool to use when developing Vulkan applications. The project itself does not use the SDK.
    2. Always using the latest SDK is recommended (1.4.x at the time of writing).
  3. Desktop operating system that natively supports Vulkan
    1. Windows and/or Linux (distros that use repos with recent packages) is recommended.
    2. MacOS does not natively support Vulkan. It can be used through MoltenVk, but at the time of writing MoltenVk does not fully support Vulkan 1.3, so if you decide to take this route, you may face some roadblocks.
  4. C++23 compiler and standard library
    1. GCC14+, Clang18+, and/or latest MSVC are recommended. MinGW/MSYS is not recommended.
    2. Using C++20 with replacements for C++23 specific features is possible. Eg replace std::print() with fmt::print(), add () to lambdas, etc.
  5. CMake 3.24+

Overview

While support for C++ modules is steadily growing, tooling is not yet ready on all platforms/IDEs we want to target, so we will unfortuntely still be using headers. This might change in the near future, followed by a refactor of this guide.

The project uses a "Build the World" approach, enabling usage of sanitizers, reproducible builds on any supported platform, and requiring minimum pre-installed things on target machines. Feel free to use pre-built binaries instead, it doesn't change anything about how you would use Vulkan.

Dependencies

  1. GLFW for windowing, input, and Surface creation
  2. VulkanHPP (via Vulkan-Headers) for interacting with Vulkan
    1. While Vulkan is a C API, it offers an official C++ wrapper library with many quality-of-life features. This guide almost exclusively uses that, except at the boundaries of other C libraries that themselves use the C API (eg GLFW and VMA).
  3. Vulkan Memory Allocator for dealing with Vulkan memory heaps
  4. GLM for GLSL-like linear algebra in C++
  5. Dear ImGui for UI

Project Layout

This page describes the layout used by the code in this guide. Everything here is just an opinionated option used by the guide, and is not related to Vulkan usage.

External dependencies are stuffed into a zip file that's decompressed by CMake during the configure stage. Using FetchContent is a viable alternative.

Ninja Multi-Config is the assumed generator used, regardless of OS/compiler. This is set up in a CMakePresets.json file in the project root. Additional custom presets can be added via CMakeUserPresets.json.

On Windows, Visual Studio CMake Mode uses this generator and automatically loads presets. With Visual Studio Code, the CMake Tools extension automatically uses presets. For other IDEs, refer to their documentation on using CMake presets.

Filesystem

.
|-- CMakeLists.txt         <== executable target
|-- CMakePresets.json
|-- [other project files]
|-- ext/
│   |-- CMakeLists.txt     <== external dependencies target
|-- src/
    |-- [sources and headers]

Validation Layers

The area of Vulkan that apps interact with: the loader, is very powerful and flexible. Read more about it here. Its design enables it to chain API calls through configurable layers, eg for overlays, and most importantly for us: Validation Layers.

Vulkan Loader

As suggested by the Khronos Group, the guide strongly recommends using Vulkan Configurator (GUI) for validation layers. It is included in the Vulkan SDK, just keep it running while developing Vulkan applications, and ensure it is setup to inject validation layers into all detected applications, with Synchronization Validation enabled. This approach provides a lot of flexibility at runtime, including the ability to have VkConfig break the debugger on encountering an error, and also eliminates the need for validation layer specific code in the applications.

Note: modify your development (or desktop) environment's PATH (or use LD_LIBRARY_PATH on supported systems) to make sure the SDK's binaries (shared libraries) are visible first.

Vulkan Configurator

Application

class App will serve as the owner and driver of the entire application. While there will only be one instance, using a class enables us to leverage RAII and destroy all its resources automatically and in the correct order, and avoids the need for globals.

// app.hpp
namespace lvk {
class App {
 public:
  void run();
};
} // namespace lvk

// app.cpp
namespace lvk {
void App::run() {
  // TODO
}
} // namespace lvk

Main

main.cpp will not do much: it's mainly responsible for transferring control to the actual entry point, and catching fatal exceptions.

// main.cpp
auto main() -> int {
  try {
    lvk::App{}.run();
  } catch (std::exception const& e) {
    std::println(stderr, "PANIC: {}", e.what());
    return EXIT_FAILURE;
  } catch (...) {
    std::println("PANIC!");
    return EXIT_FAILURE;
  }
}

Initialization

This section deals with initialization of all the systems needed, including:

  • Initializing GLFW and creating a Window
  • Creating a Vulkan Instance
  • Creating a Vulkan Surface
  • Selecting a Vulkan Physical Device
  • Creating a Vulkan logical Device
  • Creating a Vulkan Swapchain

If any step here fails, it is a fatal error as we can't do anything meaningful beyond that point.

GLFW Window

We will use GLFW (3.4) for windowing and related events. The library - like all external dependencies - is configured and added to the build tree in ext/CMakeLists.txt. GLFW_INCLUDE_VULKAN is defined for all consumers, to enable GLFW's Vulkan related functions (known as Window System Integration (WSI)). GLFW 3.4 supports Wayland on Linux, and by default it builds backends for both X11 and Wayland. For this reason it will need the development packages for both platforms (and some other Wayland/CMake dependencies) to configure/build successfully. A particular backend can be requested at runtime if desired via GLFW_PLATFORM.

Although it is quite feasible to have multiple windows in a Vulkan-GLFW application, that is out of scope for this guide. For our purposes GLFW (the library) and a single window are a monolithic unit - initialized and destroyed together. This can be encapsulated in a std::unique_ptr with a custom deleter, especially since GLFW returns an opaque pointer (GLFWwindow*).

// window.hpp
namespace lvk::glfw {
struct Deleter {
  void operator()(GLFWwindow* window) const noexcept;
};

using Window = std::unique_ptr<GLFWwindow, Deleter>;

// Returns a valid Window if successful, else throws.
[[nodiscard]] auto create_window(glm::ivec2 size, char const* title) -> Window;
} // namespace lvk::glfw

// window.cpp
void Deleter::operator()(GLFWwindow* window) const noexcept {
  glfwDestroyWindow(window);
  glfwTerminate();
}

GLFW can create fullscreen and borderless windows, but we will stick to a standard window with decorations. Since we cannot do anything useful if we are unable to create a window, all other branches throw a fatal exception (caught in main).

auto glfw::create_window(glm::ivec2 const size, char const* title) -> Window {
  static auto const on_error = [](int const code, char const* description) {
    std::println(stderr, "[GLFW] Error {}: {}", code, description);
  };
  glfwSetErrorCallback(on_error);
  if (glfwInit() != GLFW_TRUE) {
    throw std::runtime_error{"Failed to initialize GLFW"};
  }
  // check for Vulkan support.
  if (glfwVulkanSupported() != GLFW_TRUE) {
    throw std::runtime_error{"Vulkan not supported"};
  }
  auto ret = Window{};
  // tell GLFW that we don't want an OpenGL context.
  glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
  ret.reset(glfwCreateWindow(size.x, size.y, title, nullptr, nullptr));
  if (!ret) { throw std::runtime_error{"Failed to create GLFW Window"}; }
  return ret;
}

App can now store a glfw::Window and keep polling it in run() until it gets closed by the user. We will not be able to draw anything to the window for a while, but this is the first step in that journey.

Declare it as a private member:

private:
  glfw::Window m_window{};

Add some private member functions to encapsulate each operation:

void create_window();

void main_loop();

Implement them and call them in run():

void App::run() {
  create_window();

  main_loop();
}

void App::create_window() {
  m_window = glfw::create_window({1280, 720}, "Learn Vulkan");
}

void App::main_loop() {
  while (glfwWindowShouldClose(m_window.get()) == GLFW_FALSE) {
    glfwPollEvents();
  }
}

On Wayland you will not even see a window yet: it is only shown after the application presents a framebuffer to it.

Vulkan Instance

Instead of linking to Vulkan (via the SDK) at build-time, we will load Vulkan at runtime. This requires a few adjustments:

  1. In the CMake ext target VK_NO_PROTOTYPES is defined, which turns API function declarations into function pointers
  2. In app.cpp this line is added to the global scope: VULKAN_HPP_DEFAULT_DISPATCH_LOADER_DYNAMIC_STORAGE
  3. Before and during initialization VULKAN_HPP_DEFAULT_DISPATCHER.init() is called

The first thing to do in Vulkan is to create an Instance, which will enable enumeration of physical devices (GPUs) and creation of a logical device.

Since we require Vulkan 1.3, store that in a constant to be easily referenced:

namespace {
constexpr auto vk_version_v = VK_MAKE_VERSION(1, 3, 0);
} // namespace

In App, create a new member function create_instance() and call it after create_window() in run(). After initializing the dispatcher, check that the loader meets the version requirement:

void App::create_instance() {
  // initialize the dispatcher without any arguments.
  VULKAN_HPP_DEFAULT_DISPATCHER.init();
  auto const loader_version = vk::enumerateInstanceVersion();
  if (loader_version < vk_version_v) {
    throw std::runtime_error{"Loader does not support Vulkan 1.3"};
  }
}

We will need the WSI instance extensions, which GLFW conveniently provides for us. Add a helper function in window.hpp/cpp:

auto glfw::instance_extensions() -> std::span<char const* const> {
  auto count = std::uint32_t{};
  auto const* extensions = glfwGetRequiredInstanceExtensions(&count);
  return {extensions, static_cast<std::size_t>(count)};
}

Continuing with instance creation, create a vk::ApplicationInfo object and fill it up:

auto app_info = vk::ApplicationInfo{};
app_info.setPApplicationName("Learn Vulkan").setApiVersion(vk_version_v);

Create a vk::InstanceCreateInfo object and fill it up:

auto instance_ci = vk::InstanceCreateInfo{};
// need WSI instance extensions here (platform-specific Swapchains).
auto const extensions = glfw::instance_extensions();
instance_ci.setPApplicationInfo(&app_info).setPEnabledExtensionNames(
  extensions);

Add a vk::UniqueInstance member after m_window: this must be destroyed before terminating GLFW. Create it, and initialize the dispatcher against it:

glfw::Window m_window{};
vk::UniqueInstance m_instance{};

// ...
// initialize the dispatcher against the created Instance.
m_instance = vk::createInstanceUnique(instance_ci);
VULKAN_HPP_DEFAULT_DISPATCHER.init(*m_instance);

Make sure VkConfig is running with validation layers enabled, and debug/run the app. If "Information" level loader messages are enabled, you should see quite a bit of console output at this point: information about layers being loaded, physical devices and their ICDs being enumerated, etc.

If this line or equivalent is not visible in the logs, re-check your Vulkan Configurator setup and PATH:

INFO | LAYER:      Insert instance layer "VK_LAYER_KHRONOS_validation"

For instance, if libVkLayer_khronos_validation.so / VkLayer_khronos_validation.dll is not visible to the app / loader, you'll see a line similar to:

INFO | LAYER:   Requested layer "VK_LAYER_KHRONOS_validation" failed to load.

Congratulations, you have successfully initialized a Vulkan Instance!

Wayland users: seeing the window is still a long way off, these VkConfig/validation logs are your only feedback for now.

Vulkan Surface

Being platform agnostic, Vulkan interfaces with the WSI via the VK_KHR_surface extension. A Surface enables displaying images on the window through the presentation engine.

Add another helper function in window.hpp/cpp:

auto glfw::create_surface(GLFWwindow* window, vk::Instance const instance)
  -> vk::UniqueSurfaceKHR {
  VkSurfaceKHR ret{};
  auto const result =
    glfwCreateWindowSurface(instance, window, nullptr, &ret);
  if (result != VK_SUCCESS || ret == VkSurfaceKHR{}) {
    throw std::runtime_error{"Failed to create Vulkan Surface"};
  }
  return vk::UniqueSurfaceKHR{ret, instance};
}

Add a vk::UniqueSurfaceKHR member to App after m_instance, and create the surface:

void App::create_surface() {
  m_surface = glfw::create_surface(m_window.get(), *m_instance);
}

Vulkan Physical Device

A Physical Device represents a single complete implementation of Vulkan, for our intents and purposes a single GPU. (It could also be eg a software renderer like Mesa/lavapipe.) Some machines may have multiple Physical Devices available, like laptops with dual-GPUs. We need to select the one we want to use, given our constraints:

  1. Vulkan 1.3 must be supported
  2. Vulkan Swapchains must be supported
  3. A Vulkan Queue that supports Graphics and Transfer operations must be available
  4. It must be able to present to the previously created Vulkan Surface
  5. (Optional) Prefer discrete GPUs

We wrap the actual Physical Device and a few other useful objects into struct Gpu. Since it will be accompanied by a hefty utility function, we put it in its own hpp/cpp files, and move the vk_version_v constant to this new header:

constexpr auto vk_version_v = VK_MAKE_VERSION(1, 3, 0);

struct Gpu {
  vk::PhysicalDevice device{};
  vk::PhysicalDeviceProperties properties{};
  vk::PhysicalDeviceFeatures features{};
  std::uint32_t queue_family{};
};

[[nodiscard]] auto get_suitable_gpu(vk::Instance instance,
                                    vk::SurfaceKHR surface) -> Gpu;

The implementation:

auto lvk::get_suitable_gpu(vk::Instance const instance,
               vk::SurfaceKHR const surface) -> Gpu {
  auto const supports_swapchain = [](Gpu const& gpu) {
    static constexpr std::string_view name_v =
      VK_KHR_SWAPCHAIN_EXTENSION_NAME;
    static constexpr auto is_swapchain =
      [](vk::ExtensionProperties const& properties) {
        return properties.extensionName.data() == name_v;
      };
    auto const properties = gpu.device.enumerateDeviceExtensionProperties();
    auto const it = std::ranges::find_if(properties, is_swapchain);
    return it != properties.end();
  };

  auto const set_queue_family = [](Gpu& out_gpu) {
    static constexpr auto queue_flags_v =
      vk::QueueFlagBits::eGraphics | vk::QueueFlagBits::eTransfer;
    for (auto const [index, family] :
       std::views::enumerate(out_gpu.device.getQueueFamilyProperties())) {
      if ((family.queueFlags & queue_flags_v) == queue_flags_v) {
        out_gpu.queue_family = static_cast<std::uint32_t>(index);
        return true;
      }
    }
    return false;
  };

  auto const can_present = [surface](Gpu const& gpu) {
    return gpu.device.getSurfaceSupportKHR(gpu.queue_family, surface) ==
         vk::True;
  };

  auto fallback = Gpu{};
  for (auto const& device : instance.enumeratePhysicalDevices()) {
    auto gpu = Gpu{.device = device, .properties = device.getProperties()};
    if (gpu.properties.apiVersion < vk_version_v) { continue; }
    if (!supports_swapchain(gpu)) { continue; }
    if (!set_queue_family(gpu)) { continue; }
    if (!can_present(gpu)) { continue; }
    gpu.features = gpu.device.getFeatures();
    if (gpu.properties.deviceType == vk::PhysicalDeviceType::eDiscreteGpu) {
      return gpu;
    }
    // keep iterating in case we find a Discrete Gpu later.
    fallback = gpu;
  }
  if (fallback.device) { return fallback; }

  throw std::runtime_error{"No suitable Vulkan Physical Devices"};
}

Finally, add a Gpu member in App and initialize it after create_surface():

create_surface();
select_gpu();

// ...
void App::select_gpu() {
  m_gpu = get_suitable_gpu(*m_instance, *m_surface);
  std::println("Using GPU: {}",
               std::string_view{m_gpu.properties.deviceName});
}

Vulkan Device

A Vulkan Device is a logical instance of a Physical Device, and will the primary interface for everything Vulkan now onwards. Vulkan Queues are owned by the Device, we will need one from the queue family stored in the Gpu to submit recorded command buffers. We also need to explicitly declare all features we want to use, eg Dynamic Rendering and Synchronization2.

Setup a vk::QueueCreateInfo object:

auto queue_ci = vk::DeviceQueueCreateInfo{};
// since we use only one queue, it has the entire priority range, ie, 1.0
static constexpr auto queue_priorities_v = std::array{1.0f};
queue_ci.setQueueFamilyIndex(m_gpu.queue_family)
  .setQueueCount(1)
  .setQueuePriorities(queue_priorities_v);

Setup the core device features:

// nice-to-have optional core features, enable if GPU supports them.
auto enabled_features = vk::PhysicalDeviceFeatures{};
enabled_features.fillModeNonSolid = m_gpu.features.fillModeNonSolid;
enabled_features.wideLines = m_gpu.features.wideLines;
enabled_features.samplerAnisotropy = m_gpu.features.samplerAnisotropy;
enabled_features.sampleRateShading = m_gpu.features.sampleRateShading;

Setup the additional features, using setPNext() to chain them:

// extra features that need to be explicitly enabled.
auto sync_feature = vk::PhysicalDeviceSynchronization2Features{vk::True};
auto dynamic_rendering_feature =
  vk::PhysicalDeviceDynamicRenderingFeatures{vk::True};
// sync_feature.pNext => dynamic_rendering_feature,
// and later device_ci.pNext => sync_feature.
// this is 'pNext chaining'.
sync_feature.setPNext(&dynamic_rendering_feature);

Setup a vk::DeviceCreateInfo object:

auto device_ci = vk::DeviceCreateInfo{};
// we only need one device extension: Swapchain.
static constexpr auto extensions_v =
  std::array{VK_KHR_SWAPCHAIN_EXTENSION_NAME};
device_ci.setPEnabledExtensionNames(extensions_v)
  .setQueueCreateInfos(queue_ci)
  .setPEnabledFeatures(&enabled_features)
  .setPNext(&sync_feature);

Declare a vk::UniqueDevice member after m_gpu, create it, and initialize the dispatcher against it:

m_device = m_gpu.device.createDeviceUnique(device_ci);
// initialize the dispatcher against the created Device.
VULKAN_HPP_DEFAULT_DISPATCHER.init(*m_device);

Declare a vk::Queue member (order doesn't matter since it's just a handle, the actual Queue is owned by the Device) and initialize it:

static constexpr std::uint32_t queue_index_v{0};
m_queue = m_device->getQueue(m_gpu.queue_family, queue_index_v);

Scoped Waiter

A useful abstraction to have is an object that in its destructor waits/blocks until the Device is idle. It is incorrect usage to destroy Vulkan objects while they are in use by the GPU, such an object helps with making sure the device is idle before some dependent resource gets destroyed.

Being able to do arbitary things on scope exit will be useful in other spots too, so we encapsulate that in a basic class template Scoped. It's somewhat like a unique_ptr<Type, Deleter> that stores the value (Type) instead of a pointer (Type*), with some constraints:

  1. Type must be default constructible
  2. Assumes a default constructed Type is equivalent to null (does not call Deleter)
template <typename Type>
concept Scopeable =
  std::equality_comparable<Type> && std::is_default_constructible_v<Type>;

template <Scopeable Type, typename Deleter>
class Scoped {
 public:
  Scoped(Scoped const&) = delete;
  auto operator=(Scoped const&) = delete;

  Scoped() = default;

  constexpr Scoped(Scoped&& rhs) noexcept
    : m_t(std::exchange(rhs.m_t, Type{})) {}

  constexpr auto operator=(Scoped&& rhs) noexcept -> Scoped& {
    if (&rhs != this) { std::swap(m_t, rhs.m_t); }
    return *this;
  }

  explicit(false) constexpr Scoped(Type t) : m_t(std::move(t)) {}

  constexpr ~Scoped() {
    if (m_t == Type{}) { return; }
    Deleter{}(m_t);
  }

  [[nodiscard]] constexpr auto get() const -> Type const& { return m_t; }
  [[nodiscard]] constexpr auto get() -> Type& { return m_t; }

 private:
  Type m_t{};
};

Don't worry if this doesn't make a lot of sense: the implementation isn't important, what it does and how to use it is what matters.

A ScopedWaiter can now be implemented quite easily:

struct ScopedWaiterDeleter {
  void operator()(vk::Device const device) const noexcept {
    device.waitIdle();
  }
};

using ScopedWaiter = Scoped<vk::Device, ScopedWaiterDeleter>;

Add a ScopedWaiter member to App at the end of its member list: this must remain at the end to be the first member that gets destroyed, thus guaranteeing the device will be idle before the destruction of any other members begins. Initialize it after creating the Device:

m_waiter = *m_device;

Swapchain

A Vulkan Swapchain is an array of presentable images associated with a Surface, which acts as a bridge between the application and the platform's presentation engine (compositor / display engine). The Swapchain will be continually used in the main loop to acquire and present images. Since failing to create a Swapchain is a fatal error, its creation is part of the initialization section.

We shall wrap the Vulkan Swapchain into our own class Swapchain. It will also store the a copy of the Images owned by the Vulkan Swapchain, and create (and own) Image Views for each Image. The Vulkan Swapchain may need to be recreated in the main loop, eg when the framebuffer size changes, or an acquire/present operation returns vk::ErrorOutOfDateKHR. This will be encapsulated in a recreate() function which can simply be called during initialization as well.

// swapchain.hpp
class Swapchain {
 public:
  explicit Swapchain(vk::Device device, Gpu const& gpu,
                     vk::SurfaceKHR surface, glm::ivec2 size);

  auto recreate(glm::ivec2 size) -> bool;

  [[nodiscard]] auto get_size() const -> glm::ivec2 {
    return {m_ci.imageExtent.width, m_ci.imageExtent.height};
  }

 private:
  void populate_images();
  void create_image_views();

  vk::Device m_device{};
  Gpu m_gpu{};

  vk::SwapchainCreateInfoKHR m_ci{};
  vk::UniqueSwapchainKHR m_swapchain{};
  std::vector<vk::Image> m_images{};
  std::vector<vk::UniqueImageView> m_image_views{};
};

// swapchain.cpp
Swapchain::Swapchain(vk::Device const device, Gpu const& gpu,
           vk::SurfaceKHR const surface, glm::ivec2 const size)
  : m_device(device), m_gpu(gpu) {}

Static Swapchain Properties

Some Swapchain creation parameters like the image extent (size) and count depend on the surface capabilities, which can change during runtime. We can setup the rest in the constructor, for which we need a helper function to obtain a desired Surface Format:

constexpr auto srgb_formats_v = std::array{
  vk::Format::eR8G8B8A8Srgb,
  vk::Format::eB8G8R8A8Srgb,
};

// returns a SurfaceFormat with SrgbNonLinear color space and an sRGB format.
[[nodiscard]] constexpr auto
get_surface_format(std::span<vk::SurfaceFormatKHR const> supported)
  -> vk::SurfaceFormatKHR {
  for (auto const desired : srgb_formats_v) {
    auto const is_match = [desired](vk::SurfaceFormatKHR const& in) {
      return in.format == desired &&
           in.colorSpace ==
             vk::ColorSpaceKHR::eVkColorspaceSrgbNonlinear;
    };
    auto const it = std::ranges::find_if(supported, is_match);
    if (it == supported.end()) { continue; }
    return *it;
  }
  return supported.front();
}

An sRGB format is preferred because that is what the screen's color space is in. This is indicated by the fact that the only core Color Space is vk::ColorSpaceKHR::eVkColorspaceSrgbNonlinear, which specifies support for the images in sRGB color space.

The constructor can now be implemented:

auto const surface_format =
  get_surface_format(m_gpu.device.getSurfaceFormatsKHR(surface));
m_ci.setSurface(surface)
  .setImageFormat(surface_format.format)
  .setImageColorSpace(surface_format.colorSpace)
  .setImageArrayLayers(1)
  // Swapchain images will be used as color attachments (render targets).
  .setImageUsage(vk::ImageUsageFlagBits::eColorAttachment)
  // eFifo is guaranteed to be supported.
  .setPresentMode(vk::PresentModeKHR::eFifo);
if (!recreate(size)) {
  throw std::runtime_error{"Failed to create Vulkan Swapchain"};
}

Swapchain Recreation

The constraints on Swapchain creation parameters are specified by Surface Capabilities. Based on the spec we add two helper functions and a constant:

constexpr std::uint32_t min_images_v{3};

// returns currentExtent if specified, else clamped size.
[[nodiscard]] constexpr auto
get_image_extent(vk::SurfaceCapabilitiesKHR const& capabilities,
                 glm::uvec2 const size) -> vk::Extent2D {
  constexpr auto limitless_v = 0xffffffff;
  if (capabilities.currentExtent.width < limitless_v &&
    capabilities.currentExtent.height < limitless_v) {
    return capabilities.currentExtent;
  }
  auto const x = std::clamp(size.x, capabilities.minImageExtent.width,
                capabilities.maxImageExtent.width);
  auto const y = std::clamp(size.y, capabilities.minImageExtent.height,
                capabilities.maxImageExtent.height);
  return vk::Extent2D{x, y};
}

[[nodiscard]] constexpr auto
get_image_count(vk::SurfaceCapabilitiesKHR const& capabilities)
  -> std::uint32_t {
  if (capabilities.maxImageCount < capabilities.minImageCount) {
    return std::max(min_images_v, capabilities.minImageCount);
  }
  return std::clamp(min_images_v, capabilities.minImageCount,
            capabilities.maxImageCount);
}

We want at least three images in order to be have the option to set up triple buffering. While it's possible for a Surface to have maxImageCount < 3, it is quite unlikely. It is in fact much more likely for minImageCount > 3.

The dimensions of Vulkan Images must be positive, so if the incoming framebuffer size is not, we skip the attempt to recreate. This can happen eg on Windows when the window is minimized. (Until it is restored, rendering will basically be paused.)

auto Swapchain::recreate(glm::ivec2 size) -> bool {
  // Image sizes must be positive.
  if (size.x <= 0 || size.y <= 0) { return false; }

  auto const capabilities =
    m_gpu.device.getSurfaceCapabilitiesKHR(m_ci.surface);
  m_ci.setImageExtent(get_image_extent(capabilities, size))
    .setMinImageCount(get_image_count(capabilities))
    .setOldSwapchain(m_swapchain ? *m_swapchain : vk::SwapchainKHR{})
    .setQueueFamilyIndices(m_gpu.queue_family);
  assert(m_ci.imageExtent.width > 0 && m_ci.imageExtent.height > 0 &&
       m_ci.minImageCount >= min_images_v);

  // wait for the device to be idle before destroying the current swapchain.
  m_device.waitIdle();
  m_swapchain = m_device.createSwapchainKHRUnique(m_ci);

  return true;
}

After successful recreation we want to fill up those vectors of images and views. For the images we use a more verbose approach to avoid having to assign the member vector to a newly returned one every time:

void require_success(vk::Result const result, char const* error_msg) {
  if (result != vk::Result::eSuccess) { throw std::runtime_error{error_msg}; }
}

// ...
void Swapchain::populate_images() {
  // we use the more verbose two-call API to avoid assigning m_images to a new
  // vector on every call.
  auto image_count = std::uint32_t{};
  auto result =
    m_device.getSwapchainImagesKHR(*m_swapchain, &image_count, nullptr);
  require_success(result, "Failed to get Swapchain Images");

  m_images.resize(image_count);
  result = m_device.getSwapchainImagesKHR(*m_swapchain, &image_count,
                                          m_images.data());
  require_success(result, "Failed to get Swapchain Images");
}

Creation of the views is fairly straightforward:

void Swapchain::create_image_views() {
  auto subresource_range = vk::ImageSubresourceRange{};
  // this is a color image with 1 layer and 1 mip-level (the default).
  subresource_range.setAspectMask(vk::ImageAspectFlagBits::eColor)
    .setLayerCount(1)
    .setLevelCount(1);
  auto image_view_ci = vk::ImageViewCreateInfo{};
  // set common parameters here (everything except the Image).
  image_view_ci.setViewType(vk::ImageViewType::e2D)
    .setFormat(m_ci.imageFormat)
    .setSubresourceRange(subresource_range);
  m_image_views.clear();
  m_image_views.reserve(m_images.size());
  for (auto const image : m_images) {
    image_view_ci.setImage(image);
    m_image_views.push_back(m_device.createImageViewUnique(image_view_ci));
  }
}

We can now call these functions in recreate(), before return true, and add a log for some feedback:

populate_images();
create_image_views();

size = get_size();
std::println("[lvk] Swapchain [{}x{}]", size.x, size.y);
return true;

The log can get a bit noisy on incessant resizing (especially on Linux).

To get the framebuffer size, add a helper function in window.hpp/cpp:

auto glfw::framebuffer_size(GLFWwindow* window) -> glm::ivec2 {
  auto ret = glm::ivec2{};
  glfwGetFramebufferSize(window, &ret.x, &ret.y);
  return ret;
}

Finally, add a std::optional<Swapchain> member to App after m_device, add the create function, and call it after create_device():

std::optional<Swapchain> m_swapchain{};

// ...
void App::create_swapchain() {
  auto const size = glfw::framebuffer_size(m_window.get());
  m_swapchain.emplace(*m_device, m_gpu, *m_surface, size);
}

Rendering

This section implements Render Sync, the Swapchain loop, performs Swapchain image layout transitions, and introduces Dynamic Rendering. Originally Vulkan only supported Render Passes, which are quite verbose to setup, require somewhat confusing subpass dependencies, and are ironically less explicit: they can perform implicit layout transitions on their framebuffer attachments. They are also tightly coupled to Graphics Pipelines, you need a separate pipeline object for each Render Pass, even if they are identical in all other respects. This RenderPass/Subpass model was primarily beneficial for GPUs with tiled renderers, and in Vulkan 1.3 Dynamic Rendering was promoted to the core API (previously it was an extension) as an alternative to using Render Passes.

Swapchain Loop

One part of rendering in the main loop is the Swapchain loop, which at a high level comprises of these steps:

  1. Acquire a Swapchain Image
  2. Render to the acquired Image
  3. Present the Image (this releases the image back to the Swapchain)

WSI Engine

There are a few nuances to deal with, for instance:

  1. Acquiring (and/or presenting) will sometimes fail (eg because the Swapchain is out of date), in which case the remaining steps need to be skipped
  2. The acquire command can return before the image is actually ready for use, rendering needs to be synchronized to only start after the image is ready
  3. Similarly, presentation needs to be synchronized to only occur after rendering has completed
  4. The images need appropriate Layout Transitions at each stage

Additionally, the number of swapchain images can vary, whereas the engine should use a fixed number of virtual frames: 2 for double buffering, 3 for triple (more is usually overkill). More info is available here. It's also possible for the main loop to acquire the same image before a previous render command has finished (or even started), if the Swapchain is using Mailbox Present Mode. While FIFO will block until the oldest submitted image is available (also known as vsync), we should still synchronize and wait until the acquired image has finished rendering.

Virtual Frames

All the dynamic resources used during the rendering of a frame comprise a virtual frame. The application has a fixed number of virtual frames which it cycles through on each render pass. For synchronization, each frame will be associated with a vk::Fence which will be waited on before rendering to it again. It will also have a pair of vk::Semaphores to synchronize the acquire, render, and present calls on the GPU (we don't need to wait for them on the CPU side / in C++). For recording commands, there will be a vk::CommandBuffer per virtual frame, where all rendering commands for that frame (including layout transitions) will be recorded.

Image Layouts

Vulkan Images have a property known as Image Layout. Most operations on images and their subresources require them to be in certain specific layouts, requiring transitions before (and after). A layout transition conveniently also functions as a Pipeline Barrier (think memory barrier on the GPU), enabling us to synchronize operations before and after the transition.

Vulkan Synchronization is arguably the most complicated aspect of the API, a good amount of research is recommended. Here is an article explaining barriers.

Render Sync

Create a new header resource_buffering.hpp:

// Number of virtual frames.
inline constexpr std::size_t buffering_v{2};

// Alias for N-buffered resources.
template <typename Type>
using Buffered = std::array<Type, buffering_v>;

Add a private struct RenderSync to App:

struct RenderSync {
  // signaled when Swapchain image has been acquired.
  vk::UniqueSemaphore draw{};
  // signaled when image is ready to be presented.
  vk::UniqueSemaphore present{};
  // signaled with present Semaphore, waited on before next render.
  vk::UniqueFence drawn{};
  // used to record rendering commands.
  vk::CommandBuffer command_buffer{};
};

Add the new members associated with the Swapchain loop:

// command pool for all render Command Buffers.
vk::UniqueCommandPool m_render_cmd_pool{};
// Sync and Command Buffer for virtual frames.
Buffered<RenderSync> m_render_sync{};
// Current virtual frame index.
std::size_t m_frame_index{};

Add, implement, and call the create function:

void App::create_render_sync() {
  // Command Buffers are 'allocated' from a Command Pool (which is 'created'
  // like all other Vulkan objects so far). We can allocate all the buffers
  // from a single pool here.
  auto command_pool_ci = vk::CommandPoolCreateInfo{};
  // this flag enables resetting the command buffer for re-recording (unlike a
  // single-time submit scenario).
  command_pool_ci.setFlags(vk::CommandPoolCreateFlagBits::eResetCommandBuffer)
    .setQueueFamilyIndex(m_gpu.queue_family);
  m_render_cmd_pool = m_device->createCommandPoolUnique(command_pool_ci);

  auto command_buffer_ai = vk::CommandBufferAllocateInfo{};
  command_buffer_ai.setCommandPool(*m_render_cmd_pool)
    .setCommandBufferCount(static_cast<std::uint32_t>(resource_buffering_v))
    .setLevel(vk::CommandBufferLevel::ePrimary);
  auto const command_buffers =
    m_device->allocateCommandBuffers(command_buffer_ai);
  assert(command_buffers.size() == m_render_sync.size());

  // we create Render Fences as pre-signaled so that on the first render for
  // each virtual frame we don't wait on their fences (since there's nothing
  // to wait for yet).
  static constexpr auto fence_create_info_v =
    vk::FenceCreateInfo{vk::FenceCreateFlagBits::eSignaled};
  for (auto [sync, command_buffer] :
     std::views::zip(m_render_sync, command_buffers)) {
    sync.command_buffer = command_buffer;
    sync.draw = m_device->createSemaphoreUnique({});
    sync.present = m_device->createSemaphoreUnique({});
    sync.drawn = m_device->createFenceUnique(fence_create_info_v);
  }
}

Swapchain Update

Swapchain acquire/present operations can have various results. We constrain ourselves to the following:

  • eSuccess: all good
  • eSuboptimalKHR: also all good (not an error, and this is unlikely to occur on a desktop)
  • eErrorOutOfDateKHR: Swapchain needs to be recreated
  • Any other vk::Result: fatal/unexpected error

Expressing as a helper function in swapchain.cpp:

auto needs_recreation(vk::Result const result) -> bool {
  switch (result) {
  case vk::Result::eSuccess:
  case vk::Result::eSuboptimalKHR: return false;
  case vk::Result::eErrorOutOfDateKHR: return true;
  default: break;
  }
  throw std::runtime_error{"Swapchain Error"};
}

We also want to return the Image, Image View, and size upon successful acquisition of the underlying Swapchain Image. Wrapping those in a struct:

struct RenderTarget {
  vk::Image image{};
  vk::ImageView image_view{};
  vk::Extent2D extent{};
};

VulkanHPP's primary API throws if the vk::Result corresponds to an error (based on the spec). eErrorOutOfDateKHR is technically an error, but it's quite possible to get it when the framebuffer size doesn't match the Swapchain size. To avoid having to deal with exceptions here, we use the alternate API for the acquire and present calls (overloads distinguished by pointer arguments and/or out parameters, and returning a vk::Result).

Implementing the acquire operation:

auto Swapchain::acquire_next_image(vk::Semaphore const to_signal)
  -> std::optional<RenderTarget> {
  assert(!m_image_index);
  static constexpr auto timeout_v = std::numeric_limits<std::uint64_t>::max();
  // avoid VulkanHPP ErrorOutOfDateKHR exceptions by using alternate API that
  // returns a Result.
  auto image_index = std::uint32_t{};
  auto const result = m_device.acquireNextImageKHR(
    *m_swapchain, timeout_v, to_signal, {}, &image_index);
  if (needs_recreation(result)) { return {}; }

  m_image_index = static_cast<std::size_t>(image_index);
  return RenderTarget{
    .image = m_images.at(*m_image_index),
    .image_view = *m_image_views.at(*m_image_index),
    .extent = m_ci.imageExtent,
  };
}

Similarly, present:

auto Swapchain::present(vk::Queue const queue, vk::Semaphore const to_wait)
  -> bool {
  auto const image_index = static_cast<std::uint32_t>(m_image_index.value());
  auto present_info = vk::PresentInfoKHR{};
  present_info.setSwapchains(*m_swapchain)
    .setImageIndices(image_index)
    .setWaitSemaphores(to_wait);
  // avoid VulkanHPP ErrorOutOfDateKHR exceptions by using alternate API.
  auto const result = queue.presentKHR(&present_info);
  m_image_index.reset();
  return !needs_recreation(result);
}

It is the responsibility of the user (class App) to recreate the Swapchain on receiving std::nullopt / false return values for either operation. Users will also need to transition the layouts of the returned images between acquire and present operations. Add a helper to assist in that process, and extract the Image Subresource Range out as a common constant:

constexpr auto subresource_range_v = [] {
  auto ret = vk::ImageSubresourceRange{};
  // this is a color image with 1 layer and 1 mip-level (the default).
  ret.setAspectMask(vk::ImageAspectFlagBits::eColor)
    .setLayerCount(1)
    .setLevelCount(1);
  return ret;
}();

// ...
auto Swapchain::base_barrier() const -> vk::ImageMemoryBarrier2 {
  // fill up the parts common to all barriers.
  auto ret = vk::ImageMemoryBarrier2{};
  ret.setImage(m_images.at(m_image_index.value()))
    .setSubresourceRange(subresource_range_v)
    .setSrcQueueFamilyIndex(m_gpu.queue_family)
    .setDstQueueFamilyIndex(m_gpu.queue_family);
  return ret;
}

Dynamic Rendering

Dynamic Rendering enables us to avoid using Render Passes, which are quite a bit more verbose (but also generally more performant on tiled GPUs). Here we tie together the Swapchain, Render Sync, and rendering. We are not ready to actually render anything yet, but can clear the image to a particular color.

Add these new members to App:

auto acquire_render_target() -> bool;
auto begin_frame() -> vk::CommandBuffer;
void transition_for_render(vk::CommandBuffer command_buffer) const;
void render(vk::CommandBuffer command_buffer);
void transition_for_present(vk::CommandBuffer command_buffer) const;
void submit_and_present();

// ...
glm::ivec2 m_framebuffer_size{};
std::optional<RenderTarget> m_render_target{};

The main loop can now use these to implement the Swapchain and rendering loop:

while (glfwWindowShouldClose(m_window.get()) == GLFW_FALSE) {
  glfwPollEvents();
  if (!acquire_render_target()) { continue; }
  auto const command_buffer = begin_frame();
  transition_for_render(command_buffer);
  render(command_buffer);
  transition_for_present(command_buffer);
  submit_and_present();
}

Before acquiring a Swapchain image, we need to wait for the current frame's fence. If acquisition is successful, reset the fence ('un'signal it):

auto App::acquire_render_target() -> bool {
  m_framebuffer_size = glfw::framebuffer_size(m_window.get());
  // minimized? skip loop.
  if (m_framebuffer_size.x <= 0 || m_framebuffer_size.y <= 0) {
    return false;
  }

  auto& render_sync = m_render_sync.at(m_frame_index);

  // wait for the fence to be signaled.
  static constexpr auto fence_timeout_v =
    static_cast<std::uint64_t>(std::chrono::nanoseconds{3s}.count());
  auto result =
    m_device->waitForFences(*render_sync.drawn, vk::True, fence_timeout_v);
  if (result != vk::Result::eSuccess) {
    throw std::runtime_error{"Failed to wait for Render Fence"};
  }

  m_render_target = m_swapchain->acquire_next_image(*render_sync.draw);
  if (!m_render_target) {
    // acquire failure => ErrorOutOfDate. Recreate Swapchain.
    m_swapchain->recreate(m_framebuffer_size);
    return false;
  }

  // reset fence _after_ acquisition of image: if it fails, the
  // fence remains signaled.
  m_device->resetFences(*render_sync.drawn);

  return true;
}

Since the fence has been reset, a queue submission must be made that signals it before continuing, otherwise the app will deadlock on the next wait (and eventually throw after 3s). Begin Command Buffer recording:

auto App::begin_frame() -> vk::CommandBuffer {
  auto const& render_sync = m_render_sync.at(m_frame_index);

  auto command_buffer_bi = vk::CommandBufferBeginInfo{};
  // this flag means recorded commands will not be reused.
  command_buffer_bi.setFlags(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
  render_sync.command_buffer.begin(command_buffer_bi);
  return render_sync.command_buffer;
}

Transition the image for rendering, ie Attachment Optimal layout. Set up the image barrier and record it:

void App::transition_for_render(vk::CommandBuffer const command_buffer) const {
  auto dependency_info = vk::DependencyInfo{};
  auto barrier = m_swapchain->base_barrier();
  // Undefined => AttachmentOptimal
  // the barrier must wait for prior color attachment operations to complete,
  // and block subsequent ones.
  barrier.setOldLayout(vk::ImageLayout::eUndefined)
    .setNewLayout(vk::ImageLayout::eAttachmentOptimal)
    .setSrcAccessMask(vk::AccessFlagBits2::eColorAttachmentRead |
              vk::AccessFlagBits2::eColorAttachmentWrite)
    .setSrcStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput)
    .setDstAccessMask(barrier.srcAccessMask)
    .setDstStageMask(barrier.srcStageMask);
  dependency_info.setImageMemoryBarriers(barrier);
  command_buffer.pipelineBarrier2(dependency_info);
}

Create a Rendering Attachment Info using the acquired image as the color target. We use a red clear color, make sure the Load Op clears the image, and Store Op stores the results (currently just the cleared image). Set up a Rendering Info object with the color attachment and the entire image as the render area. Finally, execute the render:

void App::render(vk::CommandBuffer const command_buffer) {
  auto color_attachment = vk::RenderingAttachmentInfo{};
  color_attachment.setImageView(m_render_target->image_view)
    .setImageLayout(vk::ImageLayout::eAttachmentOptimal)
    .setLoadOp(vk::AttachmentLoadOp::eClear)
    .setStoreOp(vk::AttachmentStoreOp::eStore)
    // temporarily red.
    .setClearValue(vk::ClearColorValue{1.0f, 0.0f, 0.0f, 1.0f});
  auto rendering_info = vk::RenderingInfo{};
  auto const render_area =
    vk::Rect2D{vk::Offset2D{}, m_render_target->extent};
  rendering_info.setRenderArea(render_area)
    .setColorAttachments(color_attachment)
    .setLayerCount(1);

  command_buffer.beginRendering(rendering_info);
  // draw stuff here.
  command_buffer.endRendering();
}

Transition the image for presentation:

void App::transition_for_present(vk::CommandBuffer const command_buffer) const {
  auto dependency_info = vk::DependencyInfo{};
  auto barrier = m_swapchain->base_barrier();
  // AttachmentOptimal => PresentSrc
  // the barrier must wait for prior color attachment operations to complete,
  // and block subsequent ones.
  barrier.setOldLayout(vk::ImageLayout::eAttachmentOptimal)
    .setNewLayout(vk::ImageLayout::ePresentSrcKHR)
    .setSrcAccessMask(vk::AccessFlagBits2::eColorAttachmentRead |
              vk::AccessFlagBits2::eColorAttachmentWrite)
    .setSrcStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput)
    .setDstAccessMask(barrier.srcAccessMask)
    .setDstStageMask(barrier.srcStageMask);
  dependency_info.setImageMemoryBarriers(barrier);
  command_buffer.pipelineBarrier2(dependency_info);
}

End the command buffer and submit it. The draw Semaphore will be signaled by the Swapchain when the image is ready, which will trigger this command buffer's execution. It will signal the present Semaphore and drawn Fence on completion, with the latter being waited on the next time this virtual frame is processed. Finally, we increment the frame index, pass the present semaphore as the one for the subsequent present operation to wait on:

void App::submit_and_present() {
  auto const& render_sync = m_render_sync.at(m_frame_index);
  render_sync.command_buffer.end();

  auto submit_info = vk::SubmitInfo2{};
  auto const command_buffer_info =
    vk::CommandBufferSubmitInfo{render_sync.command_buffer};
  auto wait_semaphore_info = vk::SemaphoreSubmitInfo{};
  wait_semaphore_info.setSemaphore(*render_sync.draw)
    .setStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput);
  auto signal_semaphore_info = vk::SemaphoreSubmitInfo{};
  signal_semaphore_info.setSemaphore(*render_sync.present)
    .setStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput);
  submit_info.setCommandBufferInfos(command_buffer_info)
    .setWaitSemaphoreInfos(wait_semaphore_info)
    .setSignalSemaphoreInfos(signal_semaphore_info);
  m_queue.submit2(submit_info, *render_sync.drawn);

  m_frame_index = (m_frame_index + 1) % m_render_sync.size();
  m_render_target.reset();

  // an eErrorOutOfDateKHR result is not guaranteed if the
  // framebuffer size does not match the Swapchain image size, check it
  // explicitly.
  auto const fb_size_changed = m_framebuffer_size != m_swapchain->get_size();
  auto const out_of_date =
    !m_swapchain->present(m_queue, *render_sync.present);
  if (fb_size_changed || out_of_date) {
    m_swapchain->recreate(m_framebuffer_size);
  }
}

Wayland users: congratulaions, you can finally see and interact with the window!

Cleared Image

Render Doc on Wayland

At the time of writing, RenderDoc doesn't support inspecting Wayland applications. Temporarily force X11 (XWayland) by calling glfwInitHint() before glfwInit():

glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);

Setting up a command line option to conditionally call this is a simple and flexible approach: just set that argument in RenderDoc itself and/or pass it whenever an X11 backend is desired:

// main.cpp
// skip the first argument.
auto args = std::span{argv, static_cast<std::size_t>(argc)}.subspan(1);
while (!args.empty()) {
  auto const arg = std::string_view{args.front()};
  if (arg == "-x" || arg == "--force-x11") {
    glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);
  }
  args = args.subspan(1);
}
lvk::App{}.run();

Dear ImGui

Dear ImGui does not have native CMake support, and while adding the sources to the executable is an option, we will add it as an external library target: imgui to isolate it (and compile warnings etc) from our own code. This requires some changes to the ext target structure, since imgui will itself need to link to GLFW and Vulkan-Headers, have VK_NO_PROTOTYPES defined, etc. learn-vk-ext then links to imgui and any other libraries (currently only glm). We are using Dear ImGui v1.91.9, which has decent support for Dynamic Rendering.

class DearImGui

Dear ImGui has its own initialization and loop, which we encapsulate into class DearImGui:

struct DearImGuiCreateInfo {
  GLFWwindow* window{};
  std::uint32_t api_version{};
  vk::Instance instance{};
  vk::PhysicalDevice physical_device{};
  std::uint32_t queue_family{};
  vk::Device device{};
  vk::Queue queue{};
  vk::Format color_format{}; // single color attachment.
  vk::SampleCountFlagBits samples{};
};

class DearImGui {
  public:
  using CreateInfo = DearImGuiCreateInfo;

  explicit DearImGui(CreateInfo const& create_info);

  void new_frame();
  void end_frame();
  void render(vk::CommandBuffer command_buffer) const;

  private:
  enum class State : std::int8_t { Ended, Begun };

  struct Deleter {
    void operator()(vk::Device device) const;
  };

  State m_state{};

  Scoped<vk::Device, Deleter> m_device{};
};

In the constructor, we start by creating the ImGui Context, loading Vulkan functions, and initializing GLFW for Vulkan:

IMGUI_CHECKVERSION();
ImGui::CreateContext();

static auto const load_vk_func = +[](char const* name, void* user_data) {
  return VULKAN_HPP_DEFAULT_DISPATCHER.vkGetInstanceProcAddr(
    *static_cast<vk::Instance*>(user_data), name);
};
auto instance = create_info.instance;
ImGui_ImplVulkan_LoadFunctions(create_info.api_version, load_vk_func,
                  &instance);

if (!ImGui_ImplGlfw_InitForVulkan(create_info.window, true)) {
  throw std::runtime_error{"Failed to initialize Dear ImGui"};
}

Then initialize Dear ImGui for Vulkan:

auto init_info = ImGui_ImplVulkan_InitInfo{};
init_info.ApiVersion = create_info.api_version;
init_info.Instance = create_info.instance;
init_info.PhysicalDevice = create_info.physical_device;
init_info.Device = create_info.device;
init_info.QueueFamily = create_info.queue_family;
init_info.Queue = create_info.queue;
init_info.MinImageCount = 2;
init_info.ImageCount = static_cast<std::uint32_t>(resource_buffering_v);
init_info.MSAASamples =
  static_cast<VkSampleCountFlagBits>(create_info.samples);
init_info.DescriptorPoolSize = 2;
auto pipline_rendering_ci = vk::PipelineRenderingCreateInfo{};
pipline_rendering_ci.setColorAttachmentCount(1).setColorAttachmentFormats(
  create_info.color_format);
init_info.PipelineRenderingCreateInfo = pipline_rendering_ci;
init_info.UseDynamicRendering = true;
if (!ImGui_ImplVulkan_Init(&init_info)) {
  throw std::runtime_error{"Failed to initialize Dear ImGui"};
}
ImGui_ImplVulkan_CreateFontsTexture();

Since we are using an sRGB format and Dear ImGui is not color-space aware, we need to convert its style colors to linear space (so that they shift back to the original values by gamma correction):

ImGui::StyleColorsDark();
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-array-to-pointer-decay)
for (auto& colour : ImGui::GetStyle().Colors) {
  auto const linear = glm::convertSRGBToLinear(
    glm::vec4{colour.x, colour.y, colour.z, colour.w});
  colour = ImVec4{linear.x, linear.y, linear.z, linear.w};
}
ImGui::GetStyle().Colors[ImGuiCol_WindowBg].w = 0.99f; // more opaque

Finally, create the deleter and its implementation:

m_device = Scoped<vk::Device, Deleter>{create_info.device};

// ...
void DearImGui::Deleter::operator()(vk::Device const device) const {
  device.waitIdle();
  ImGui_ImplVulkan_DestroyFontsTexture();
  ImGui_ImplVulkan_Shutdown();
  ImGui_ImplGlfw_Shutdown();
  ImGui::DestroyContext();
}

The remaining functions are straightforward:

void DearImGui::new_frame() {
  if (m_state == State::Begun) { end_frame(); }
  ImGui_ImplGlfw_NewFrame();
  ImGui_ImplVulkan_NewFrame();
  ImGui::NewFrame();
  m_state = State::Begun;
}

void DearImGui::end_frame() {
  if (m_state == State::Ended) { return; }
  ImGui::Render();
  m_state = State::Ended;
}

// NOLINTNEXTLINE(readability-convert-member-functions-to-static)
void DearImGui::render(vk::CommandBuffer const command_buffer) const {
  auto* data = ImGui::GetDrawData();
  if (data == nullptr) { return; }
  ImGui_ImplVulkan_RenderDrawData(data, command_buffer);
}

ImGui Integration

Update Swapchain to expose its image format:

[[nodiscard]] auto get_format() const -> vk::Format {
  return m_ci.imageFormat;
}

class App can now store a std::optional<DearImGui> member and add/call its create function:

void App::create_imgui() {
  auto const imgui_ci = DearImGui::CreateInfo{
    .window = m_window.get(),
    .api_version = vk_version_v,
    .instance = *m_instance,
    .physical_device = m_gpu.device,
    .queue_family = m_gpu.queue_family,
    .device = *m_device,
    .queue = m_queue,
    .color_format = m_swapchain->get_format(),
    .samples = vk::SampleCountFlagBits::e1,
  };
  m_imgui.emplace(imgui_ci);
}

Start a new ImGui frame after resetting the render fence, and show the demo window:

m_device->resetFences(*render_sync.drawn);
m_imgui->new_frame();

// ...
command_buffer.beginRendering(rendering_info);
ImGui::ShowDemoWindow();
// draw stuff here.
command_buffer.endRendering();

ImGui doesn't draw anything here (the actual draw command requires the Command Buffer), it's just a good customization point for all higher level logic.

We use a separate render pass for Dear ImGui, again for isolation, and to enable us to change the main render pass later, eg by adding a depth buffer attachment (DearImGui is setup assuming its render pass will only use a single color attachment).

m_imgui->end_frame();
// we don't want to clear the image again, instead load it intact after the
// previous pass.
color_attachment.setLoadOp(vk::AttachmentLoadOp::eLoad);
rendering_info.setColorAttachments(color_attachment)
  .setPDepthAttachment(nullptr);
command_buffer.beginRendering(rendering_info);
m_imgui->render(command_buffer);
command_buffer.endRendering();

ImGui Demo

Shader Objects

A Vulkan Graphics Pipeline is a large object that encompasses the entire graphics pipeline. It consists of many stages - all this happens during a single draw() call. There is however an extension called VK_EXT_shader_object which enables avoiding graphics pipelines entirely. Almost all pipeline state becomes dynamic, ie set at draw time, and the only Vulkan handles to own are ShaderEXT objects. For a comprehensive guide, check out the Vulkan Sample from Khronos.

Vulkan requires shader code to be provided as SPIR-V (IR). We shall use glslc (part of the Vulkan SDK) to compile GLSL to SPIR-V manually when required.

Locating Assets

Before we can use shaders, we need to load them as asset/data files. To do that correctly, first the asset directory needs to be located. There are a few ways to go about this, we will use the approach of looking for a particular subdirectory, starting from the working directory and walking up the parent directory tree. This enables app in any project/build subdirectory to locate assets/ in the various examples below:

.
|-- assets/
|-- app
|-- build/
    |-- app
|-- out/
    |-- default/Release/
        |-- app
    |-- ubsan/Debug/
        |-- app

In a release package you would want to use the path to the executable instead (and probably not perform an "upfind" walk), the working directory could be anywhere whereas assets shipped with the package will be in the vicinity of the executable.

Assets Directory

Add a member to App to store this path to assets/:

namespace fs = std::filesystem;

// ...
fs::path m_assets_dir{};

Add a helper function to locate the assets dir, and assign m_assets_dir to its return value at the top of run():

[[nodiscard]] auto locate_assets_dir() -> fs::path {
  // look for '<path>/assets/', starting from the working
  // directory and walking up the parent directory tree.
  static constexpr std::string_view dir_name_v{"assets"};
  for (auto path = fs::current_path();
     !path.empty() && path.has_parent_path(); path = path.parent_path()) {
    auto ret = path / dir_name_v;
    if (fs::is_directory(ret)) { return ret; }
  }
  std::println("[lvk] Warning: could not locate '{}' directory", dir_name_v);
  return fs::current_path();
}

// ...
m_assets_dir = locate_assets_dir();

Shader Program

To use Shader Objects we need to enable the corresponding feature and extension during device creation:

auto shader_object_feature =
  vk::PhysicalDeviceShaderObjectFeaturesEXT{vk::True};
dynamic_rendering_feature.setPNext(&shader_object_feature);

// ...
// we need two device extensions: Swapchain and Shader Object.
static constexpr auto extensions_v = std::array{
  VK_KHR_SWAPCHAIN_EXTENSION_NAME,
  "VK_EXT_shader_object",
};

Emulation Layer

It's possible device creation now fails because the driver or physical device does not support VK_EXT_shader_object (especially likely with Intel). Vulkan SDK provides a layer that implements this extension: VK_LAYER_KHRONOS_shader_object. Adding this layer to the Instance Create Info should unblock usage of this feature:

// ...
// add the Shader Object emulation layer.
static constexpr auto layers_v = std::array{
  "VK_LAYER_KHRONOS_shader_object",
};
instance_ci.setPEnabledLayerNames(layers_v);

m_instance = vk::createInstanceUnique(instance_ci);
// ...
This layer is not part of standard Vulkan driver installs, you must package the layer with the application for it to run on environments without Vulkan SDK / Vulkan Configurator. Read more here.

Since desired layers may not be available, we can set up a defensive check:

[[nodiscard]] auto get_layers(std::span<char const* const> desired)
  -> std::vector<char const*> {
  auto ret = std::vector<char const*>{};
  ret.reserve(desired.size());
  auto const available = vk::enumerateInstanceLayerProperties();
  for (char const* layer : desired) {
    auto const pred = [layer = std::string_view{layer}](
                vk::LayerProperties const& properties) {
      return properties.layerName == layer;
    };
    if (std::ranges::find_if(available, pred) == available.end()) {
      std::println("[lvk] [WARNING] Vulkan Layer '{}' not found", layer);
      continue;
    }
    ret.push_back(layer);
  }
  return ret;
}

// ...
auto const layers = get_layers(layers_v);
instance_ci.setPEnabledLayerNames(layers);

class ShaderProgram

We will encapsulate both vertex and fragment shaders into a single ShaderProgram, which will also bind the shaders before a draw, and expose/set various dynamic states.

In shader_program.hpp, first add a ShaderProgramCreateInfo struct:

struct ShaderProgramCreateInfo {
  vk::Device device;
  std::span<std::uint32_t const> vertex_spirv;
  std::span<std::uint32_t const> fragment_spirv;
  std::span<vk::DescriptorSetLayout const> set_layouts;
};

Descriptor Sets and their Layouts will be covered later.

Start with a skeleton definition:

class ShaderProgram {
 public:
  using CreateInfo = ShaderProgramCreateInfo;

  explicit ShaderProgram(CreateInfo const& create_info);

 private:
  std::vector<vk::UniqueShaderEXT> m_shaders{};

  ScopedWaiter m_waiter{};
};

The definition of the constructor is fairly straightforward:

ShaderProgram::ShaderProgram(CreateInfo const& create_info) {
  auto const create_shader_ci =
    [&create_info](std::span<std::uint32_t const> spirv) {
      auto ret = vk::ShaderCreateInfoEXT{};
      ret.setCodeSize(spirv.size_bytes())
        .setPCode(spirv.data())
        // set common parameters.
        .setSetLayouts(create_info.set_layouts)
        .setCodeType(vk::ShaderCodeTypeEXT::eSpirv)
        .setPName("main");
      return ret;
    };

  auto shader_cis = std::array{
    create_shader_ci(create_info.vertex_spirv),
    create_shader_ci(create_info.fragment_spirv),
  };
  shader_cis[0]
    .setStage(vk::ShaderStageFlagBits::eVertex)
    .setNextStage(vk::ShaderStageFlagBits::eFragment);
  shader_cis[1].setStage(vk::ShaderStageFlagBits::eFragment);

  auto result = create_info.device.createShadersEXTUnique(shader_cis);
  if (result.result != vk::Result::eSuccess) {
    throw std::runtime_error{"Failed to create Shader Objects"};
  }
  m_shaders = std::move(result.value);
  m_waiter = create_info.device;
}

Expose some dynamic states via public members:

static constexpr auto color_blend_equation_v = [] {
  auto ret = vk::ColorBlendEquationEXT{};
  ret.setColorBlendOp(vk::BlendOp::eAdd)
    // standard alpha blending:
    // (alpha * src) + (1 - alpha) * dst
    .setSrcColorBlendFactor(vk::BlendFactor::eSrcAlpha)
    .setDstColorBlendFactor(vk::BlendFactor::eOneMinusSrcAlpha);
  return ret;
}();

// ...
vk::PrimitiveTopology topology{vk::PrimitiveTopology::eTriangleList};
vk::PolygonMode polygon_mode{vk::PolygonMode::eFill};
float line_width{1.0f};
vk::ColorBlendEquationEXT color_blend_equation{color_blend_equation_v};
vk::CompareOp depth_compare_op{vk::CompareOp::eLessOrEqual};

Encapsulate booleans into bit flags:

// bit flags for various binary states.
enum : std::uint8_t {
  None = 0,
  AlphaBlend = 1 << 0, // turn on alpha blending.
  DepthTest = 1 << 1,  // turn on depth write and test.
};

// ...
static constexpr auto flags_v = AlphaBlend | DepthTest;

// ...
std::uint8_t flags{flags_v};

There is one more piece of pipeline state needed: vertex input. We will consider this to be constant per shader and store it in the constructor:

// shader_program.hpp

// vertex attributes and bindings.
struct ShaderVertexInput {
  std::span<vk::VertexInputAttributeDescription2EXT const> attributes{};
  std::span<vk::VertexInputBindingDescription2EXT const> bindings{};
};

struct ShaderProgramCreateInfo {
  // ...
  ShaderVertexInput vertex_input{};
  // ...
};

class ShaderProgram {
  // ...
  ShaderVertexInput m_vertex_input{};
  std::vector<vk::UniqueShaderEXT> m_shaders{};
  // ...
};

// shader_program.cpp
ShaderProgram::ShaderProgram(CreateInfo const& create_info)
  : m_vertex_input(create_info.vertex_input) {
  // ...
}

The API to bind will take the command buffer and the framebuffer size (to set the viewport and scissor):

void bind(vk::CommandBuffer command_buffer,
          glm::ivec2 framebuffer_size) const;

Add helper member functions and implement bind() by calling them in succession:

static void set_viewport_scissor(vk::CommandBuffer command_buffer,
                                 glm::ivec2 framebuffer);
static void set_static_states(vk::CommandBuffer command_buffer);
void set_common_states(vk::CommandBuffer command_buffer) const;
void set_vertex_states(vk::CommandBuffer command_buffer) const;
void set_fragment_states(vk::CommandBuffer command_buffer) const;
void bind_shaders(vk::CommandBuffer command_buffer) const;

// ...
void ShaderProgram::bind(vk::CommandBuffer const command_buffer,
                         glm::ivec2 const framebuffer_size) const {
  set_viewport_scissor(command_buffer, framebuffer_size);
  set_static_states(command_buffer);
  set_common_states(command_buffer);
  set_vertex_states(command_buffer);
  set_fragment_states(command_buffer);
  bind_shaders(command_buffer);
}

Implementations are long but straightforward:

namespace {
constexpr auto to_vkbool(bool const value) {
  return value ? vk::True : vk::False;
}
} // namespace

// ...
void ShaderProgram::set_viewport_scissor(vk::CommandBuffer const command_buffer,
                     glm::ivec2 const framebuffer_size) {
  auto const fsize = glm::vec2{framebuffer_size};
  auto viewport = vk::Viewport{};
  // flip the viewport about the X-axis (negative height):
  // https://www.saschawillems.de/blog/2019/03/29/flipping-the-vulkan-viewport/
  viewport.setX(0.0f).setY(fsize.y).setWidth(fsize.x).setHeight(-fsize.y);
  command_buffer.setViewportWithCount(viewport);

  auto const usize = glm::uvec2{framebuffer_size};
  auto const scissor =
    vk::Rect2D{vk::Offset2D{}, vk::Extent2D{usize.x, usize.y}};
  command_buffer.setScissorWithCount(scissor);
}

void ShaderProgram::set_static_states(vk::CommandBuffer const command_buffer) {
  command_buffer.setRasterizerDiscardEnable(vk::False);
  command_buffer.setRasterizationSamplesEXT(vk::SampleCountFlagBits::e1);
  command_buffer.setSampleMaskEXT(vk::SampleCountFlagBits::e1, 0xff);
  command_buffer.setAlphaToCoverageEnableEXT(vk::False);
  command_buffer.setCullMode(vk::CullModeFlagBits::eNone);
  command_buffer.setFrontFace(vk::FrontFace::eCounterClockwise);
  command_buffer.setDepthBiasEnable(vk::False);
  command_buffer.setStencilTestEnable(vk::False);
  command_buffer.setPrimitiveRestartEnable(vk::False);
  command_buffer.setColorWriteMaskEXT(0, ~vk::ColorComponentFlags{});
}

void ShaderProgram::set_common_states(
  vk::CommandBuffer const command_buffer) const {
  auto const depth_test = to_vkbool((flags & DepthTest) == DepthTest);
  command_buffer.setDepthWriteEnable(depth_test);
  command_buffer.setDepthTestEnable(depth_test);
  command_buffer.setDepthCompareOp(depth_compare_op);
  command_buffer.setPolygonModeEXT(polygon_mode);
  command_buffer.setLineWidth(line_width);
}

void ShaderProgram::set_vertex_states(
  vk::CommandBuffer const command_buffer) const {
  command_buffer.setVertexInputEXT(m_vertex_input.bindings,
                   m_vertex_input.attributes);
  command_buffer.setPrimitiveTopology(topology);
}

void ShaderProgram::set_fragment_states(
  vk::CommandBuffer const command_buffer) const {
  auto const alpha_blend = to_vkbool((flags & AlphaBlend) == AlphaBlend);
  command_buffer.setColorBlendEnableEXT(0, alpha_blend);
  command_buffer.setColorBlendEquationEXT(0, color_blend_equation);
}

void ShaderProgram::bind_shaders(vk::CommandBuffer const command_buffer) const {
  static constexpr auto stages_v = std::array{
    vk::ShaderStageFlagBits::eVertex,
    vk::ShaderStageFlagBits::eFragment,
  };
  auto const shaders = std::array{
    *m_shaders[0],
    *m_shaders[1],
  };
  command_buffer.bindShadersEXT(stages_v, shaders);
}

GLSL to SPIR-V

Shaders work in NDC space: -1 to +1 for X and Y. We output a triangle's coordinates in a new vertex shader and save it to src/glsl/shader.vert:

#version 450 core

void main() {
  const vec2 positions[] = {
    vec2(-0.5, -0.5),
    vec2(0.5, -0.5),
    vec2(0.0, 0.5),
  };

  const vec2 position = positions[gl_VertexIndex];

  gl_Position = vec4(position, 0.0, 1.0);
}

The fragment shader just outputs white for now, in src/glsl/shader.frag:

#version 450 core

layout (location = 0) out vec4 out_color;

void main() {
  out_color = vec4(1.0);
}

Compile both shaders into assets/:

glslc src/glsl/shader.vert -o assets/shader.vert
glslc src/glsl/shader.frag -o assets/shader.frag

glslc is part of the Vulkan SDK.

Loading SPIR-V

SPIR-V shaders are binary files with a stride/alignment of 4 bytes. As we have seen, the Vulkan API accepts a span of std::uint32_ts, so we need to load it into such a buffer (and not std::vector<std::byte> or other 1-byte equivalents). Add a helper function in app.cpp:

[[nodiscard]] auto to_spir_v(fs::path const& path)
  -> std::vector<std::uint32_t> {
  // open the file at the end, to get the total size.
  auto file = std::ifstream{path, std::ios::binary | std::ios::ate};
  if (!file.is_open()) {
    throw std::runtime_error{
      std::format("Failed to open file: '{}'", path.generic_string())};
  }

  auto const size = file.tellg();
  auto const usize = static_cast<std::uint64_t>(size);
  // file data must be uint32 aligned.
  if (usize % sizeof(std::uint32_t) != 0) {
    throw std::runtime_error{std::format("Invalid SPIR-V size: {}", usize)};
  }

  // seek to the beginning before reading.
  file.seekg({}, std::ios::beg);
  auto ret = std::vector<std::uint32_t>{};
  ret.resize(usize / sizeof(std::uint32_t));
  void* data = ret.data();
  file.read(static_cast<char*>(data), size);
  return ret;
}

Drawing a Triangle

Add a ShaderProgram to App and its create function:

[[nodiscard]] auto asset_path(std::string_view uri) const -> fs::path;

// ...
void create_shader();

// ...
std::optional<ShaderProgram> m_shader{};

Implement and call create_shader() (and asset_path()):

void App::create_shader() {
  auto const vertex_spirv = to_spir_v(asset_path("shader.vert"));
  auto const fragment_spirv = to_spir_v(asset_path("shader.frag"));
  auto const shader_ci = ShaderProgram::CreateInfo{
    .device = *m_device,
    .vertex_spirv = vertex_spirv,
    .fragment_spirv = fragment_spirv,
    .vertex_input = {},
    .set_layouts = {},
  };
  m_shader.emplace(shader_ci);
}

auto App::asset_path(std::string_view const uri) const -> fs::path {
  return m_assets_dir / uri;
}

Before render() grows to an unwieldy size, extract the higher level logic into two member functions:

// ImGui code goes here.
void inspect();
// Issue draw calls here.
void draw(vk::CommandBuffer command_buffer) const;

// ...
void App::inspect() {
  ImGui::ShowDemoWindow();
  // TODO
}

// ...
command_buffer.beginRendering(rendering_info);
inspect();
draw(command_buffer);
command_buffer.endRendering();

We can now bind the shader and use it to draw the triangle in the shader. Making draw() const forces us to ensure no App state is changed:

void App::draw(vk::CommandBuffer const command_buffer) const {
  m_shader->bind(command_buffer, m_framebuffer_size);
  // current shader has hard-coded logic for 3 vertices.
  command_buffer.draw(3, 1, 0, 0);
}

White Triangle

Updating the shaders to use interpolated RGB on each vertex:

// shader.vert

layout (location = 0) out vec3 out_color;

// ...
const vec3 colors[] = {
  vec3(1.0, 0.0, 0.0),
  vec3(0.0, 1.0, 0.0),
  vec3(0.0, 0.0, 1.0),
};

// ...
out_color = colors[gl_VertexIndex];

// shader.frag

layout (location = 0) in vec3 in_color;

// ...
out_color = vec4(in_color, 1.0);

Make sure to recompile both the SPIR-V shaders in assets/.

And a black clear color:

// ...
.setClearValue(vk::ClearColorValue{0.0f, 0.0f, 0.0f, 1.0f});

Gives us the renowned Vulkan sRGB triangle:

sRGB Triangle

Modifying Dynamic State

We can use an ImGui window to inspect / tweak some pipeline state:

ImGui::SetNextWindowSize({200.0f, 100.0f}, ImGuiCond_Once);
if (ImGui::Begin("Inspect")) {
  if (ImGui::Checkbox("wireframe", &m_wireframe)) {
    m_shader->polygon_mode =
      m_wireframe ? vk::PolygonMode::eLine : vk::PolygonMode::eFill;
  }
  if (m_wireframe) {
    auto const& line_width_range =
      m_gpu.properties.limits.lineWidthRange;
    ImGui::SetNextItemWidth(100.0f);
    ImGui::DragFloat("line width", &m_shader->line_width, 0.25f,
              line_width_range[0], line_width_range[1]);
  }
}
ImGui::End();

sRGB Triangle (wireframe)

Graphics Pipelines

This page describes the usage of Graphics Pipelines instead of Shader Objects. While the guide assumes Shader Object usage, not much should change in the rest of the code if you instead choose to use Graphics Pipelines. A notable exception is the setup of Descriptor Set Layouts: with pipelines it needs to be specified as part of the Pipeline Layout, whereas with Shader Objects it is part of each ShaderEXT's CreateInfo.

Pipeline State

Most dynamic state with Shader Objects is static with pipelines: specified at pipeline creation time. Pipelines also require additional parameters, like attachment formats and sample count: these will be considered constant and stored in the builder later. Expose a subset of dynamic states through a struct:

// bit flags for various binary Pipeline States.
struct PipelineFlag {
  enum : std::uint8_t {
    None = 0,
    AlphaBlend = 1 << 0, // turn on alpha blending.
    DepthTest = 1 << 1,  // turn on depth write and test.
  };
};

// specification of a unique Graphics Pipeline.
struct PipelineState {
  using Flag = PipelineFlag;

  [[nodiscard]] static constexpr auto default_flags() -> std::uint8_t {
    return Flag::AlphaBlend | Flag::DepthTest;
  }

  vk::ShaderModule vertex_shader;   // required.
  vk::ShaderModule fragment_shader; // required.

  std::span<vk::VertexInputAttributeDescription const> vertex_attributes{};
  std::span<vk::VertexInputBindingDescription const> vertex_bindings{};

  vk::PrimitiveTopology topology{vk::PrimitiveTopology::eTriangleList};
  vk::PolygonMode polygon_mode{vk::PolygonMode::eFill};
  vk::CullModeFlags cull_mode{vk::CullModeFlagBits::eNone};
  vk::CompareOp depth_compare{vk::CompareOp::eLess};
  std::uint8_t flags{default_flags()};
};

Encapsulate building pipelines into a class:

struct PipelineBuilderCreateInfo {
  vk::Device device{};
  vk::SampleCountFlagBits samples{};
  vk::Format color_format{};
  vk::Format depth_format{};
};

class PipelineBuilder {
  public:
  using CreateInfo = PipelineBuilderCreateInfo;

  explicit PipelineBuilder(CreateInfo const& create_info)
    : m_info(create_info) {}

  [[nodiscard]] auto build(vk::PipelineLayout layout,
               PipelineState const& state) const
    -> vk::UniquePipeline;

  private:
  CreateInfo m_info{};
};

The implementation is quite verbose, splitting it into multiple functions helps a bit:

// single viewport and scissor.
constexpr auto viewport_state_v =
  vk::PipelineViewportStateCreateInfo({}, 1, {}, 1);

// these dynamic states are guaranteed to be available.
constexpr auto dynamic_states_v = std::array{
  vk::DynamicState::eViewport,
  vk::DynamicState::eScissor,
  vk::DynamicState::eLineWidth,
};

[[nodiscard]] auto create_shader_stages(vk::ShaderModule const vertex,
                    vk::ShaderModule const fragment) {
  // set vertex (0) and fragment (1) shader stages.
  auto ret = std::array<vk::PipelineShaderStageCreateInfo, 2>{};
  ret[0]
    .setStage(vk::ShaderStageFlagBits::eVertex)
    .setPName("main")
    .setModule(vertex);
  ret[1]
    .setStage(vk::ShaderStageFlagBits::eFragment)
    .setPName("main")
    .setModule(fragment);
  return ret;
}

[[nodiscard]] constexpr auto
create_depth_stencil_state(std::uint8_t flags,
               vk::CompareOp const depth_compare) {
  auto ret = vk::PipelineDepthStencilStateCreateInfo{};
  auto const depth_test =
    (flags & PipelineFlag::DepthTest) == PipelineFlag::DepthTest;
  ret.setDepthTestEnable(depth_test ? vk::True : vk::False)
    .setDepthCompareOp(depth_compare);
  return ret;
}

[[nodiscard]] constexpr auto
create_color_blend_attachment(std::uint8_t const flags) {
  auto ret = vk::PipelineColorBlendAttachmentState{};
  auto const alpha_blend =
    (flags & PipelineFlag::AlphaBlend) == PipelineFlag::AlphaBlend;
  using CCF = vk::ColorComponentFlagBits;
  ret.setColorWriteMask(CCF::eR | CCF::eG | CCF::eB | CCF::eA)
    .setBlendEnable(alpha_blend ? vk::True : vk::False)
    // standard alpha blending:
    // (alpha * src) + (1 - alpha) * dst
    .setSrcColorBlendFactor(vk::BlendFactor::eSrcAlpha)
    .setDstColorBlendFactor(vk::BlendFactor::eOneMinusSrcAlpha)
    .setColorBlendOp(vk::BlendOp::eAdd)
    .setSrcAlphaBlendFactor(vk::BlendFactor::eOne)
    .setDstAlphaBlendFactor(vk::BlendFactor::eZero)
    .setAlphaBlendOp(vk::BlendOp::eAdd);
  return ret;
}

// ...
auto PipelineBuilder::build(vk::PipelineLayout const layout,
              PipelineState const& state) const
  -> vk::UniquePipeline {
  auto const shader_stage_ci =
    create_shader_stages(state.vertex_shader, state.fragment_shader);

  auto vertex_input_ci = vk::PipelineVertexInputStateCreateInfo{};
  vertex_input_ci.setVertexAttributeDescriptions(state.vertex_attributes)
    .setVertexBindingDescriptions(state.vertex_bindings);

  auto multisample_state_ci = vk::PipelineMultisampleStateCreateInfo{};
  multisample_state_ci.setRasterizationSamples(m_info.samples)
    .setSampleShadingEnable(vk::False);

  auto const input_assembly_ci =
    vk::PipelineInputAssemblyStateCreateInfo{{}, state.topology};

  auto rasterization_state_ci = vk::PipelineRasterizationStateCreateInfo{};
  rasterization_state_ci.setPolygonMode(state.polygon_mode)
    .setCullMode(state.cull_mode);

  auto const depth_stencil_state_ci =
    create_depth_stencil_state(state.flags, state.depth_compare);

  auto const color_blend_attachment =
    create_color_blend_attachment(state.flags);
  auto color_blend_state_ci = vk::PipelineColorBlendStateCreateInfo{};
  color_blend_state_ci.setAttachments(color_blend_attachment);

  auto dynamic_state_ci = vk::PipelineDynamicStateCreateInfo{};
  dynamic_state_ci.setDynamicStates(dynamic_states_v);

  // Dynamic Rendering requires passing this in the pNext chain.
  auto rendering_ci = vk::PipelineRenderingCreateInfo{};
  // could be a depth-only pass, argument is span-like (notice the plural
  // `Formats()`), only set if not Undefined.
  if (m_info.color_format != vk::Format::eUndefined) {
    rendering_ci.setColorAttachmentFormats(m_info.color_format);
  }
  // single depth attachment format, ok to set to Undefined.
  rendering_ci.setDepthAttachmentFormat(m_info.depth_format);

  auto pipeline_ci = vk::GraphicsPipelineCreateInfo{};
  pipeline_ci.setLayout(layout)
    .setStages(shader_stage_ci)
    .setPVertexInputState(&vertex_input_ci)
    .setPViewportState(&viewport_state_v)
    .setPMultisampleState(&multisample_state_ci)
    .setPInputAssemblyState(&input_assembly_ci)
    .setPRasterizationState(&rasterization_state_ci)
    .setPDepthStencilState(&depth_stencil_state_ci)
    .setPColorBlendState(&color_blend_state_ci)
    .setPDynamicState(&dynamic_state_ci)
    .setPNext(&rendering_ci);

  auto ret = vk::Pipeline{};
  // use non-throwing API.
  if (m_info.device.createGraphicsPipelines({}, 1, &pipeline_ci, {}, &ret) !=
    vk::Result::eSuccess) {
    std::println(stderr, "[lvk] Failed to create Graphics Pipeline");
    return {};
  }

  return vk::UniquePipeline{ret, m_info.device};
}

App will need to store a builder, a Pipeline Layout, and the Pipeline(s):

std::optional<PipelineBuilder> m_pipeline_builder{};
vk::UniquePipelineLayout m_pipeline_layout{};
vk::UniquePipeline m_pipeline{};

// ...
void create_pipeline() {
  auto const vertex_spirv = to_spir_v(asset_path("shader.vert"));
  auto const fragment_spirv = to_spir_v(asset_path("shader.frag"));
  if (vertex_spirv.empty() || fragment_spirv.empty()) {
    throw std::runtime_error{"Failed to load shaders"};
  }

  auto pipeline_layout_ci = vk::PipelineLayoutCreateInfo{};
  pipeline_layout_ci.setSetLayouts({});
  m_pipeline_layout =
    m_device->createPipelineLayoutUnique(pipeline_layout_ci);

  auto const pipeline_builder_ci = PipelineBuilder::CreateInfo{
    .device = *m_device,
    .samples = vk::SampleCountFlagBits::e1,
    .color_format = m_swapchain->get_format(),
  };
  m_pipeline_builder.emplace(pipeline_builder_ci);

  auto vertex_ci = vk::ShaderModuleCreateInfo{};
  vertex_ci.setCode(vertex_spirv);
  auto fragment_ci = vk::ShaderModuleCreateInfo{};
  fragment_ci.setCode(fragment_spirv);

  auto const vertex_shader =
    m_device->createShaderModuleUnique(vertex_ci);
  auto const fragment_shader =
    m_device->createShaderModuleUnique(fragment_ci);
  auto const pipeline_state = PipelineState{
    .vertex_shader = *vertex_shader,
    .fragment_shader = *fragment_shader,
  };
  m_pipeline =
    m_pipeline_builder->build(*m_pipeline_layout, pipeline_state);
}

Finally, App::draw():

void draw(vk::CommandBuffer const command_buffer) const {
  command_buffer.bindPipeline(vk::PipelineBindPoint::eGraphics,
                *m_pipeline);
  auto viewport = vk::Viewport{};
  viewport.setX(0.0f)
    .setY(static_cast<float>(m_render_target->extent.height))
    .setWidth(static_cast<float>(m_render_target->extent.width))
    .setHeight(-viewport.y);
  command_buffer.setViewport(0, viewport);
  command_buffer.setScissor(0, vk::Rect2D{{}, m_render_target->extent});
  command_buffer.draw(3, 1, 0, 0);
}

Memory Allocation

Being an explicit API, allocating memory in Vulkan that can be used by the device is the application's responsibility. The specifics can get quite complicated, but as recommended by the spec, we shall simply defer all that to a library: Vulkan Memory Allocator (VMA).

Vulkan exposes two kinds of objects that use such allocated memory: Buffers and Images, VMA offers transparent support for both: we just have to allocate/free buffers and images through VMA instead of the device directly. Unlike memory allocation / object construction on the CPU, there are many more parameters (than say alignment and size) to provide for the creation of buffers and images. As you might have guessed, we shall constrain ourselves to a subset that's relevant for shader resources: vertex buffers, uniform/storage buffers, and texture images.

Vulkan Memory Allocator

VMA has full CMake support, but it is also a single-header library that requires users to "instantiate" it in a single translation unit. Isolating that into a wrapper library to minimize warning pollution etc, we create our own vma::vma target that compiles this source file:

// vk_mem_alloc.cpp
#define VMA_IMPLEMENTATION

#include <vk_mem_alloc.h>

Unlike VulkanHPP, VMA's interface is C only, thus we shall use our Scoped class template to wrap objects in RAII types. The first thing we need is a VmaAllocator, which is similar to a vk::Device or GLFWwindow*:

// vma.hpp
namespace lvk::vma {
struct Deleter {
  void operator()(VmaAllocator allocator) const noexcept;
};

using Allocator = Scoped<VmaAllocator, Deleter>;

[[nodiscard]] auto create_allocator(vk::Instance instance,
                                    vk::PhysicalDevice physical_device,
                                    vk::Device device) -> Allocator;
} // namespace lvk::vma

// vma.cpp
void Deleter::operator()(VmaAllocator allocator) const noexcept {
  vmaDestroyAllocator(allocator);
}

// ...
auto vma::create_allocator(vk::Instance const instance,
                           vk::PhysicalDevice const physical_device,
                           vk::Device const device) -> Allocator {
  auto const& dispatcher = VULKAN_HPP_DEFAULT_DISPATCHER;
  // need to zero initialize C structs, unlike VulkanHPP.
  auto vma_vk_funcs = VmaVulkanFunctions{};
  vma_vk_funcs.vkGetInstanceProcAddr = dispatcher.vkGetInstanceProcAddr;
  vma_vk_funcs.vkGetDeviceProcAddr = dispatcher.vkGetDeviceProcAddr;

  auto allocator_ci = VmaAllocatorCreateInfo{};
  allocator_ci.physicalDevice = physical_device;
  allocator_ci.device = device;
  allocator_ci.pVulkanFunctions = &vma_vk_funcs;
  allocator_ci.instance = instance;
  VmaAllocator ret{};
  auto const result = vmaCreateAllocator(&allocator_ci, &ret);
  if (result == VK_SUCCESS) { return ret; }

  throw std::runtime_error{"Failed to create Vulkan Memory Allocator"};
}

App stores and creates a vma::Allocator object:

// ...
vma::Allocator m_allocator{}; // anywhere between m_device and m_shader.

// ...
void App::create_allocator() {
  m_allocator = vma::create_allocator(*m_instance, m_gpu.device, *m_device);
}

Buffers

First add the RAII wrapper components for VMA buffers:

struct RawBuffer {
  [[nodiscard]] auto mapped_span() const -> std::span<std::byte> {
    return std::span{static_cast<std::byte*>(mapped), size};
  }

  auto operator==(RawBuffer const& rhs) const -> bool = default;

  VmaAllocator allocator{};
  VmaAllocation allocation{};
  vk::Buffer buffer{};
  vk::DeviceSize size{};
  void* mapped{};
};

struct BufferDeleter {
  void operator()(RawBuffer const& raw_buffer) const noexcept;
};

// ...
void BufferDeleter::operator()(RawBuffer const& raw_buffer) const noexcept {
  vmaDestroyBuffer(raw_buffer.allocator, raw_buffer.buffer,
                   raw_buffer.allocation);
}

Buffers can be backed by host (RAM) or device (VRAM) memory: the former is mappable and thus useful for data that changes every frame, latter is faster to access for the GPU but needs more complex methods to copy data to. Add the related types and a create function:

struct BufferCreateInfo {
  VmaAllocator allocator;
  vk::BufferUsageFlags usage;
  std::uint32_t queue_family;
};

enum class BufferMemoryType : std::int8_t { Host, Device };

[[nodiscard]] auto create_buffer(BufferCreateInfo const& create_info,
                                 BufferMemoryType memory_type,
                                 vk::DeviceSize size) -> Buffer;

// ...
auto vma::create_buffer(BufferCreateInfo const& create_info,
                        BufferMemoryType const memory_type,
                        vk::DeviceSize const size) -> Buffer {
  if (size == 0) {
    std::println(stderr, "Buffer cannot be 0-sized");
    return {};
  }

  auto allocation_ci = VmaAllocationCreateInfo{};
  allocation_ci.flags =
    VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT;
  auto usage = create_info.usage;
  if (memory_type == BufferMemoryType::Device) {
    allocation_ci.usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE;
    // device buffers need to support TransferDst.
    usage |= vk::BufferUsageFlagBits::eTransferDst;
  } else {
    allocation_ci.usage = VMA_MEMORY_USAGE_AUTO_PREFER_HOST;
    // host buffers can provide mapped memory.
    allocation_ci.flags |= VMA_ALLOCATION_CREATE_MAPPED_BIT;
  }

  auto buffer_ci = vk::BufferCreateInfo{};
  buffer_ci.setQueueFamilyIndices(create_info.queue_family)
    .setSize(size)
    .setUsage(usage);
  auto vma_buffer_ci = static_cast<VkBufferCreateInfo>(buffer_ci);

  VmaAllocation allocation{};
  VkBuffer buffer{};
  auto allocation_info = VmaAllocationInfo{};
  auto const result =
    vmaCreateBuffer(create_info.allocator, &vma_buffer_ci, &allocation_ci,
            &buffer, &allocation, &allocation_info);
  if (result != VK_SUCCESS) {
    std::println(stderr, "Failed to create VMA Buffer");
    return {};
  }

  return RawBuffer{
    .allocator = create_info.allocator,
    .allocation = allocation,
    .buffer = buffer,
    .size = size,
    .mapped = allocation_info.pMappedData,
  };
}

Vertex Buffer

The goal here is to move the hard-coded vertices in the shader to application code. For the time being we will use an ad-hoc Host vma::Buffer and focus more on the rest of the infrastructure like vertex attributes.

First add a new header, vertex.hpp:

struct Vertex {
  glm::vec2 position{};
  glm::vec3 color{1.0f};
};

// two vertex attributes: position at 0, color at 1.
constexpr auto vertex_attributes_v = std::array{
  // the format matches the type and layout of data: vec2 => 2x 32-bit floats.
  vk::VertexInputAttributeDescription2EXT{0, 0, vk::Format::eR32G32Sfloat,
                      offsetof(Vertex, position)},
  // vec3 => 3x 32-bit floats
  vk::VertexInputAttributeDescription2EXT{1, 0, vk::Format::eR32G32B32Sfloat,
                      offsetof(Vertex, color)},
};

// one vertex binding at location 0.
constexpr auto vertex_bindings_v = std::array{
  // we are using interleaved data with a stride of sizeof(Vertex).
  vk::VertexInputBindingDescription2EXT{0, sizeof(Vertex),
                      vk::VertexInputRate::eVertex, 1},
};

Add the vertex attributes and bindings to the Shader Create Info:

// ...
static constexpr auto vertex_input_v = ShaderVertexInput{
  .attributes = vertex_attributes_v,
  .bindings = vertex_bindings_v,
};
auto const shader_ci = ShaderProgram::CreateInfo{
  .device = *m_device,
  .vertex_spirv = vertex_spirv,
  .fragment_spirv = fragment_spirv,
  .vertex_input = vertex_input_v,
  .set_layouts = {},
};
// ...

With the vertex input defined, we can update the vertex shader and recompile it:

#version 450 core

layout (location = 0) in vec2 a_pos;
layout (location = 1) in vec3 a_color;

layout (location = 0) out vec3 out_color;

void main() {
  const vec2 position = a_pos;

  out_color = a_color;
  gl_Position = vec4(position, 0.0, 1.0);
}

Add a VBO (Vertex Buffer Object) member and create it:

void App::create_vertex_buffer() {
  // vertices moved from the shader.
  static constexpr auto vertices_v = std::array{
    Vertex{.position = {-0.5f, -0.5f}, .color = {1.0f, 0.0f, 0.0f}},
    Vertex{.position = {0.5f, -0.5f}, .color = {0.0f, 1.0f, 0.0f}},
    Vertex{.position = {0.0f, 0.5f}, .color = {0.0f, 0.0f, 1.0f}},
  };

  // we want to write vertices_v to a Host VertexBuffer.
  auto const buffer_ci = vma::BufferCreateInfo{
    .allocator = m_allocator.get(),
    .usage = vk::BufferUsageFlagBits::eVertexBuffer,
    .queue_family = m_gpu.queue_family,
  };
  m_vbo = vma::create_buffer(buffer_ci, vma::BufferMemoryType::Host,
                             sizeof(vertices_v));

  // host buffers have a memory-mapped pointer available to memcpy data to.
  std::memcpy(m_vbo.get().mapped, vertices_v.data(), sizeof(vertices_v));
}

Bind the VBO before recording the draw call:

// single VBO at binding 0 at no offset.
command_buffer.bindVertexBuffers(0, m_vbo->get_raw().buffer,
                                 vk::DeviceSize{});
// m_vbo has 3 vertices.
command_buffer.draw(3, 1, 0, 0);

You should see the same triangle as before. But now we can use whatever set of vertices we like! The Primitive Topology is Triange List by default, so every three vertices in the array is drawn as a triangle, eg for 9 vertices: [[0, 1, 2], [3, 4, 5], [6, 7, 8]], where each inner [] represents a triangle comprised of the vertices at those indices. Try playing around with customized vertices and topologies, use Render Doc to debug unexpected outputs / bugs.

Host Vertex Buffers are useful for primitives that are temporary and/or frequently changing, such as UI objects. A 2D framework can use such VBOs exclusively: a simple approach would be a pool of buffers per virtual frame where for each draw a buffer is obtained from the current virtual frame's pool and vertices are copied in.

Command Block

Long-lived vertex buffers perform better when backed by Device memory, especially for 3D meshes. Data is transferred to device buffers in two steps:

  1. Allocate a host buffer and copy the data to its mapped memory
  2. Allocate a device buffer, record a Buffer Copy operation and submit it

The second step requires a command buffer and queue submission (and waiting for the submitted work to complete). Encapsulate this behavior into a class, it will also be used for creating images:

class CommandBlock {
 public:
  explicit CommandBlock(vk::Device device, vk::Queue queue,
                        vk::CommandPool command_pool);

  [[nodiscard]] auto command_buffer() const -> vk::CommandBuffer {
    return *m_command_buffer;
  }

  void submit_and_wait();

 private:
  vk::Device m_device{};
  vk::Queue m_queue{};
  vk::UniqueCommandBuffer m_command_buffer{};
};

The constructor takes an existing command pool created for such ad-hoc allocations, and the queue for submission later. This way it can be passed around after creation and used by other code.

CommandBlock::CommandBlock(vk::Device const device, vk::Queue const queue,
               vk::CommandPool const command_pool)
  : m_device(device), m_queue(queue) {
  // allocate a UniqueCommandBuffer which will free the underlying command
  // buffer from its owning pool on destruction.
  auto allocate_info = vk::CommandBufferAllocateInfo{};
  allocate_info.setCommandPool(command_pool)
    .setCommandBufferCount(1)
    .setLevel(vk::CommandBufferLevel::ePrimary);
  // all the current VulkanHPP functions for UniqueCommandBuffer allocation
  // return vectors.
  auto command_buffers = m_device.allocateCommandBuffersUnique(allocate_info);
  m_command_buffer = std::move(command_buffers.front());

  // start recording commands before returning.
  auto begin_info = vk::CommandBufferBeginInfo{};
  begin_info.setFlags(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
  m_command_buffer->begin(begin_info);
}

submit_and_wait() resets the unique command buffer at the end, to free it from its command pool:

void CommandBlock::submit_and_wait() {
  if (!m_command_buffer) { return; }

  // end recording and submit.
  m_command_buffer->end();
  auto submit_info = vk::SubmitInfo2KHR{};
  auto const command_buffer_info =
    vk::CommandBufferSubmitInfo{*m_command_buffer};
  submit_info.setCommandBufferInfos(command_buffer_info);
  auto fence = m_device.createFenceUnique({});
  m_queue.submit2(submit_info, *fence);

  // wait for submit fence to be signaled.
  static constexpr auto timeout_v =
    static_cast<std::uint64_t>(std::chrono::nanoseconds(30s).count());
  auto const result = m_device.waitForFences(*fence, vk::True, timeout_v);
  if (result != vk::Result::eSuccess) {
    std::println(stderr, "Failed to submit Command Buffer");
  }
  // free the command buffer.
  m_command_buffer.reset();
}

Multithreading considerations

Instead of blocking the main thread on every Command Block's submit_and_wait(), you might be wondering if command block usage could be multithreaded. The answer is yes! But with some extra work: each thread will require its own command pool - just using one owned (unique) pool per Command Block (with no need to free the buffer) is a good starting point. All queue operations need to be synchronized, ie a critical section protected by a mutex. This includes Swapchain acquire/present calls, and Queue submissions. A class Queue value type that stores a copy of the vk::Queue and a pointer/reference to its std::mutex - and wraps the submit call - can be passed to command blocks. Just this much will enable asynchronous asset loading etc, as each loading thread will use its own command pool, and queue submissions all around will be critical sections. VmaAllocator is internally synchronized (can be disabled at build time), so performing allocations through the same allocator on multiple threads is safe.

For multi-threaded rendering, use a Secondary command buffer per thread to record rendering commands, accumulate and execute them in the main (Primary) command buffer currently in RenderSync. This is not particularly helpful unless you have thousands of expensive draw calls and dozens of render passes, as recording even a hundred draws will likely be faster on a single thread.

Device Buffers

This guide will only use device buffers for vertex buffers, where both vertex and index data will be strung together in a single VBO. The create function can thus take the data and perform the buffer copy operation before returning. In essence this return value is a "GPU const" buffer. To enable utilizing separate spans for vertices and indices (instead of forcing allocation of a contiguous bytestream and copying the data), the create function takes a slightly awkward span of spans:

// disparate byte spans.
using ByteSpans = std::span<std::span<std::byte const> const>;

// returns a Device Buffer with each byte span sequentially written.
[[nodiscard]] auto create_device_buffer(BufferCreateInfo const& create_info,
                                        CommandBlock command_block,
                                        ByteSpans const& byte_spans) -> Buffer;

Implement create_device_buffer():

auto vma::create_device_buffer(BufferCreateInfo const& create_info,
                               CommandBlock command_block,
                               ByteSpans const& byte_spans) -> Buffer {
  auto const total_size = std::accumulate(
    byte_spans.begin(), byte_spans.end(), 0uz,
    [](std::size_t const n, std::span<std::byte const> bytes) {
      return n + bytes.size();
    });

  auto staging_ci = create_info;
  staging_ci.usage = vk::BufferUsageFlagBits::eTransferSrc;

  // create staging Host Buffer with TransferSrc usage.
  auto staging_buffer =
    create_buffer(staging_ci, BufferMemoryType::Host, total_size);
  // create the Device Buffer.
  auto ret = create_buffer(create_info, BufferMemoryType::Device, total_size);
  // can't do anything if either buffer creation failed.
  if (!staging_buffer.get().buffer || !ret.get().buffer) { return {}; }

  // copy byte spans into staging buffer.
  auto dst = staging_buffer.get().mapped_span();
  for (auto const bytes : byte_spans) {
    std::memcpy(dst.data(), bytes.data(), bytes.size());
    dst = dst.subspan(bytes.size());
  }

  // record buffer copy operation.
  auto buffer_copy = vk::BufferCopy2{};
  buffer_copy.setSize(total_size);
  auto copy_buffer_info = vk::CopyBufferInfo2{};
  copy_buffer_info.setSrcBuffer(staging_buffer.get().buffer)
    .setDstBuffer(ret.get().buffer)
    .setRegions(buffer_copy);
  command_block.command_buffer().copyBuffer2(copy_buffer_info);

  // submit and wait.
  // waiting here is necessary to keep the staging buffer alive while the GPU
  // accesses it through the recorded commands.
  // this is also why the function takes ownership of the passed CommandBlock
  // instead of just referencing it / taking a vk::CommandBuffer.
  command_block.submit_and_wait();

  return ret;
}

Add a command block pool to App, and a helper function to create command blocks:

void App::create_cmd_block_pool() {
  auto command_pool_ci = vk::CommandPoolCreateInfo{};
  command_pool_ci
    .setQueueFamilyIndex(m_gpu.queue_family)
    // this flag indicates that the allocated Command Buffers will be
    // short-lived.
    .setFlags(vk::CommandPoolCreateFlagBits::eTransient);
  m_cmd_block_pool = m_device->createCommandPoolUnique(command_pool_ci);
}

auto App::create_command_block() const -> CommandBlock {
  return CommandBlock{*m_device, m_queue, *m_cmd_block_pool};
}

Update create_vertex_buffer() to create a quad with indices:

template <typename T>
[[nodiscard]] constexpr auto to_byte_array(T const& t) {
  return std::bit_cast<std::array<std::byte, sizeof(T)>>(t);
}

// ...
void App::create_vertex_buffer() {
  // vertices of a quad.
  static constexpr auto vertices_v = std::array{
    Vertex{.position = {-0.5f, -0.5f}, .color = {1.0f, 0.0f, 0.0f}},
    Vertex{.position = {0.5f, -0.5f}, .color = {0.0f, 1.0f, 0.0f}},
    Vertex{.position = {0.5f, 0.5f}, .color = {0.0f, 0.0f, 1.0f}},
    Vertex{.position = {-0.5f, 0.5f}, .color = {1.0f, 1.0f, 0.0f}},
  };
  static constexpr auto indices_v = std::array{
    0u, 1u, 2u, 2u, 3u, 0u,
  };
  static constexpr auto vertices_bytes_v = to_byte_array(vertices_v);
  static constexpr auto indices_bytes_v = to_byte_array(indices_v);
  static constexpr auto total_bytes_v =
    std::array<std::span<std::byte const>, 2>{
      vertices_bytes_v,
      indices_bytes_v,
    };
  // we want to write total_bytes_v to a Device VertexBuffer | IndexBuffer.
  m_vbo = vma::create_device_buffer(m_allocator.get(),
                                    vk::BufferUsageFlagBits::eVertexBuffer |
                                      vk::BufferUsageFlagBits::eIndexBuffer,
                                    create_command_block(), total_bytes_v);
}

Update draw():

void App::draw(vk::CommandBuffer const command_buffer) const {
  m_shader->bind(command_buffer, m_framebuffer_size);
  // single VBO at binding 0 at no offset.
  command_buffer.bindVertexBuffers(0, m_vbo.get().buffer, vk::DeviceSize{});
  // u32 indices after offset of 4 vertices.
  command_buffer.bindIndexBuffer(m_vbo.get().buffer, 4 * sizeof(Vertex),
                                 vk::IndexType::eUint32);
  // m_vbo has 6 indices.
  command_buffer.drawIndexed(6, 1, 0, 0, 0);
}

VBO Quad

Images

Images have a lot more properties and creation parameters than buffers. We shall constrain ourselves to just two kinds: sampled images (textures) for shaders, and depth images for rendering. For now add the foundation types and functions:

struct RawImage {
  auto operator==(RawImage const& rhs) const -> bool = default;

  VmaAllocator allocator{};
  VmaAllocation allocation{};
  vk::Image image{};
  vk::Extent2D extent{};
  vk::Format format{};
  std::uint32_t levels{};
};

struct ImageDeleter {
  void operator()(RawImage const& raw_image) const noexcept;
};

using Image = Scoped<RawImage, ImageDeleter>;

struct ImageCreateInfo {
  VmaAllocator allocator;
  std::uint32_t queue_family;
};

[[nodiscard]] auto create_image(ImageCreateInfo const& create_info,
                                vk::ImageUsageFlags usage, std::uint32_t levels,
                                vk::Format format, vk::Extent2D extent)
  -> Image;

Implementation:

void ImageDeleter::operator()(RawImage const& raw_image) const noexcept {
  vmaDestroyImage(raw_image.allocator, raw_image.image, raw_image.allocation);
}

// ...
auto vma::create_image(ImageCreateInfo const& create_info,
                       vk::ImageUsageFlags const usage,
                       std::uint32_t const levels, vk::Format const format,
                       vk::Extent2D const extent) -> Image {
  if (extent.width == 0 || extent.height == 0) {
    std::println(stderr, "Images cannot have 0 width or height");
    return {};
  }
  auto image_ci = vk::ImageCreateInfo{};
  image_ci.setImageType(vk::ImageType::e2D)
    .setExtent({extent.width, extent.height, 1})
    .setFormat(format)
    .setUsage(usage)
    .setArrayLayers(1)
    .setMipLevels(levels)
    .setSamples(vk::SampleCountFlagBits::e1)
    .setTiling(vk::ImageTiling::eOptimal)
    .setInitialLayout(vk::ImageLayout::eUndefined)
    .setQueueFamilyIndices(create_info.queue_family);
  auto const vk_image_ci = static_cast<VkImageCreateInfo>(image_ci);

  auto allocation_ci = VmaAllocationCreateInfo{};
  allocation_ci.usage = VMA_MEMORY_USAGE_AUTO;
  VkImage image{};
  VmaAllocation allocation{};
  auto const result = vmaCreateImage(create_info.allocator, &vk_image_ci,
                     &allocation_ci, &image, &allocation, {});
  if (result != VK_SUCCESS) {
    std::println(stderr, "Failed to create VMA Image");
    return {};
  }

  return RawImage{
    .allocator = create_info.allocator,
    .allocation = allocation,
    .image = image,
    .extent = extent,
    .format = format,
    .levels = levels,
  };
}

For creating sampled images, we need both the image bytes and size (extent). Wrap that into a struct:

struct Bitmap {
  std::span<std::byte const> bytes{};
  glm::ivec2 size{};
};

The creation process is similar to device buffers: requiring a staging copy, but it also needs layout transitions. In short:

  1. Create the image and staging buffer
  2. Transition the layout from Undefined to TransferDst
  3. Record a buffer image copy operation
  4. Transition the layout from TransferDst to ShaderReadOnlyOptimal
auto vma::create_sampled_image(ImageCreateInfo const& create_info,
                               CommandBlock command_block, Bitmap const& bitmap)
  -> Image {
  // create image.
  // no mip-mapping right now: 1 level.
  auto const mip_levels = 1u;
  auto const usize = glm::uvec2{bitmap.size};
  auto const extent = vk::Extent2D{usize.x, usize.y};
  auto const usage =
    vk::ImageUsageFlagBits::eTransferDst | vk::ImageUsageFlagBits::eSampled;
  auto ret = create_image(create_info, usage, mip_levels,
              vk::Format::eR8G8B8A8Srgb, extent);

  // create staging buffer.
  auto const buffer_ci = BufferCreateInfo{
    .allocator = create_info.allocator,
    .usage = vk::BufferUsageFlagBits::eTransferSrc,
    .queue_family = create_info.queue_family,
  };
  auto const staging_buffer = create_buffer(buffer_ci, BufferMemoryType::Host,
                        bitmap.bytes.size_bytes());

  // can't do anything if either creation failed.
  if (!ret.get().image || !staging_buffer.get().buffer) { return {}; }

  // copy bytes into staging buffer.
  std::memcpy(staging_buffer.get().mapped, bitmap.bytes.data(),
        bitmap.bytes.size_bytes());

  // transition image for transfer.
  auto dependency_info = vk::DependencyInfo{};
  auto subresource_range = vk::ImageSubresourceRange{};
  subresource_range.setAspectMask(vk::ImageAspectFlagBits::eColor)
    .setLayerCount(1)
    .setLevelCount(mip_levels);
  auto barrier = vk::ImageMemoryBarrier2{};
  barrier.setImage(ret.get().image)
    .setSrcQueueFamilyIndex(create_info.queue_family)
    .setDstQueueFamilyIndex(create_info.queue_family)
    .setOldLayout(vk::ImageLayout::eUndefined)
    .setNewLayout(vk::ImageLayout::eTransferDstOptimal)
    .setSubresourceRange(subresource_range)
    .setSrcStageMask(vk::PipelineStageFlagBits2::eTopOfPipe)
    .setSrcAccessMask(vk::AccessFlagBits2::eNone)
    .setDstStageMask(vk::PipelineStageFlagBits2::eTransfer)
    .setDstAccessMask(vk::AccessFlagBits2::eMemoryRead |
              vk::AccessFlagBits2::eMemoryWrite);
  dependency_info.setImageMemoryBarriers(barrier);
  command_block.command_buffer().pipelineBarrier2(dependency_info);

  // record buffer image copy.
  auto buffer_image_copy = vk::BufferImageCopy2{};
  auto subresource_layers = vk::ImageSubresourceLayers{};
  subresource_layers.setAspectMask(vk::ImageAspectFlagBits::eColor)
    .setLayerCount(1)
    .setLayerCount(mip_levels);
  buffer_image_copy.setImageSubresource(subresource_layers)
    .setImageExtent(vk::Extent3D{extent.width, extent.height, 1});
  auto copy_info = vk::CopyBufferToImageInfo2{};
  copy_info.setDstImage(ret.get().image)
    .setDstImageLayout(vk::ImageLayout::eTransferDstOptimal)
    .setSrcBuffer(staging_buffer.get().buffer)
    .setRegions(buffer_image_copy);
  command_block.command_buffer().copyBufferToImage2(copy_info);

  // transition image for sampling.
  barrier.setOldLayout(barrier.newLayout)
    .setNewLayout(vk::ImageLayout::eShaderReadOnlyOptimal)
    .setSrcStageMask(barrier.dstStageMask)
    .setSrcAccessMask(barrier.dstAccessMask)
    .setDstStageMask(vk::PipelineStageFlagBits2::eAllGraphics)
    .setDstAccessMask(vk::AccessFlagBits2::eMemoryRead |
              vk::AccessFlagBits2::eMemoryWrite);
  dependency_info.setImageMemoryBarriers(barrier);
  command_block.command_buffer().pipelineBarrier2(dependency_info);

  command_block.submit_and_wait();

  return ret;
}

Before such images can be used as textures, we need to set up Descriptor Set infrastructure.

Descriptor Sets

Vulkan Descriptors are essentially typed pointers to resources that shaders can use, eg uniform/storage buffers or combined image samplers (textures with samplers). A Descriptor Set is a collection of descriptors at various bindings that is bound together as an atomic unit. Shaders can declare input based on these set and binding numbers, and any sets the shader uses must have been updated and bound before drawing. A Descriptor Set Layout is a description of a collection of descriptor sets associated with a particular set number, usually describing all the sets in a shader. Descriptor sets are allocated using a Descriptor Pool and the desired set layout(s).

Structuring set layouts and managing descriptor sets are complex topics with many viable approaches, each with their pros and cons. Some robust ones are described in this page. 2D frameworks - and even simple/basic 3D ones - can simply allocate and update sets every frame, as described in the docs as the "simplest approach". Here's an extremely detailed - albeit a bit dated now - post by Arseny on the subject. A more modern approach, namely "bindless" or Descriptor Indexing, is described in the official docs here.

Pipeline Layout

A Vulkan Pipeline Layout represents a sequence of descriptor sets (and push constants) associated with a shader program. Even when using Shader Objects, a Pipeline Layout is needed to utilize descriptor sets.

Starting with the layout of a single descriptor set containing a uniform buffer to set the view/projection matrices in, store a descriptor pool in App and create it before the shader:

vk::UniqueDescriptorPool m_descriptor_pool{};

// ...
void App::create_descriptor_pool() {
  static constexpr auto pool_sizes_v = std::array{
    // 2 uniform buffers, can be more if desired.
    vk::DescriptorPoolSize{vk::DescriptorType::eUniformBuffer, 2},
  };
  auto pool_ci = vk::DescriptorPoolCreateInfo{};
  // allow 16 sets to be allocated from this pool.
  pool_ci.setPoolSizes(pool_sizes_v).setMaxSets(16);
  m_descriptor_pool = m_device->createDescriptorPoolUnique(pool_ci);
}

Add new members to App to store the set layouts and pipeline layout. m_set_layout_views is just a copy of the descriptor set layout handles in a contiguous vector:

std::vector<vk::UniqueDescriptorSetLayout> m_set_layouts{};
std::vector<vk::DescriptorSetLayout> m_set_layout_views{};
vk::UniquePipelineLayout m_pipeline_layout{};

// ...
constexpr auto layout_binding(std::uint32_t binding,
                              vk::DescriptorType const type) {
  return vk::DescriptorSetLayoutBinding{
    binding, type, 1, vk::ShaderStageFlagBits::eAllGraphics};
}

// ...
void App::create_pipeline_layout() {
  static constexpr auto set_0_bindings_v = std::array{
    layout_binding(0, vk::DescriptorType::eUniformBuffer),
  };
  auto set_layout_cis = std::array<vk::DescriptorSetLayoutCreateInfo, 1>{};
  set_layout_cis[0].setBindings(set_0_bindings_v);

  for (auto const& set_layout_ci : set_layout_cis) {
    m_set_layouts.push_back(
      m_device->createDescriptorSetLayoutUnique(set_layout_ci));
    m_set_layout_views.push_back(*m_set_layouts.back());
  }

  auto pipeline_layout_ci = vk::PipelineLayoutCreateInfo{};
  pipeline_layout_ci.setSetLayouts(m_set_layout_views);
  m_pipeline_layout =
    m_device->createPipelineLayoutUnique(pipeline_layout_ci);
}

Add a helper function that allocates a set of descriptor sets for the entire layout:

auto App::allocate_sets() const -> std::vector<vk::DescriptorSet> {
  auto allocate_info = vk::DescriptorSetAllocateInfo{};
  allocate_info.setDescriptorPool(*m_descriptor_pool)
    .setSetLayouts(m_set_layout_views);
  return m_device->allocateDescriptorSets(allocate_info);
}

Store a Buffered copy of descriptor sets for one drawable object:

Buffered<std::vector<vk::DescriptorSet>> m_descriptor_sets{};

// ...

void App::create_descriptor_sets() {
  for (auto& descriptor_sets : m_descriptor_sets) {
    descriptor_sets = allocate_sets();
  }
}

Shader Buffer

Uniform and Storage buffers need to be N-buffered unless they are "GPU const", ie contents do not change after creation. Encapsulate a vma::Buffer per virtual frame in a ShaderBuffer:

class ShaderBuffer {
 public:
  explicit ShaderBuffer(VmaAllocator allocator, std::uint32_t queue_family,
                        vk::BufferUsageFlags usage);

  void write_at(std::size_t frame_index, std::span<std::byte const> bytes);

  [[nodiscard]] auto descriptor_info_at(std::size_t frame_index) const
    -> vk::DescriptorBufferInfo;

 private:
  struct Buffer {
    vma::Buffer buffer{};
    vk::DeviceSize size{};
  };

  void write_to(Buffer& out, std::span<std::byte const> bytes) const;

  VmaAllocator m_allocator{};
  std::uint32_t m_queue_family{};
  vk::BufferUsageFlags m_usage{};
  Buffered<Buffer> m_buffers{};
};

The implementation is fairly straightforward, it reuses existing buffers if they are large enough, else recreates them before copying data. It also ensures buffers are always valid to be bound to descriptors.

ShaderBuffer::ShaderBuffer(VmaAllocator allocator,
                           std::uint32_t const queue_family,
                           vk::BufferUsageFlags const usage)
  : m_allocator(allocator), m_queue_family(queue_family), m_usage(usage) {
  // ensure buffers are created and can be bound after returning.
  for (auto& buffer : m_buffers) { write_to(buffer, {}); }
}

void ShaderBuffer::write_at(std::size_t const frame_index,
                            std::span<std::byte const> bytes) {
  write_to(m_buffers.at(frame_index), bytes);
}

auto ShaderBuffer::descriptor_info_at(std::size_t const frame_index) const
  -> vk::DescriptorBufferInfo {
  auto const& buffer = m_buffers.at(frame_index);
  auto ret = vk::DescriptorBufferInfo{};
  ret.setBuffer(buffer.buffer.get().buffer).setRange(buffer.size);
  return ret;
}

void ShaderBuffer::write_to(Buffer& out,
                            std::span<std::byte const> bytes) const {
  static constexpr auto blank_byte_v = std::array{std::byte{}};
  // fallback to an empty byte if bytes is empty.
  if (bytes.empty()) { bytes = blank_byte_v; }
  out.size = bytes.size();
  if (out.buffer.get().size < bytes.size()) {
    // size is too small (or buffer doesn't exist yet), recreate buffer.
    auto const buffer_ci = vma::BufferCreateInfo{
      .allocator = m_allocator,
      .usage = m_usage,
      .queue_family = m_queue_family,
    };
    out.buffer = vma::create_buffer(buffer_ci, vma::BufferMemoryType::Host,
                    out.size);
  }
  std::memcpy(out.buffer.get().mapped, bytes.data(), bytes.size());
}

Store a ShaderBuffer in App and rename create_vertex_buffer() to create_shader_resources():

std::optional<ShaderBuffer> m_view_ubo{};

// ...
m_vbo = vma::create_device_buffer(buffer_ci, create_command_block(),
                                  total_bytes_v);

m_view_ubo.emplace(m_allocator.get(), m_gpu.queue_family,
              vk::BufferUsageFlagBits::eUniformBuffer);

Add functions to update the view/projection matrices and bind the frame's descriptor sets:

void App::update_view() {
  auto const half_size = 0.5f * glm::vec2{m_framebuffer_size};
  auto const mat_projection =
    glm::ortho(-half_size.x, half_size.x, -half_size.y, half_size.y);
  auto const bytes =
    std::bit_cast<std::array<std::byte, sizeof(mat_projection)>>(
      mat_projection);
  m_view_ubo->write_at(m_frame_index, bytes);
}

// ...
void App::bind_descriptor_sets(vk::CommandBuffer const command_buffer) const {
  auto writes = std::array<vk::WriteDescriptorSet, 1>{};
  auto const& descriptor_sets = m_descriptor_sets.at(m_frame_index);
  auto const set0 = descriptor_sets[0];
  auto write = vk::WriteDescriptorSet{};
  auto const view_ubo_info = m_view_ubo->descriptor_info_at(m_frame_index);
  write.setBufferInfo(view_ubo_info)
    .setDescriptorType(vk::DescriptorType::eUniformBuffer)
    .setDescriptorCount(1)
    .setDstSet(set0)
    .setDstBinding(0);
  writes[0] = write;
  m_device->updateDescriptorSets(writes, {});

  command_buffer.bindDescriptorSets(vk::PipelineBindPoint::eGraphics,
                                    *m_pipeline_layout, 0, descriptor_sets,
                                    {});
}

Add the descriptor set layouts to the Shader, call update_view() before draw(), and bind_descriptor_sets() in draw():

auto const shader_ci = ShaderProgram::CreateInfo{
  .device = *m_device,
  .vertex_spirv = vertex_spirv,
  .fragment_spirv = fragment_spirv,
  .vertex_input = vertex_input_v,
  .set_layouts = m_set_layout_views,
};

// ...
inspect();
update_view();
draw(command_buffer);

// ...
m_shader->bind(command_buffer, m_framebuffer_size);
bind_descriptor_sets(command_buffer);
// ...

Update the vertex shader to use the view UBO:

layout (set = 0, binding = 0) uniform View {
  mat4 mat_vp;
};

// ...
void main() {
  const vec4 world_pos = vec4(a_pos, 0.0, 1.0);

  out_color = a_color;
  gl_Position = mat_vp * world_pos;
}

Since the projected space is now the framebuffer size instead of [-1, 1], update the vertex positions to be larger than 1 pixel:

static constexpr auto vertices_v = std::array{
  Vertex{.position = {-200.0f, -200.0f}, .color = {1.0f, 0.0f, 0.0f}},
  Vertex{.position = {200.0f, -200.0f}, .color = {0.0f, 1.0f, 0.0f}},
  Vertex{.position = {200.0f, 200.0f}, .color = {0.0f, 0.0f, 1.0f}},
  Vertex{.position = {-200.0f, 200.0f}, .color = {1.0f, 1.0f, 0.0f}},
};

View UBO

When such shader buffers are created and (more importantly) destroyed dynamically, they would need to store a ScopedWaiter to ensure all rendering with descriptor sets bound to them completes before destruction. Alternatively, the app can maintain a pool of scratch buffers (similar to small/dynamic vertex buffers) per virtual frame which get destroyed in a batch instead of individually.

Texture

With a large part of the complexity wrapped away in vma, a Texture is just a combination of three things:

  1. Sampled Image
  2. (Unique) Image View of above
  3. (Unique) Sampler

In texture.hpp, create a default sampler:

[[nodiscard]] constexpr auto
create_sampler_ci(vk::SamplerAddressMode const wrap, vk::Filter const filter) {
  auto ret = vk::SamplerCreateInfo{};
  ret.setAddressModeU(wrap)
    .setAddressModeV(wrap)
    .setAddressModeW(wrap)
    .setMinFilter(filter)
    .setMagFilter(filter)
    .setMaxLod(VK_LOD_CLAMP_NONE)
    .setBorderColor(vk::BorderColor::eFloatTransparentBlack)
    .setMipmapMode(vk::SamplerMipmapMode::eNearest);
  return ret;
}

constexpr auto sampler_ci_v = create_sampler_ci(
  vk::SamplerAddressMode::eClampToEdge, vk::Filter::eLinear);

Define the Create Info and Texture types:

struct TextureCreateInfo {
  vk::Device device;
  VmaAllocator allocator;
  std::uint32_t queue_family;
  CommandBlock command_block;
  Bitmap bitmap;

  vk::SamplerCreateInfo sampler{sampler_ci_v};
};

class Texture {
 public:
  using CreateInfo = TextureCreateInfo;

  explicit Texture(CreateInfo create_info);

  [[nodiscard]] auto descriptor_info() const -> vk::DescriptorImageInfo;

 private:
  vma::Image m_image{};
  vk::UniqueImageView m_view{};
  vk::UniqueSampler m_sampler{};
};

Add a fallback bitmap constant, and the implementation:

// 4-channels.
constexpr auto white_pixel_v = std::array{std::byte{0xff}, std::byte{0xff},
                                          std::byte{0xff}, std::byte{0xff}};
// fallback bitmap.
constexpr auto white_bitmap_v = Bitmap{
  .bytes = white_pixel_v,
  .size = {1, 1},
};

// ...
Texture::Texture(CreateInfo create_info) {
  if (create_info.bitmap.bytes.empty() || create_info.bitmap.size.x <= 0 ||
    create_info.bitmap.size.y <= 0) {
    create_info.bitmap = white_bitmap_v;
  }

  auto const image_ci = vma::ImageCreateInfo{
    .allocator = create_info.allocator,
    .queue_family = create_info.queue_family,
  };
  m_image = vma::create_sampled_image(
    image_ci, std::move(create_info.command_block), create_info.bitmap);

  auto image_view_ci = vk::ImageViewCreateInfo{};
  auto subresource_range = vk::ImageSubresourceRange{};
  subresource_range.setAspectMask(vk::ImageAspectFlagBits::eColor)
    .setLayerCount(1)
    .setLevelCount(m_image.get().levels);

  image_view_ci.setImage(m_image.get().image)
    .setViewType(vk::ImageViewType::e2D)
    .setFormat(m_image.get().format)
    .setSubresourceRange(subresource_range);
  m_view = create_info.device.createImageViewUnique(image_view_ci);

  m_sampler = create_info.device.createSamplerUnique(create_info.sampler);
}

auto Texture::descriptor_info() const -> vk::DescriptorImageInfo {
  auto ret = vk::DescriptorImageInfo{};
  ret.setImageView(*m_view)
    .setImageLayout(vk::ImageLayout::eShaderReadOnlyOptimal)
    .setSampler(*m_sampler);
  return ret;
}

To sample textures, Vertex will need a UV coordinate:

struct Vertex {
  glm::vec2 position{};
  glm::vec3 color{1.0f};
  glm::vec2 uv{};
};

// two vertex attributes: position at 0, color at 1.
constexpr auto vertex_attributes_v = std::array{
  // the format matches the type and layout of data: vec2 => 2x 32-bit floats.
  vk::VertexInputAttributeDescription2EXT{0, 0, vk::Format::eR32G32Sfloat,
                      offsetof(Vertex, position)},
  // vec3 => 3x 32-bit floats
  vk::VertexInputAttributeDescription2EXT{1, 0, vk::Format::eR32G32B32Sfloat,
                      offsetof(Vertex, color)},
  // vec2 => 2x 32-bit floats
  vk::VertexInputAttributeDescription2EXT{2, 0, vk::Format::eR32G32Sfloat,
                      offsetof(Vertex, uv)},
};

Store a texture in App and create with the other shader resources:

std::optional<Texture> m_texture{};

// ...
using Pixel = std::array<std::byte, 4>;
static constexpr auto rgby_pixels_v = std::array{
  Pixel{std::byte{0xff}, {}, {}, std::byte{0xff}},
  Pixel{std::byte{}, std::byte{0xff}, {}, std::byte{0xff}},
  Pixel{std::byte{}, {}, std::byte{0xff}, std::byte{0xff}},
  Pixel{std::byte{0xff}, std::byte{0xff}, {}, std::byte{0xff}},
};
static constexpr auto rgby_bytes_v =
  std::bit_cast<std::array<std::byte, sizeof(rgby_pixels_v)>>(
    rgby_pixels_v);
static constexpr auto rgby_bitmap_v = Bitmap{
  .bytes = rgby_bytes_v,
  .size = {2, 2},
};
auto texture_ci = Texture::CreateInfo{
  .device = *m_device,
  .allocator = m_allocator.get(),
  .queue_family = m_gpu.queue_family,
  .command_block = create_command_block(),
  .bitmap = rgby_bitmap_v,
};
// use Nearest filtering instead of Linear (interpolation).
texture_ci.sampler.setMagFilter(vk::Filter::eNearest);
m_texture.emplace(std::move(texture_ci));

Update the descriptor pool sizes to also contain Combined Image Samplers:

/// ...
vk::DescriptorPoolSize{vk::DescriptorType::eUniformBuffer, 2},
vk::DescriptorPoolSize{vk::DescriptorType::eCombinedImageSampler, 2},

Set up a new descriptor set (number 1) with a combined image sampler at binding 0. This could be added to binding 1 of set 0 as well, since we are not optimizing binding calls (eg binding set 0 only once for multiple draws):

static constexpr auto set_1_bindings_v = std::array{
  layout_binding(0, vk::DescriptorType::eCombinedImageSampler),
};
auto set_layout_cis = std::array<vk::DescriptorSetLayoutCreateInfo, 2>{};
set_layout_cis[0].setBindings(set_0_bindings_v);
set_layout_cis[1].setBindings(set_1_bindings_v);

Remove the vertex colors and set the UVs for the quad. In Vulkan UV space is the same as GLFW window space: origin is at the top left, +X moves right, +Y moves down.

static constexpr auto vertices_v = std::array{
  Vertex{.position = {-200.0f, -200.0f}, .uv = {0.0f, 1.0f}},
  Vertex{.position = {200.0f, -200.0f}, .uv = {1.0f, 1.0f}},
  Vertex{.position = {200.0f, 200.0f}, .uv = {1.0f, 0.0f}},
  Vertex{.position = {-200.0f, 200.0f}, .uv = {0.0f, 0.0f}},
};

Finally, update the descriptor writes:

auto writes = std::array<vk::WriteDescriptorSet, 2>{};
// ...
auto const set1 = descriptor_sets[1];
auto const image_info = m_texture->descriptor_info();
write.setImageInfo(image_info)
  .setDescriptorType(vk::DescriptorType::eCombinedImageSampler)
  .setDescriptorCount(1)
  .setDstSet(set1)
  .setDstBinding(0);
writes[1] = write;

Since the Texture is not N-buffered (because it is "GPU const"), in this case the sets could also be updated once after texture creation instead of every frame.

Add the UV vertex attribute the vertex shader and pass it to the fragment shader:

layout (location = 2) in vec2 a_uv;

// ...
layout (location = 1) out vec2 out_uv;

// ...
out_color = a_color;
out_uv = a_uv;

Add set 1 and the incoming UV coords to the fragment shader, combine the sampled texture color with the vertex color:

layout (set = 1, binding = 0) uniform sampler2D tex;

// ...
layout (location = 1) in vec2 in_uv;

// ...
out_color = vec4(in_color, 1.0) * texture(tex, in_uv);

RGBY Texture

For generating mip-maps, follow the sample in the Vulkan docs. The high-level steps are:

  1. Compute mip levels based on image size
  2. Create an image with the desired mip levels
  3. Copy the source data to the first mip level as usual
  4. Transition the first mip level to TransferSrc
  5. Iterate through all the remaining mip levels:
    1. Transition the current mip level to TransferDst
    2. Record an image blit operation from previous to current mip levels
    3. Transition the current mip level to TransferSrc
  6. Transition all levels (entire image) to ShaderRead

View Matrix

Integrating the view matrix will be quite simple and short. First, transformations for objects and cameras/views can be encapsulated into a single struct:

struct Transform {
  glm::vec2 position{};
  float rotation{};
  glm::vec2 scale{1.0f};

  [[nodiscard]] auto model_matrix() const -> glm::mat4;
  [[nodiscard]] auto view_matrix() const -> glm::mat4;
};

Extracting the common logic into a helper, both member functions can be implemented easily:

namespace {
struct Matrices {
  glm::mat4 translation;
  glm::mat4 orientation;
  glm::mat4 scale;
};

[[nodiscard]] auto to_matrices(glm::vec2 const position, float rotation,
                 glm::vec2 const scale) -> Matrices {
  static constexpr auto mat_v = glm::identity<glm::mat4>();
  static constexpr auto axis_v = glm::vec3{0.0f, 0.0f, 1.0f};
  return Matrices{
    .translation = glm::translate(mat_v, glm::vec3{position, 0.0f}),
    .orientation = glm::rotate(mat_v, glm::radians(rotation), axis_v),
    .scale = glm::scale(mat_v, glm::vec3{scale, 1.0f}),
  };
}
} // namespace

auto Transform::model_matrix() const -> glm::mat4 {
  auto const [t, r, s] = to_matrices(position, rotation, scale);
  // right to left: scale first, then rotate, then translate.
  return t * r * s;
}

auto Transform::view_matrix() const -> glm::mat4 {
  // view matrix is the inverse of the model matrix.
  // instead, perform translation and rotation in reverse order and with
  // negative values. or, use glm::lookAt().
  //  scale is kept unchanged as the first transformation for
  // "intuitive" scaling on cameras.
  auto const [t, r, s] = to_matrices(-position, -rotation, scale);
  return r * t * s;
}

Add a Transform member to App to represent the view/camera, inspect its members, and combine with the existing projection matrix:

Transform m_view_transform{}; // generates view matrix.

// ...
ImGui::Separator();
if (ImGui::TreeNode("View")) {
  ImGui::DragFloat2("position", &m_view_transform.position.x);
  ImGui::DragFloat("rotation", &m_view_transform.rotation);
  ImGui::DragFloat2("scale", &m_view_transform.scale.x);
  ImGui::TreePop();
}

// ...
auto const mat_view = m_view_transform.view_matrix();
auto const mat_vp = mat_projection * mat_view;
auto const bytes =
  std::bit_cast<std::array<std::byte, sizeof(mat_vp)>>(mat_vp);
m_view_ubo->write_at(m_frame_index, bytes);

Naturally, moving the view left moves everything else - currently only a single RGBY quad - to the right.

View Matrix

Instanced Rendering

When multiple copies of a drawable object are desired, one option is to use instanced rendering. The basic idea is to store per-instance data in a uniform/storage buffer and index into it in the vertex shader. We shall represent one model matrix per instance, feel free to add more data like an overall tint (color) that gets multiplied to the existing output color in the fragment shader. This will be bound to a Storage Buffer (SSBO), which can be "unbounded" in the shader (size is determined during invocation).

Store the SSBO and a buffer for instance matrices:

std::vector<glm::mat4> m_instance_data{}; // model matrices.
std::optional<ShaderBuffer> m_instance_ssbo{};

Add two Transforms as the source of rendering instances, and a function to update the matrices:

void update_instances();

// ...
std::array<Transform, 2> m_instances{}; // generates model matrices.

// ...
void App::update_instances() {
  m_instance_data.clear();
  m_instance_data.reserve(m_instances.size());
  for (auto const& transform : m_instances) {
    m_instance_data.push_back(transform.model_matrix());
  }
  // can't use bit_cast anymore, reinterpret data as a byte array instead.
  auto const span = std::span{m_instance_data};
  void* data = span.data();
  auto const bytes =
    std::span{static_cast<std::byte const*>(data), span.size_bytes()};
  m_instance_ssbo->write_at(m_frame_index, bytes);
}

Update the descriptor pool to also provide storage buffers:

// ...
vk::DescriptorPoolSize{vk::DescriptorType::eCombinedImageSampler, 2},
vk::DescriptorPoolSize{vk::DescriptorType::eStorageBuffer, 2},

This time add a new binding to set 1 (instead of a new set):

static constexpr auto set_1_bindings_v = std::array{
  layout_binding(0, vk::DescriptorType::eCombinedImageSampler),
  layout_binding(1, vk::DescriptorType::eStorageBuffer),
};

Create the instance SSBO after the view UBO:

m_instance_ssbo.emplace(m_allocator.get(), m_gpu.queue_family,
                        vk::BufferUsageFlagBits::eStorageBuffer);

Call update_instances() after update_view():

// ...
update_view();
update_instances();

Extract transform inspection into a lambda and inspect each instance transform too:

static auto const inspect_transform = [](Transform& out) {
  ImGui::DragFloat2("position", &out.position.x);
  ImGui::DragFloat("rotation", &out.rotation);
  ImGui::DragFloat2("scale", &out.scale.x, 0.1f);
};

ImGui::Separator();
if (ImGui::TreeNode("View")) {
  inspect_transform(m_view_transform);
  ImGui::TreePop();
}

ImGui::Separator();
if (ImGui::TreeNode("Instances")) {
  for (std::size_t i = 0; i < m_instances.size(); ++i) {
    auto const label = std::to_string(i);
    if (ImGui::TreeNode(label.c_str())) {
      inspect_transform(m_instances.at(i));
      ImGui::TreePop();
    }
  }
  ImGui::TreePop();
}

Add another descriptor write for the SSBO:

auto writes = std::array<vk::WriteDescriptorSet, 3>{};
// ...
auto const instance_ssbo_info =
  m_instance_ssbo->descriptor_info_at(m_frame_index);
write.setBufferInfo(instance_ssbo_info)
  .setDescriptorType(vk::DescriptorType::eStorageBuffer)
  .setDescriptorCount(1)
  .setDstSet(set1)
  .setDstBinding(1);
writes[2] = write;

Finally, change the instance count in the draw call:

auto const instances = static_cast<std::uint32_t>(m_instances.size());
// m_vbo has 6 indices.
command_buffer.drawIndexed(6, instances, 0, 0, 0);

Update the vertex shader to incorporate the instance model matrix:

// ...
layout (set = 1, binding = 1) readonly buffer Instances {
  mat4 mat_ms[];
};

// ...
const mat4 mat_m = mat_ms[gl_InstanceIndex];
const vec4 world_pos = mat_m * vec4(a_pos, 0.0, 1.0);

Instanced Rendering