]> Git Repo - linux.git/commit
mm/mmap: undo ->mmap() when arch_validate_flags() fails
authorCarlos Llamas <[email protected]>
Fri, 30 Sep 2022 00:38:43 +0000 (00:38 +0000)
committerAndrew Morton <[email protected]>
Thu, 13 Oct 2022 01:51:36 +0000 (18:51 -0700)
commitdeb0f6562884b5b4beb883d73e66a7d3a1b96d99
tree163343d7516fc633eadaad71440d9b5b73e3264a
parent515778e2d790652a38a24554fdb7f21420d91efc
mm/mmap: undo ->mmap() when arch_validate_flags() fails

Commit c462ac288f2c ("mm: Introduce arch_validate_flags()") added a late
check in mmap_region() to let architectures validate vm_flags.  The check
needs to happen after calling ->mmap() as the flags can potentially be
modified during this callback.

If arch_validate_flags() check fails we unmap and free the vma.  However,
the error path fails to undo the ->mmap() call that previously succeeded
and depending on the specific ->mmap() implementation this translates to
reference increments, memory allocations and other operations what will
not be cleaned up.

There are several places (mainly device drivers) where this is an issue.
However, one specific example is bpf_map_mmap() which keeps count of the
mappings in map->writecnt.  The count is incremented on ->mmap() and then
decremented on vm_ops->close().  When arch_validate_flags() fails this
count is off since bpf_map_mmap_close() is never called.

One can reproduce this issue in arm64 devices with MTE support.  Here the
vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
previously.  From userspace then is enough to pass the PROT_MTE flag to
mmap() syscall to trigger the arch_validate_flags() failure.

The following program reproduces this issue:

  #include <stdio.h>
  #include <unistd.h>
  #include <linux/unistd.h>
  #include <linux/bpf.h>
  #include <sys/mman.h>

  int main(void)
  {
union bpf_attr attr = {
.map_type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(long long),
.max_entries = 256,
.map_flags = BPF_F_MMAPABLE,
};
int fd;

fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);

return 0;
  }

By manually adding some log statements to the vm_ops callbacks we can
confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
->release():

With PROT_MTE flag:
  root@debian:~# ./bpf-test
  [  111.263874] bpf_map_write_active_inc: map=9 writecnt=1
  [  111.288763] bpf_map_release: map=9 writecnt=1

Without PROT_MTE flag:
  root@debian:~# ./bpf-test
  [  157.816912] bpf_map_write_active_inc: map=10 writecnt=1
  [  157.830442] bpf_map_write_active_dec: map=10 writecnt=0
  [  157.832396] bpf_map_release: map=10 writecnt=0

This patch fixes the above issue by calling vm_ops->close() when the
arch_validate_flags() check fails, after this we can proceed to unmap and
free the vma on the error path.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()")
Signed-off-by: Carlos Llamas <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Liam Howlett <[email protected]>
Cc: Christian Brauner (Microsoft) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: <[email protected]> [5.10+]
Signed-off-by: Andrew Morton <[email protected]>
mm/mmap.c
This page took 0.050027 seconds and 4 git commands to generate.