From: Linus Torvalds Date: Sat, 23 Nov 2024 17:58:07 +0000 (-0800) Subject: Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel... X-Git-Tag: v6.13-rc1~99 X-Git-Url: https://repo.jachan.dev/linux.git/commitdiff_plain/5c00ff742bf5caf85f60e1c73999f99376fb865d Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from MaĆ­ra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ... --- 5c00ff742bf5caf85f60e1c73999f99376fb865d diff --cc arch/arm64/include/asm/set_memory.h index 37774c793006,98088c043606..90f61b17275e --- a/arch/arm64/include/asm/set_memory.h +++ b/arch/arm64/include/asm/set_memory.h @@@ -13,9 -13,7 +13,10 @@@ int set_memory_valid(unsigned long addr int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); + int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); bool kernel_page_present(struct page *page); +int set_memory_encrypted(unsigned long addr, int numpages); +int set_memory_decrypted(unsigned long addr, int numpages); + #endif /* _ASM_ARM64_SET_MEMORY_H */ diff --cc arch/arm64/mm/pageattr.c index 6ae6ae806454,01225900293a..39fd1f7ff02a --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@@ -202,87 -192,17 +202,103 @@@ int set_direct_map_default_noflush(stru PAGE_SIZE, change_page_range, &data); } +static int __set_memory_enc_dec(unsigned long addr, + int numpages, + bool encrypt) +{ + unsigned long set_prot = 0, clear_prot = 0; + phys_addr_t start, end; + int ret; + + if (!is_realm_world()) + return 0; + + if (!__is_lm_address(addr)) + return -EINVAL; + + start = __virt_to_phys(addr); + end = start + numpages * PAGE_SIZE; + + if (encrypt) + clear_prot = PROT_NS_SHARED; + else + set_prot = PROT_NS_SHARED; + + /* + * Break the mapping before we make any changes to avoid stale TLB + * entries or Synchronous External Aborts caused by RIPAS_EMPTY + */ + ret = __change_memory_common(addr, PAGE_SIZE * numpages, + __pgprot(set_prot), + __pgprot(clear_prot | PTE_VALID)); + + if (ret) + return ret; + + if (encrypt) + ret = rsi_set_memory_range_protected(start, end); + else + ret = rsi_set_memory_range_shared(start, end); + + if (ret) + return ret; + + return __change_memory_common(addr, PAGE_SIZE * numpages, + __pgprot(PTE_VALID), + __pgprot(0)); +} + +static int realm_set_memory_encrypted(unsigned long addr, int numpages) +{ + int ret = __set_memory_enc_dec(addr, numpages, true); + + /* + * If the request to change state fails, then the only sensible cause + * of action for the caller is to leak the memory + */ + WARN(ret, "Failed to encrypt memory, %d pages will be leaked", + numpages); + + return ret; +} + +static int realm_set_memory_decrypted(unsigned long addr, int numpages) +{ + int ret = __set_memory_enc_dec(addr, numpages, false); + + WARN(ret, "Failed to decrypt memory, %d pages will be leaked", + numpages); + + return ret; +} + +static const struct arm64_mem_crypt_ops realm_crypt_ops = { + .encrypt = realm_set_memory_encrypted, + .decrypt = realm_set_memory_decrypted, +}; + +int realm_register_memory_enc_ops(void) +{ + return arm64_mem_crypt_ops_register(&realm_crypt_ops); +} + + int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) + { + unsigned long addr = (unsigned long)page_address(page); + + if (!can_set_direct_map()) + return 0; + + return set_memory_valid(addr, nr, valid); + } + #ifdef CONFIG_DEBUG_PAGEALLOC ++/* ++ * This is - apart from the return value - doing the same ++ * thing as the new set_direct_map_valid_noflush() function. ++ * ++ * Unify? Explain the conceptual differences? ++ */ void __kernel_map_pages(struct page *page, int numpages, int enable) { if (!can_set_direct_map()) diff --cc arch/s390/include/asm/set_memory.h index cb4cc0f59012,240bcfbdcdce..94092f4ae764 --- a/arch/s390/include/asm/set_memory.h +++ b/arch/s390/include/asm/set_memory.h @@@ -62,6 -62,6 +62,7 @@@ __SET_MEMORY_FUNC(set_memory_4k, SET_ME int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); + int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid); +bool kernel_page_present(struct page *page); #endif diff --cc arch/s390/mm/pageattr.c index 4a0f422cfeb6,4c7ee74aa130..8f56a21a077f --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@@ -407,21 -406,17 +407,33 @@@ int set_direct_map_default_noflush(stru return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_DEF); } + int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid) + { + unsigned long flags; + + if (valid) + flags = SET_MEMORY_DEF; + else + flags = SET_MEMORY_INV; + + return __set_memory((unsigned long)page_to_virt(page), nr, flags); + } ++ +bool kernel_page_present(struct page *page) +{ + unsigned long addr; + unsigned int cc; + + addr = (unsigned long)page_address(page); + asm volatile( + " lra %[addr],0(%[addr])\n" + CC_IPM(cc) + : CC_OUT(cc, cc), [addr] "+a" (addr) + : + : CC_CLOBBER); + return CC_TRANSFORM(cc) == 0; +} + #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) static void ipte_range(pte_t *pte, unsigned long address, int nr) diff --cc include/linux/mm.h index 673771f34674,5d6cd523c7c0..2bbf73eb53e7 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@@ -4176,65 -4161,4 +4174,8 @@@ static inline int do_mseal(unsigned lon } #endif - #ifdef CONFIG_MEM_ALLOC_PROFILING - static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) - { - int i; - struct alloc_tag *tag; - unsigned int nr_pages = 1 << new_order; - - if (!mem_alloc_profiling_enabled()) - return; - - tag = pgalloc_tag_get(&folio->page); - if (!tag) - return; - - for (i = nr_pages; i < (1 << old_order); i += nr_pages) { - union codetag_ref *ref = get_page_tag_ref(folio_page(folio, i)); - - if (ref) { - /* Set new reference to point to the original tag */ - alloc_tag_ref_set(ref, tag); - put_page_tag_ref(ref); - } - } - } - - static inline void pgalloc_tag_copy(struct folio *new, struct folio *old) - { - struct alloc_tag *tag; - union codetag_ref *ref; - - tag = pgalloc_tag_get(&old->page); - if (!tag) - return; - - ref = get_page_tag_ref(&new->page); - if (!ref) - return; - - /* Clear the old ref to the original allocation tag. */ - clear_page_tag_ref(&old->page); - /* Decrement the counters of the tag on get_new_folio. */ - alloc_tag_sub(ref, folio_nr_pages(new)); - - __alloc_tag_ref_set(ref, tag); - - put_page_tag_ref(ref); - } - #else /* !CONFIG_MEM_ALLOC_PROFILING */ - static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) - { - } - - static inline void pgalloc_tag_copy(struct folio *new, struct folio *old) - { - } - #endif /* CONFIG_MEM_ALLOC_PROFILING */ - +int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status); +int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); +int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status); + #endif /* _LINUX_MM_H */ diff --cc mm/ksm.c index a2e2a521df0a,7ac59cde626c..31a9bc365437 --- a/mm/ksm.c +++ b/mm/ksm.c @@@ -2259,10 -2259,11 +2259,10 @@@ static void cmp_and_merge_page(struct p return; } - /* We first start with searching the page inside the stable tree */ - kpage = stable_tree_search(page); - if (kpage == page && rmap_item->head == stable_node) { - put_page(kpage); + /* Start by searching for the folio in the stable tree */ + kfolio = stable_tree_search(page); - if (!IS_ERR_OR_NULL(kfolio) && &kfolio->page == page && - rmap_item->head == stable_node) { ++ if (&kfolio->page == page && rmap_item->head == stable_node) { + folio_put(kfolio); return; } diff --cc mm/shmem.c index c7881e16f4be,579e58cb3262..ccb9629a0f70 --- a/mm/shmem.c +++ b/mm/shmem.c @@@ -1171,9 -1169,11 +1174,9 @@@ static int shmem_getattr(struct mnt_idm stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); - inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); - inode_unlock_shared(inode); - if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) + if (shmem_huge_global_enabled(inode, 0, 0, false, 0)) stat->blksize = HPAGE_PMD_SIZE; if (request_mask & STATX_BTIME) {