In set_nr_huge_pages(), local variable "count" is used to record
persistent_huge_pages(), but when it cames to nodes huge page allocation,
the semantics changes to nr_huge_pages. When there exists surplus huge
pages and using the interface under
/sys/devices/system/node/node*/hugepages to change huge page pool size,
this difference can result in the allocation of an unexpected number of
huge pages.
Steps to reproduce the bug:
Starting with:
Node 0 Node 1 Total
HugePages_Total 0.00 0.00 0.00
HugePages_Free 0.00 0.00 0.00
HugePages_Surp 0.00 0.00 0.00
create 100 huge pages in Node 0 and consume it, then set Node 0 's
nr_hugepages to 0.
yields:
Node 0 Node 1 Total
HugePages_Total 200.00 0.00 200.00
HugePages_Free 0.00 0.00 0.00
HugePages_Surp 200.00 0.00 200.00
write 100 to Node 1's nr_hugepages
echo 100 > /sys/devices/system/node/node1/\
hugepages/hugepages-2048kB/nr_hugepages
gets:
Node 0 Node 1 Total
HugePages_Total 200.00 400.00 600.00
HugePages_Free 0.00 400.00 400.00
HugePages_Surp 200.00 0.00 200.00
Kernel is expected to create only 100 huge pages and it gives 200.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes")
Signed-off-by: Xueshi Hu <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
if (nid != NUMA_NO_NODE) {
unsigned long old_count = count;
- count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
+ count += persistent_huge_pages(h) -
+ (h->nr_huge_pages_node[nid] -
+ h->surplus_huge_pages_node[nid]);
/*
* User may have specified a large count value which caused the
* above calculation to overflow. In this case, they wanted