the GPE dispatcher.
This facility can be used to prevent such uncontrolled
GPE floodings.
- Format: <byte>
+ Format: <byte> or <bitmap-list>
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
ccw_timeout_log [S390]
See Documentation/s390/common_io.rst for details.
- cgroup_disable= [KNL] Disable a particular controller
- Format: {name of the controller(s) to disable}
+ cgroup_disable= [KNL] Disable a particular controller or optional feature
+ Format: {name of the controller(s) or feature(s) to disable}
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in
a single hierarchy
- foo isn't visible as an individually mountable
subsystem
+ - if foo is an optional feature then the feature is
+ disabled and corresponding cgroup files are not
+ created
{Currently only "memory" controller deal with this and
cut the overhead, others just disable the usage. So
only cgroup_disable=memory is actually worthy}
+ Specifying "pressure" disables per-cgroup pressure
+ stall information accounting feature
cgroup_no_v1= [KNL] Disable cgroup controllers and named hierarchies in v1
Format: { { controller | "all" | "named" }
loops can be debugged more effectively on production
systems.
+ clocksource.max_cswd_read_retries= [KNL]
+ Number of clocksource_watchdog() retries due to
+ external delays before the clock will be marked
+ unstable. Defaults to three retries, that is,
+ four attempts to read the clock under test.
+
+ clocksource.verify_n_cpus= [KNL]
+ Limit the number of CPUs checked for clocksources
+ marked with CLOCK_SOURCE_VERIFY_PERCPU that
+ are marked unstable due to excessive skew.
+ A negative value says to check all CPUs, while
+ zero says not to check any. Values larger than
+ nr_cpu_ids are silently truncated to nr_cpu_ids.
+ The actual CPUs are chosen randomly, with
+ no replacement if the same CPU is chosen twice.
+
+ clocksource-wdtest.holdoff= [KNL]
+ Set the time in seconds that the clocksource
+ watchdog test waits before commencing its tests.
+ Defaults to zero when built as a module and to
+ 10 seconds when built into the kernel.
+
clearcpuid=BITNUM[,BITNUM...] [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
+ hugetlb_free_vmemmap=
+ [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+ enabled.
+ Allows heavy hugetlb users to free up some more
+ memory (6 * PAGE_SIZE for each 2MB hugetlb page).
+ Format: { on | off (default) }
+
+ on: enable the feature
+ off: disable the feature
+
+ Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
+ the default is on.
+
+ This is not compatible with memory_hotplug.memmap_on_memory.
+ If both parameters are enabled, hugetlb_free_vmemmap takes
+ precedence over memory_hotplug.memmap_on_memory.
+
hung_task_panic=
[KNL] Should the hung task detector generate panics.
Format: 0 | 1
Note that even when enabled, there are a few cases where
the feature is not effective.
+ This is not compatible with hugetlb_free_vmemmap. If
+ both parameters are enabled, hugetlb_free_vmemmap takes
+ precedence over memory_hotplug.memmap_on_memory.
+
memtest= [KNL,X86,ARM,PPC,RISCV] Enable memtest
Format: <integer>
default : 0 <disable>
noclflush [BUGS=X86] Don't use the CLFLUSH instruction
- nodelayacct [KNL] Disable per-task delay accounting
+ delayacct [KNL] Enable per-task delay accounting
nodsp [SH] Disable hardware DSP at boot time.
nr_uarts= [SERIAL] maximum number of UARTs to be registered.
+ numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only
+ set up a single NUMA node spanning all memory.
+
numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
NUMA balancing.
Allowed values are enable and disable
Reserves a hole at the top of the kernel virtual
address space.
- reservelow= [X86]
- Format: nn[K]
- Set the amount of memory to reserve for BIOS at
- the bottom of the address space.
-
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
exception. Default behavior is by #AC if
both features are enabled in hardware.
+ ratelimit:N -
+ Set system wide rate limit to N bus locks
+ per second for bus lock detection.
+ 0 < N <= 1000.
+
+ N/A for split lock detection.
+
+
If an #AC exception is hit in the kernel or in
firmware (i.e. not while executing in user mode)
the kernel will oops in either "warn" or "fatal"
S: Supported
-T: git git://people.freedesktop.org/~agd5f/linux
+T: git https://gitlab.freedesktop.org/agd5f/linux.git
F: drivers/gpu/drm/amd/display/
AMD FAM15H PROCESSOR POWER MONITORING DRIVER
S: Supported
-T: git git://people.freedesktop.org/~agd5f/linux
+T: git https://gitlab.freedesktop.org/agd5f/linux.git
F: drivers/gpu/drm/amd/pm/powerplay/
AMD SEATTLE DEVICE TREE SUPPORT
AMD SENSOR FUSION HUB DRIVER
-M: Sandeep Singh <sandeep.singh@amd.com>
+M: Basavaraj Natikar <basavaraj.natikar@amd.com>
S: Maintained
F: Documentation/hid/amd-sfh*
T: git https://github.com/AsahiLinux/linux.git
F: Documentation/devicetree/bindings/arm/apple.yaml
F: Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml
+F: Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
F: arch/arm64/boot/dts/apple/
F: drivers/irqchip/irq-apple-aic.c
F: include/dt-bindings/interrupt-controller/apple-aic.h
+F: include/dt-bindings/pinctrl/apple.h
ARM/ARTPEC MACHINE SUPPORT
F: Documentation/devicetree/bindings/pinctrl/cortina,gemini-pinctrl.txt
F: Documentation/devicetree/bindings/rtc/faraday,ftrtc010.txt
F: arch/arm/mach-gemini/
+F: drivers/crypto/gemini/
F: drivers/net/ethernet/cortina/
F: drivers/pinctrl/pinctrl-gemini.c
F: drivers/rtc/rtc-ftrtc010.c
F: Documentation/devicetree/bindings/timer/intel,ixp4xx-timer.yaml
F: arch/arm/mach-ixp4xx/
F: drivers/clocksource/timer-ixp4xx.c
+F: drivers/crypto/ixp4xx_crypto.c
F: drivers/gpio/gpio-ixp4xx.c
F: drivers/irqchip/irq-ixp4xx.c
F: include/linux/irqchip/irq-ixp4xx.h
BROADCOM NETXTREME-E ROCE DRIVER
S: Supported
F: scripts/clang-tools/
K: \b(?i:clang|llvm)\b
+CLANG CONTROL FLOW INTEGRITY SUPPORT
+S: Supported
+B: https://github.com/ClangBuiltLinux/linux/issues
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/features
+F: include/linux/cfi.h
+F: kernel/cfi.c
+
CLEANCACHE API
F: drivers/video/console/
F: include/linux/console*
+CONTEXT TRACKING
+S: Maintained
+F: kernel/context_tracking.c
+F: include/linux/context_tracking*
+
CONTROL GROUP (CGROUP)
S: Maintained
-F: drivers/platform/x86/dell/dell-wmi.c
+F: drivers/platform/x86/dell/dell-wmi-base.c
+
+DELL WMI HARDWARE PRIVACY SUPPORT
+S: Maintained
+F: drivers/platform/x86/dell/dell-wmi-privacy.c
DELTA ST MEDIA DRIVER
T: git git://linuxtv.org/media_tree.git
F: drivers/media/platform/sti/delta
+DELTA DPS920AB PSU DRIVER
+S: Maintained
+F: Documentation/hwmon/dps920ab.rst
+F: drivers/hwmon/pmbus/dps920ab.c
+
DENALI NAND DRIVER
S: Orphan
F: drivers/gpu/drm/savage/
F: include/uapi/drm/savage_drm.h
+DRM DRIVER FOR SIMPLE FRAMEBUFFERS
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: drivers/gpu/drm/tiny/simpledrm.c
+
DRM DRIVER FOR SIS VIDEO CARDS
S: Orphan / Obsolete
F: drivers/gpu/drm/sis/
F: Documentation/devicetree/bindings/display/hisilicon/
F: drivers/gpu/drm/hisilicon/
+DRM DRIVER FOR HYPERV SYNTHETIC VIDEO DEVICE
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: drivers/gpu/drm/hyperv
+
DRM DRIVERS FOR LIMA
S: Maintained
-T: git git://people.freedesktop.org/~agd5f/linux
+T: git git://anongit.freedesktop.org/drm/drm-misc
F: drivers/gpu/drm/ttm/
F: include/drm/ttm/
F: fs/ecryptfs/
EDAC-AMD64
-S: Maintained
+S: Supported
F: drivers/edac/amd64_edac*
+F: drivers/edac/mce_amd*
EDAC-ARMADA
EMULEX ONECONNECT ROCE DRIVER
S: Odd Fixes
W: http://www.broadcom.com
EROFS FILE SYSTEM
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
F: Documentation/devicetree/bindings/net/qca,ar803x.yaml
F: Documentation/networking/phy.rst
F: drivers/net/mdio/
+F: drivers/net/mdio/acpi_mdio.c
+F: drivers/net/mdio/fwnode_mdio.c
F: drivers/net/mdio/of_mdio.c
F: drivers/net/pcs/
F: drivers/net/phy/
FREESCALE CAAM (Cryptographic Acceleration and Assurance Module) DRIVER
-M: Aymen Sghaier <aymen.sghaier@nxp.com>
+M: Pankaj Gupta <pankaj.gupta@nxp.com>
S: Maintained
F: Documentation/devicetree/bindings/crypto/fsl-sec4.txt
S: Supported
F: scripts/gdb/
+GEMINI CRYPTO DRIVER
+S: Maintained
+F: drivers/crypto/gemini/
+
GEMTEK FM RADIO RECEIVER DRIVER
S: Maintained
F: drivers/input/touchscreen/resistive-adc-touch.c
+ GENERIC STRING LIBRARY
+ S: Maintained
+ F: lib/string.c
+ F: lib/string_helpers.c
+ F: lib/test_string.c
+ F: lib/test-string_helpers.c
+
GENERIC UIO DRIVER FOR PCI DEVICES
S: Maintained
F: drivers/i2c/busses/i2c-icy.c
-IDE SUBSYSTEM
-S: Maintained
-Q: http://patchwork.ozlabs.org/project/linux-ide/list/
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide.git
-F: Documentation/ide/
-F: drivers/ide/
-F: include/linux/ide.h
-
-IDE/ATAPI DRIVERS
-S: Orphan
-F: Documentation/cdrom/ide-cd.rst
-F: drivers/ide/ide-cd*
-
IDEAPAD LAPTOP EXTRAS DRIVER
F: drivers/net/ethernet/intel/
F: drivers/net/ethernet/intel/*/
F: include/linux/avf/virtchnl.h
+F: include/linux/net/intel/iidc.h
+
+INTEL ETHERNET PROTOCOL DRIVER FOR RDMA
+S: Supported
+F: drivers/infiniband/hw/irdma/
+F: include/uapi/rdma/irdma-abi.h
INTEL FRAMEBUFFER DRIVER (excluding 810 and 815)
F: Documentation/userspace-api/media/v4l/pixfmt-meta-intel-ipu3.rst
F: drivers/staging/media/ipu3/
+INTEL IXP4XX CRYPTO SUPPORT
+S: Maintained
+F: drivers/crypto/ixp4xx_crypto.c
+
INTEL IXP4XX QMGR, NPE, ETHERNET and HSS SUPPORT
S: Maintained
S: Supported
F: drivers/cpufreq/intel_pstate.c
-INTEL RDMA RNIC DRIVER
-S: Supported
-F: drivers/infiniband/hw/i40iw/
-F: include/uapi/rdma/i40iw-abi.h
-
INTEL SCU DRIVERS
S: Maintained
F: arch/x86/include/asm/intel_scu_ipc.h
F: drivers/platform/x86/intel_scu_*
+INTEL SKYLAKE INT3472 ACPI DEVICE DRIVER
+S: Maintained
+F: drivers/platform/x86/intel/int3472/
+
INTEL SPEED SELECT TECHNOLOGY
F: include/linux/firmware/intel/stratix10-svc-client.h
INTEL TELEMETRY DRIVER
+M: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com>
S: Maintained
S: Maintained
F: drivers/platform/x86/intel-wmi-thunderbolt.c
+INTEL WWAN IOSM DRIVER
+S: Maintained
+F: drivers/net/wwan/iosm/
+
INTEL(R) TRACE HUB
S: Supported
T: git git://linuxtv.org/anttip/media_tree.git
F: drivers/media/tuners/it913x*
+ITE IT66121 HDMI BRIDGE DRIVER
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml
+F: drivers/gpu/drm/bridge/ite-it66121.c
+
IVTV VIDEO4LINUX DRIVER
F: arch/arm64/include/uapi/asm/kvm*
F: arch/arm64/kvm/
F: include/kvm/arm_*
+F: tools/testing/selftests/kvm/*/aarch64/
+F: tools/testing/selftests/kvm/aarch64/
KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)
F: drivers/mailbox/
F: include/linux/mailbox_client.h
F: include/linux/mailbox_controller.h
+F: include/dt-bindings/mailbox/
F: Documentation/devicetree/bindings/mailbox/
MAILBOX ARM MHUv2
MEDIA DRIVERS FOR FREESCALE IMX7
S: Maintained
T: git git://linuxtv.org/media_tree.git
S: Supported
T: git git://linuxtv.org/media_tree.git
F: Documentation/devicetree/bindings/media/renesas,csi2.yaml
+F: Documentation/devicetree/bindings/media/renesas,isp.yaml
F: Documentation/devicetree/bindings/media/renesas,vin.yaml
F: drivers/media/platform/rcar-vin/
F: include/linux/pagewalk.h
F: include/linux/vmalloc.h
F: mm/
+ F: tools/testing/selftests/vm/
MEMORY TECHNOLOGY DEVICES (MTD)
S: Supported
-F: Documentation/devicetree/bindings/media/atmel-isc.txt
+F: Documentation/devicetree/bindings/media/atmel,isc.yaml
+F: Documentation/devicetree/bindings/media/microchip,xisc.yaml
F: drivers/media/platform/atmel/atmel-isc-base.c
F: drivers/media/platform/atmel/atmel-isc-regs.h
F: drivers/media/platform/atmel/atmel-isc.h
F: drivers/media/platform/atmel/atmel-sama5d2-isc.c
+F: drivers/media/platform/atmel/atmel-sama7g5-isc.c
F: include/linux/atmel-isc-media.h
MICROCHIP ISI DRIVER
S: Maintained
W: https://github.com/linux-surface/surface-aggregator-module
-C: irc://chat.freenode.net/##linux-surface
+C: irc://irc.libera.chat/linux-surface
F: Documentation/driver-api/surface_aggregator/
F: drivers/platform/surface/aggregator/
F: drivers/platform/surface/surface_acpi_notify.c
F: drivers/media/pci/meye/
F: include/uapi/linux/meye.h
+MOTORCOMM PHY DRIVER
+S: Maintained
+F: drivers/net/phy/motorcomm.c
+
MOXA SMARTIO/INDUSTIO/INTELLIO SERIAL CARD
S: Orphan
F: Documentation/driver-api/serial/moxa-smartio.rst
F: drivers/net/ethernet/natsemi/natsemi.c
NCR 5380 SCSI DRIVERS
-M: Finn Thain <fthain@telegraphics.com.au>
+M: Finn Thain <fthain@linux-m68k.org>
S: Maintained
W: http://www.iptables.org/
W: http://www.nftables.org/
Q: http://patchwork.ozlabs.org/project/netfilter-devel/list/
+C: irc://irc.libera.chat/netfilter
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
F: include/linux/netfilter*
F: fs/ntfs/
NUBUS SUBSYSTEM
-M: Finn Thain <fthain@telegraphics.com.au>
+M: Finn Thain <fthain@linux-m68k.org>
S: Maintained
F: arch/*/include/asm/nubus.h
S: Maintained
F: drivers/net/dsa/sja1105
+F: drivers/net/pcs/pcs-xpcs-nxp.c
NXP TDA998X DRM DRIVER
F: Documentation/devicetree/bindings/opp/qcom-nvmem-cpufreq.txt
F: drivers/cpufreq/qcom-cpufreq-nvmem.c
+QUALCOMM CRYPTO DRIVERS
+S: Maintained
+F: drivers/crypto/qce/
+
QUALCOMM EMAC GIGABIT ETHERNET DRIVER
RADEON and AMDGPU DRM DRIVERS
S: Supported
T: git https://gitlab.freedesktop.org/agd5f/linux.git
F: include/uapi/linux/rpmsg.h
F: samples/rpmsg/
+REMOTE PROCESSOR MESSAGING (RPMSG) WWAN CONTROL DRIVER
+S: Maintained
+F: drivers/net/wwan/rpmsg_wwan_ctrl.c
+
RENESAS CLOCK DRIVERS
N: riscv
K: riscv
+RISC-V/MICROCHIP POLARFIRE SOC SUPPORT
+S: Supported
+F: drivers/mailbox/mailbox-mpfs.c
+F: drivers/soc/microchip/
+F: include/soc/microchip/mpfs.h
+
RNBD BLOCK DRIVERS
F: drivers/ssb/
F: include/linux/ssb/
+SONY IMX208 SENSOR DRIVER
+S: Maintained
+T: git git://linuxtv.org/media_tree.git
+F: drivers/media/i2c/imx208.c
+
SONY IMX214 SENSOR DRIVER
S: Supported
F: drivers/net/pcs/pcs-xpcs.c
+F: drivers/net/pcs/pcs-xpcs.h
F: include/linux/pcs/pcs-xpcs.h
SYNOPSYS DESIGNWARE I2C DRIVER
T: git git://repo.or.cz/linux-2.6/linux-acpi-2.6/ibm-acpi-2.6.git
F: drivers/platform/x86/thinkpad_acpi.c
+THINKPAD LMI DRIVER
+S: Maintained
+F: Documentation/ABI/testing/sysfs-class-firmware-attributes
+F: drivers/platform/x86/think-lmi.?
+
THUNDERBOLT DMA TRAFFIC TEST DRIVER
F: include/linux/regulator/
K: regulator_get_optional
+VOLTAGE AND CURRENT REGULATOR IRQ HELPERS
+F: drivers/regulator/irq_helpers.c
+
VRF
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git
F: Documentation/core-api/printk-formats.rst
F: lib/test_printf.c
+F: lib/test_scanf.c
F: lib/vsprintf.c
VT1211 HARDWARE MONITOR DRIVER
F: include/linux/workqueue.h
F: kernel/workqueue.c
+WWAN DRIVERS
+S: Maintained
+F: drivers/net/wwan/
+F: include/linux/wwan.h
+F: include/uapi/linux/wwan.h
+
X-POWERS AXP288 PMIC DRIVERS
S: Maintained
S: Maintained
- F: include/linux/zbud.h
F: mm/zbud.c
ZD1211RW WIRELESS DRIVER
config ARCH_MAY_HAVE_PC_FDC
bool
- config ZONE_DMA
- bool
-
config ARCH_SUPPORTS_UPROBES
def_bool y
select ARM_VIC
select GENERIC_IRQ_MULTI_HANDLER
select AUTO_ZRELADDR
- select CLKDEV_LOOKUP
select CLKSRC_MMIO
select CPU_ARM920T
select GPIOLIB
bool "TI OMAP1"
depends on MMU
select ARCH_OMAP
- select CLKDEV_LOOKUP
select CLKSRC_MMIO
select GENERIC_IRQ_CHIP
select GENERIC_IRQ_MULTI_HANDLER
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
+ select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_ELF_PROT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK if !PREEMPTION
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+ select ARCH_WANTS_NO_INSTR
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARM_AMBA
select ARM_ARCH_TIMER
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
- select HAVE_ARCH_PFN_VALID
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
config GENERIC_CALIBRATE_DELAY
def_bool y
- config ZONE_DMA
- bool "Support DMA zone" if EXPERT
- default y
-
- config ZONE_DMA32
- bool "Support DMA32 zone" if EXPERT
- default y
-
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
def_bool y
depends on NUMA
- config HOLES_IN_ZONE
- def_bool y
-
source "kernel/Kconfig.hz"
config ARCH_SPARSEMEM_ENABLE
config ARM64_PTR_AUTH
bool "Enable support for pointer authentication"
default y
- depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) && AS_HAS_PAC
- # Modern compilers insert a .note.gnu.property section note for PAC
- # which is only understood by binutils starting with version 2.33.1.
- depends on LD_IS_LLD || LD_VERSION >= 23301 || (CC_IS_GCC && GCC_VERSION < 90100)
- depends on !CC_IS_CLANG || AS_HAS_CFI_NEGATE_RA_STATE
- depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
help
Pointer authentication (part of the ARMv8.3 Extensions) provides
instructions for signing and authenticating pointers against secret
for each process at exec() time, with these keys being
context-switched along with the process.
- If the compiler supports the -mbranch-protection or
- -msign-return-address flag (e.g. GCC 7 or later), then this option
- will also cause the kernel itself to be compiled with return address
- protection. In this case, and if the target hardware is known to
- support pointer authentication, then CONFIG_STACKPROTECTOR can be
- disabled with minimal loss of protection.
-
The feature is detected at runtime. If the feature is not present in
hardware it will not be advertised to userspace/KVM guest nor will it
be enabled.
but with the feature disabled. On such a system, this option should
not be selected.
+config ARM64_PTR_AUTH_KERNEL
+ bool "Use pointer authentication for kernel"
+ default y
+ depends on ARM64_PTR_AUTH
+ depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) && AS_HAS_PAC
+ # Modern compilers insert a .note.gnu.property section note for PAC
+ # which is only understood by binutils starting with version 2.33.1.
+ depends on LD_IS_LLD || LD_VERSION >= 23301 || (CC_IS_GCC && GCC_VERSION < 90100)
+ depends on !CC_IS_CLANG || AS_HAS_CFI_NEGATE_RA_STATE
+ depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
+ help
+ If the compiler supports the -mbranch-protection or
+ -msign-return-address flag (e.g. GCC 7 or later), then this option
+ will cause the kernel itself to be compiled with return address
+ protection. In this case, and if the target hardware is known to
+ support pointer authentication, then CONFIG_STACKPROTECTOR can be
+ disabled with minimal loss of protection.
+
This feature works with FUNCTION_GRAPH_TRACER option only if
DYNAMIC_FTRACE_WITH_REGS is enabled.
bool "Use Branch Target Identification for kernel"
default y
depends on ARM64_BTI
- depends on ARM64_PTR_AUTH
+ depends on ARM64_PTR_AUTH_KERNEL
depends on CC_HAS_BRANCH_PROT_PAC_RET_BTI
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697
depends on !CC_IS_GCC || GCC_VERSION >= 100100
#define MT_NORMAL 0
#define MT_NORMAL_TAGGED 1
#define MT_NORMAL_NC 2
-#define MT_NORMAL_WT 3
-#define MT_DEVICE_nGnRnE 4
-#define MT_DEVICE_nGnRE 5
-#define MT_DEVICE_GRE 6
+#define MT_DEVICE_nGnRnE 3
+#define MT_DEVICE_nGnRE 4
/*
* Memory types for Stage-2 translation
#define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x))
-#ifdef CONFIG_CFI_CLANG
-/*
- * With CONFIG_CFI_CLANG, the compiler replaces function address
- * references with the address of the function's CFI jump table
- * entry. The function_nocfi macro always returns the address of the
- * actual function instead.
- */
-#define function_nocfi(x) ({ \
- void *addr; \
- asm("adrp %0, " __stringify(x) "\n\t" \
- "add %0, %0, :lo12:" __stringify(x) \
- : "=r" (addr)); \
- addr; \
-})
-#endif
-
/*
* virt_to_page(x) convert a _valid_ virtual address to struct page *
* virt_addr_valid(x) indicates whether a virtual address is valid
#define virt_addr_valid(addr) ({ \
__typeof__(addr) __addr = __tag_reset(addr); \
- __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \
+ __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); \
})
void dump_mem_limit(void);
#ifndef __ASSEMBLY__
#include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h> /* for gfp_t */
#include <asm/pgtable-types.h>
struct page;
void copy_highpage(struct page *to, struct page *from);
#define __HAVE_ARCH_COPY_HIGHPAGE
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
- alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
-#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
+struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
+ unsigned long vaddr);
+#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE_MOVABLE
+
+void tag_clear_highpage(struct page *to);
+#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
typedef struct page *pgtable_t;
- extern int pfn_valid(unsigned long);
+ int pfn_is_map_memory(unsigned long pfn);
#include <asm/memory.h>
#define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
- #define FIRST_USER_ADDRESS 0UL
-
#ifndef __ASSEMBLY__
#include <asm/cmpxchg.h>
if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
__sync_icache_dcache(pte);
- if (system_supports_mte() &&
- pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
- mte_sync_tags(ptep, pte);
+ /*
+ * If the PTE would provide user space access to the tags associated
+ * with it then ensure that the MTE tags are synchronised. Although
+ * pte_access_permitted() returns false for exec only mappings, they
+ * don't expose tags (instruction fetches don't check tags).
+ */
+ if (system_supports_mte() && pte_access_permitted(pte, false) &&
+ !pte_special(pte)) {
+ pte_t old_pte = READ_ONCE(*ptep);
+ /*
+ * We only need to synchronise if the new PTE has tags enabled
+ * or if swapping in (in which case another mapping may have
+ * set tags in the past even if this PTE isn't tagged).
+ * (!pte_none() && !pte_present()) is an open coded version of
+ * is_swap_pte()
+ */
+ if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+ mte_sync_tags(old_pte, pte);
+ }
__check_racy_pte_update(mm, ptep, pte);
#define pmd_none(pmd) (!pmd_val(pmd))
-#define pmd_bad(pmd) (!(pmd_val(pmd) & PMD_TABLE_BIT))
-
#define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_TABLE)
#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_SECT)
#define pmd_leaf(pmd) pmd_sect(pmd)
+#define pmd_bad(pmd) (!pmd_table(pmd))
#define pmd_leaf_size(pmd) (pmd_cont(pmd) ? CONT_PMD_SIZE : PMD_SIZE)
#define pte_leaf_size(pte) (pte_cont(pte) ? CONT_PTE_SIZE : PAGE_SIZE)
pr_err("%s:%d: bad pmd %016llx.\n", __FILE__, __LINE__, pmd_val(e))
#define pud_none(pud) (!pud_val(pud))
-#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
+#define pud_bad(pud) (!pud_table(pud))
#define pud_present(pud) pte_present(pud_pte(pud))
#define pud_leaf(pud) pud_sect(pud)
#define pud_valid(pud) pte_valid(pud_pte(pud))
#include <linux/interrupt.h>
#include <linux/smp.h>
#include <linux/fs.h>
+ #include <linux/panic_notifier.h>
#include <linux/proc_fs.h>
#include <linux/memblock.h>
#include <linux/of_fdt.h>
u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
set_cpu_logical_map(0, mpidr);
- /*
- * clear __my_cpu_offset on boot CPU to avoid hang caused by
- * using percpu variable early, for example, lockdep will
- * access percpu variable inside lock_release
- */
- set_my_cpu_offset(0);
pr_info("Booting Linux on physical CPU 0x%010lx [0x%08x]\n",
(unsigned long)mpidr, read_cpuid_id());
}
* faults in case uaccess_enable() is inadvertently called by the init
* thread.
*/
- init_task.thread_info.ttbr0 = __pa_symbol(reserved_pg_dir);
+ init_task.thread_info.ttbr0 = phys_to_ttbr(__pa_symbol(reserved_pg_dir));
#endif
if (boot_args[1] || boot_args[2] || boot_args[3]) {
static bool kvm_is_device_pfn(unsigned long pfn)
{
- return !pfn_valid(pfn);
+ return !pfn_is_map_memory(pfn);
}
static void *stage2_memcache_zalloc_page(void *arg)
return __va(phys);
}
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+ __clean_dcache_guest_page(va, size);
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+ __invalidate_icache_guest_page(va, size);
+}
+
/*
* Unmapping vs dcache management:
*
.page_count = kvm_host_page_count,
.phys_to_virt = kvm_host_va,
.virt_to_phys = kvm_host_pa,
+ .dcache_clean_inval_poc = clean_dcache_guest_page,
+ .icache_inval_pou = invalidate_icache_guest_page,
};
/**
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}
-static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- __clean_dcache_guest_page(pfn, size);
-}
-
-static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- __invalidate_icache_guest_page(pfn, size);
-}
-
static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
{
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
return PAGE_SIZE;
}
+static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
+{
+ unsigned long pa;
+
+ if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
+ return huge_page_shift(hstate_vma(vma));
+
+ if (!(vma->vm_flags & VM_PFNMAP))
+ return PAGE_SHIFT;
+
+ VM_BUG_ON(is_vm_hugetlb_page(vma));
+
+ pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
+
+#ifndef __PAGETABLE_PMD_FOLDED
+ if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
+ ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
+ ALIGN(hva, PUD_SIZE) <= vma->vm_end)
+ return PUD_SHIFT;
+#endif
+
+ if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
+ ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
+ ALIGN(hva, PMD_SIZE) <= vma->vm_end)
+ return PMD_SHIFT;
+
+ return PAGE_SHIFT;
+}
+
+/*
+ * The page will be mapped in stage 2 as Normal Cacheable, so the VM will be
+ * able to see the page's tags and therefore they must be initialised first. If
+ * PG_mte_tagged is set, tags have already been initialised.
+ *
+ * The race in the test/set of the PG_mte_tagged flag is handled by:
+ * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
+ * racing to santise the same page
+ * - mmap_lock protects between a VM faulting a page in and the VMM performing
+ * an mprotect() to add VM_MTE
+ */
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+ unsigned long size)
+{
+ unsigned long i, nr_pages = size >> PAGE_SHIFT;
+ struct page *page;
+
+ if (!kvm_has_mte(kvm))
+ return 0;
+
+ /*
+ * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+ * that may not support tags.
+ */
+ page = pfn_to_online_page(pfn);
+
+ if (!page)
+ return -EFAULT;
+
+ for (i = 0; i < nr_pages; i++, page++) {
+ if (!test_bit(PG_mte_tagged, &page->flags)) {
+ mte_clear_page_tags(page_address(page));
+ set_bit(PG_mte_tagged, &page->flags);
+ }
+ }
+
+ return 0;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot, unsigned long hva,
unsigned long fault_status)
bool write_fault, writable, force_pte = false;
bool exec_fault;
bool device = false;
+ bool shared;
unsigned long mmu_seq;
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
return -EFAULT;
}
- /* Let's check if we will get back a huge page backed by hugetlbfs */
+ /*
+ * Let's check if we will get back a huge page backed by hugetlbfs, or
+ * get block mapping for device MMIO region.
+ */
mmap_read_lock(current->mm);
vma = vma_lookup(current->mm, hva);
if (unlikely(!vma)) {
return -EFAULT;
}
- if (is_vm_hugetlb_page(vma))
- vma_shift = huge_page_shift(hstate_vma(vma));
- else
- vma_shift = PAGE_SHIFT;
-
- if (logging_active ||
- (vma->vm_flags & VM_PFNMAP)) {
+ /*
+ * logging_active is guaranteed to never be true for VM_PFNMAP
+ * memslots.
+ */
+ if (logging_active) {
force_pte = true;
vma_shift = PAGE_SHIFT;
+ } else {
+ vma_shift = get_vma_page_shift(vma, hva);
}
+ shared = (vma->vm_flags & VM_PFNMAP);
+
switch (vma_shift) {
#ifndef __PAGETABLE_PMD_FOLDED
case PUD_SHIFT:
return -EFAULT;
if (kvm_is_device_pfn(pfn)) {
+ /*
+ * If the page was identified as device early by looking at
+ * the VMA flags, vma_pagesize is already representing the
+ * largest quantity we can map. If instead it was mapped
+ * via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
+ * and must not be upgraded.
+ *
+ * In both cases, we don't let transparent_hugepage_adjust()
+ * change things at the last minute.
+ */
device = true;
- force_pte = true;
} else if (logging_active && !write_fault) {
/*
* Only actually map the page as writable if this was a write
* If we are not forced to use page mapping, check if we are
* backed by a THP and thus use block mapping if possible.
*/
- if (vma_pagesize == PAGE_SIZE && !force_pte)
+ if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
vma_pagesize = transparent_hugepage_adjust(memslot, hva,
&pfn, &fault_ipa);
+
+ if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
+ /* Check the VMM hasn't introduced a new VM_SHARED VMA */
+ if (!shared)
+ ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+ else
+ ret = -EFAULT;
+ if (ret)
+ goto out_unlock;
+ }
+
if (writable)
prot |= KVM_PGTABLE_PROT_W;
- if (fault_status != FSC_PERM && !device)
- clean_dcache_guest_page(pfn, vma_pagesize);
-
- if (exec_fault) {
+ if (exec_fault)
prot |= KVM_PGTABLE_PROT_X;
- invalidate_icache_guest_page(pfn, vma_pagesize);
- }
if (device)
prot |= KVM_PGTABLE_PROT_DEVICE;
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
kvm_pfn_t pfn = pte_pfn(range->pte);
+ int ret;
if (!kvm->arch.mmu.pgt)
return false;
WARN_ON(range->end - range->start != 1);
- /*
- * We've moved a page around, probably through CoW, so let's treat it
- * just like a translation fault and clean the cache to the PoC.
- */
- clean_dcache_guest_page(pfn, PAGE_SIZE);
+ ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+ if (ret)
+ return false;
/*
+ * We've moved a page around, probably through CoW, so let's treat
+ * it just like a translation fault and the map handler will clean
+ * the cache to the PoC.
+ *
* The MMU notifiers will have unmapped a huge PMD before calling
* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
* therefore we never need to clear out a huge PMD through this
{
hva_t hva = mem->userspace_addr;
hva_t reg_end = hva + mem->memory_size;
- bool writable = !(mem->flags & KVM_MEM_READONLY);
int ret = 0;
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
mmap_read_lock(current->mm);
/*
* A memory region could potentially cover multiple VMAs, and any holes
- * between them, so iterate over all of them to find out if we can map
- * any of them right now.
+ * between them, so iterate over all of them.
*
* +--------------------------------------------+
* +---------------+----------------+ +----------------+
*/
do {
struct vm_area_struct *vma;
- hva_t vm_start, vm_end;
vma = find_vma_intersection(current->mm, hva, reg_end);
if (!vma)
break;
/*
- * Take the intersection of this VMA with the memory region
+ * VM_SHARED mappings are not allowed with MTE to avoid races
+ * when updating the PG_mte_tagged page flag, see
+ * sanitise_mte_tags for more details.
*/
- vm_start = max(hva, vma->vm_start);
- vm_end = min(reg_end, vma->vm_end);
+ if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED)
+ return -EINVAL;
if (vma->vm_flags & VM_PFNMAP) {
- gpa_t gpa = mem->guest_phys_addr +
- (vm_start - mem->userspace_addr);
- phys_addr_t pa;
-
- pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
- pa += vm_start - vma->vm_start;
-
/* IO region dirty page logging not allowed */
if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
ret = -EINVAL;
- goto out;
- }
-
- ret = kvm_phys_addr_ioremap(kvm, gpa, pa,
- vm_end - vm_start,
- writable);
- if (ret)
break;
+ }
}
- hva = vm_end;
+ hva = min(reg_end, vma->vm_end);
} while (hva < reg_end);
- if (change == KVM_MR_FLAGS_ONLY)
- goto out;
-
- spin_lock(&kvm->mmu_lock);
- if (ret)
- unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
- else if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
- stage2_flush_memslot(kvm, memslot);
- spin_unlock(&kvm->mmu_lock);
-out:
mmap_read_unlock(current->mm);
return ret;
}
free_area_init(max_zone_pfns);
}
- int pfn_valid(unsigned long pfn)
+ int pfn_is_map_memory(unsigned long pfn)
{
phys_addr_t addr = PFN_PHYS(pfn);
- struct mem_section *ms;
-
- /*
- * Ensure the upper PAGE_SHIFT bits are clear in the
- * pfn. Else it might lead to false positives when
- * some of the upper bits are set, but the lower bits
- * match a valid pfn.
- */
- if (PHYS_PFN(addr) != pfn)
- return 0;
-
- if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
- return 0;
- ms = __pfn_to_section(pfn);
- if (!valid_section(ms))
+ /* avoid false positives for bogus PFNs, see comment in pfn_valid() */
+ if (PHYS_PFN(addr) != pfn)
return 0;
- /*
- * ZONE_DEVICE memory does not have the memblock entries.
- * memblock_is_map_memory() check for ZONE_DEVICE based
- * addresses will always fail. Even the normal hotplugged
- * memory will never have MEMBLOCK_NOMAP flag set in their
- * memblock entries. Skip memblock search for all non early
- * memory sections covering all of hotplug memory including
- * both normal and ZONE_DEVICE based.
- */
- if (!early_section(ms))
- return pfn_section_valid(ms, pfn);
-
return memblock_is_map_memory(addr);
}
- EXPORT_SYMBOL(pfn_valid);
+ EXPORT_SYMBOL(pfn_is_map_memory);
static phys_addr_t memory_limit = PHYS_ADDR_MAX;
BUILD_BUG_ON(TASK_SIZE_32 > DEFAULT_MAP_WINDOW_64);
#endif
+ /*
+ * Selected page table levels should match when derived from
+ * scratch using the virtual address range and page size.
+ */
+ BUILD_BUG_ON(ARM64_HW_PGTABLE_LEVELS(CONFIG_ARM64_VA_BITS) !=
+ CONFIG_PGTABLE_LEVELS);
+
if (PAGE_SIZE >= 16384 && get_num_physpages() <= 128) {
extern int sysctl_overcommit_memory;
/*
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
{
- if (!pfn_valid(pfn))
+ if (!pfn_is_map_memory(pfn))
return pgprot_noncached(vma_prot);
else if (file->f_flags & O_SYNC)
return pgprot_writecombine(vma_prot);
next = pmd_addr_end(addr, end);
/* try section mapping first */
- if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
+ if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmdp, phys, prot);
}
#endif
-#if !ARM64_SWAPPER_USES_SECTION_MAPS
+#if !ARM64_KERNEL_USES_PMD_MAPS
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
return vmemmap_populate_basepages(start, end, node, altmap);
}
-#else /* !ARM64_SWAPPER_USES_SECTION_MAPS */
+#else /* !ARM64_KERNEL_USES_PMD_MAPS */
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
return 0;
}
-#endif /* !ARM64_SWAPPER_USES_SECTION_MAPS */
+#endif /* !ARM64_KERNEL_USES_PMD_MAPS */
+
+#ifdef CONFIG_MEMORY_HOTPLUG
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
-#ifdef CONFIG_MEMORY_HOTPLUG
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
unmap_hotplug_range(start, end, true, altmap);
free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
-#endif
}
+#endif /* CONFIG_MEMORY_HOTPLUG */
static inline pud_t *fixmap_pud(unsigned long addr)
{
return dt_virt;
}
+ #if CONFIG_PGTABLE_LEVELS > 3
int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
{
pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
return 1;
}
+ int pud_clear_huge(pud_t *pudp)
+ {
+ if (!pud_sect(READ_ONCE(*pudp)))
+ return 0;
+ pud_clear(pudp);
+ return 1;
+ }
+ #endif
+
+ #if CONFIG_PGTABLE_LEVELS > 2
int pmd_set_huge(pmd_t *pmdp, phys_addr_t phys, pgprot_t prot)
{
pmd_t new_pmd = pfn_pmd(__phys_to_pfn(phys), mk_pmd_sect_prot(prot));
return 1;
}
- int pud_clear_huge(pud_t *pudp)
- {
- if (!pud_sect(READ_ONCE(*pudp)))
- return 0;
- pud_clear(pudp);
- return 1;
- }
-
int pmd_clear_huge(pmd_t *pmdp)
{
if (!pmd_sect(READ_ONCE(*pmdp)))
pmd_clear(pmdp);
return 1;
}
+ #endif
int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
{
select SYS_SUPPORTS_LITTLE_ENDIAN
select SYS_SUPPORTS_ZBOOT
select DMA_NONCOHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
select IRQ_MIPS_CPU
select PINCTRL
select GPIOLIB
config AR7
bool "Texas Instruments AR7"
select BOOT_ELF32
+ select COMMON_CLK
select DMA_NONCOHERENT
select CEVT_R4K
select CSRC_R4K
select SYS_SUPPORTS_ZBOOT_UART16550
select GPIOLIB
select VLYNQ
- select HAVE_LEGACY_CLK
help
Support for the Texas Instruments AR7 System-on-a-Chip
family: TNETD7100, 7200 and 7300.
select SWAP_IO_SPACE
select GPIOLIB
select MIPS_L1_CACHE_SHIFT_4
- select CLKDEV_LOOKUP
select HAVE_LEGACY_CLK
help
Support for BCM63XX based boards
select MIPS_GENERIC
select MACH_INGENIC
select SYS_SUPPORTS_ZBOOT_UART16550
+ select CPU_SUPPORTS_CPUFREQ
+ select MIPS_EXTERNAL_TIMER
config LANTIQ
bool "Lantiq based platforms"
select GPIOLIB
select SWAP_IO_SPACE
select BOOT_RAW
- select CLKDEV_LOOKUP
select HAVE_LEGACY_CLK
select USE_OF
select PINCTRL
config RALINK
bool "Ralink based machines"
select CEVT_R4K
+ select COMMON_CLK
select CSRC_R4K
select BOOT_RAW
select DMA_NONCOHERENT
select SYS_SUPPORTS_MIPS16
select SYS_SUPPORTS_ZBOOT
select SYS_HAS_EARLY_PRINTK
- select CLKDEV_LOOKUP
select ARCH_HAS_RESET_CONTROLLER
select RESET_CONTROLLER
select HAVE_PLAT_FW_INIT_CMDLINE
select HAVE_PLAT_MEMCPY
select ZONE_DMA32
- select HOLES_IN_ZONE
select GPIOLIB
select USE_OF
select ARCH_SPARSEMEM_ENABLE
select CLKSRC_I8253
select CLKEVT_I8253
select MIPS_EXTERNAL_TIMER
-
- config ZONE_DMA
- bool
-
- config ZONE_DMA32
- bool
-
endmenu
config TRAD_SIGNALS
{
set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
}
- #define pmd_pgtable(pmd) pmd_page(pmd)
/*
* Initialize a new pmd table with invalid pointers.
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{
- pmd_t *pmd;
+ pmd_t *pmd = NULL;
+ struct page *pg;
- pmd = (pmd_t *) __get_free_pages(GFP_KERNEL, PMD_ORDER);
- if (pmd)
+ pg = alloc_pages(GFP_KERNEL | __GFP_ACCOUNT, PMD_ORDER);
+ if (pg) {
+ pgtable_pmd_page_ctor(pg);
+ pmd = (pmd_t *)page_address(pg);
pmd_init((unsigned long)pmd, (unsigned long)invalid_pte_table);
+ }
return pmd;
}
config MMU
def_bool y
- config ZONE_DMA
- def_bool y
-
config CPU_BIG_ENDIAN
def_bool y
select ARCH_BINFMT_ELF_STATE
select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM
select ARCH_ENABLE_MEMORY_HOTREMOVE
- select ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
+ select ARCH_WANTS_NO_INSTR
select ARCH_WANT_DEFAULT_BPF_JIT
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_TABLE_SORT
select THREAD_INFO_IN_TASK
select TTY
select VIRT_CPU_ACCOUNTING
+ select ZONE_DMA
# Note: keep the above list sorted alphabetically
config SCHED_OMIT_FRAME_POINTER
return page;
}
- void free_insn_page(void *page)
- {
- module_memfree(page);
- }
-
static void *alloc_s390_insn_page(void)
{
if (xchg(&insn_page_in_use, 1) == 1)
break;
case KPROBE_HIT_ACTIVE:
case KPROBE_HIT_SSDONE:
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accounting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(p);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (p->fault_handler && p->fault_handler(p, regs, trapnr))
- return 1;
-
/*
* In case the user-specified fault handler returned
* zero, try to fix up.
select NEED_DMA_MAP_STATE
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
+ select ZONE_DMA32
config FORCE_DYNAMIC_FTRACE
def_bool y
select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
- select ARCH_ENABLE_SPLIT_PMD_PTLOCK if X86_64 || X86_PAE
+ select ARCH_ENABLE_SPLIT_PMD_PTLOCK if (PGTABLE_LEVELS > 2) && (X86_64 || X86_PAE)
select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_DEBUG_WX
+ select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
- select ARCH_SUPPORTS_LTO_CLANG if X86_64
- select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
+ select ARCH_SUPPORTS_LTO_CLANG
+ select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANT_DEFAULT_BPF_JIT if X86_64
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
+ select ARCH_WANTS_NO_INSTR
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
config ARCH_WANT_GENERAL_HUGETLB
def_bool y
- config ZONE_DMA32
- def_bool y if X86_64
-
config AUDIT_ARCH
def_bool y if X86_64
menu "Processor type and features"
- config ZONE_DMA
- bool "DMA memory allocation support" if EXPERT
- default y
- help
- DMA memory allocation support allows devices with less than 32-bit
- addressing to allocate within the first 16MB of address space.
- Disable if no such devices will be used.
-
- If unsure, say Y.
-
config SMP
bool "Symmetric multi-processing support"
help
Set whether the default state of memory_corruption_check is
on or off.
-config X86_RESERVE_LOW
- int "Amount of low memory, in kilobytes, to reserve for the BIOS"
- default 64
- range 4 640
- help
- Specify the amount of low memory to reserve for the BIOS.
-
- The first page contains BIOS data structures that the kernel
- must not use, so that page must always be reserved.
-
- By default we reserve the first 64K of physical RAM, as a
- number of BIOSes are known to corrupt that memory range
- during events such as suspend/resume or monitor cable
- insertion, so it must not be used by the kernel.
-
- You can set this to 4 if you are absolutely sure that you
- trust the BIOS to get all its memory reservations and usages
- right. If you know your BIOS have problems beyond the
- default 64K area, you can set this to 640 to avoid using the
- entire low memory range.
-
- If you have doubts about the BIOS (e.g. suspend/resume does
- not work or there's kernel crashes after certain hardware
- hotplug events) then you might want to enable
- X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check
- typical corruption patterns.
-
- Leave this to the default value of 64 if you are unsure.
-
config MATH_EMULATION
bool
depends on MODIFY_LDT_SYSCALL
#include <asm/irq_vectors.h>
#include <asm/cpu_entry_area.h>
+ #include <linux/debug_locks.h>
#include <linux/smp.h>
#include <linux/percpu.h>
asm volatile("sidt %0":"=m" (*dtr));
}
+static inline void native_gdt_invalidate(void)
+{
+ const struct desc_ptr invalid_gdt = {
+ .address = 0,
+ .size = 0
+ };
+
+ native_load_gdt(&invalid_gdt);
+}
+
+static inline void native_idt_invalidate(void)
+{
+ const struct desc_ptr invalid_idt = {
+ .address = 0,
+ .size = 0
+ };
+
+ native_load_idt(&invalid_idt);
+}
+
/*
* The LTR instruction marks the TSS GDT entry as busy. On 64-bit, the GDT is
* a read-only remapping. To prevent a page fault, the GDT is switched to the
#ifdef CONFIG_X86_64
extern void idt_setup_early_pf(void);
-extern void idt_setup_ist_traps(void);
#else
static inline void idt_setup_early_pf(void) { }
-static inline void idt_setup_ist_traps(void) { }
#endif
-extern void idt_invalidate(void *addr);
+extern void idt_invalidate(void);
#endif /* _ASM_X86_DESC_H */
#include <linux/irq.h>
#include <linux/kexec.h>
#include <linux/i8253.h>
+ #include <linux/panic_notifier.h>
#include <linux/random.h>
#include <asm/processor.h>
#include <asm/hypervisor.h>
for_each_present_cpu(i) {
if (i == 0)
continue;
- ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
+ ret = hv_call_add_logical_proc(numa_cpu_node(i), i, i);
BUG_ON(ret);
}
static void __init ms_hyperv_init_platform(void)
{
+ int hv_max_functions_eax;
int hv_host_info_eax;
int hv_host_info_ebx;
int hv_host_info_ecx;
ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
+ hv_max_functions_eax = cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS);
+
pr_info("Hyper-V: privilege flags low 0x%x, high 0x%x, hints 0x%x, misc 0x%x\n",
ms_hyperv.features, ms_hyperv.priv_high, ms_hyperv.hints,
ms_hyperv.misc_features);
/*
* Extract host information.
*/
- if (cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS) >=
- HYPERV_CPUID_VERSION) {
+ if (hv_max_functions_eax >= HYPERV_CPUID_VERSION) {
hv_host_info_eax = cpuid_eax(HYPERV_CPUID_VERSION);
hv_host_info_ebx = cpuid_ebx(HYPERV_CPUID_VERSION);
hv_host_info_ecx = cpuid_ecx(HYPERV_CPUID_VERSION);
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
}
- if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) {
+ if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
ms_hyperv.nested_features =
cpuid_eax(HYPERV_CPUID_NESTED_FEATURES);
+ pr_info("Hyper-V: Nested features: 0x%x\n",
+ ms_hyperv.nested_features);
}
/*
return page;
}
- /* Recover page to RW mode before releasing it */
- void free_insn_page(void *page)
- {
- module_memfree(page);
- }
-
/* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
break;
if (insn->addr_bytes != sizeof(unsigned long))
- return -EOPNOTSUPP; /* Don't support differnt size */
+ return -EOPNOTSUPP; /* Don't support different size */
if (X86_MODRM_MOD(opcode) != 3)
return -EOPNOTSUPP; /* TODO: support memory addressing */
restore_previous_kprobe(kcb);
else
reset_current_kprobe();
- } else if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
- kcb->kprobe_status == KPROBE_HIT_SSDONE) {
- /*
- * We increment the nmissed count for accounting,
- * we can also use npre/npostfault count for accounting
- * these specific fault cases.
- */
- kprobes_inc_nmissed_count(cur);
-
- /*
- * We come here because instructions in the pre/post
- * handler caused the page_fault, this could happen
- * if handler tries to access user space by
- * copy_from_user(), get_user() etc. Let the
- * user-specified handler try to fix it first.
- */
- if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
- return 1;
}
return 0;
#include <linux/initrd.h>
#include <linux/iscsi_ibft.h>
#include <linux/memblock.h>
+ #include <linux/panic_notifier.h>
#include <linux/pci.h>
#include <linux/root_dev.h>
#include <linux/hugetlb.h>
e820__range_add(start, size, E820_TYPE_RAM);
}
-static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10;
-
-static int __init parse_reservelow(char *p)
-{
- unsigned long long size;
-
- if (!p)
- return -EINVAL;
-
- size = memparse(p, &p);
-
- if (size < 4096)
- size = 4096;
-
- if (size > 640*1024)
- size = 640*1024;
-
- reserve_low = size;
-
- return 0;
-}
-
-early_param("reservelow", parse_reservelow);
-
static void __init early_reserve_memory(void)
{
/*
#endif
/*
- * Find free memory for the real mode trampoline and place it
- * there.
- * If there is not enough free memory under 1M, on EFI-enabled
- * systems there will be additional attempt to reclaim the memory
- * for the real mode trampoline at efi_free_boot_services().
+ * Find free memory for the real mode trampoline and place it there. If
+ * there is not enough free memory under 1M, on EFI-enabled systems
+ * there will be additional attempt to reclaim the memory for the real
+ * mode trampoline at efi_free_boot_services().
+ *
+ * Unconditionally reserve the entire first 1M of RAM because BIOSes
+ * are known to corrupt low memory and several hundred kilobytes are not
+ * worth complex detection what memory gets clobbered. Windows does the
+ * same thing for very similar reasons.
*
- * Unconditionally reserve the entire first 1M of RAM because
- * BIOSes are know to corrupt low memory and several
- * hundred kilobytes are not worth complex detection what memory gets
- * clobbered. Moreover, on machines with SandyBridge graphics or in
- * setups that use crashkernel the entire 1M is reserved anyway.
+ * Moreover, on machines with SandyBridge graphics or in setups that use
+ * crashkernel the entire 1M is reserved anyway.
*/
reserve_real_mode();
/*
* zram is claimed so open request will be failed
*/
- bool claim; /* Protected by bdev->bd_mutex */
+ bool claim; /* Protected by disk->open_mutex */
- struct file *backing_dev;
#ifdef CONFIG_ZRAM_WRITEBACK
+ struct file *backing_dev;
spinlock_t wb_limit_lock;
bool wb_limit_enable;
u64 bd_wb_limit;
#include <linux/bug.h>
#include <linux/err.h>
+ #include <linux/limits.h>
#include <linux/log2.h>
#include <linux/math64.h>
+ #include <linux/math.h>
+ #include <linux/minmax.h>
+
#include <linux/clk/analogbits-wrpll-cln28hpc.h>
/* MIN_INPUT_FREQ: minimum input clock frequency, in Hz (Fref_min) */
}
/**
- * wrpll_configure() - compute PLL configuration for a target rate
+ * wrpll_configure_for_rate() - compute PLL configuration for a target rate
* @c: ptr to a struct wrpll_cfg record to write into
* @target_rate: target PLL output clock rate (post-Q-divider)
* @parent_rate: PLL input refclk rate (pre-R-divider)
#include <linux/device.h>
#include <linux/interrupt.h>
#include <linux/notifier.h>
+ #include <linux/panic_notifier.h>
#include <linux/soc/qcom/smem.h>
#include <linux/soc/qcom/smem_state.h>
int ret;
ret = platform_get_irq_byname(smp2p->ipa->pdev, name);
- if (ret <= 0) {
- dev_err(dev, "DT error %d getting \"%s\" IRQ property\n",
- ret, name);
+ if (ret <= 0)
return ret ? : -EINVAL;
- }
irq = ret;
ret = request_threaded_irq(irq, NULL, handler, 0, name, smp2p);
void __register_binfmt(struct linux_binfmt * fmt, int insert)
{
- BUG_ON(!fmt);
- if (WARN_ON(!fmt->load_binary))
- return;
write_lock(&binfmt_lock);
insert ? list_add(&fmt->lh, &formats) :
list_add_tail(&fmt->lh, &formats);
WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
flush_signal_handlers(me, 0);
+ retval = set_cred_ucounts(bprm->cred);
+ if (retval < 0)
+ goto out_unlock;
+
/*
* install the new credentials for this executable
*/
* whether NPROC limit is still exceeded.
*/
if ((current->flags & PF_NPROC_EXCEEDED) &&
- atomic_read(¤t_user()->processes) > rlimit(RLIMIT_NPROC)) {
+ is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
retval = -EAGAIN;
goto out_ret;
}
* XXX: Huge page cache doesn't support writing yet. Drop all page
* cache for this file before processing writes.
*/
- if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
- truncate_pagecache(inode, 0);
+ if (f->f_mode & FMODE_WRITE) {
+ /*
+ * Paired with smp_mb() in collapse_file() to ensure nr_thps
+ * is up to date and the update to i_writecount by
+ * get_write_access() is visible. Ensures subsequent insertion
+ * of THPs into the page cache will fail.
+ */
+ smp_mb();
+ if (filemap_nr_thps(inode->i_mapping))
+ truncate_pagecache(inode, 0);
+ }
return 0;
inline int build_open_flags(const struct open_how *how, struct open_flags *op)
{
- int flags = how->flags;
+ u64 flags = how->flags;
+ u64 strip = FMODE_NONOTIFY | O_CLOEXEC;
int lookup_flags = 0;
int acc_mode = ACC_MODE(flags);
- /* Must never be set by userspace */
- flags &= ~(FMODE_NONOTIFY | O_CLOEXEC);
+ BUILD_BUG_ON_MSG(upper_32_bits(VALID_OPEN_FLAGS),
+ "struct open_flags doesn't yet handle flags > 32 bits");
+
+ /*
+ * Strip flags that either shouldn't be set by userspace like
+ * FMODE_NONOTIFY or that aren't relevant in determining struct
+ * open_flags like O_CLOEXEC.
+ */
+ flags &= ~strip;
/*
* Older syscalls implicitly clear all of the invalid flags or argument
return ret;
}
-static inline long userfaultfd_get_blocking_state(unsigned int flags)
+static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
{
if (flags & FAULT_FLAG_INTERRUPTIBLE)
return TASK_INTERRUPTIBLE;
struct userfaultfd_wait_queue uwq;
vm_fault_t ret = VM_FAULT_SIGBUS;
bool must_wait;
- long blocking_state;
+ unsigned int blocking_state;
/*
* We don't do userfault handling for the final child pid update.
}
if (vm_flags & VM_UFFD_MINOR) {
- /* FIXME: Add minor fault interception for shmem. */
- if (!is_vm_hugetlb_page(vma))
+ if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
return false;
}
vm_flags = 0;
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
vm_flags |= VM_UFFD_MISSING;
- if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)
+ if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) {
+ #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+ goto out;
+ #endif
vm_flags |= VM_UFFD_WP;
+ }
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) {
#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
goto out;
/* report all available features and ioctls to userland */
uffdio_api.features = UFFD_API_FEATURES;
#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
- uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
+ uffdio_api.features &=
+ ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
+ #endif
+ #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+ uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
#endif
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
#include <linux/compiler.h>
#include <linux/instrumentation.h>
+#include <linux/once_lite.h>
#define CUT_HERE "------------[ cut here ]------------\n"
#endif
#ifndef __ASSEMBLY__
- #include <linux/kernel.h>
+ #include <linux/panic.h>
+ #include <linux/printk.h>
#ifdef CONFIG_BUG
})
#ifndef WARN_ON_ONCE
-#define WARN_ON_ONCE(condition) ({ \
- static bool __section(".data.once") __warned; \
- int __ret_warn_once = !!(condition); \
- \
- if (unlikely(__ret_warn_once && !__warned)) { \
- __warned = true; \
- WARN_ON(1); \
- } \
- unlikely(__ret_warn_once); \
-})
+#define WARN_ON_ONCE(condition) \
+ DO_ONCE_LITE_IF(condition, WARN_ON, 1)
#endif
-#define WARN_ONCE(condition, format...) ({ \
- static bool __section(".data.once") __warned; \
- int __ret_warn_once = !!(condition); \
- \
- if (unlikely(__ret_warn_once && !__warned)) { \
- __warned = true; \
- WARN(1, format); \
- } \
- unlikely(__ret_warn_once); \
-})
+#define WARN_ONCE(condition, format...) \
+ DO_ONCE_LITE_IF(condition, WARN, 1, format)
-#define WARN_TAINT_ONCE(condition, taint, format...) ({ \
- static bool __section(".data.once") __warned; \
- int __ret_warn_once = !!(condition); \
- \
- if (unlikely(__ret_warn_once && !__warned)) { \
- __warned = true; \
- WARN_TAINT(1, taint, format); \
- } \
- unlikely(__ret_warn_once); \
-})
+#define WARN_TAINT_ONCE(condition, taint, format...) \
+ DO_ONCE_LITE_IF(condition, WARN_TAINT, 1, taint, format)
#else /* !CONFIG_BUG */
#ifndef HAVE_ARCH_BUG
/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
- __no_kcsan __no_sanitize_address __no_profile
- __no_kcsan __no_sanitize_address __no_sanitize_coverage
++ __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
#endif /* __KERNEL__ */
#include <linux/shm.h>
#include <asm/tlbflush.h>
+ /*
+ * For HugeTLB page, there are more metadata to save in the struct page. But
+ * the head struct page cannot meet our needs, so we have to abuse other tail
+ * struct page to store the metadata. In order to avoid conflicts caused by
+ * subsequent use of more tail struct pages, we gather these discrete indexes
+ * of tail struct page here.
+ */
+ enum {
+ SUBPAGE_INDEX_SUBPOOL = 1, /* reuse page->private */
+ #ifdef CONFIG_CGROUP_HUGETLB
+ SUBPAGE_INDEX_CGROUP, /* reuse page->private */
+ SUBPAGE_INDEX_CGROUP_RSVD, /* reuse page->private */
+ __MAX_CGROUP_SUBPAGE_INDEX = SUBPAGE_INDEX_CGROUP_RSVD,
+ #endif
+ __NR_USED_SUBPAGE,
+ };
+
struct hugepage_subpool {
spinlock_t lock;
long count;
extern const struct file_operations hugetlbfs_file_operations;
extern const struct vm_operations_struct hugetlb_vm_ops;
struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
- struct user_struct **user, int creat_flags,
+ struct ucounts **ucounts, int creat_flags,
int page_size_log);
static inline bool is_file_hugepages(struct file *file)
#define is_file_hugepages(file) false
static inline struct file *
hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag,
- struct user_struct **user, int creat_flags,
+ struct ucounts **ucounts, int creat_flags,
int page_size_log)
{
return ERR_PTR(-ENOSYS);
* modifications require hugetlb_lock.
* HPG_freed - Set when page is on the free lists.
* Synchronization: hugetlb_lock held for examination and modification.
+ * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
*/
enum hugetlb_page_flags {
HPG_restore_reserve = 0,
HPG_migratable,
HPG_temporary,
HPG_freed,
+ HPG_vmemmap_optimized,
__NR_HPAGEFLAGS,
};
HPAGEFLAG(Migratable, migratable)
HPAGEFLAG(Temporary, temporary)
HPAGEFLAG(Freed, freed)
+ HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
#ifdef CONFIG_HUGETLB_PAGE
unsigned int nr_huge_pages_node[MAX_NUMNODES];
unsigned int free_huge_pages_node[MAX_NUMNODES];
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+ #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+ unsigned int nr_free_vmemmap_pages;
+ #endif
#ifdef CONFIG_CGROUP_HUGETLB
/* cgroup control files */
struct cftype cgroup_files_dfl[7];
*/
static inline struct hugepage_subpool *hugetlb_page_subpool(struct page *hpage)
{
- return (struct hugepage_subpool *)(hpage+1)->private;
+ return (void *)page_private(hpage + SUBPAGE_INDEX_SUBPOOL);
}
static inline void hugetlb_set_page_subpool(struct page *hpage,
struct hugepage_subpool *subpool)
{
- set_page_private(hpage+1, (unsigned long)subpool);
+ set_page_private(hpage + SUBPAGE_INDEX_SUBPOOL, (unsigned long)subpool);
}
static inline struct hstate *hstate_file(struct file *f)
#endif
#ifndef arch_make_huge_pte
- static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
- struct page *page, int writable)
+ static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift,
+ vm_flags_t flags)
{
return entry;
}
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
+ static inline struct hugepage_subpool *hugetlb_page_subpool(struct page *hpage)
+ {
+ return NULL;
+ }
+
static inline int isolate_or_dissolve_huge_page(struct page *page,
struct list_head *list)
{
}
#endif /* CONFIG_HUGETLB_PAGE */
+ #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+ extern bool hugetlb_free_vmemmap_enabled;
+ #else
+ #define hugetlb_free_vmemmap_enabled false
+ #endif
+
static inline spinlock_t *huge_pte_lock(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
#include <linux/types.h>
#include <linux/compiler.h>
#include <linux/bitops.h>
+ #include <linux/kstrtox.h>
#include <linux/log2.h>
#include <linux/math.h>
#include <linux/minmax.h>
#include <linux/typecheck.h>
+ #include <linux/panic.h>
#include <linux/printk.h>
#include <linux/build_bug.h>
#include <linux/static_call_types.h>
*/
#define lower_32_bits(n) ((u32)((n) & 0xffffffff))
+/**
+ * upper_16_bits - return bits 16-31 of a number
+ * @n: the number we're accessing
+ */
+#define upper_16_bits(n) ((u16)((n) >> 16))
+
+/**
+ * lower_16_bits - return bits 0-15 of a number
+ * @n: the number we're accessing
+ */
+#define lower_16_bits(n) ((u16)((n) & 0xffff))
+
struct completion;
- struct pt_regs;
struct user;
#ifdef CONFIG_PREEMPT_VOLUNTARY
static inline void might_fault(void) { }
#endif
- extern struct atomic_notifier_head panic_notifier_list;
- extern long (*panic_blink)(int state);
- __printf(1, 2)
- void panic(const char *fmt, ...) __noreturn __cold;
- void nmi_panic(struct pt_regs *regs, const char *msg);
- extern void oops_enter(void);
- extern void oops_exit(void);
- extern bool oops_may_print(void);
void do_exit(long error_code) __noreturn;
void complete_and_exit(struct completion *, long) __noreturn;
- /* Internal, do not use. */
- int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
- int __must_check _kstrtol(const char *s, unsigned int base, long *res);
-
- int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
- int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
-
- /**
- * kstrtoul - convert a string to an unsigned long
- * @s: The start of the string. The string must be null-terminated, and may also
- * include a single newline before its terminating null. The first character
- * may also be a plus sign, but not a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- * given as 0, then the base of the string is automatically detected with the
- * conventional semantics - If it begins with 0x the number will be parsed as a
- * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- * parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Preferred over simple_strtoul(). Return code must be checked.
- */
- static inline int __must_check kstrtoul(const char *s, unsigned int base, unsigned long *res)
- {
- /*
- * We want to shortcut function call, but
- * __builtin_types_compatible_p(unsigned long, unsigned long long) = 0.
- */
- if (sizeof(unsigned long) == sizeof(unsigned long long) &&
- __alignof__(unsigned long) == __alignof__(unsigned long long))
- return kstrtoull(s, base, (unsigned long long *)res);
- else
- return _kstrtoul(s, base, res);
- }
-
- /**
- * kstrtol - convert a string to a long
- * @s: The start of the string. The string must be null-terminated, and may also
- * include a single newline before its terminating null. The first character
- * may also be a plus sign or a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- * given as 0, then the base of the string is automatically detected with the
- * conventional semantics - If it begins with 0x the number will be parsed as a
- * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- * parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Preferred over simple_strtol(). Return code must be checked.
- */
- static inline int __must_check kstrtol(const char *s, unsigned int base, long *res)
- {
- /*
- * We want to shortcut function call, but
- * __builtin_types_compatible_p(long, long long) = 0.
- */
- if (sizeof(long) == sizeof(long long) &&
- __alignof__(long) == __alignof__(long long))
- return kstrtoll(s, base, (long long *)res);
- else
- return _kstrtol(s, base, res);
- }
-
- int __must_check kstrtouint(const char *s, unsigned int base, unsigned int *res);
- int __must_check kstrtoint(const char *s, unsigned int base, int *res);
-
- static inline int __must_check kstrtou64(const char *s, unsigned int base, u64 *res)
- {
- return kstrtoull(s, base, res);
- }
-
- static inline int __must_check kstrtos64(const char *s, unsigned int base, s64 *res)
- {
- return kstrtoll(s, base, res);
- }
-
- static inline int __must_check kstrtou32(const char *s, unsigned int base, u32 *res)
- {
- return kstrtouint(s, base, res);
- }
-
- static inline int __must_check kstrtos32(const char *s, unsigned int base, s32 *res)
- {
- return kstrtoint(s, base, res);
- }
-
- int __must_check kstrtou16(const char *s, unsigned int base, u16 *res);
- int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
- int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
- int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
- int __must_check kstrtobool(const char *s, bool *res);
-
- int __must_check kstrtoull_from_user(const char __user *s, size_t count, unsigned int base, unsigned long long *res);
- int __must_check kstrtoll_from_user(const char __user *s, size_t count, unsigned int base, long long *res);
- int __must_check kstrtoul_from_user(const char __user *s, size_t count, unsigned int base, unsigned long *res);
- int __must_check kstrtol_from_user(const char __user *s, size_t count, unsigned int base, long *res);
- int __must_check kstrtouint_from_user(const char __user *s, size_t count, unsigned int base, unsigned int *res);
- int __must_check kstrtoint_from_user(const char __user *s, size_t count, unsigned int base, int *res);
- int __must_check kstrtou16_from_user(const char __user *s, size_t count, unsigned int base, u16 *res);
- int __must_check kstrtos16_from_user(const char __user *s, size_t count, unsigned int base, s16 *res);
- int __must_check kstrtou8_from_user(const char __user *s, size_t count, unsigned int base, u8 *res);
- int __must_check kstrtos8_from_user(const char __user *s, size_t count, unsigned int base, s8 *res);
- int __must_check kstrtobool_from_user(const char __user *s, size_t count, bool *res);
-
- static inline int __must_check kstrtou64_from_user(const char __user *s, size_t count, unsigned int base, u64 *res)
- {
- return kstrtoull_from_user(s, count, base, res);
- }
-
- static inline int __must_check kstrtos64_from_user(const char __user *s, size_t count, unsigned int base, s64 *res)
- {
- return kstrtoll_from_user(s, count, base, res);
- }
-
- static inline int __must_check kstrtou32_from_user(const char __user *s, size_t count, unsigned int base, u32 *res)
- {
- return kstrtouint_from_user(s, count, base, res);
- }
-
- static inline int __must_check kstrtos32_from_user(const char __user *s, size_t count, unsigned int base, s32 *res)
- {
- return kstrtoint_from_user(s, count, base, res);
- }
-
- /*
- * Use kstrto<foo> instead.
- *
- * NOTE: simple_strto<foo> does not check for the range overflow and,
- * depending on the input, may give interesting results.
- *
- * Use these functions if and only if you cannot use kstrto<foo>, because
- * the conversion ends on the first non-digit character, which may be far
- * beyond the supported range. It might be useful to parse the strings like
- * 10x50 or 12:21 without altering original string or temporary buffer in use.
- * Keep in mind above caveat.
- */
-
- extern unsigned long simple_strtoul(const char *,char **,unsigned int);
- extern long simple_strtol(const char *,char **,unsigned int);
- extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
- extern long long simple_strtoll(const char *,char **,unsigned int);
-
extern int num_to_str(char *buf, int size,
unsigned long long num, unsigned int width);
extern int kernel_text_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
- #ifdef CONFIG_SMP
- extern unsigned int sysctl_oops_all_cpu_backtrace;
- #else
- #define sysctl_oops_all_cpu_backtrace 0
- #endif /* CONFIG_SMP */
-
extern void bust_spinlocks(int yes);
- extern int panic_timeout;
- extern unsigned long panic_print;
- extern int panic_on_oops;
- extern int panic_on_unrecovered_nmi;
- extern int panic_on_io_nmi;
- extern int panic_on_warn;
- extern unsigned long panic_on_taint;
- extern bool panic_on_taint_nousertaint;
- extern int sysctl_panic_on_rcu_stall;
- extern int sysctl_max_rcu_stall_to_panic;
- extern int sysctl_panic_on_stackoverflow;
-
- extern bool crash_kexec_post_notifiers;
- /*
- * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
- * holds a CPU number which is executing panic() currently. A value of
- * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
- */
- extern atomic_t panic_cpu;
- #define PANIC_CPU_INVALID -1
-
- /*
- * Only to be used by arch init code. If the user over-wrote the default
- * CONFIG_PANIC_TIMEOUT, honor it.
- */
- static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
- {
- if (panic_timeout == arch_default_timeout)
- panic_timeout = timeout;
- }
- extern const char *print_tainted(void);
- enum lockdep_ok {
- LOCKDEP_STILL_OK,
- LOCKDEP_NOW_UNRELIABLE
- };
- extern void add_taint(unsigned flag, enum lockdep_ok);
- extern int test_taint(unsigned flag);
- extern unsigned long get_taint(void);
extern int root_mountflags;
extern bool early_boot_irqs_disabled;
SYSTEM_SUSPEND,
} system_state;
- /* This cannot be an enum because some may be used in assembly source. */
- #define TAINT_PROPRIETARY_MODULE 0
- #define TAINT_FORCED_MODULE 1
- #define TAINT_CPU_OUT_OF_SPEC 2
- #define TAINT_FORCED_RMMOD 3
- #define TAINT_MACHINE_CHECK 4
- #define TAINT_BAD_PAGE 5
- #define TAINT_USER 6
- #define TAINT_DIE 7
- #define TAINT_OVERRIDDEN_ACPI_TABLE 8
- #define TAINT_WARN 9
- #define TAINT_CRAP 10
- #define TAINT_FIRMWARE_WORKAROUND 11
- #define TAINT_OOT_MODULE 12
- #define TAINT_UNSIGNED_MODULE 13
- #define TAINT_SOFTLOCKUP 14
- #define TAINT_LIVEPATCH 15
- #define TAINT_AUX 16
- #define TAINT_RANDSTRUCT 17
- #define TAINT_FLAGS_COUNT 18
- #define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
-
- struct taint_flag {
- char c_true; /* character printed when tainted */
- char c_false; /* character printed when not tainted */
- bool module; /* also show as a per-module taint flag */
- };
-
- extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];
-
extern const char hex_asc[];
#define hex_asc_lo(x) hex_asc[((x) & 0x0f)]
#define hex_asc_hi(x) hex_asc[((x) & 0xf0) >> 4]
typedef int (*kprobe_pre_handler_t) (struct kprobe *, struct pt_regs *);
typedef void (*kprobe_post_handler_t) (struct kprobe *, struct pt_regs *,
unsigned long flags);
-typedef int (*kprobe_fault_handler_t) (struct kprobe *, struct pt_regs *,
- int trapnr);
typedef int (*kretprobe_handler_t) (struct kretprobe_instance *,
struct pt_regs *);
/* Called after addr is executed, unless... */
kprobe_post_handler_t post_handler;
- /*
- * ... called if executing addr causes a fault (eg. page fault).
- * Return 1 if it handled fault, otherwise kernel will see it.
- */
- kprobe_fault_handler_t fault_handler;
-
/* Saved opcode (which has been replaced with breakpoint) */
kprobe_opcode_t opcode;
void dump_kprobe(struct kprobe *kp);
void *alloc_insn_page(void);
- void free_insn_page(void *page);
int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
char *sym);
#define lm_alias(x) __va(__pa_symbol(x))
#endif
-/*
- * With CONFIG_CFI_CLANG, the compiler replaces function addresses in
- * instrumented C code with jump table addresses. Architectures that
- * support CFI can define this macro to return the actual function address
- * when needed.
- */
-#ifndef function_nocfi
-#define function_nocfi(x) (x)
-#endif
-
/*
* To prevent common memory management code establishing
* a zero page mapping on a read fault.
/* This function must be updated when the size of struct page grows above 80
* or reduces below 56. The idea that compiler optimizes out switch()
* statement, and only leaves move/store instructions. Also the compiler can
- * combine write statments if they are both assignments and can be reordered,
+ * combine write statements if they are both assignments and can be reordered,
* this can result in several of the writes here being dropped.
*/
#define mm_zero_struct_page(pp) __mm_zero_struct_page(pp)
pud_t *pud; /* Pointer to pud entry matching
* the 'address'
*/
- pte_t orig_pte; /* Value of PTE at the time of fault */
+ union {
+ pte_t orig_pte; /* Value of PTE at the time of fault */
+ pmd_t orig_pmd; /* Value of PMD at the time of fault,
+ * used by PMD fault only.
+ */
+ };
struct page *cow_page; /* Page handler may use for COW fault */
struct page *page; /* ->fault handlers should return a
static inline bool page_is_pfmemalloc(const struct page *page)
{
/*
- * Page index cannot be this large so this must be
- * a pfmemalloc page.
+ * lru.next has bit 1 set if the page is allocated from the
+ * pfmemalloc reserves. Callers may simply overwrite it if
+ * they do not need to preserve that information.
*/
- return page->index == -1UL;
+ return (uintptr_t)page->lru.next & BIT(1);
}
/*
*/
static inline void set_page_pfmemalloc(struct page *page)
{
- page->index = -1UL;
+ page->lru.next = (void *)BIT(1);
}
static inline void clear_page_pfmemalloc(struct page *page)
{
- page->index = 0;
+ page->lru.next = NULL;
}
/*
#else
static inline bool can_do_mlock(void) { return false; }
#endif
-extern int user_shm_lock(size_t, struct user_struct *);
-extern void user_shm_unlock(size_t, struct user_struct *);
+extern int user_shm_lock(size_t, struct ucounts *);
+extern void user_shm_unlock(size_t, struct ucounts *);
/*
* Parameter block passed down to zap_pte_range in exceptional cases.
}
#endif
+ int vmemmap_remap_free(unsigned long start, unsigned long end,
+ unsigned long reuse);
+ int vmemmap_remap_alloc(unsigned long start, unsigned long end,
+ unsigned long reuse, gfp_t gfp_mask);
+
void *sparse_buffer_alloc(unsigned long size);
struct page * __populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
unsigned long private;
};
struct { /* page_pool used by netstack */
+ /**
+ * @pp_magic: magic value to avoid recycling non
+ * page_pool allocated pages.
+ */
+ unsigned long pp_magic;
+ struct page_pool *pp;
+ unsigned long _pp_mapping_pad;
/**
* @dma_addr: might require a 64-bit value on
* 32-bit architectures.
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
- /* Base adresses for compatible mmap() */
+ /* Base addresses for compatible mmap() */
unsigned long mmap_compat_base;
unsigned long mmap_compat_legacy_base;
#endif
#endif
#ifdef CONFIG_64BIT
PG_arch_2,
+#endif
+#ifdef CONFIG_KASAN_HW_TAGS
+ PG_skip_kasan_poison,
#endif
__NR_PAGEFLAGS,
PAGEFLAG(Idle, idle, PF_ANY)
#endif
+#ifdef CONFIG_KASAN_HW_TAGS
+PAGEFLAG(SkipKASanPoison, skip_kasan_poison, PF_HEAD)
+#else
+PAGEFLAG_FALSE(SkipKASanPoison)
+#endif
+
/*
* PageReported() is used to track reported free pages within the Buddy
* allocator. We can use the non-atomic version of the test and set
TESTSCFLAG_FALSE(DoubleMap)
#endif
+ /*
+ * Check if a page is currently marked HWPoisoned. Note that this check is
+ * best effort only and inherently racy: there is no way to synchronize with
+ * failing hardware.
+ */
+ static inline bool is_page_hwpoison(struct page *page)
+ {
+ if (PageHWPoison(page))
+ return true;
+ return PageHuge(page) && PageHWPoison(compound_head(page));
+ }
+
/*
* For pages that are never mapped to userspace (and aren't PageSlab),
* page_type may be used. Because it is initialised to -1, we invert the
* relies on this feature is aware that re-onlining the memory block will
* require to re-set the pages PageOffline() and not giving them to the
* buddy via online_page_callback_t.
+ *
+ * There are drivers that mark a page PageOffline() and expect there won't be
+ * any further access to page content. PFN walkers that read content of random
+ * pages should check PageOffline() and synchronize with such drivers using
+ * page_offline_freeze()/page_offline_thaw().
*/
PAGE_TYPE_OPS(Offline, offline)
+ extern void page_offline_freeze(void);
+ extern void page_offline_thaw(void);
+ extern void page_offline_begin(void);
+ extern void page_offline_end(void);
+
/*
* Marks pages in use as page tables.
*/
extern int shmem_zero_setup(struct vm_area_struct *);
extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
-extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
#ifdef CONFIG_SHMEM
extern const struct address_space_operations shmem_aops;
static inline bool shmem_mapping(struct address_space *mapping)
extern bool shmem_charge(struct inode *inode, long pages);
extern void shmem_uncharge(struct inode *inode, long pages);
+ #ifdef CONFIG_USERFAULTFD
#ifdef CONFIG_SHMEM
- extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
+ bool zeropage,
struct page **pagep);
- extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr);
- #else
- #define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
- src_addr, pagep) ({ BUG(); 0; })
- #define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \
- dst_addr) ({ BUG(); 0; })
- #endif
+ #else /* !CONFIG_SHMEM */
+ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
+ src_addr, zeropage, pagep) ({ BUG(); 0; })
+ #endif /* CONFIG_SHMEM */
+ #endif /* CONFIG_USERFAULTFD */
#endif
*/
rcu_read_lock();
tsk = find_task_by_pid_ns(pid, &init_pid_ns);
+ tsk->flags |= PF_NO_SETAFFINITY;
set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
rcu_read_unlock();
rest_init();
}
+ static void __init print_unknown_bootoptions(void)
+ {
+ char *unknown_options;
+ char *end;
+ const char *const *p;
+ size_t len;
+
+ if (panic_later || (!argv_init[1] && !envp_init[2]))
+ return;
+
+ /*
+ * Determine how many options we have to print out, plus a space
+ * before each
+ */
+ len = 1; /* null terminator */
+ for (p = &argv_init[1]; *p; p++) {
+ len++;
+ len += strlen(*p);
+ }
+ for (p = &envp_init[2]; *p; p++) {
+ len++;
+ len += strlen(*p);
+ }
+
+ unknown_options = memblock_alloc(len, SMP_CACHE_BYTES);
+ if (!unknown_options) {
+ pr_err("%s: Failed to allocate %zu bytes\n",
+ __func__, len);
+ return;
+ }
+ end = unknown_options;
+
+ for (p = &argv_init[1]; *p; p++)
+ end += sprintf(end, " %s", *p);
+ for (p = &envp_init[2]; *p; p++)
+ end += sprintf(end, " %s", *p);
+
+ pr_notice("Unknown command line parameters:%s\n", unknown_options);
+ memblock_free(__pa(unknown_options), len);
+ }
+
asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
{
char *command_line;
static_command_line, __start___param,
__stop___param - __start___param,
-1, -1, NULL, &unknown_bootoption);
+ print_unknown_bootoptions();
if (!IS_ERR_OR_NULL(after_dashes))
parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
NULL, set_init_arg);
* time - but meanwhile we still have a functioning scheduler.
*/
sched_init();
- /*
- * Disable preemption - early bootup scheduling is extremely
- * fragile until we cpu_idle() for the first time.
- */
- preempt_disable();
+
if (WARN(!irqs_disabled(),
"Interrupts were enabled *very* early, fixing it\n"))
local_irq_disable();
{
int ret;
+ /*
+ * Wait until kthreadd is all set-up.
+ */
+ wait_for_completion(&kthreadd_done);
+
kernel_init_freeable();
/* need to finish all async __init code before freeing the memory */
async_synchronize_full();
static noinline void __init kernel_init_freeable(void)
{
- /*
- * Wait until kthreadd is all set-up.
- */
- wait_for_completion(&kthreadd_done);
-
/* Now the scheduler is fully set up and can do blocking allocations */
gfp_allowed_mask = __GFP_BITS_MASK;
time64_t shm_ctim;
struct pid *shm_cprid;
struct pid *shm_lprid;
- struct user_struct *mlock_user;
+ struct ucounts *mlock_ucounts;
/* The task created the shm object. NULL if the task is dead. */
struct task_struct *shm_creator;
struct shmid_kernel *shp = container_of(ptr, struct shmid_kernel,
shm_perm);
security_shm_free(&shp->shm_perm);
- kvfree(shp);
+ kfree(shp);
}
static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
shm_rmid(ns, shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
- shmem_lock(shm_file, 0, shp->mlock_user);
- else if (shp->mlock_user)
+ shmem_lock(shm_file, 0, shp->mlock_ucounts);
+ else if (shp->mlock_ucounts)
user_shm_unlock(i_size_read(file_inode(shm_file)),
- shp->mlock_user);
+ shp->mlock_ucounts);
fput(shm_file);
ipc_update_pid(&shp->shm_cprid, NULL);
ipc_update_pid(&shp->shm_lprid, NULL);
ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;
- shp = kvmalloc(sizeof(*shp), GFP_KERNEL);
+ shp = kmalloc(sizeof(*shp), GFP_KERNEL);
if (unlikely(!shp))
return -ENOMEM;
shp->shm_perm.key = key;
shp->shm_perm.mode = (shmflg & S_IRWXUGO);
- shp->mlock_user = NULL;
+ shp->mlock_ucounts = NULL;
shp->shm_perm.security = NULL;
error = security_shm_alloc(&shp->shm_perm);
if (error) {
- kvfree(shp);
+ kfree(shp);
return error;
}
if (shmflg & SHM_NORESERVE)
acctflag = VM_NORESERVE;
file = hugetlb_file_setup(name, hugesize, acctflag,
- &shp->mlock_user, HUGETLB_SHMFS_INODE,
+ &shp->mlock_ucounts, HUGETLB_SHMFS_INODE,
(shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
} else {
/*
no_id:
ipc_update_pid(&shp->shm_cprid, NULL);
ipc_update_pid(&shp->shm_lprid, NULL);
- if (is_file_hugepages(file) && shp->mlock_user)
- user_shm_unlock(size, shp->mlock_user);
+ if (is_file_hugepages(file) && shp->mlock_ucounts)
+ user_shm_unlock(size, shp->mlock_ucounts);
fput(file);
ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
return error;
goto out_unlock0;
if (cmd == SHM_LOCK) {
- struct user_struct *user = current_user();
+ struct ucounts *ucounts = current_ucounts();
- err = shmem_lock(shm_file, 1, user);
+ err = shmem_lock(shm_file, 1, ucounts);
if (!err && !(shp->shm_perm.mode & SHM_LOCKED)) {
shp->shm_perm.mode |= SHM_LOCKED;
- shp->mlock_user = user;
+ shp->mlock_ucounts = ucounts;
}
goto out_unlock0;
}
/* SHM_UNLOCK */
if (!(shp->shm_perm.mode & SHM_LOCKED))
goto out_unlock0;
- shmem_lock(shm_file, 0, shp->mlock_user);
+ shmem_lock(shm_file, 0, shp->mlock_ucounts);
shp->shm_perm.mode &= ~SHM_LOCKED;
- shp->mlock_user = NULL;
+ shp->mlock_ucounts = NULL;
get_file(shm_file);
ipc_unlock_object(&shp->shm_perm);
rcu_read_unlock();
#include <linux/kthread.h>
#include <linux/lockdep.h>
#include <linux/export.h>
+ #include <linux/panic_notifier.h>
#include <linux/sysctl.h>
#include <linux/suspend.h>
#include <linux/utsname.h>
last_break = jiffies;
}
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
- if (t->state == TASK_UNINTERRUPTIBLE)
+ if (READ_ONCE(t->__state) == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
}
unlock:
return module_alloc(PAGE_SIZE);
}
- void __weak free_insn_page(void *page)
+ static void free_insn_page(void *page)
{
module_memfree(page);
}
}
NOKPROBE_SYMBOL(aggr_post_handler);
-static int aggr_fault_handler(struct kprobe *p, struct pt_regs *regs,
- int trapnr)
-{
- struct kprobe *cur = __this_cpu_read(kprobe_instance);
-
- /*
- * if we faulted "during" the execution of a user specified
- * probe handler, invoke just that probe's fault handler
- */
- if (cur && cur->fault_handler) {
- if (cur->fault_handler(cur, regs, trapnr))
- return 1;
- }
- return 0;
-}
-NOKPROBE_SYMBOL(aggr_fault_handler);
-
/* Walks the list and increments nmissed count for multiprobe case */
void kprobes_inc_nmissed_count(struct kprobe *p)
{
ap->addr = p->addr;
ap->flags = p->flags & ~KPROBE_FLAG_OPTIMIZED;
ap->pre_handler = aggr_pre_handler;
- ap->fault_handler = aggr_fault_handler;
/* We don't care the kprobe which has gone. */
if (p->post_handler && !kprobe_gone(p))
ap->post_handler = aggr_post_handler;
rp->kp.pre_handler = pre_handler_kretprobe;
rp->kp.post_handler = NULL;
- rp->kp.fault_handler = NULL;
/* Pre-allocate memory for max kretprobe instances */
if (rp->maxactive <= 0) {
int override_rlimit, const unsigned int sigqueue_flags)
{
struct sigqueue *q = NULL;
- struct user_struct *user;
- int sigpending;
+ struct ucounts *ucounts = NULL;
+ long sigpending;
/*
* Protect access to @t credentials. This can go away when all
* changes from/to zero.
*/
rcu_read_lock();
- user = __task_cred(t)->user;
- sigpending = atomic_inc_return(&user->sigpending);
+ ucounts = task_ucounts(t);
+ sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
if (sigpending == 1)
- get_uid(user);
+ ucounts = get_ucounts(ucounts);
rcu_read_unlock();
- if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
+ if (override_rlimit || (sigpending < LONG_MAX && sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
} else {
print_dropped_signal(sig);
}
if (unlikely(q == NULL)) {
- if (atomic_dec_and_test(&user->sigpending))
- free_uid(user);
+ if (ucounts && dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
+ put_ucounts(ucounts);
} else {
INIT_LIST_HEAD(&q->list);
q->flags = sigqueue_flags;
- q->user = user;
+ q->ucounts = ucounts;
}
-
return q;
}
{
if (q->flags & SIGQUEUE_PREALLOC)
return;
- if (atomic_dec_and_test(&q->user->sigpending))
- free_uid(q->user);
+ if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
+ put_ucounts(q->ucounts);
+ q->ucounts = NULL;
+ }
kmem_cache_free(sigqueue_cachep, q);
}
if (!(ksig->ka.sa.sa_flags & SA_NODEFER))
sigaddset(&blocked, ksig->sig);
set_current_blocked(&blocked);
+ if (current->sas_ss_flags & SS_AUTODISARM)
+ sas_ss_reset(current);
tracehook_signal_handler(stepping);
}
int err = __put_user((void __user *)t->sas_ss_sp, &uss->ss_sp) |
__put_user(t->sas_ss_flags, &uss->ss_flags) |
__put_user(t->sas_ss_size, &uss->ss_size);
- if (err)
- return err;
- if (t->sas_ss_flags & SS_AUTODISARM)
- sas_ss_reset(t);
- return 0;
+ return err;
}
#ifdef CONFIG_COMPAT
&uss->ss_sp) |
__put_user(t->sas_ss_flags, &uss->ss_flags) |
__put_user(t->sas_ss_size, &uss->ss_size);
- if (err)
- return err;
- if (t->sas_ss_flags & SS_AUTODISARM)
- sas_ss_reset(t);
- return 0;
+ return err;
}
#endif
}
new_t = kdb_prev_t != t;
kdb_prev_t = t;
- if (t->state != TASK_RUNNING && new_t) {
+ if (!task_is_running(t) && new_t) {
spin_unlock(&t->sighand->siglock);
kdb_printf("Process is not RUNNING, sending a signal from "
"kdb risks deadlock\n"
#include <linux/sysctl.h>
#include <linux/bitmap.h>
#include <linux/signal.h>
+ #include <linux/panic.h>
#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/security.h>
#include <linux/coredump.h>
#include <linux/latencytop.h>
#include <linux/pid.h>
+#include <linux/delayacct.h>
#include "../lib/kstrtox.h"
void *buffer, size_t *lenp, loff_t *ppos)
{
int err = 0;
- bool first = 1;
size_t left = *lenp;
unsigned long bitmap_len = table->maxlen;
unsigned long *bitmap = *(unsigned long **) table->data;
}
bitmap_set(tmp_bitmap, val_a, val_b - val_a + 1);
- first = 0;
proc_skip_char(&p, &left, '\n');
}
left += skipped;
} else {
unsigned long bit_a, bit_b = 0;
+ bool first = 1;
while (left) {
bit_a = find_next_bit(bitmap, bitmap_len, bit_b);
.extra2 = SYSCTL_ONE,
},
#endif /* CONFIG_SCHEDSTATS */
+#ifdef CONFIG_TASK_DELAY_ACCT
+ {
+ .procname = "task_delayacct",
+ .data = NULL,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = sysctl_delayacct,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif /* CONFIG_TASK_DELAY_ACCT */
#ifdef CONFIG_NUMA_BALANCING
{
.procname = "numa_balancing",
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
},
- {
- .procname = "block_dump",
- .data = &block_dump,
- .maxlen = sizeof(block_dump),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- },
{
.procname = "vfs_cache_pressure",
.data = &sysctl_vfs_cache_pressure,
bool
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
select STACKTRACE
- depends on FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86
select KALLSYMS
select KALLSYMS_ALL
config TEST_PRINTF
tristate "Test printf() family of functions at runtime"
+config TEST_SCANF
+ tristate "Test scanf() family of functions at runtime"
+
config TEST_BITMAP
tristate "Test bitmap_*() family of functions at runtime"
help
If unsure, say N.
+ config RATIONAL_KUNIT_TEST
+ tristate "KUnit test for rational.c" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ select RATIONAL
+ default KUNIT_ALL_TESTS
+ help
+ This builds the rational math unit test.
+ For more information on KUnit and unit tests in general please refer
+ to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+ If unsure, say N.
+
config TEST_UDELAY
tristate "udelay test driver"
help
If unsure, say N.
+config TEST_CLOCKSOURCE_WATCHDOG
+ tristate "Test clocksource watchdog in kernel space"
+ depends on CLOCKSOURCE_WATCHDOG
+ help
+ Enable this option to create a kernel module that will trigger
+ a test of the clocksource watchdog. This module may be loaded
+ via modprobe or insmod in which case it will run upon being
+ loaded, or it may be built in, in which case it will run
+ shortly after boot.
+
+ If unsure, say N.
+
endif # RUNTIME_TESTING_MENU
config ARCH_USE_MEMTEST
*/
#include <linux/ctype.h>
#include <linux/errno.h>
- #include <linux/kernel.h>
- #include <linux/math64.h>
#include <linux/export.h>
+ #include <linux/kstrtox.h>
+ #include <linux/math64.h>
#include <linux/types.h>
#include <linux/uaccess.h>
+
#include "kstrtox.h"
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
/*
* Convert non-negative integer string representation in explicitly given radix
- * to an integer.
+ * to an integer. A maximum of max_chars characters will be converted.
+ *
* Return number of characters consumed maybe or-ed with overflow bit.
* If overflow occurs, result integer (incorrect) is still returned.
*
* Don't you dare use this function.
*/
-unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
+unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *p,
+ size_t max_chars)
{
unsigned long long res;
unsigned int rv;
res = 0;
rv = 0;
- while (1) {
+ while (max_chars--) {
unsigned int c = *s;
unsigned int lc = c | 0x20; /* don't tolower() this line */
unsigned int val;
return rv;
}
+unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
+{
+ return _parse_integer_limit(s, base, p, INT_MAX);
+}
+
static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
{
unsigned long long _res;
#include <linux/string_helpers.h>
#include "kstrtox.h"
+static unsigned long long simple_strntoull(const char *startp, size_t max_chars,
+ char **endp, unsigned int base)
+{
+ const char *cp;
+ unsigned long long result = 0ULL;
+ size_t prefix_chars;
+ unsigned int rv;
+
+ cp = _parse_integer_fixup_radix(startp, &base);
+ prefix_chars = cp - startp;
+ if (prefix_chars < max_chars) {
+ rv = _parse_integer_limit(cp, base, &result, max_chars - prefix_chars);
+ /* FIXME */
+ cp += (rv & ~KSTRTOX_OVERFLOW);
+ } else {
+ /* Field too short for prefix + digit, skip over without converting */
+ cp = startp + max_chars;
+ }
+
+ if (endp)
+ *endp = (char *)cp;
+
+ return result;
+}
+
/**
* simple_strtoull - convert a string to an unsigned long long
* @cp: The start of the string
*
* This function has caveats. Please use kstrtoull instead.
*/
+ noinline
unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
{
- unsigned long long result;
- unsigned int rv;
-
- cp = _parse_integer_fixup_radix(cp, &base);
- rv = _parse_integer(cp, base, &result);
- /* FIXME */
- cp += (rv & ~KSTRTOX_OVERFLOW);
-
- if (endp)
- *endp = (char *)cp;
-
- return result;
+ return simple_strntoull(cp, INT_MAX, endp, base);
}
EXPORT_SYMBOL(simple_strtoull);
}
EXPORT_SYMBOL(simple_strtol);
+static long long simple_strntoll(const char *cp, size_t max_chars, char **endp,
+ unsigned int base)
+{
+ /*
+ * simple_strntoull() safely handles receiving max_chars==0 in the
+ * case cp[0] == '-' && max_chars == 1.
+ * If max_chars == 0 we can drop through and pass it to simple_strntoull()
+ * and the content of *cp is irrelevant.
+ */
+ if (*cp == '-' && max_chars > 0)
+ return -simple_strntoull(cp + 1, max_chars - 1, endp, base);
+
+ return simple_strntoull(cp, max_chars, endp, base);
+}
+
/**
* simple_strtoll - convert a string to a signed long long
* @cp: The start of the string
*/
long long simple_strtoll(const char *cp, char **endp, unsigned int base)
{
- if (*cp == '-')
- return -simple_strtoull(cp + 1, endp, base);
-
- return simple_strtoull(cp, endp, base);
+ return simple_strntoll(cp, INT_MAX, endp, base);
}
EXPORT_SYMBOL(simple_strtoll);
struct printf_spec spec, const char *fmt)
{
bool have_t = true, have_d = true;
- bool raw = false;
+ bool raw = false, iso8601_separator = true;
+ bool found = true;
int count = 2;
if (check_pointer(&buf, end, tm, spec))
break;
}
- raw = fmt[count] == 'r';
+ do {
+ switch (fmt[count++]) {
+ case 'r':
+ raw = true;
+ break;
+ case 's':
+ iso8601_separator = false;
+ break;
+ default:
+ found = false;
+ break;
+ }
+ } while (found);
if (have_d)
buf = date_str(buf, end, tm, raw);
if (have_d && have_t) {
- /* Respect ISO 8601 */
if (buf < end)
- *buf = 'T';
+ *buf = iso8601_separator ? 'T' : ' ';
buf++;
}
if (have_t)
* - 'd[234]' For a dentry name (optionally 2-4 last components)
* - 'D[234]' Same as 'd' but for a struct file
* - 'g' For block_device name (gendisk + partition number)
- * - 't[RT][dt][r]' For time and date as represented by:
+ * - 't[RT][dt][r][s]' For time and date as represented by:
* R struct rtc_time
* T time64_t
* - 'C' For a clock, it prints the name (Common Clock Framework) or address
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
break;
if (is_sign)
- val.s = qualifier != 'L' ?
- simple_strtol(str, &next, base) :
- simple_strtoll(str, &next, base);
+ val.s = simple_strntoll(str,
+ field_width >= 0 ? field_width : INT_MAX,
+ &next, base);
else
- val.u = qualifier != 'L' ?
- simple_strtoul(str, &next, base) :
- simple_strtoull(str, &next, base);
-
- if (field_width > 0 && next - str > field_width) {
- if (base == 0)
- _parse_integer_fixup_radix(str, &base);
- while (next - str > field_width) {
- if (is_sign)
- val.s = div_s64(val.s, base);
- else
- val.u = div_u64(val.u, base);
- --next;
- }
- }
+ val.u = simple_strntoull(str,
+ field_width >= 0 ? field_width : INT_MAX,
+ &next, base);
switch (qualifier) {
case 'H': /* that's 'hh' in format */
if (!list_is_last(freelist, &freepage->lru)) {
list_cut_before(&sublist, freelist, &freepage->lru);
- if (!list_empty(&sublist))
- list_splice_tail(&sublist, freelist);
+ list_splice_tail(&sublist, freelist);
}
}
if (!list_is_first(freelist, &freepage->lru)) {
list_cut_position(&sublist, freelist, &freepage->lru);
- if (!list_empty(&sublist))
- list_splice_tail(&sublist, freelist);
+ list_splice_tail(&sublist, freelist);
}
}
static unsigned long
fast_isolate_freepages(struct compact_control *cc)
{
- unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
+ unsigned int limit = max(1U, freelist_scan_limit(cc) >> 1);
unsigned int nr_scanned = 0;
unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
spin_unlock_irqrestore(&cc->zone->lock, flags);
/*
- * Smaller scan on next order so the total scan ig related
+ * Smaller scan on next order so the total scan is related
* to freelist_scan_limit.
*/
if (order_scanned >= limit)
- limit = min(1U, limit >> 1);
+ limit = max(1U, limit >> 1);
}
if (!page) {
static bool kswapd_is_running(pg_data_t *pgdat)
{
- return pgdat->kswapd && (pgdat->kswapd->state == TASK_RUNNING);
+ return pgdat->kswapd && task_is_running(pgdat->kswapd);
}
/*
}
#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
- static ssize_t sysfs_compact_node(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
+ static ssize_t compact_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
{
int nid = dev->id;
return count;
}
- static DEVICE_ATTR(compact, 0200, NULL, sysfs_compact_node);
+ static DEVICE_ATTR_WO(compact);
int compaction_register_node(struct node *node)
{
EXPORT_PER_CPU_SYMBOL_GPL(int_active_memcg);
/* Socket memory accounting disabled? */
-static bool cgroup_memory_nosocket;
+static bool cgroup_memory_nosocket __ro_after_init;
/* Kernel memory accounting disabled? */
-bool cgroup_memory_nokmem;
+bool cgroup_memory_nokmem __ro_after_init;
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
-bool cgroup_memory_noswap __read_mostly;
+bool cgroup_memory_noswap __ro_after_init;
#else
#define cgroup_memory_noswap 1
#endif
#ifdef CONFIG_MEMCG_KMEM
extern spinlock_t css_set_lock;
+bool mem_cgroup_kmem_disabled(void)
+{
+ return cgroup_memory_nokmem;
+}
+
static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
unsigned int nr_pages);
* as special swap entry in the CPU page table.
*/
if (is_device_private_entry(ent)) {
- page = device_private_entry_to_page(ent);
+ page = pfn_swap_entry_to_page(ent);
/*
* MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
* a refcount of 1 when free (unlike normal page)
}
/**
- * mem_cgroup_protected - check if memory consumption is in the normal range
+ * mem_cgroup_calculate_protection - check if memory consumption is in the normal range
* @root: the top ancestor of the sub-tree being checked
* @memcg: the memory cgroup to check
*
/*
* Finish munlock after successful page isolation
*
- * Page must be locked. This is a wrapper for try_to_munlock()
+ * Page must be locked. This is a wrapper for page_mlock()
* and putback_lru_page() with munlock accounting.
*/
static void __munlock_isolated_page(struct page *page)
* and we don't need to check all the other vmas.
*/
if (page_mapcount(page) > 1)
- try_to_munlock(page);
+ page_mlock(page);
/* Did try_to_unlock() succeed or punt? */
if (!PageMlocked(page))
* munlock()ed or munmap()ed, we want to check whether other vmas hold the
* page locked so that we can leave it on the unevictable lru list and not
* bother vmscan with it. However, to walk the page's rmap list in
- * try_to_munlock() we must isolate the page from the LRU. If some other
+ * page_mlock() we must isolate the page from the LRU. If some other
* task has removed the page from the LRU, we won't be able to do that.
* So we clear the PageMlocked as we might not get another chance. If we
* can't isolate the page, we leave it for putback_lru_page() and vmscan
{
int nr_pages;
- /* For try_to_munlock() and to serialize with page migration */
+ /* For page_mlock() and to serialize with page migration */
BUG_ON(!PageLocked(page));
VM_BUG_ON_PAGE(PageTail(page), page);
*
* The fast path is available only for evictable pages with single mapping.
* Then we can bypass the per-cpu pvec and get better performance.
- * when mapcount > 1 we need try_to_munlock() which can fail.
+ * when mapcount > 1 we need page_mlock() which can fail.
* when !page_evictable(), we need the full redo logic of putback_lru_page to
* avoid leaving evictable page in unevictable list.
*
*
* We don't save and restore VM_LOCKED here because pages are
* still on lru. In unmap path, pages might be scanned by reclaim
- * and re-mlocked by try_to_{munlock|unmap} before we unmap and
+ * and re-mlocked by page_mlock/try_to_unmap before we unmap and
* free them. This will result in freeing mlocked pages.
*/
void munlock_vma_pages_range(struct vm_area_struct *vma,
*/
static DEFINE_SPINLOCK(shmlock_user_lock);
-int user_shm_lock(size_t size, struct user_struct *user)
+int user_shm_lock(size_t size, struct ucounts *ucounts)
{
unsigned long lock_limit, locked;
+ long memlock;
int allowed = 0;
locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
allowed = 1;
lock_limit >>= PAGE_SHIFT;
spin_lock(&shmlock_user_lock);
- if (!allowed &&
- locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
+ memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
+
+ if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
+ dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
+ goto out;
+ }
+ if (!get_ucounts(ucounts)) {
+ dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
goto out;
- get_uid(user);
- user->locked_shm += locked;
+ }
allowed = 1;
out:
spin_unlock(&shmlock_user_lock);
return allowed;
}
-void user_shm_unlock(size_t size, struct user_struct *user)
+void user_shm_unlock(size_t size, struct ucounts *ucounts)
{
spin_lock(&shmlock_user_lock);
- user->locked_shm -= (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT);
spin_unlock(&shmlock_user_lock);
- free_uid(user);
+ put_ucounts(ucounts);
}
static DEFINE_STATIC_KEY_TRUE(deferred_pages);
/*
- * Calling kasan_free_pages() only after deferred memory initialization
+ * Calling kasan_poison_pages() only after deferred memory initialization
* has completed. Poisoning pages during deferred memory init will greatly
* lengthen the process and cause problem in large memory systems as the
* deferred pages initialization is done with interrupt disabled.
* on-demand allocation and then freed again before the deferred pages
* initialization is done, but this is not likely to happen.
*/
-static inline void kasan_free_nondeferred_pages(struct page *page, int order,
- bool init, fpi_t fpi_flags)
+static inline bool should_skip_kasan_poison(struct page *page, fpi_t fpi_flags)
{
- if (static_branch_unlikely(&deferred_pages))
- return;
- if (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
- (fpi_flags & FPI_SKIP_KASAN_POISON))
- return;
- kasan_free_pages(page, order, init);
+ return static_branch_unlikely(&deferred_pages) ||
+ (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
+ (fpi_flags & FPI_SKIP_KASAN_POISON)) ||
+ PageSkipKASanPoison(page);
}
/* Returns true if the struct page for the pfn is uninitialised */
return false;
}
#else
-static inline void kasan_free_nondeferred_pages(struct page *page, int order,
- bool init, fpi_t fpi_flags)
+static inline bool should_skip_kasan_poison(struct page *page, fpi_t fpi_flags)
{
- if (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
- (fpi_flags & FPI_SKIP_KASAN_POISON))
- return;
- kasan_free_pages(page, order, init);
+ return (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
+ (fpi_flags & FPI_SKIP_KASAN_POISON)) ||
+ PageSkipKASanPoison(page);
}
static inline bool early_page_uninitialised(unsigned long pfn)
__SetPageHead(page);
for (i = 1; i < nr_pages; i++) {
struct page *p = page + i;
- set_page_count(p, 0);
p->mapping = TAIL_MAPPING;
set_compound_head(p, page);
}
return ret;
}
-static void kernel_init_free_pages(struct page *page, int numpages)
+static void kernel_init_free_pages(struct page *page, int numpages, bool zero_tags)
{
int i;
+ if (zero_tags) {
+ for (i = 0; i < numpages; i++)
+ tag_clear_highpage(page + i);
+ return;
+ }
+
/* s390's use of memset() could override KASAN redzones. */
kasan_disable_current();
for (i = 0; i < numpages; i++) {
unsigned int order, bool check_free, fpi_t fpi_flags)
{
int bad = 0;
- bool init;
+ bool skip_kasan_poison = should_skip_kasan_poison(page, fpi_flags);
VM_BUG_ON_PAGE(PageTail(page), page);
* With hardware tag-based KASAN, memory tags must be set before the
* page becomes unavailable via debug_pagealloc or arch_free_page.
*/
- init = want_init_on_free();
- if (init && !kasan_has_integrated_init())
- kernel_init_free_pages(page, 1 << order);
- kasan_free_nondeferred_pages(page, order, init, fpi_flags);
+ if (kasan_has_integrated_init()) {
+ if (!skip_kasan_poison)
+ kasan_free_pages(page, order);
+ } else {
+ bool init = want_init_on_free();
+
+ if (init)
+ kernel_init_free_pages(page, 1 << order, false);
+ if (!skip_kasan_poison)
+ kasan_poison_pages(page, order, init);
+ }
/*
* arch_free_page() can make the page's contents inaccessible. s390
inline void post_alloc_hook(struct page *page, unsigned int order,
gfp_t gfp_flags)
{
- bool init;
-
set_page_private(page, 0);
set_page_refcounted(page);
* kasan_alloc_pages and kernel_init_free_pages must be
* kept together to avoid discrepancies in behavior.
*/
- init = !want_init_on_free() && want_init_on_alloc(gfp_flags);
- kasan_alloc_pages(page, order, init);
- if (init && !kasan_has_integrated_init())
- kernel_init_free_pages(page, 1 << order);
+ if (kasan_has_integrated_init()) {
+ kasan_alloc_pages(page, order, gfp_flags);
+ } else {
+ bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags);
+
+ kasan_unpoison_pages(page, order, init);
+ if (init)
+ kernel_init_free_pages(page, 1 << order,
+ gfp_flags & __GFP_ZEROTAGS);
+ }
set_page_owner(page, order, gfp_flags);
}
int cpu;
/*
- * Allocate in the BSS so we wont require allocation in
+ * Allocate in the BSS so we won't require allocation in
* direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
*/
static cpumask_t cpus_with_pcps;
#endif /* CONFIG_FAIL_PAGE_ALLOC */
- noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+ static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
{
return __should_fail_alloc_page(gfp_mask, order);
}
* vm. If we swap it in we mark it dirty since we also free the swap
* entry since a page cannot live in both the swap and page cache.
*
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
* otherwise they are NULL.
*/
static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
page = pagecache_get_page(mapping, index,
FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
+
+ if (page && vma && userfaultfd_minor(vma)) {
+ if (!xa_is_value(page)) {
+ unlock_page(page);
+ put_page(page);
+ }
+ *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+ return 0;
+ }
+
if (xa_is_value(page)) {
error = shmem_swapin_page(inode, index, &page,
sgp, gfp, vma, fault_type);
}
#endif
-int shmem_lock(struct file *file, int lock, struct user_struct *user)
+int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
{
struct inode *inode = file_inode(file);
struct shmem_inode_info *info = SHMEM_I(inode);
* no serialization needed when called from shm_destroy().
*/
if (lock && !(info->flags & VM_LOCKED)) {
- if (!user_shm_lock(inode->i_size, user))
+ if (!user_shm_lock(inode->i_size, ucounts))
goto out_nomem;
info->flags |= VM_LOCKED;
mapping_set_unevictable(file->f_mapping);
}
- if (!lock && (info->flags & VM_LOCKED) && user) {
- user_shm_unlock(inode->i_size, user);
+ if (!lock && (info->flags & VM_LOCKED) && ucounts) {
+ user_shm_unlock(inode->i_size, ucounts);
info->flags &= ~VM_LOCKED;
mapping_clear_unevictable(file->f_mapping);
}
return inode;
}
- static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- bool zeropage,
- struct page **pagep)
+ #ifdef CONFIG_USERFAULTFD
+ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
+ pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ unsigned long src_addr,
+ bool zeropage,
+ struct page **pagep)
{
struct inode *inode = file_inode(dst_vma->vm_file);
struct shmem_inode_info *info = SHMEM_I(inode);
struct address_space *mapping = inode->i_mapping;
gfp_t gfp = mapping_gfp_mask(mapping);
pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
- spinlock_t *ptl;
void *page_kaddr;
struct page *page;
- pte_t _dst_pte, *dst_pte;
int ret;
- pgoff_t offset, max_off;
+ pgoff_t max_off;
- ret = -ENOMEM;
if (!shmem_inode_acct_block(inode, 1)) {
/*
* We may have got a page, returned -ENOENT triggering a retry,
put_page(*pagep);
*pagep = NULL;
}
- goto out;
+ return -ENOMEM;
}
if (!*pagep) {
+ ret = -ENOMEM;
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
- if (!zeropage) { /* mcopy_atomic */
+ if (!zeropage) { /* COPY */
page_kaddr = kmap_atomic(page);
ret = copy_from_user(page_kaddr,
(const void __user *)src_addr,
/* fallback to copy_from_user outside mmap_lock */
if (unlikely(ret)) {
*pagep = page;
- shmem_inode_unacct_blocks(inode, 1);
+ ret = -ENOENT;
/* don't free the page */
- return -ENOENT;
+ goto out_unacct_blocks;
}
- } else { /* mfill_zeropage_atomic */
+ } else { /* ZEROPAGE */
clear_highpage(page);
}
} else {
*pagep = NULL;
}
- VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
+ VM_BUG_ON(PageLocked(page));
+ VM_BUG_ON(PageSwapBacked(page));
__SetPageLocked(page);
__SetPageSwapBacked(page);
__SetPageUptodate(page);
ret = -EFAULT;
- offset = linear_page_index(dst_vma, dst_addr);
max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
+ if (unlikely(pgoff >= max_off))
goto out_release;
ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
if (ret)
goto out_release;
- _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- if (dst_vma->vm_flags & VM_WRITE)
- _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
- else {
- /*
- * We don't set the pte dirty if the vma has no
- * VM_WRITE permission, so mark the page dirty or it
- * could be freed from under us. We could do it
- * unconditionally before unlock_page(), but doing it
- * only if VM_WRITE is not set is faster.
- */
- set_page_dirty(page);
- }
-
- dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
-
- ret = -EFAULT;
- max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
- goto out_release_unlock;
-
- ret = -EEXIST;
- if (!pte_none(*dst_pte))
- goto out_release_unlock;
-
- lru_cache_add(page);
+ ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ page, true, false);
+ if (ret)
+ goto out_delete_from_cache;
spin_lock_irq(&info->lock);
info->alloced++;
shmem_recalc_inode(inode);
spin_unlock_irq(&info->lock);
- inc_mm_counter(dst_mm, mm_counter_file(page));
- page_add_file_rmap(page, false);
- set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
- /* No need to invalidate - it was non-present before */
- update_mmu_cache(dst_vma, dst_addr, dst_pte);
- pte_unmap_unlock(dst_pte, ptl);
+ SetPageDirty(page);
unlock_page(page);
- ret = 0;
- out:
- return ret;
- out_release_unlock:
- pte_unmap_unlock(dst_pte, ptl);
- ClearPageDirty(page);
+ return 0;
+ out_delete_from_cache:
delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
shmem_inode_unacct_blocks(inode, 1);
- goto out;
- }
-
- int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- struct page **pagep)
- {
- return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
- dst_addr, src_addr, false, pagep);
- }
-
- int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr)
- {
- struct page *page = NULL;
-
- return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
- dst_addr, 0, true, &page);
+ return ret;
}
+ #endif /* CONFIG_USERFAULTFD */
#ifdef CONFIG_TMPFS
static const struct inode_operations shmem_symlink_inode_operations;
loff_t i_size;
pgoff_t off;
- if ((vma->vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+ if (!transhuge_vma_enabled(vma, vma->vm_flags))
return false;
if (shmem_huge == SHMEM_HUGE_FORCE)
return true;
return 0;
}
-int shmem_lock(struct file *file, int lock, struct user_struct *user)
+int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
{
return 0;
}