h8300: move definition of __kernel_size_t etc. to posix_types.h
These types should be defined in posix_types.h, not in bitsperlong.h .
With these defines moved, h8300-specific bitsperlong.h is no longer
needed since Kbuild will automatically create a wrapper of
include/uapi/asm-generic/bitsperlong.h
Merge branch 'nvme-5.7' of git://git.infradead.org/nvme into block-5.7
Pull NVMe fixes from Christoph.
* 'nvme-5.7' of git://git.infradead.org/nvme:
nvmet-rdma: fix double free of rdma queue
nvme-fc: Revert "add module to ops template to allow module references"
nvme: fix deadlock caused by ANA update wrong locking
nvmet-rdma: fix bonding failover possible NULL deref
nvmet: fix NULL dereference when removing a referral
nvme: inherit stable pages constraint in the mpath stack device
nvme-tcp: fix possible crash in recv error flow
nvme-tcp: don't poll a non-live queue
nvme-tcp: fix possible crash in write_zeroes processing
nvmet-fc: fix typo in comment
nvme-rdma: Replace comma with a semicolon
nvme-fcloop: fix deallocation of working context
nvme: fix compat address handling in several ioctls
The recent AMD platform exposes an HD-audio bus but without any actual
codecs, which is internally tied with a USB-audio device, supposedly.
It results in "no codecs" error of HD-audio bus driver, and it's
nothing but a waste of resources.
This patch introduces a static blacklist table for skipping such a
known bogus PCI SSID entry. As of writing this patch, the known SSIDs
are:
* 1043:874f - ASUS ROG Zenith II / Strix
* 1462:cb59 - MSI TRX40 Creator
* 1462:cb60 - MSI TRX40
ALSA: usb-audio: Add mixer workaround for TRX40 and co
Some recent boards (supposedly with a new AMD platform) contain the
USB audio class 2 device that is often tied with HD-audio. The device
exposes an Input Gain Pad control (id=19, control=12) but this node
doesn't behave correctly, returning an error for each inquiry of
GET_MIN and GET_MAX that should have been mandatory.
As a workaround, simply ignore this node by adding a usbmix_name_map
table entry. The currently known devices are:
* 0414:a002 - Gigabyte TRX40 Aorus Pro WiFi
* 0b05:1916 - ASUS ROG Zenith II
* 0b05:1917 - ASUS ROG Strix
* 0db0:0d64 - MSI TRX40 Creator
* 0db0:543d - MSI TRX40
MSI GL63 laptop requires the similar quirk like other MSI models,
ALC1220_FIXUP_CLEVO_P950. The board BIOS doesn't provide a PCI SSID
for the device, hence we need to take the codec SSID (1462:1275)
instead.
Mike Marshall [Wed, 8 Apr 2020 13:05:45 +0000 (09:05 -0400)]
orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
Christoph Hellwig noticed that we were doing some unnecessary
work in orangefs_flush:
orangefs_flush just writes out data on every close(2) call. There is
no need to change anything about the dirty state, especially as
orangefs doesn't treat I_DIRTY_TIMES special in any way. The code
seems to come from partially open coding vfs_fsync.
He sent in a patch with the above commit message and also a
patch that was a reversion of another Orangefs patch I had
sent upstream a while ago. I had to fix his reversion patch
so that it would compile which caused his "don't mess with
I_DIRTY_TIMES" patch to fail to apply. So here I have just
remade his patch and applied it after the fixed reversion patch.
Mike Marshall [Wed, 8 Apr 2020 12:52:40 +0000 (08:52 -0400)]
orangefs: get rid of knob code...
Christoph Hellwig sent in a reversion of "orangefs: remember count
when reading." because:
->read_iter calls can race with each other and one or
more ->flush calls. Remove the the scheme to store the read
count in the file private data as is is completely racy and
can cause use after free or double free conditions
Christoph's reversion caused Orangefs not to work or to compile. I
added a patch that fixed that, but intel's kbuild test robot pointed
out that sending Christoph's patch followed by my patch upstream, it
would break bisection because of the failure to compile. So I have
combined the reversion plus my patch... here's the commit message
that was in my patch:
Logically, optimal Orangefs "pages" are 4 megabytes. Reading
large Orangefs files 4096 bytes at a time is like trying to
kick a dead whale down the beach. Before Christoph's "Revert
orangefs: remember count when reading." I tried to give users
a knob whereby they could, for example, use "count" in
read(2) or bs with dd(1) to get whatever they considered an
appropriate amount of bytes at a time from Orangefs and fill
as many page cache pages as they could at once.
Without the racy code that Christoph reverted Orangefs won't
even compile, much less work. So this replaces the logic that
used the private file data that Christoph reverted with
a static number of bytes to read from Orangefs.
I ran tests like the following to determine what a
reasonable static number of bytes might be:
The 'invalid wait context' splat doesn't print all the information
required to reconstruct / validate the error, specifically the
irq-context state is missing.
Qian Cai [Mon, 30 Mar 2020 21:30:02 +0000 (17:30 -0400)]
locking/percpu-rwsem: Fix a task_struct refcount
The following commit:
7f26482a872c ("locking/percpu-rwsem: Remove the embedded rwsem")
introduced task_struct memory leaks due to messing up the task_struct
refcount.
At the beginning of percpu_rwsem_wake_function(), it calls get_task_struct(),
but if the trylock failed, it will remain in the waitqueue. However, it
will run percpu_rwsem_wake_function() again with get_task_struct() to
increase the refcount but then only call put_task_struct() once the trylock
succeeded.
Fix it by adjusting percpu_rwsem_wake_function() a bit to guard against
when percpu_rwsem_wait() observing !private, terminating the wait and
doing a quick exit() while percpu_rwsem_wake_function() then doing
wake_up_process(p) as a use-after-free.
sched/debug: Add task uclamp values to SCHED_DEBUG procfs
Requested and effective uclamp values can be a bit tricky to decipher when
playing with cgroup hierarchies. Add them to a task's procfs when
SCHED_DEBUG is enabled.
sched/debug: Factor out printing formats into common macros
The printing macros in debug.c keep redefining the same output
format. Collect each output format in a single definition, and reuse that
definition in the other macros. While at it, add a layer of parentheses and
replace printf's with the newly introduced macros.
workqueue: Remove the warning in wq_worker_sleeping()
The kernel test robot triggered a warning with the following race:
task-ctx A interrupt-ctx B
worker
-> process_one_work()
-> work_item()
-> schedule();
-> sched_submit_work()
-> wq_worker_sleeping()
-> ->sleeping = 1
atomic_dec_and_test(nr_running)
__schedule(); *interrupt*
async_page_fault()
-> local_irq_enable();
-> schedule();
-> sched_submit_work()
-> wq_worker_sleeping()
-> if (WARN_ON(->sleeping)) return
-> __schedule()
-> sched_update_worker()
-> wq_worker_running()
-> atomic_inc(nr_running);
-> ->sleeping = 0;
-> sched_update_worker()
-> wq_worker_running()
if (!->sleeping) return
In this context the warning is pointless everything is fine.
An interrupt before wq_worker_sleeping() will perform the ->sleeping
assignment (0 -> 1 > 0) twice.
An interrupt after wq_worker_sleeping() will trigger the warning and
nr_running will be decremented (by A) and incremented once (only by B, A
will skip it). This is the case until the ->sleeping is zeroed again in
wq_worker_running().
Remove the WARN statement because this condition may happen. Document
that preemption around wq_worker_sleeping() needs to be disabled to
protect ->sleeping and not just as an optimisation.
Aubrey Li [Thu, 26 Mar 2020 05:42:29 +0000 (13:42 +0800)]
sched/fair: Fix negative imbalance in imbalance calculation
A negative imbalance value was observed after imbalance calculation,
this happens when the local sched group type is group_fully_busy,
and the average load of local group is greater than the selected
busiest group. Fix this problem by comparing the average load of the
local and busiest group before imbalance calculation formula.
Huaixin Chang [Fri, 27 Mar 2020 03:26:25 +0000 (11:26 +0800)]
sched/fair: Fix race between runtime distribution and assignment
Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b->runtime at the same time.
Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b->runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.
A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far more than the
expected 10%.
On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.
Solve this by holding lock inside distribute_cfs_runtime().
sched/fair: Align rq->avg_idle and rq->avg_scan_cost
sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.
The only difference between the two is that rq->avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:
"""
The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
has a signed type and a negative value, the resulting value is
implementation-defined.
"""
Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:
-4 -> 0xF..FC
-4 >> 3 -> 0xF..FF == -1 != -4 / 8
-8 -> 0xF..F8
-8 >> 3 -> 0xF..FF == -1 == -8 / 8
-9 -> 0xF..F7
-9 >> 3 -> 0xF..FE == -2 != -9 / 8
Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:
Kan Liang [Thu, 2 Apr 2020 15:46:51 +0000 (08:46 -0700)]
perf/x86/intel/uncore: Add Ice Lake server uncore support
The uncore subsystem in Ice Lake server is similar to previous server.
There are some differences in config register encoding and pci device
IDs. The uncore PMON units in Ice Lake server include Ubox, Chabox, IIO,
IRP, M2PCIE, PCU, M2M, PCIE3 and IMC.
- For CHA, filter 1 register has been removed. The filter 0 register can
be used by and of CHA events to be filterd by Thread/Core-ID. To do
so, the control register's tid_en bit must be set to 1.
- For IIO, there are some changes on event constraints. The MSR address
and MSR offsets among counters are also changed.
- For IRP, the MSR address and MSR offsets among counters are changed.
- For M2PCIE, the counters are accessed by MSR now. Add new MSR address
and MSR offsets. Change event constraints.
- To determine the number of CHAs, have to read CAPID6(Low) and CAPID7
(High) now.
- For M2M, update the PCICFG address and Device ID.
- For UPI, update the PCICFG address, Device ID and counter address.
- For M3UPI, update the PCICFG address, Device ID, counter address and
event constraints.
- For IMC, update the formular to calculate MMIO BAR address, which is
MMIO_BASE + specific MEM_BAR offset.
The problem being that cgroup events try to track cpuctx->cgrp even
for disabled events, which is pointless and actively harmful since the
above commit. Rework the code to have explicit enable/disable hooks
for cgroup events, such that we can limit cgroup tracking to active
events.
More specifically, since the above commit disabled events are no
longer added to their context from the 'right' CPU, and we can't
access things like the current cgroup for a remote CPU.
Merge tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is a set of fixes that have queued up, I think I might have
another pull with some more before rc1 but I'd like to dequeue what I
have now just in case Easter is more eggciting that expected.
The main thing in here is a fix for a longstanding nouveau power
management issues on certain laptops, it should help runtime
suspend/resume for a lot of people.
There is also a reverted patch for some drm_mm behaviour in atomic
contexts.
* tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm: (41 commits)
drm/nouveau/kms/nv50-: wait for FIFO space on PIO channels
drm/nouveau/nvif: protect waits against GPU falling off the bus
drm/nouveau/nvif: access PTIMER through usermode class, if available
drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges
drm/nouveau/svm: remove useless SVM range check
drm/nouveau/svm: check for SVM initialized before migrating
drm/nouveau/svm: fix vma range check for migration
drm/nouveau: remove checks for return value of debugfs functions
drm/nouveau/ttm: evict other IO mappings when running out of BAR1 space
drm/amdkfd: kfree the wrong pointer
drm/amd/display: increase HDCP authentication delay
drm/amd/display: Correctly cancel future watchdog and callback events
drm/amd/display: Don't try hdcp1.4 when content_type is set to type1
drm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c
drm/amd/powerplay: drop redundant BIF doorbell interrupt operations
drm/amd/display: Fix dcn21 num_states
drm/amd/display: Enable BT2020 in COLOR_ENCODING property
drm/amd/display: LFC not working on 2.0x range monitors (v2)
drm/amd/display: Support plane level CTM
...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"An update to the Goodix touchscreen driver to enable it work properly
on various Bay Trail and Cherry Trail devices, and a few other
assorted changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (26 commits)
Input: update SPDX tag for input-event-codes.h
Input: i8042 - add Acer Aspire 5738z to nomux list
Input: goodix - fix compilation when ACPI support is disabled
dt-bindings: touchscreen: Convert edt-ft5x06 to json-schema
Input: of_touchscreen - explicitly choose axis
Input: goodix - support gt9147 touchpanel
dt-bindings: touchscreen: goodix: support of gt9147
Input: goodix - add support for Goodix GT917S
Input: goodix - use string-based chip ID
dt-bindings: input: touchscreen: add compatible string for Goodix GT917S
Input: goodix - add support for more then one touch-key
Input: goodix - fix spurious key release events
Input: goodix - try to reset the controller if the i2c-test fails
Input: goodix - restore config on resume if necessary
Input: goodix - make goodix_send_cfg() take a raw buffer as argument
Input: goodix - add minimum firmware size check
Input: goodix - save a copy of the config from goodix_read_config()
Input: goodix - move defines to above struct goodix_ts_data declaration
Input: goodix - add support for controlling the IRQ pin through ACPI methods
Input: goodix - add support for getting IRQ + reset GPIOs on Bay Trail devices
...
Merge tag 'thermal-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Convert tsens configuration DT binding to yaml (Rajeshwari)
- Add interrupt support on the rcar sensor (Niklas Söderlund)
- Add a new Spreadtrum thermal driver (Baolin Wang)
- Add thermal binding for the fsl scu board, a new API to retrieve the
sensor id bound to the thermal zone and i.MX system controller sensor
(Anson Huang))
- Remove warning log when a deferred probe is requested on Exynos
(Marek Szyprowski)
- Add the thermal monitoring unit support for imx8mm with its DT
bindings (Anson Huang)
- Rephrase the Kconfig text for clarity (Linus Walleij)
- Use the gpio descriptor for the ti-soc-thermal (Linus Walleij)
- Align msg structure to 4 bytes for i.MX SC, fix the Kconfig
dependency, add the __may_be unused annotation for PM functions and
the COMPILE_TEST option for imx8mm (Anson Huang)
- Fix a dependency on regmap in Kconfig for qoriq (Yuantian Tang)
- Add DT binding and support for the rcar gen3 r8a77961 and improve the
error path on the rcar init function (Niklas Söderlund)
- Cleanup and improvements for the tsens Qcom sensor (Amit Kucheria)
- Improve code by removing lock and caching values in the rcar thermal
sensor (Niklas Söderlund)
- Cleanup in the qoriq drivers and add a call to
imx_thermal_unregister_legacy_cooling in the removal function (Anson
Huang)
- Remove redundant 'maxItems' in tsens and sprd DT bindings (Rob
Herring)
- Change the thermal DT bindings by making the cooling-maps optional
(Yuantian Tang)
- Add Tiger Lake support (Sumeet Pawnikar)
- Use scnprintf() for avoiding potential buffer overflow (Takashi Iwai)
- Make pkg_temp_lock a raw_spinlock_t(Clark Williams)
- Fix incorrect data types by changing them to signed on i.MX SC (Anson
Huang)
- Replace zero-length array with flexible-array member (Gustavo A. R.
Silva)
- Add support for i.MX8MP in the driver and in the DT bindings (Anson
Huang)
- Fix return value of the cpufreq_set_cur_state() function (Willy
Wolff)
- Remove abusing and scary WARN_ON in the cpufreq cooling device
(Daniel Lezcano)
- Fix build warning of incorrect argument type reported by sparse on
imx8mm (Anson Huang)
- Fix stub for the devfreq cooling device (Martin Blumenstingl)
- Fix cpu idle cooling documentation (Sergey Vidishev)
* tag 'thermal-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (52 commits)
Documentation: cpu-idle-cooling: Fix diagram for 33% duty cycle
thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
thermal: imx8mm: Fix build warning of incorrect argument type
thermal/drivers/cpufreq_cooling: Remove abusing WARN_ON
thermal/drivers/cpufreq_cooling: Fix return of cpufreq_set_cur_state
thermal: imx8mm: Add i.MX8MP support
dt-bindings: thermal: imx8mm-thermal: Add support for i.MX8MP
thermal: qcom: tsens.h: Replace zero-length array with flexible-array member
thermal: imx_sc_thermal: Fix incorrect data type
thermal: int340x_thermal: Use scnprintf() for avoiding potential buffer overflow
thermal: int340x: processor_thermal: Add Tiger Lake support
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
dt-bindings: thermal: make cooling-maps property optional
dt-bindings: thermal: qcom-tsens: Remove redundant 'maxItems'
dt-bindings: thermal: sprd: Remove redundant 'maxItems'
thermal: imx: Calling imx_thermal_unregister_legacy_cooling() in .remove
thermal: qoriq: Sort includes alphabetically
thermal: qoriq: Use devm_add_action_or_reset() to handle all cleanups
thermal: rcar_thermal: Remove lock in rcar_thermal_get_current_temp()
thermal: rcar_thermal: Do not store ctemp in rcar_thermal_priv
...
Merge tag 'mfd-next-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull mfd updates from Lee Jones:
"New Drivers:
- Add support for IQS620A/621/622/624/625 Azoteq IQS62X Sensors
New Device Support:
- Add support for ADC, IRQ, Regulator, RTC and WDT to Ricoh RN5T618 PMIC
- Add support for Comet Lake to Intel LPSS
New Functionality:
- Add support for Charger Detection to Spreadtrum SC27xx PMICs
- Add support for Interrupt Polarity to Dialog Semi DA9062/61 PMIC
- Add ACPI enumeration support to Diolan DLN2 USB Adaptor
* tag 'mfd-next-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (33 commits)
mfd: intel-lpss: Fix Intel Elkhart Lake LPSS I2C input clock
mfd: aat2870: Use scnprintf() for avoiding potential buffer overflow
mfd: dln2: Allow to be enumerated via ACPI
mfd: da9062: Add support for interrupt polarity defined in device tree
dt-bindings: bd718x7: Yamlify and add BD71850
mfd: dln2: Fix sanity checking for endpoints
mfd: intel-lpss: Add Intel Comet Lake PCH-V PCI IDs
mfd: sc27xx: Add USB charger type detection support
dt-bindings: mfd: Document STM32 low power timer bindings
mfd: rk808: Convert RK805 to shutdown/suspend hooks
mfd: rk808: Reduce shutdown duplication
mfd: rk808: Stop using syscore ops
mfd: rk808: Ensure suspend/resume hooks always work
mfd: rk808: Always use poweroff when requested
mfd: omap: Remove useless cast for driver.name
mfd: Kconfig: Fix some misspelling of the word functionality
mfd: pm8xxx: Replace zero-length array with flexible-array member
mfd: omap-usb-tll: Replace zero-length array with flexible-array member
mfd: cpcap: Fix compile if MFD_CORE is not selected
mfd: cros_ec: Check DT node for usbpd-notify add
...
Merge tag 'backlight-next-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Switch pwm_bl and corgi_lcd drivers to use GPIO descriptors"
* tag 'backlight-next-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: corgi: Convert to use GPIO descriptors
backlight: pwm_bl: Switch to full GPIO descriptor
Merge tag 'leds-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds
Pull LED updates from Pavel Machek:
"One new driver, some driver changes, and some late minute cleanups --
but those are just whitespace so should be okay.
There are some major changes being prepared (multicolor, triggers) so
the next release likely will be more interesting"
* tag 'leds-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
leds: core: Fix warning message when init_data
leds: make functions easier to understand
leds: sort Makefile entries
leds: old enums are not really applicable to new code
leds: ip30: label power LED as such
leds: lm3532: make bitfield 'enabled' unsigned
leds: leds-pwm: Replace zero-length array with flexible-array member
leds: leds-is31fl32xx: Replace zero-length array with flexible-array member
leds: pwm: remove useless pwm_period_ns
leds: pwm: remove header
leds: pwm: convert to atomic PWM API
leds: pwm: simplify if condition
leds: add SGI IP30 led support
leds: lm3697: fix spelling mistake "To" -> "Too"
leds: leds-bd2802: remove set but not used variable 'pdata'
leds: ns2: Convert to GPIO descriptors
leds: ns2: Absorb platform data
Peter Xu [Wed, 8 Apr 2020 01:40:09 +0000 (21:40 -0400)]
mm/mempolicy: Allow lookup_node() to handle fatal signal
lookup_node() uses gup to pin the page and get node information. It
checks against ret>=0 assuming the page will be filled in. However it's
also possible that gup will return zero, for example, when the thread is
quickly killed with a fatal signal. Teach lookup_node() to gracefully
return an error -EFAULT if it happens.
Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.
Dave Airlie [Tue, 7 Apr 2020 23:14:21 +0000 (09:14 +1000)]
Merge tag 'drm-misc-next-fixes-2020-04-04' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
A bunch of fixes to avoid null pointer dereference in fbcon, fix a return
in xen, some DT bindings fixes, a vc4 issue with 1920x1200 mode validation,
and a conflicting framebuffer in vboxvideo.
platform/chrome: cros_ec_spi: Wait for USECS, not NSECS
The use of `delay_usecs` in terminate_request() was replaced with the new
`delay` struct used by the SPI subsystem, however the unit was
set to SPI_DELAY_UNIT_NSECS instead of SPI_DELAY_UNIT_USECS. This fixes that.
Fixes: 7d3ca507fda9 ("platform/chrome: cros_ec_spi: Use new structure for SPI transfer delays") Signed-off-by: Benson Leung <[email protected]>
- a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)
- various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
ubsan, fault-injection, ipc)
* emailed patches from Andrew Morton <[email protected]>: (158 commits)
ipc/shm.c: make compat_ksys_shmctl() static
ipc/mqueue.c: fix a brace coding style issue
lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
ubsan: include bug type in report header
kasan: unset panic_on_warn before calling panic()
ubsan: check panic_on_warn
drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
ubsan: split "bounds" checker from other options
ubsan: add trap instrumentation option
init/Kconfig: clean up ANON_INODES and old IO schedulers options
kernel/gcov/fs.c: replace zero-length array with flexible-array member
gcov: gcc_3_4: replace zero-length array with flexible-array member
gcov: gcc_4_7: replace zero-length array with flexible-array member
kernel/kmod.c: fix a typo "assuems" -> "assumes"
reiserfs: clean up several indentation issues
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
samples/hw_breakpoint: drop use of kallsyms_lookup_name()
samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
fs/binfmt_elf.c: allocate less for static executable
...
Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a page leak in nfs_destroy_unlinked_subrequests()
- Fix use-after-free issues in nfs_pageio_add_request()
- Fix new mount code constant_table array definitions
- finish_automount() requires us to hold 2 refs to the mount record
Features:
- Improve the accuracy of telldir/seekdir by using 64-bit cookies
when possible.
- Allow one RDMA active connection and several zombie connections to
prevent blocking if the remote server is unresponsive.
- Limit the size of the NFS access cache by default
- Reduce the number of references to credentials that are taken by
NFS
- pNFS files and flexfiles drivers now support per-layout segment
COMMIT lists.
- Enable partial-file layout segments in the pNFS/flexfiles driver.
- Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
- pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
the DS using the layouterror mechanism.
Bugfixes and cleanups:
- SUNRPC: Fix krb5p regressions
- Don't specify NFS version in "UDP not supported" error
- nfsroot: set tcp as the default transport protocol
- pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
- alloc_nfs_open_context() must use the file cred when available
- Fix locking when dereferencing the delegation cred
- Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
- Various clean ups of the NFS O_DIRECT commit code
- Clean up RDMA connect/disconnect
- Replace zero-length arrays with C99-style flexible arrays"
* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
NFS: Clean up process of marking inode stale.
SUNRPC: Don't start a timer on an already queued rpc task
NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
NFS: Beware when dereferencing the delegation cred
NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
NFS: finish_automount() requires us to hold 2 refs to the mount record
NFS: Fix a few constant_table array definitions
NFS: Try to join page groups before an O_DIRECT retransmission
NFS: Refactor nfs_lock_and_join_requests()
NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
NFS: Clean up nfs_lock_and_join_requests()
NFS: Remove the redundant function nfs_pgio_has_mirroring()
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
NFS: Fix use-after-free issues in nfs_pageio_add_request()
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
...
Colin Ian King [Sun, 5 Apr 2020 11:51:20 +0000 (12:51 +0100)]
ata: ahci-imx: remove redundant assignment to ret
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Merge tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mainly focused on fixing bugs and addressing
issues in recently introduced compression support.
Enhancement:
- add zstd support, and set LZ4 by default
- add ioctl() to show # of compressed blocks
- show mount time in debugfs
- replace rwsem with spinlock
- avoid lock contention in DIO reads
Some major bug fixes wrt compression:
- compressed block count
- memory access and leak
- remove obsolete fields
- flag controls
Other bug fixes and clean ups:
- fix overflow when handling .flags in inode_info
- fix SPO issue during resize FS flow
- fix compression with fsverity enabled
- potential deadlock when writing compressed pages
- show missing mount options"
* tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
f2fs: keep inline_data when compression conversion
f2fs: fix to disable compression on directory
f2fs: add missing CONFIG_F2FS_FS_COMPRESSION
f2fs: switch discard_policy.timeout to bool type
f2fs: fix to verify tpage before releasing in f2fs_free_dic()
f2fs: show compression in statx
f2fs: clean up dic->tpages assignment
f2fs: compress: support zstd compress algorithm
f2fs: compress: add .{init,destroy}_decompress_ctx callback
f2fs: compress: fix to call missing destroy_compress_ctx()
f2fs: change default compression algorithm
f2fs: clean up {cic,dic}.ref handling
f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()
f2fs: xattr.h: Make stub helpers inline
f2fs: fix to avoid double unlock
f2fs: fix potential .flags overflow on 32bit architecture
f2fs: fix NULL pointer dereference in f2fs_verity_work()
f2fs: fix to clear PG_error if fsverity failed
f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()
f2fs: don't trigger data flush in foreground operation
...
Kai-Heng Feng [Wed, 27 Mar 2019 09:02:54 +0000 (17:02 +0800)]
libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
During system resume from suspend, this can be observed on ASM1062 PMP
controller:
ata10.01: SATA link down (SStatus 0 SControl 330)
ata10.02: hard resetting link
ata10.02: SATA link down (SStatus 0 SControl 330)
ata10.00: configured for UDMA/133
Kernel panic - not syncing: stack-protector: Kernel
in: sata_pmp_eh_recover+0xa2b/0xa40
Since sata_pmp_eh_recover_pmp() doens't set rc when ATA_DFLAG_DETACH is
set, sata_pmp_eh_recover() continues to run. During retry it triggers
the stack protector.
Set correct rc in sata_pmp_eh_recover_pmp() to let sata_pmp_eh_recover()
jump to pmp_fail directly.
block: fix busy device checking in blk_drop_partitions
bd_super is only set by get_tree_bdev and mount_bdev, and thus not by
other openers like btrfs or the XFS realtime and log devices, as well as
block devices directly opened from user space. Check bd_openers
instead.
Fixes: 77032ca66f86 ("Return EBUSY from BLKRRPART for mounted whole-dev fs") Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Jan Kara [Tue, 7 Apr 2020 15:46:43 +0000 (17:46 +0200)]
ucount: Make sure ucounts in /proc/sys/user don't regress again
Commit 769071ac9f20 "ns: Introduce Time Namespace" broke reporting of
inotify ucounts (max_inotify_instances, max_inotify_watches) in
/proc/sys/user because it has added UCOUNT_TIME_NAMESPACES into enum
ucount_type but didn't properly update reporting in
kernel/ucount.c:setup_userns_sysctls(). This problem got fixed in commit eeec26d5da82 "time/namespace: Add max_time_namespaces ucount".
Add BUILD_BUG_ON to catch a similar problem in the future.
In writing_usb_driver.rst:
Remove link to https://www.qbik.ch/usb/devices/ since it seems to be inactive since 2013
Update link to linux-usb mailing list archive
Lukas Bulwahn [Mon, 30 Mar 2020 06:01:32 +0000 (08:01 +0200)]
docs: driver-api: address duplicate label warning
Delete identically named subsection to fix Documentation warning:
Documentation/driver-api/w1.rst:11: \
WARNING: duplicate label driver-api/w1:w1 api internal to the kernel, \
other instance in Documentation/driver-api/w1.rst
Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- Fix for memory leaks around UBIFS orphan handling
- Fix for memory leaks around UBI fastmap
- Remove zero-length array from ubi-media.h
- Fix for TNC lookup in UBIFS orphan code
* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: ubi-media.h: Replace zero-length array with flexible-array member
ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
ubi: fastmap: Only produce the initial anchor PEB when fastmap is used
ubi: fastmap: Free unused fastmap anchor peb during detach
ubifs: ubifs_add_orphan: Fix a memory leak bug
ubifs: ubifs_jnl_write_inode: Fix a memory leak bug
ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()
Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- New mode for time travel, external via virtio
- Fixes for ubd to make sure no requests can get lost
- Fixes for vector networking
- Allow CONFIG_STATIC_LINK only when possible
- Minor cleanups and fixes
* tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Remove some unnecessary NULL checks in vector_user.c
um: vector: Avoid NULL ptr deference if transport is unset
um: Make CONFIG_STATIC_LINK actually static
um: Implement cpu_relax() as ndelay(1) for time-travel
um: Implement ndelay/udelay in time-travel mode
um: Implement time-travel=ext
um: virtio: Implement VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS
um: time-travel: Rewrite as an event scheduler
um: Move timer-internal.h to non-shared
hostfs: Use kasprintf() instead of fixed buffer formatting
um: falloc.h needs to be directly included for older libc
um: ubd: Retry buffer read on any kind of error
um: ubd: Prevent buffer overrun on command completion
um: Fix overlapping ELF segments when statically linked
um: Delete never executed timer
um: Don't overwrite ethtool driver version
um: Fix len of file in create_pid_file
um: Don't use console_drivers directly
um: Cleanup CONFIG_IOSCHED_CFQ
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"A few cleanups all over the place, things of note:
- Enable the clone3 syscall
- Remove CONFIG_CROSS_COMPILE from Krzysztof Kozlowski
- Update to use mmgrab from Julia Lawall"
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc: Remove obsolete show_trace_task function
openrisc: Cleanup copy_thread_tls docs and comments
openrisc: Enable the clone3 syscall
openrisc: Convert copy_thread to copy_thread_tls
openrisc: use mmgrab
openrisc: configs: Cleanup CONFIG_CROSS_COMPILE
Alyssa Ross [Fri, 3 Apr 2020 17:07:01 +0000 (17:07 +0000)]
Documentation: sysrq: fix RST formatting
"On x86" and "On SPARC" are now definition list terms, like
"On PowerPC", "On other", and "On all".
The Credits list is now a bulleted list, like lots of Credits lists in
other files. This prevents the list from becoming a single long,
unpunctuated sentence in the generated documentation.
I also did a couple of other tiny readability improvements to the
"How do I use the magic SysRq key?" section while I was there.
1) Slave bond and team devices should not be assigned ipv6 link local
addresses, from Jarod Wilson.
2) Fix clock sink config on some at803x PHY devices, from Oleksij
Rempel.
3) Uninitialized stack space transmitted in slcan frames, fix from
Richard Palethorpe.
4) Guard HW VLAN ops properly in stmmac driver, from Jose Abreu.
5) "=" --> "|=" fix in aquantia driver, from Colin Ian King.
6) Fix TCP fallback in mptcp, from Florian Westphal. (accessing a plain
tcp_sk as if it were an mptcp socket).
7) Fix cavium driver in some configurations wrt. PTP, from Yue Haibing.
8) Make ipv6 and ipv4 consistent in the lower bound allowed for
neighbour entry retrans_time, from Hangbin Liu.
9) Don't use private workqueue in pegasus usb driver, from Petko
Manolov.
10) Fix integer overflow in mlxsw, from Colin Ian King.
11) Missing refcnt init in cls_tcindex, from Cong Wang.
12) One too many loop iterations when processing cmpri entries in ipv6
rpl code, from Alexander Aring.
13) Disable SG and TSO by default in r8169, from Heiner Kallweit.
14) NULL deref in macsec, from Davide Caratti.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
macsec: fix NULL dereference in macsec_upd_offload()
skbuff.h: Improve the checksum related comments
net: dsa: bcm_sf2: Ensure correct sub-node is parsed
qed: remove redundant assignment to variable 'rc'
wimax: remove some redundant assignments to variable result
mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_PRIORITY
r8169: change back SG and TSO to be disabled by default
net: dsa: bcm_sf2: Do not register slave MDIO bus with OF
ipv6: rpl: fix loop iteration
tun: Don't put_page() for all negative return values from XDP program
net: dsa: mt7530: fix null pointer dereferencing in port5 setup
mptcp: add some missing pr_fmt defines
net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers
net_sched: fix a missing refcnt in tcindex_init()
net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting
mlxsw: spectrum_trap: fix unintention integer overflow on left shift
pegasus: Remove pegasus' own workqueue
neigh: support smaller retrans_time settting
net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry
...
Steve French [Tue, 7 Apr 2020 15:23:27 +0000 (10:23 -0500)]
smb3: smbdirect support can be configured by default
smbdirect support (SMB3 over RDMA) should be enabled by
default in many configurations.
It is not experimental and is stable enough and has enough
performance benefits to recommend that it be configured by
default. Change the "If unsure N" to "If unsure Y" in
the description of the configuration parameter.
Michael Strauss [Sun, 5 Apr 2020 20:41:12 +0000 (16:41 -0400)]
drm/amd/display: Check for null fclk voltage when parsing clock table
[WHY]
In cases where a clock table is malformed such that fclk entries have
frequencies but not voltages listed, we don't catch the error and set
clocks to 0 instead of using hardcoded values as we should.
[HOW]
Add check for clock tables fclk entry's voltage as well
[Why]
If dc->clk_mgr->funcs->are_clock_states_equal is set, then
wm_optimized_required is never checked. In that case, when going from a
higher mode to a lower mode, wm_optimized_required remains true until
the next mode change.
drm/amd/display: Make cursor source translation adjustment optional
[Why]
In some usecases, like tiled display, the stream and plane configuration
can be setup in a way where the caller expects DAL to perform the
clipping, eg:
P0:
src_rect(0, 0, w, h)
dst_rect(0, 0, w, h)
P1:
src_rect(w, 0, w, h)
dst_rect(0, 0, w, h)
Cursor is enabled on both streams with the same position.
This can result in double cursor on tiled display, even though this
behavior is technically correct from the DC interface point of view.
We need a mechanism to control this dynamically.
[How]
This is something that should live in the DM layer based on detection
of the specified configuration but it's not something that we really
have enough information to deal with today.
Add a flag to the cursor position state that specifies whether we
want DC to do the translation or not and make it opt-in and let
the DM decide when to do it.
drm/amd/display: Calculate scaling ratios on every medium/full update
[Why]
If a plane isn't being actively enabled or disabled then DC won't
always recalculate scaling rects and ratios for the primary plane.
This results in only a partial or corrupted rect being displayed on
the screen instead of scaling to fit the screen.
[How]
Add back the logic to recalculate the scaling rects into
dc_commit_updates_for_stream since this is the expected place to
do it in DC.
This was previously removed a few years ago to fix an underscan issue
but underscan is still functional now with this change - and it should
be, since this is only updating to the latest plane state getting passed
in.
drm/amd/display: Program viewport when source pos changes for DCN20 hw seq
[Why]
For medium updates that change nothing but the source rect position
the viewport doesn't change on DCN20.
We're missing the check for the position update bit that was there in
the DCN10 hardware sequencer.
[How]
Check the position bit along with the scaling bit like we were doing
with DCN20.
We shouldn't actually hit a case where context != current_state in
our programming/commit model but guard against it anyway since it was
guarded for the other bits.
drm/amd/display: Fix incorrect cursor pos on scaled primary plane
[Why]
Cursor pos is correctly adjusted from DC side for source rect offset
on DCN ASIC, but only on the overlay.
This is because DM places offsets the cursor for primary planes only
to workaround missing code in DCE for the adjustment we're now correctly
doing in DC for DCN ASIC.
[How]
Drop the adjustment for source rect from the DM side of things and put
the code where it actually belongs - in DC on the pipe level.
drm/amd/display: Translate cursor position by source rect
[Why]
Cursor is drawn as part of the framebuffer for a plane on AMD hardware.
The cursor position on the framebuffer does not change even if the
source rect viewport for the cursor does. This causes the cursor to be
clipped.
The following IGT tests fail as a result of this issue:
- kms_plane_cursor@pipe-*-viewport-size-*
[How]
Offset cursor position by plane source rect viewport. If the viewport
is unscaled then the cursor is now correctly positioned on any
plane - primary or overlay.
There is still a hardware limitation for dealing with the cursor size
being incorrectly scaled but that's not something we can address.
Add some documentation explaining some of this in the code while we're
at it.
drm/amd/display: Update stream adjust in dc_stream_adjust_vmin_vmax
[Why]
After v_total_min and max are updated in vrr structure, the changes are
not reflected in stream adjust. When these values are read from stream
adjust it does not reflect the actual state of the system.
[How]
Set stream adjust values equal to vrr adjust values after vrr adjust
values are updated.
Merge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull pcmcia updates from Dominik Brodowski:
"A few PCMCIA odd fixes: removing a few spaces and useless casts,
replacing snprintf() with scnprintf(), and replacing zero-length
arrays with a flexible-array member"
* 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
pcmcia: remove some unused space characters
pcmcia: soc_common.h: Replace zero-length array with flexible-array member
pcmcia: cs_internal.h: Replace zero-length array with flexible-array member
pcmcia: Use scnprintf() for avoiding potential buffer overflow
pcmcia: omap: remove useless cast for driver.name
Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).
When syzbot tries to figure out how to deduplicate bug reports, it prefers
seeing a hint about a specific bug type (we can do better than just
"UBSAN"). This lifts the handler reason into the UBSAN report line that
includes the file path that tripped a check. Unfortunately, UBSAN does
not provide function names.
Syzkaller expects kernel warnings to panic when the panic_on_warn sysctl
is set. More work is needed here to have UBSan reuse the WARN
infrastructure, but for now, just check the flag manually.
In order to do kernel builds with the bounds checker individually
available, introduce CONFIG_UBSAN_BOUNDS, with the remaining options under
CONFIG_UBSAN_MISC.
For example, using this, we can start to expand the coverage syzkaller is
providing. Right now, all of UBSan is disabled for syzbot builds because
taken as a whole, it is too noisy. This will let us focus on one feature
at a time.
For the bounds checker specifically, this provides a mechanism to
eliminate an entire class of array overflows with close to zero
performance overhead (I cannot measure a difference). In my (mostly)
defconfig, enabling bounds checking adds ~4200 checks to the kernel.
Performance changes are in the noise, likely due to the branch predictors
optimizing for the non-fail path.
Some notes on the bounds checker:
- it does not instrument {mem,str}*()-family functions, it only
instruments direct indexed accesses (e.g. "foo[i]"). Dealing with
the {mem,str}*()-family functions is a work-in-progress around
CONFIG_FORTIFY_SOURCE[1].
- it ignores flexible array members, including the very old single
byte (e.g. "int foo[1];") declarations. (Note that GCC's
implementation appears to ignore _all_ trailing arrays, but Clang only
ignores empty, 0, and 1 byte arrays[2].)
Patch series "ubsan: Split out bounds checker", v5.
This splits out the bounds checker so it can be individually used. This
is enabled in Android and hopefully for syzbot. Includes LKDTM tests for
behavioral corner-cases (beyond just the bounds checker), and adjusts
ubsan and kasan slightly for correct panic handling.
This patch (of 6):
The Undefined Behavior Sanitizer can operate in two modes: warning
reporting mode via lib/ubsan.c handler calls, or trap mode, which uses
__builtin_trap() as the handler. Using lib/ubsan.c means the kernel image
is about 5% larger (due to all the debugging text and reporting structures
to capture details about the warning conditions). Using the trap mode,
the image size changes are much smaller, though at the loss of the
"warning only" mode.
In order to give greater flexibility to system builders that want minimal
changes to image size and are prepared to deal with kernel code being
aborted and potentially destabilizing the system, this introduces
CONFIG_UBSAN_TRAP. The resulting image sizes comparison:
kernel/gcov/fs.c: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
gcov: gcc_3_4: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
gcov: gcc_4_7: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Will Deacon [Tue, 7 Apr 2020 03:11:43 +0000 (20:11 -0700)]
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
kallsyms_lookup_name() and kallsyms_on_each_symbol() are exported to
modules despite having no in-tree users and being wide open to abuse by
out-of-tree modules that can use them as a method to invoke arbitrary
non-exported kernel functions.
Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol().
Will Deacon [Tue, 7 Apr 2020 03:11:39 +0000 (20:11 -0700)]
samples/hw_breakpoint: drop use of kallsyms_lookup_name()
The 'data_breakpoint' test code is the only modular user of
kallsyms_lookup_name(), which was exported as part of fixing the test in f60d24d2ad04 ("hw-breakpoints: Fix broken hw-breakpoint sample module").
In preparation for un-exporting this symbol, switch the test over to using
__symbol_get(), which can be used to place breakpoints on exported
symbols.
Will Deacon [Tue, 7 Apr 2020 03:11:36 +0000 (20:11 -0700)]
samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
Patch series "Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()".
Despite having just a single modular in-tree user that I could spot,
kallsyms_lookup_name() is exported to modules and provides a mechanism
for out-of-tree modules to access and invoke arbitrary, non-exported
kernel symbols when kallsyms is enabled.
This patch series fixes up that one user and unexports the symbol along
with kallsyms_on_each_symbol(), since that could also be abused in a
similar manner.
I would like to avoid out-of-tree modules being easily able to call
functions that are not exported. kallsyms_lookup_name() makes this
trivial to the point that there is very little incentive to rework these
modules to either use upstream interfaces correctly or propose
functionality which may be otherwise missing upstream. Both of these
latter solutions would be pre-requisites to upstreaming these modules, and
the current state of things actively discourages that approach.
The background here is that we are aiming for Android devices to be able
to use a generic binary kernel image closely following upstream, with any
vendor extensions coming in as kernel modules. In this case, we (Google)
end up maintaining the binary module ABI within the scope of a single LTS
kernel. Monitoring and managing the ABI surface is not feasible if it
effectively includes all data and functions via kallsyms_lookup_name().
Of course, we could just carry this patch in the Android kernel tree, but
we're aiming to carry as little as possible (ideally nothing) and I think
it's a sensible change in its own right. I'm surprised you object to it,
in all honesty.
Now, you could turn around and say "that's not upstream's problem", but it
still seems highly undesirable to me to have an upstream bypass for
exported symbols that isn't even used by upstream modules. It's ripe for
abuse and encourages people to work outside of the upstream tree. The
usual rule is that we don't export symbols without a user in the tree and
that seems especially relevant in this case.
Joe Lawrence said:
: FWIW, kallsyms was historically used by the out-of-tree kpatch support
: module to resolve external symbols as well as call set_memory_r{w,o}()
: API. All of that support code has been merged upstream, so modern kpatch
: modules* no longer leverage kallsyms by default.
:
: That said, there are still some users who still use the deprecated support
: module with newer kernels, but that is not officially supported by the
: project.
This patch (of 3):
Given the name of a kernel symbol, the 'data_breakpoint' test claims to
"report any write operations on the kernel symbol". However, it creates
the breakpoint using both HW_BREAKPOINT_W and HW_BREAKPOINT_R, which menas
it also fires for read access.
Drop HW_BREAKPOINT_R from the breakpoint attributes.
Jason Baron [Tue, 7 Apr 2020 03:11:23 +0000 (20:11 -0700)]
fs/epoll: make nesting accounting safe for -rt kernel
Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
ep_poll_safewake() can take several non-raw spinlocks after disabling
interrupts. Since a spinlock can block in the -rt kernel, we can't take a
spinlock after disabling interrupts. So let's re-work how we determine
the nesting level such that it plays nicely with the -rt kernel.
Let's introduce a 'nests' field in struct eventpoll that records the
current nesting level during ep_poll_callback(). Then, if we nest again
we can find the previous struct eventpoll that we were called from and
increase our count by 1. The 'nests' field is protected by
ep->poll_wait.lock.
I've also moved the visited field to reduce the size of struct eventpoll
from 184 bytes to 176 bytes on x86_64 for !CONFIG_DEBUG_LOCK_ALLOC, which
is typical for a production config.