Daniel Borkmann [Sat, 23 Mar 2019 00:49:10 +0000 (01:49 +0100)]
bpf, libbpf: fix version info and add it to shared object
Even though libbpf's versioning script for the linker (libbpf.map)
is pointing to 0.0.2, the BPF_EXTRAVERSION in the Makefile has
not been updated along with it and is therefore still on 0.0.1.
While fixing up, I also noticed that the generated shared object
versioning information is missing, typical convention is to have
a linker name (libbpf.so), soname (libbpf.so.0) and real name
(libbpf.so.0.0.2) for library management. This is based upon the
LIBBPF_VERSION as well.
The build will then produce the following bpf libraries:
# rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld install
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/btf.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so.0.0.2
LINK /tmp/bld/test_libbpf
INSTALL /tmp/bld/libbpf.a
INSTALL /tmp/bld/libbpf.so.0.0.2
Linus Torvalds [Sun, 24 Mar 2019 20:41:37 +0000 (13:41 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
Linus Torvalds [Sun, 24 Mar 2019 18:42:10 +0000 (11:42 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:
- Prevent a 32bit math overflow in the cpufreq code
- Fix a buffer overflow when scanning the cgroup2 cpu.max property
- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow
Linus Torvalds [Sun, 24 Mar 2019 18:16:27 +0000 (11:16 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...
Linus Torvalds [Sun, 24 Mar 2019 18:12:27 +0000 (11:12 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Prevent potential NULL pointer dereferences in the HPET and HyperV
code
- Exclude the GART aperture from /proc/kcore to prevent kernel
crashes on access
- Use the correct macros for Cyrix I/O on Geode processors
- Remove yet another kernel address printk leak
- Announce microcode reload completion as requested by quite some
people. Microcode loading has become popular recently.
- Some 'Make Clang' happy fixlets
- A few cleanups for recently added code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from kcore
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
x86/mm/pti: Make local symbols static
x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
x86/microcode: Announce reload operation's completion
x86/hyperv: Prevent potential NULL pointer dereference
x86/hpet: Prevent potential NULL pointer dereference
x86/lib: Fix indentation issue, remove extra tab
x86/boot: Restrict header scope to make Clang happy
x86/mm: Don't leak kernel addresses
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Linus Torvalds [Sun, 24 Mar 2019 18:09:47 +0000 (11:09 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A set of small fixes plus the removal of stale board support code:
- Remove the board support code from the clpx711x clocksource driver.
This change had fallen through the cracks and I'm sending it now
rather than dealing with people who want to improve that stale code
for 3 month.
- Use the proper clocksource mask on RICSV
- Make local scope functions and variables static"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/clps711x: Remove board support
clocksource/drivers/riscv: Fix clocksource mask
clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction static
clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static
clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static
clocksource/drivers/clps711x: Make clps711x_clksrc_init() static
time/jiffies: Make refined_jiffies static
Linus Torvalds [Sun, 24 Mar 2019 17:58:01 +0000 (10:58 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two small fixes:
- Cure a recently introduces error path hickup which tries to
unregister a not registered lockdep key in te workqueue code
- Prevent unaligned cmpxchg() crashes in the robust list handling
code by sanity checking the user space supplied futex pointer"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Ensure that futex address is aligned in handle_futex_death()
workqueue: Only unregister a registered lockdep key
Linus Torvalds [Sun, 24 Mar 2019 17:51:23 +0000 (10:51 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem:
- Remove secondary GIC support on systems w/o device-tree support
- A set of small fixlets in various irqchip drivers
- static and fall-through annotations
- Kernel doc and typo fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Mark expected switch case fall-through
genirq/devres: Remove excess parameter from kernel doc
irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
irqchip/mbigen: Don't clear eventid when freeing an MSI
irqchip/stm32: Don't set rising configuration registers at init
irqchip/stm32: Don't clear rising/falling config registers at init
dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
irqchip/mmp: Make mmp_irq_domain_ops static
irqchip/brcmstb-l2: Make two init functions static
genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
irqchip/gic: Drop support for secondary GIC in non-DT systems
irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
Linus Torvalds [Sun, 24 Mar 2019 17:17:33 +0000 (10:17 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
"Two small fixes:
- Move the large objtool_file struct off the stack so objtool works
in setups with a tight stack limit.
- Make a few variables static in the watchdog core code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Make variables static
objtool: Move objtool_file struct off the stack
Linus Torvalds [Sun, 24 Mar 2019 16:58:08 +0000 (09:58 -0700)]
Merge tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fixes from Steve French:
- two fixes for stable for guest mount problems with smb3.1.1
- two fixes for crediting (SMB3 flow control) on resent requests
- a byte range lock leak fix
- two fixes for incorrect rc mappings
* tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
SMB3: Fix SMB3.1.1 guest mounts to Samba
cifs: Fix slab-out-of-bounds when tracing SMB tcon
cifs: allow guest mounts to work for smb3.11
fix incorrect error code mapping for OBJECTID_NOT_FOUND
cifs: fix that return -EINVAL when do dedupe operation
CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN
CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN
Linus Torvalds [Sun, 24 Mar 2019 16:43:35 +0000 (09:43 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fixes to four drivers and two core fixes.
One core fix simply corrects a missed destroy_rcu_head() but the other
is hopefully the end of an ongoing effort to make suspend/resume play
nicely with scsi quiesce"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvscsi: Fix empty event pool access during host removal
scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton
scsi: hisi_sas: Add softreset in hisi_sas_I_T_nexus_reset()
scsi: qla2xxx: Fix NULL pointer crash due to stale CPUID
scsi: qla2xxx: Fix FC-AL connection target discovery
scsi: core: Avoid that a kernel warning appears during system resume
scsi: core: Also call destroy_rcu_head() for passthrough requests
scsi: iscsi: flush running unbind operations when removing a session
Arnd Bergmann [Fri, 22 Mar 2019 14:18:43 +0000 (15:18 +0100)]
rxrpc: avoid clang -Wuninitialized warning
clang produces a false-positive warning as it fails to notice
that "lost = true" implies that "ret" is initialized:
net/rxrpc/output.c:402:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
if (lost)
^~~~
net/rxrpc/output.c:437:6: note: uninitialized use occurs here
if (ret >= 0) {
^~~
net/rxrpc/output.c:402:2: note: remove the 'if' if its condition is always false
if (lost)
^~~~~~~~~
net/rxrpc/output.c:339:9: note: initialize the variable 'ret' to silence this warning
int ret, opt;
^
= 0
Rearrange the code to make that more obvious and avoid the warning.
Jon Maloy [Fri, 22 Mar 2019 14:03:51 +0000 (15:03 +0100)]
tipc: tipc clang warning
When checking the code with clang -Wsometimes-uninitialized we get the
following warning:
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:847:46: note: uninitialized use occurs here
tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr);
net/tipc/node.c:831:2: note: remove the 'if' if its condition is always
true
if (!tipc_link_is_establishing(l)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/node.c:821:31: note: initialize the variable 'maddr' to silence
this warning
struct tipc_media_addr *maddr;
We fix this by initializing 'maddr' to NULL. For the matter of clarity,
we also test if 'xmitq' is non-empty before we use it and 'maddr'
further down in the function. It will never happen that 'xmitq' is non-
empty at the same time as 'maddr' is NULL, so this is a sufficient test.
Fixes: 598411d70f85 ("tipc: make resetting of links non-atomic") Reported-by: Nathan Chancellor <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
John Hurley [Fri, 22 Mar 2019 12:37:35 +0000 (12:37 +0000)]
net: sched: fix cleanup NULL pointer exception in act_mirr
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.
Fix this by initialising the list entry on creation of a new action.
Heiner Kallweit [Fri, 22 Mar 2019 06:39:35 +0000 (07:39 +0100)]
r8169: fix cable re-plugging issue
Bartek reported that after few cable unplug/replug cycles suddenly
replug isn't detected any longer. His system uses a RTL8106, I wasn't
able to reproduce the issue with RTL8168g. According to his bisect
the referenced commit caused the regression. As Realtek doesn't
release datasheets or errata it's hard to say what's the actual root
cause, but this change was reported to fix the issue.
Wen Yang [Fri, 22 Mar 2019 03:04:09 +0000 (11:04 +0800)]
net: ethernet: ti: fix possible object reference leak
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ti/netcp_ethss.c:3661:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
./drivers/net/ethernet/ti/netcp_ethss.c:3665:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3654, but without a corresponding object release within this function.
Wen Yang [Fri, 22 Mar 2019 03:04:08 +0000 (11:04 +0800)]
net: ibm: fix possible object reference leak
The call to ehea_get_eth_dn returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/ibm/ehea/ehea_main.c:3163:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 3154, but without a corresponding object release within this function.
Wen Yang [Fri, 22 Mar 2019 03:04:07 +0000 (11:04 +0800)]
net: xilinx: fix possible object reference leak
The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1624:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1569, but without a corresponding object release within this function.
Florian Fainelli [Thu, 21 Mar 2019 23:34:44 +0000 (16:34 -0700)]
net: phy: Re-parent menus for MDIO bus drivers correctly
After 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support
from PHYs") the various MDIO bus drivers were no longer parented with
config PHYLIB but with config MDIO_BUS which is not a menuconfig, fix
this by depending on MDIO_DEVICE which is a menuconfig.
This is visually nicer and less confusing for users.
Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Sat, 23 Mar 2019 17:25:12 +0000 (10:25 -0700)]
Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-block
Pull io_uring fixes and improvements from Jens Axboe:
"The first five in this series are heavily inspired by the work Al did
on the aio side to fix the races there.
The last two re-introduce a feature that was in io_uring before it got
merged, but which I pulled since we didn't have a good way to have
BVEC iters that already have a stable reference. These aren't
necessarily related to block, it's just how io_uring pins fixed
buffers"
* tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
block: add BIO_NO_PAGE_REF flag
iov_iter: add ITER_BVEC_FLAG_NO_REF flag
io_uring: mark me as the maintainer
io_uring: retry bulk slab allocs as single allocs
io_uring: fix poll races
io_uring: fix fget/fput handling
io_uring: add prepped flag
io_uring: make io_read/write return an integer
io_uring: use regular request ref counts
Linus Torvalds [Sat, 23 Mar 2019 17:14:42 +0000 (10:14 -0700)]
Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes/changes that should go into this series. This contains:
- Kernel doc / comment updates (Bart, Shenghui)
- Un-export of core-only used function (Bart)
- Fix race on loop file access (Dongli)
- pf/pcd queue cleanup fixes (me)
- Use appropriate helper for RESTART bit set (Yufen)
- Use named identifier for classic poll (Yufen)"
* tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
blkcg: Fix kernel-doc warnings
blk-iolatency: #include "blk.h"
block: Unexport blk_mq_add_to_requeue_list()
block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
loop: access lo_backing_file only when the loop device is Lo_bound
blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
paride/pcd: cleanup queues when detection fails
paride/pf: cleanup queues when detection fails
Linus Torvalds [Sat, 23 Mar 2019 17:04:47 +0000 (10:04 -0700)]
Merge tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A follow up for the new alloc_size logic and a blacklisting fix,
marked for stable"
* tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
rbd: drop wait_for_latest_osdmap()
libceph: wait for latest osdmap in ceph_monc_blacklist_add()
rbd: set io_min, io_opt and discard_granularity to alloc_size
Darrick J. Wong [Sat, 23 Mar 2019 16:10:29 +0000 (12:10 -0400)]
ext4: prohibit fstrim in norecovery mode
The ext4 fstrim implementation uses the block bitmaps to find free space
that can be discarded. If we haven't replayed the journal, the bitmaps
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Trond Myklebust [Thu, 21 Mar 2019 21:57:56 +0000 (17:57 -0400)]
NFS: Fix a typo in nfs_init_timeout_values()
Specifying a retrans=0 mount parameter to a NFS/TCP mount, is
inadvertently causing the NFS client to rewrite any specified
timeout parameter to the default of 60 seconds.
Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust <[email protected]>
zhangyi (F) [Sat, 23 Mar 2019 15:56:01 +0000 (11:56 -0400)]
ext4: cleanup bh release code in ext4_ind_remove_space()
Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
may probably forget to release the buffer some day. This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Trond Myklebust [Tue, 19 Mar 2019 15:24:54 +0000 (11:24 -0400)]
SUNRPC: Don't let RPC_SOFTCONN tasks time out if the transport is connected
If the transport is still connected, then we do want to allow
RPC_SOFTCONN tasks to retry. They should time out if and only if
the connection is broken.
zhangyi (F) [Sat, 23 Mar 2019 15:43:05 +0000 (11:43 -0400)]
ext4: brelse all indirect buffer in ext4_ind_remove_space()
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
With -Wimplicit-fallthrough added to CFLAGS:
kernel/irq/manage.c: In function ‘irq_do_set_affinity’:
kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
cpumask_copy(desc->irq_common_data.affinity, mask);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/irq/manage.c:199:2: note: here
case IRQ_SET_MASK_OK_NOCOPY:
^~~~
Kairui Song [Fri, 8 Mar 2019 03:05:08 +0000 (11:05 +0800)]
x86/gart: Exclude GART aperture from kcore
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.
vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.
Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.
Steve French [Sat, 23 Mar 2019 03:31:17 +0000 (22:31 -0500)]
SMB3: Fix SMB3.1.1 guest mounts to Samba
Workaround problem with Samba responses to SMB3.1.1
null user (guest) mounts. The server doesn't set the
expected flag in the session setup response so we have
to do a similar check to what is done in smb3_validate_negotiate
where we also check if the user is a null user (but not sec=krb5
since username might not be passed in on mount for Kerberos case).
Note that the commit below tightened the conditions and forced signing
for the SMB2-TreeConnect commands as per MS-SMB2.
However, this should only apply to normal user sessions and not for
cases where there is no user (even if server forgets to set the flag
in the response) since we don't have anything useful to sign with.
This is especially important now that the more secure SMB3.1.1 protocol
is in the default dialect list.
An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
the guest mounts to Windows.
[ 779.045085] Memory state around the buggy address:
[ 779.045089] ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 779.045093] ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 779.045099] ^
[ 779.045103] ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 779.045107] ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 779.045109] ==================================================================
[ 779.045110] Disabling lock debugging due to kernel taint
Correctly assign tree name str for smb3_tcon event.
Ronnie Sahlberg [Thu, 21 Mar 2019 04:59:02 +0000 (14:59 +1000)]
cifs: allow guest mounts to work for smb3.11
Fix Guest/Anonymous sessions so that they work with SMB 3.11.
The commit noted below tightened the conditions and forced signing for
the SMB2-TreeConnect commands as per MS-SMB2.
However, this should only apply to normal user sessions and not for
Guest/Anonumous sessions.
Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Signed-off-by: Ronnie Sahlberg <[email protected]> CC: Stable <[email protected]> Signed-off-by: Steve French <[email protected]>
Steve French [Sun, 17 Mar 2019 20:58:38 +0000 (15:58 -0500)]
fix incorrect error code mapping for OBJECTID_NOT_FOUND
It was mapped to EIO which can be confusing when user space
queries for an object GUID for an object for which the server
file system doesn't support (or hasn't saved one).
As Amir Goldstein suggested this is similar to ENOATTR
(equivalently ENODATA in Linux errno definitions) so
changing NT STATUS code mapping for OBJECTID_NOT_FOUND
to ENODATA.
Xiaoli Feng [Sat, 16 Mar 2019 04:11:54 +0000 (12:11 +0800)]
cifs: fix that return -EINVAL when do dedupe operation
dedupe_file_range operations is combiled into remap_file_range.
But it's always skipped for dedupe operations in function
cifs_remap_file_range.
Example to test:
Before this patch:
# dd if=/dev/zero of=cifs/file bs=1M count=1
# xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
XFS_IOC_FILE_EXTENT_SAME: Invalid argument
After this patch:
# dd if=/dev/zero of=cifs/file bs=1M count=1
# xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
XFS_IOC_FILE_EXTENT_SAME: Operation not supported
Influence for xfstests:
generic/091
generic/112
generic/127
generic/263
These tests report this error "do_copy_range:: Invalid
argument" instead of "FIDEDUPERANGE: Invalid argument".
Because there are still two bugs cause these test failed.
https://bugzilla.kernel.org/show_bug.cgi?id=202935
https://bugzilla.kernel.org/show_bug.cgi?id=202785
YueHaibing [Fri, 22 Mar 2019 14:39:40 +0000 (22:39 +0800)]
clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static
Fix sparse warnings:
drivers/clocksource/tcb_clksrc.c:74:6: warning:
symbol 'tc_clksrc_suspend' was not declared. Should it be static?
drivers/clocksource/tcb_clksrc.c:89:6: warning:
symbol 'tc_clksrc_resume' was not declared. Should it be static?
Thomas Gleixner [Fri, 22 Mar 2019 21:51:21 +0000 (22:51 +0100)]
Merge tag 'perf-core-for-mingo-5.1-20190321' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo:
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Thomas Gleixner [Fri, 22 Mar 2019 21:50:41 +0000 (22:50 +0100)]
Merge tag 'perf-core-for-mingo-5.1-20190311' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo:
kernel:
Stephane Eranian :
- Restore mmap record type correctly when handling PERF_RECORD_MMAP2
events, as the same template is used for all the threads interested
in mmap events, some may want just PERF_RECORD_MMAP, while some
may want the extra info in MMAP2 records.
perf probe:
Adrian Hunter:
- Fix getting the kernel map, because since changes related to x86 PTI
entry trampolines handling, there are more than one kernel map.
perf script:
Andi Kleen:
- Support insn output for normal samples, i.e.:
perf script -F ip,sym,insn --xed
Will fetch the sample IP from the thread address space and feed it
to Intel's XED disassembler, producing lines such as:
- Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
addition to PERF_RECORD_SAMPLE.
perf report:
- Add a new --samples option to save a small random number of samples
per hist entry, using a reservoir technique to select a representative
number of samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or CPU of the sample is displayed. Then we use less' search
functionality to directly jump to the time stamp of the selected sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
- Fix the UI browser scripts pop up menu when there are many scripts
available.
- Update x86's syscall_64.tbl, no change in tools/perf behaviour.
- Sync copies asm-generic/unistd.h and linux/in with the kernel sources.
perf data:
Jiri Olsa:
- Prep work to support having perf.data stored as a directory, with one
file per CPU, that ultimately will allow having one ring buffer reading
thread per CPU.
Vendor events:
Martin Liška:
- perf PMU events for AMD Family 17h.
perf script python:
Tony Jones:
- Add python3 support for the remaining Intel PT related scripts, with
these we should have a clean build of perf with python3 while still
supporting the build with python2.
libbpf:
Arnaldo Carvalho de Melo:
- Fix the build on uCLibc, adding the missing stdarg.h since we use
va_list in one typedef.
Linus Torvalds [Fri, 22 Mar 2019 21:15:11 +0000 (14:15 -0700)]
Merge tag 'powerpc-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a boot failure on 32-bit, introduced during the merge
window.
A fix for our handling of CLOCK_MONOTONIC in the 64-bit VDSO. Changing
the wall clock across the Y2038 boundary could cause CLOCK_MONOTONIC
to jump forward and backward.
Our spectre_v2 reporting was a bit confusing due to a bug I
introduced. On some systems it was reporting that the count cache was
disabled and also that we were flushing the count cache on context
switch. Only the former is true, and given that the count cache is
disabled it doesn't make any sense to flush it. No one reported it, so
presumably the presence of any mitigation is all people check for.
Finally a small build fix for zsmalloc on 32-bit.
Thanks to: Ben Hutchings, Christophe Leroy, Diana Craciun, Guenter
Roeck, Michael Neuling"
* tag 'powerpc-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/security: Fix spectre_v2 reporting
powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations
powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32
powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038
Linus Torvalds [Fri, 22 Mar 2019 21:10:27 +0000 (14:10 -0700)]
Merge tag 'iommu-fixes-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- AMD IOMMU fix for sg-mapping with sg->offset > PAGE_SIZE
- Fix for IOVA code to trigger the slow-path less often
- Two fixes for Intel VT-d to avoid writing to read-only registers and
to flush the right domain id for the default domains in scalable mode
* tag 'iommu-fixes-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Save the right domain ID used by hardware
iommu/vt-d: Check capability before disabling protected memory
iommu/iova: Fix tracking of recently failed iova address
iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
Linus Torvalds [Fri, 22 Mar 2019 21:04:38 +0000 (14:04 -0700)]
Merge tag 'sound-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only significant change is the regression fixes for the jack
detection at resume on HD-audio, while others are all small or trivial
fixes like the coverage of missing error code or usual HD-audio quirk"
* tag 'sound-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable headset MIC of Acer AIO with ALC286
ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec
ALSA: hda - Don't trigger jackpoll_work in azx_resume
ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist
ALSA: firewire-motu: use 'version' field of unit directory to identify model
ALSA: sb8: add a check for request_region
ALSA: echoaudio: add a check for ioremap_nocache
Linus Torvalds [Fri, 22 Mar 2019 19:03:19 +0000 (12:03 -0700)]
Merge tag 'pm-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These rearrange some code in the generic power domains (genpd)
framework to avoid a potential deadlock and make the turbostat utility
behave more as expected.
Specifics:
- Rearrange the generic power domains (genpd) code to avoid a
potential deadlock possible due to its interactions with the clock
framework (Jiada Wang)
- Make turbostat return the exit status of the command run under it
if that command fails (David Arcari)"
* tag 'pm-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Avoid a potential deadlock
tools/power turbostat: return the exit status of a command
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
When building with -Wsometimes-uninitialized, Clang warns:
arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used
uninitialized whenever switch default is taken
[-Wsometimes-uninitialized]
The default cannot be reached because arch_build_bp_info() initializes
hw->len to one of the specified cases. Nevertheless the warning is valid
and returning -EINVAL makes sure that this cannot be broken by future
modifications.
Heiner Kallweit [Thu, 21 Mar 2019 20:23:14 +0000 (21:23 +0100)]
r8169: don't read interrupt mask register in interrupt handler
After the original patch network starts to crash on heavy load.
It's not fully clear why this additional register read has such side
effects, but removing it fixes the issue.
Thanks also to Alex for his contribution and hints.
Valdis Kletnieks [Tue, 12 Mar 2019 09:33:48 +0000 (05:33 -0400)]
watchdog/core: Make variables static
sparse complains:
CHECK kernel/watchdog.c
kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
was not declared. Should it be static?
kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
was not declared. Should it be static?
They're not referenced by name from anyplace else, make them static.
Valdis Kletnieks [Tue, 12 Mar 2019 07:47:53 +0000 (03:47 -0400)]
x86/mm/pti: Make local symbols static
With 'make C=2 W=1', sparse and gcc both complain:
CHECK arch/x86/mm/pti.c
arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
CC arch/x86/mm/pti.o
arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
605 | void pti_set_kernel_image_nonglobal(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
local (static) symbol.
Chen Jie [Fri, 15 Mar 2019 03:44:38 +0000 (03:44 +0000)]
futex: Ensure that futex address is aligned in handle_futex_death()
The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.
As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.
[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
comment ]
Lu Baolu [Wed, 20 Mar 2019 01:58:34 +0000 (09:58 +0800)]
iommu/vt-d: Save the right domain ID used by hardware
The driver sets a default domain id (FLPT_DEFAULT_DID) in the
first level only pasid entry, but saves a different domain id
in @sdev->did. The value saved in @sdev->did will be used to
invalidate the translation caches. Hence, the driver might
result in invalidating the caches with a wrong domain id.
Lu Baolu [Wed, 20 Mar 2019 01:58:33 +0000 (09:58 +0800)]
iommu/vt-d: Check capability before disabling protected memory
The spec states in 10.4.16 that the Protected Memory Enable
Register should be treated as read-only for implementations
not supporting protected memory regions (PLMR and PHMR fields
reported as Clear in the Capability register).
Robert Richter [Wed, 20 Mar 2019 18:57:23 +0000 (18:57 +0000)]
iommu/iova: Fix tracking of recently failed iova address
If a 32 bit allocation request is too big to possibly succeed, it
early exits with a failure and then should never update max32_alloc_
size. This patch fixes current code, now the size is only updated if
the slow path failed while walking the tree. Without the fix the
allocation may enter the slow path again even if there was a failure
before of a request with the same or a smaller size.
Linus Torvalds [Fri, 22 Mar 2019 03:40:05 +0000 (20:40 -0700)]
Merge tag 'drm-fixes-2019-03-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"i915, amdgpu, vmwgfx, exynos, nouveau and udl fixes.
Seems to be lots of little minor ones for regressions in rc1, and some
cleanups. The exynos one is the largest one, and is for a hw
difference between exynos versions"
* tag 'drm-fixes-2019-03-22' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/dmem: empty chunk do not have a buffer object associated with them.
drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
drm/nouveau/dmem: Fix a NULL vs IS_ERR() check
drm/nouveau/dmem: remove set but not used variable 'drm'
drm/exynos/mixer: fix MIXER shadow registry synchronisation code
drm/vmwgfx: Don't double-free the mode stored in par->set_mode
drm/vmwgfx: Return 0 when gmrid::get_node runs out of ID's
drm/amdgpu: fix invalid use of change_bit
drm/amdgpu: revert "cleanup setting bulk_movable"
drm/i915: Sanity check mmap length against object size
drm/i915: Fix off-by-one in reporting hanging process
drm/i915/bios: assume eDP is present on port A when there is no VBT
drm/udl: use drm_gem_object_put_unlocked.
Jakub Kicinski [Thu, 21 Mar 2019 21:34:36 +0000 (14:34 -0700)]
bpf: verifier: propagate liveness on all frames
Commit 7640ead93924 ("bpf: verifier: make sure callees don't prune
with caller differences") connected up parentage chains of all
frames of the stack. It didn't, however, ensure propagate_liveness()
propagates all liveness information along those chains.
This means pruning happening in the callee may generate explored
states with incomplete liveness for the chains in lower frames
of the stack.
The included selftest is similar to the prior one from commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with
caller differences"), where callee would prune regardless of the
difference in r8 state.
Now we also initialize r9 to 0 or 1 based on a result from get_random().
r9 is never read so the walk with r9 = 0 gets pruned (correctly) after
the walk with r9 = 1 completes.
The selftest is so arranged that the pruning will happen in the
callee. Since callee does not propagate read marks of r8, the
explored state at the pruning point prior to the callee will
now ignore r8.
Propagate liveness on all frames of the stack when pruning.
Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
Uwe Kleine-König [Thu, 10 Jan 2019 20:19:33 +0000 (21:19 +0100)]
ARM: imx_v6_v7_defconfig: continue compiling the pwm driver
After the pwm-imx driver was split into two drivers and the Kconfig symbol
changed accordingly, use the new name to continue being able to use the
PWM hardware.
Dave Airlie [Fri, 22 Mar 2019 01:52:40 +0000 (11:52 +1000)]
Merge tag 'exynos-drm-fixes-for-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
- Fix page fault issue at Mixer device
. This patch fixes the page fault issue by correcting sychronization
method for updating shadow registers for Mixer device.
Dave Airlie [Fri, 22 Mar 2019 00:41:51 +0000 (10:41 +1000)]
Merge tag 'drm-intel-fixes-2019-03-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
A protection on our mmap against attempts to map past the end of the object;
plus a fix off-by-one in our hang report and a protection;
and a fix for eDP panels on Gen9 platforms on VBT absence.
YueHaibing [Thu, 28 Feb 2019 12:24:59 +0000 (20:24 +0800)]
drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
pm_runtime_get_sync returns negative on failure.
Fixes: eaeb9010bb4b ("drm/nouveau/debugfs: Wake up GPU before doing any reclocking") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
YueHaibing [Thu, 21 Feb 2019 03:38:51 +0000 (03:38 +0000)]
drm/nouveau/dmem: remove set but not used variable 'drm'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/nouveau/nouveau_dmem.c: In function 'nouveau_dmem_free':
drivers/gpu/drm/nouveau/nouveau_dmem.c:103:22: warning:
variable 'drm' set but not used [-Wunused-but-set-variable]
struct nouveau_drm *drm;
^
Yunsheng Lin [Thu, 21 Mar 2019 03:28:43 +0000 (11:28 +0800)]
net: hns3: fix for not calculating tx bd num correctly
When there is only one byte in a frag, the current calculation
using "(size + HNS3_MAX_BD_SIZE - 1) >> HNS3_MAX_BD_SIZE_OFFSET"
will return zero, because HNS3_MAX_BD_SIZE is 65535 and
HNS3_MAX_BD_SIZE_OFFSET is 16. So it will cause tx error when
a frag's size is one byte.
This patch fixes it by using DIV_ROUND_UP.
Fixes: 3fe13ed95dd3 ("net: hns3: avoid mult + div op in critical data path") Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Herbert Xu [Thu, 21 Mar 2019 01:39:52 +0000 (09:39 +0800)]
rhashtable: Still do rehash when we get EEXIST
As it stands if a shrink is delayed because of an outstanding
rehash, we will go into a rescheduling loop without ever doing
the rehash.
This patch fixes this by still carrying out the rehash and then
rescheduling so that we can shrink after the completion of the
rehash should it still be necessary.
The return value of EEXIST captures this case and other cases
(e.g., another thread expanded/rehashed the table at the same
time) where we should still proceed with the rehash.
Wang Hai [Wed, 20 Mar 2019 18:25:05 +0000 (14:25 -0400)]
net-sysfs: Fix memory leak in netdev_register_kobject
When registering struct net_device, it will call
register_netdevice ->
netdev_register_kobject ->
device_initialize(dev);
dev_set_name(dev, "%s", ndev->name)
device_add(dev)
register_queue_kobjects(ndev)
In netdev_register_kobject(), if device_add(dev) or
register_queue_kobjects(ndev) failed. Register_netdevice()
will return error, causing netdev_freemem(ndev) to be
called to free net_device, however put_device(&dev->dev)->..->
kobject_cleanup() won't be called, resulting in a memory leak.
====================
net/sched: validate the control action with all the other parameters
currently, the kernel checks for bad values of the control action in
tcf_action_init_1(), after a successful call to the action's init()
function. When the control action is 'goto chain', this causes two
undesired behaviors:
1. "misconfigured action after replace that causes kernel crash":
if users replace a valid TC action with another one having invalid
control action, all the new configuration data (including the bad
control action) are applied successfully, even if the kernel returned
an error. As a consequence, it's possible to trigger a NULL pointer
dereference in the traffic path of every TC action (1), replacing the
control action with 'goto chain x', when chain <x> doesn't exist.
2. "refcount leak that makes kmemleak complain"
when a valid 'goto chain' action is overwritten with another action,
the kernel forgets to decrease refcounts in the chain.
The above problems can be fixed if we validate the control action in each
action's init() function, the same way as we are already doing for all the
other configuration parameters.
Now that chains can be released after an action is replaced, we need to
care about concurrent access of 'goto_chain' pointer: ensure we access it
through RCU, like we did with most action-specific configuration parameters.
- Patch 1 removes the wrong checks and provides functions that can be
used to properly validate control actions in individual actions
- Patch 2 to 16 fix individual actions, and add TDC selftest code to
verify the correct behavior (2)
- Patch 17 and 18 fix concurrent access issues on 'goto_chain', that can be
observed after the chain refcount leak is fixed.
Changes since v1:
- reword the cover letter
- condense the extack message in case tc_action_check_ctrlact() is called
with invalid parameters.
- add tcf_action_set_ctrlact() to avoid code duplication an make the
RCU-ification of 'goto_chain' easier.
- fix errors in act_ife, act_simple, act_skbedit, and avoid useless 'goto
end' in act_connmark, thanks a lot to Vlad Buslov.
- avoid dereferencing 'goto_chain' in tcf_gact_goto_chain_index(), so
we don't have to care about the grace period there.
- let actions respect the grace period when they release chains, thanks
to Cong Wang and Vlad Buslov.
Changes since RFC:
- include a fix for all TC actions
- add a selftest for each TC action
- squash fix for refcount leaks into a single patch, the first in the
series, thanks to Cong Wang
- ensure that chain refcount is released without tcfa_lock held, thanks
to Vlad Buslov
Notes:
(1) act_ipt didn't need any fix, as the control action is constantly equal
to TC_ACT_OK.
(2) the selftest for act_simple fails because userspace tc backend for
'simple' does not parse the control action correctly (and hardcodes it
to TC_ACT_PIPE).
====================
Davide Caratti [Wed, 20 Mar 2019 14:00:16 +0000 (15:00 +0100)]
net/sched: let actions use RCU to access 'goto_chain'
use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:
# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto udp action pass index 4
# tc filter add dev dd0 ingress protocol ip flower \
> ip_proto udp action csum udp goto chain 42 index 66
# tc chain del dev dd0 chain 42 ingress
(start UDP traffic towards dd0)
# tc action replace action csum udp pass index 66
Davide Caratti [Wed, 20 Mar 2019 14:00:15 +0000 (15:00 +0100)]
net/sched: don't dereference a->goto_chain to read the chain index
callers of tcf_gact_goto_chain_index() can potentially read an old value
of the chain index, or even dereference a NULL 'goto_chain' pointer,
because 'goto_chain' and 'tcfa_action' are read in the traffic path
without caring of concurrent write in the control path. The most recent
value of chain index can be read also from a->tcfa_action (it's encoded
there together with TC_ACT_GOTO_CHAIN bits), so we don't really need to
dereference 'goto_chain': just read the chain id from the control action.
Fixes: e457d86ada27 ("net: sched: add couple of goto_chain helpers") Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Davide Caratti [Wed, 20 Mar 2019 14:00:12 +0000 (15:00 +0100)]
net/sched: act_skbmod: validate the control action inside init()
the following script:
# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90
# tc actions replace action skbmod \
> set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action skbmod
had the following output:
src MAC address <00:c1:a0:c1:a0:00>
src MAC address <00:c1:a0:c1:a0:00>
Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1
action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00
index 90 ref 2 bind 1
cookie c1a0c1a0
Then, the first packet transmitted by crash0 made the kernel crash:
Davide Caratti [Wed, 20 Mar 2019 14:00:07 +0000 (15:00 +0100)]
net/sched: act_pedit: validate the control action inside init()
the following script:
# tc filter add dev crash0 egress matchall \
> action pedit ex munge ip ttl set 10 pass index 90
# tc actions replace action pedit \
> ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action pedit
had the following output:
Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1
action order 0: pedit action goto chain 42 keys 1
index 90 ref 2 bind 1
key #0 at ipv4+8: val 0a000000 mask 00ffffff
cookie c1a0c1a0
Then, the first packet transmitted by crash0 made the kernel crash: