* pm-tools:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Mike Snitzer [Wed, 3 Apr 2019 16:23:11 +0000 (12:23 -0400)]
dm: disable DISCARD if the underlying storage no longer supports it
Storage devices which report supporting discard commands like
WRITE_SAME_16 with unmap, but reject discard commands sent to the
storage device. This is a clear storage firmware bug but it doesn't
change the fact that should a program cause discards to be sent to a
multipath device layered on this buggy storage, all paths can end up
failed at the same time from the discards, causing possible I/O loss.
The first discard to a path will fail with Illegal Request, Invalid
field in cdb, e.g.:
kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
device disables its support for discard on this path, and because of the
BLK_STS_TARGET error multipath fails the discard without failing any
path or retrying down a different path. But subsequent discards can
cause path failures. Any discards sent to the path which already failed
a discard ends up failing with EIO from blk_cloned_rq_check_limits with
an "over max size limit" error since the discard limit was set to 0 by
the sd driver for the path. As the error is EIO, this now fails the
path and multipath tries to send the discard down the next path. This
cycle continues as discards are sent until all paths fail.
Fix this by training DM core to disable DISCARD if the underlying
storage already did so.
Also, fix branching in dm_done() and clone_endio() to reflect the
mutually exclussive nature of the IO operations in question.
David S. Miller [Thu, 4 Apr 2019 17:55:59 +0000 (10:55 -0700)]
Merge branch 'sch_cake-fixes'
Toke Høiland-Jørgensen says:
====================
sched: A few small fixes for sch_cake
Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
diffserv fields. This series fixes those two issues, and should probably go to
in 4.19 as well. However, the previous refactoring patch means they don't apply
as-is; I can send a follow-up directly to stable if that's OK with you?
====================
sch_cake: Make sure we can write the IP header before changing DSCP bits
There is not actually any guarantee that the IP headers are valid before we
access the DSCP bits of the packets. Fix this using the same approach taken
in sch_dsmark.
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
We shouldn't be using skb->protocol directly as that will miss cases with
hardware-accelerated VLAN tags. Use the helper instead to get the right
protocol number.
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.
The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.
We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.
net/sched: act_sample: fix divide by zero in the traffic path
the control path of 'sample' action does not validate the value of 'rate'
provided by the user, but then it uses it as divisor in the traffic path.
Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
message in case that value is zero, to fix a splat with the script below:
# tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
# tc -s a s action sample
total acts 1
action order 0: sample rate 1/0 group 1 pipe
index 2 ref 1 bind 1 installed 19 sec used 19 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
# ping 192.0.2.1 -I test0 -c1 -q
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
When a bpf program is uploaded, the driver computes the number of
xdp tx queues resulting in the allocation of additional qsets.
Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global
nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
the driver runs link state polling for each VF resulting in the
following NULL pointer dereference:
Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop
Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them") Fixes: 2c632ad8bc74 ("net: thunderx: move link state polling function to VF") Reported-by: Matteo Croce <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Tested-by: Matteo Croce <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yonglong Liu [Thu, 4 Apr 2019 08:46:47 +0000 (16:46 +0800)]
net: hns: Fix sparse: some warnings in HNS drivers
There are some sparse warnings in the HNS drivers:
warning: incorrect type in assignment (different address spaces)
expected void [noderef] <asn:2> *io_base
got void *vaddr
warning: cast removes address space '<asn:2>' of expression
[...]
Add __iomem and change all the u8 __iomem to void __iomem to
fix these kind of warnings.
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:2> *base
got unsigned char [usertype] *base_addr
warning: cast to restricted __le16
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype] tbl_tcam_data_high
got restricted __le32 [usertype]
warning: cast to restricted __le32
[...]
These variables used u32/u16 as their type, and finally as a
parameter of writel(), writel() will do the cpu_to_le32 coversion
so remove the little endian covert code to fix these kind of warnings.
The tx ring buffer map when xmit and unmap when xmit done. So in
hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring()
have a unmap operation for tx ring buffer, which is already unmapped
when xmit done, than cause this WARNING.
The hnae_alloc_buffers() is called in hnae_init_ring(),
so the hnae_free_buffers() should be in hnae_fini_ring(), not in
hnae_free_desc().
In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring().
When the ring buffer is tx ring, adds a piece of code to ensure that
the tx ring is unmap.
Yonglong Liu [Thu, 4 Apr 2019 08:46:45 +0000 (16:46 +0800)]
net: hns: fix ICMP6 neighbor solicitation messages discard problem
ICMP6 neighbor solicitation messages will be discard by the Hip06
chips, because of not setting forwarding pool. Enable promisc mode
has the same problem.
This patch fix the wrong forwarding table configs for the multicast
vague matching when enable promisc mode, and add forwarding pool
for the forwarding table.
Every time the call trace is not the same, but the overwrite address
is always the same:
Unable to handle kernel paging request at virtual address 0000000200000040
The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG,
didn't use the io_base offset.
Yonglong Liu [Thu, 4 Apr 2019 08:46:43 +0000 (16:46 +0800)]
net: hns: Use NAPI_POLL_WEIGHT for hns driver
When the HNS driver loaded, always have an error print:
"netif_napi_add() called with weight 256"
This is because the kernel checks the NAPI polling weights
requested by drivers and it prints an error message if a driver
requests a weight bigger than 64.
Liubin Shu [Thu, 4 Apr 2019 08:46:42 +0000 (16:46 +0800)]
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
This patch is trying to fix the issue due to:
[27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv]
After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be
interrupted by interruptions, and than call hns_nic_tx_poll_one()
to handle the new packets, and free the skb. So, when turn back to
hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free.
This patch update tx ring statistics in hns_nic_tx_poll_one() to
fix the bug.
Wei Li [Mon, 1 Apr 2019 03:55:57 +0000 (11:55 +0800)]
arm64: fix wrong check of on_sdei_stack in nmi context
When doing unwind_frame() in the context of pseudo nmi (need enable
CONFIG_ARM64_PSEUDO_NMI), reaching the bottom of the stack (fp == 0,
pc != 0), function on_sdei_stack() will return true while the sdei acpi
table is not inited in fact. This will cause a "NULL pointer dereference"
oops when going on.
blk-mq: do not reset plug->rq_count before the list is sorted
We would never be able to sort the list if we first reset plug->rq_count
which is used in conditional check later.
Fixes: ce5b009cff19 ("block: improve logic around when to sort a plug list") Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Dmitry V. Levin [Fri, 29 Mar 2019 17:12:30 +0000 (20:12 +0300)]
csky: Fix syscall_get_arguments() and syscall_set_arguments()
C-SKY syscall arguments are located in orig_a0,a1,a2,a3,regs[0],regs[1]
fields of struct pt_regs.
Due to an off-by-one bug and a bug in pointer arithmetic
syscall_get_arguments() was reading orig_a0,regs[1..5] fields instead.
Likewise, syscall_set_arguments() was writing orig_a0,regs[1..5] fields
instead.
Dmitry V. Levin [Fri, 29 Mar 2019 17:12:21 +0000 (20:12 +0300)]
riscv: Fix syscall_get_arguments() and syscall_set_arguments()
RISC-V syscall arguments are located in orig_a0,a1..a5 fields
of struct pt_regs.
Due to an off-by-one bug and a bug in pointer arithmetic
syscall_get_arguments() was reading s3..s7 fields instead of a1..a5.
Likewise, syscall_set_arguments() was writing s3..s7 fields
instead of a1..a5.
tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
The only users that calls syscall_get_arguments() with a variable and not a
hard coded '6' is ftrace_syscall_enter(). syscall_get_arguments() can be
optimized by removing a variable input, and always grabbing 6 arguments
regardless of what the system call actually uses.
Change ftrace_syscall_enter() to pass the 6 args into a local stack array
and copy the necessary arguments into the trace event as needed.
This is needed to remove two parameters from syscall_get_arguments().
ptrace: Remove maxargs from task_current_syscall()
task_current_syscall() has a single user that passes in 6 for maxargs, which
is the maximum arguments that can be used to get system calls from
syscall_get_arguments(). Instead of passing in a number of arguments to
grab, just get 6 arguments. The args argument even specifies that it's an
array of 6 items.
This will also allow changing syscall_get_arguments() to not get a variable
number of arguments, but always grab 6.
Linus also suggested not passing in a bunch of arguments to
task_current_syscall() but to instead pass in a pointer to a structure, and
just fill the structure. struct seccomp_data has almost all the parameters
that is needed except for the stack pointer (sp). As seccomp_data is part of
uapi, and I'm afraid to change it, a new structure was created
"syscall_info", which includes seccomp_data and adds the "sp" field.
Qian Cai [Thu, 4 Apr 2019 10:54:41 +0000 (11:54 +0100)]
mm/compaction.c: abort search if isolation fails
Running LTP oom01 in a tight loop or memory stress testing put the system
in a low-memory situation could triggers random memory corruption like
page flag corruption below due to in fast_isolate_freepages(), if
isolation fails, next_search_order() does not abort the search immediately
could lead to improper accesses.
He bisected it down to e332f741a8dd ("mm, compaction: be selective about
what pageblocks to clear skip hints"). The problem is that the patch in
question was sloppy with respect to the handling of zone boundaries. In
some instances, it was possible for PFNs outside of a zone to be examined
and if those were not properly initialised or poisoned then it would
trigger the VM_BUG_ON. This patch corrects the zone boundary issues when
resetting pageblock skip hints and Mikhail reported that the bug did not
trigger after 30 hours of testing.
Merge tag '5.1-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fixes from Steve French:
"Four smb3 fixes for stable:
- fix an open path where we had an unitialized structure
- fix for snapshot (previous version) enumeration
- allow reconnect timeout on handles to be configurable to better
handle network or server crash
- correctly handle lack of file_all_info structure"
* tag '5.1-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: a smb2_validate_and_copy_iov failure does not mean the handle is invalid.
SMB3: Allow persistent handle timeout to be configurable on mount
smb3: Fix enumerating snapshots to Azure
cifs: fix kref underflow in close_shroot()
Junwei Hu [Tue, 2 Apr 2019 11:38:04 +0000 (19:38 +0800)]
ipv6: Fix dangling pointer when ipv6 fragment
At the beginning of ip6_fragment func, the prevhdr pointer is
obtained in the ip6_find_1stfragopt func.
However, all the pointers pointing into skb header may change
when calling skb_checksum_help func with
skb->ip_summed = CHECKSUM_PARTIAL condition.
The prevhdr pointe will be dangling if it is not reloaded after
calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset,
which does not changes even after calling __skb_linearize func.
net-gro: Fix GRO flush when receiving a GSO packet.
Currently we may merge incorrectly a received GSO packet
or a packet with frag_list into a packet sitting in the
gro_hash list. skb_segment() may crash case because
the assumptions on the skb layout are not met.
The correct behaviour would be to flush the packet in the
gro_hash list and send the received GSO packet directly
afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders")
sets NAPI_GRO_CB(skb)->flush in this case, but this is not
checked before merging. This patch makes sure to check this
flag and to not merge in that case.
Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
James Smart [Wed, 3 Apr 2019 18:10:34 +0000 (11:10 -0700)]
scsi: lpfc: Fix missing wakeups on abort threads
Abort thread wakeups, on some wqe types, are not happening. The thread
wakeup logic is dependent upon the LPFC_DRIVER_ABORTED flag. However, on
these wqes, the completion handler running prior to the io completion
routine ends up clearing the flag.
Rework the wakeup logic to look at a non-null waitq element which must be
set if the abort thread is waiting. This is reverting the change in the
indicated patch.
Fixes: c2017260eea2d ("scsi: lpfc: Rework locking on SCSI io completion") Signed-off-by: Dick Kennedy <[email protected]> Signed-off-by: James Smart <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Michael Kelley [Mon, 1 Apr 2019 21:42:06 +0000 (21:42 +0000)]
scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
Reduce the default VMbus channel ring buffer size for storvsc SCSI devices
from 1 Mbyte to 128 Kbytes. Measurements show that ring buffer sizes above
128 Kbytes do not increase performance even at very high IOPS rates, so
don't waste the memory. Also remove the dependence on PAGE_SIZE, since the
ring buffer size should not change on architectures where PAGE_SIZE is not
4 Kbytes.
Michael Kelley [Mon, 1 Apr 2019 16:10:52 +0000 (16:10 +0000)]
scsi: storvsc: Fix calculation of sub-channel count
When the number of sub-channels offered by Hyper-V is >= the number of CPUs
in the VM, calculate the correct number of sub-channels. The current code
produces one too many.
This scenario arises only when the number of CPUs is artificially
restricted (for example, with maxcpus=<n> on the kernel boot line), because
Hyper-V normally offers a sub-channel count < number of CPUs. While the
current code doesn't break, the extra sub-channel is unbalanced across the
CPUs (for example, a total of 5 channels on a VM with 4 CPUs).
Chris Wilson [Tue, 5 Feb 2019 20:30:33 +0000 (20:30 +0000)]
drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
drivers/gpu/drm/i915/gvt/display.c:457: warning: Function parameter or member 'connected' not described in 'intel_vgpu_emulate_hotplug'
drivers/gpu/drm/i915/gvt/display.c:457: warning: Excess function parameter 'conncted' description in 'intel_vgpu_emulate_hotplug'
Alex Williamson [Wed, 3 Apr 2019 18:36:21 +0000 (12:36 -0600)]
vfio/type1: Limit DMA mappings per container
Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.
To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).
Louis Taylor [Wed, 3 Apr 2019 18:36:20 +0000 (12:36 -0600)]
vfio/pci: use correct format characters
When compiling with -Wformat, clang emits the following warnings:
drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~
drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
vendor, device, subvendor, subdevice,
^~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for unsigned ints.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- build dependency fix for hid-asus from Arnd Bergmann
- addition of omitted mapping of _ASSISTANT key from Dmitry Torokhov
- race condition fix in hid-debug inftastructure from He, Bo
- fixed support for devices with big maximum report size from Kai-Heng
Feng
- deadlock fix in hid-steam from Rodrigo Rivas Costa
- quite a few device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: input: add mapping for Assistant key
HID: i2c-hid: Disable runtime PM on Synaptics touchpad
HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
HID: logitech: Handle 0 scroll events for the m560
HID: debug: fix race condition with between rdesc_show() and device removal
HID: logitech: check the return value of create_singlethread_workqueue
HID: Increase maximum report size allowed by hid_field_extract()
HID: steam: fix deadlock with input devices.
HID: uclogic: remove redudant duplicated null check on ver_ptr
HID: quirks: Drop misused kernel-doc annotation
HID: hid-asus: select CONFIG_POWER_SUPPLY
HID: quirks: use correct format chars in dbg_hid
If alloc_disk fails in pf_init_units, pf->disk will be
NULL, however in pf_detect and pf_exit, it's not check
this before free.It may result a NULL pointer dereference.
Also when register_blkdev failed, blk_cleanup_queue() and
blk_mq_free_tag_set() should be called to free resources.
io_uring: fix double free in case of fileset regitration failure
Will Deacon reported the following KASAN complaint:
[ 149.890370] ==================================================================
[ 149.891266] BUG: KASAN: double-free or invalid-free in io_sqe_files_unregister+0xa8/0x140
[ 149.892218]
[ 149.892411] CPU: 113 PID: 3974 Comm: io_uring_regist Tainted: G B 5.1.0-rc3-00012-g40b114779944 #3
[ 149.893623] Hardware name: linux,dummy-virt (DT)
[ 149.894169] Call trace:
[ 149.894539] dump_backtrace+0x0/0x228
[ 149.895172] show_stack+0x14/0x20
[ 149.895747] dump_stack+0xe8/0x124
[ 149.896335] print_address_description+0x60/0x258
[ 149.897148] kasan_report_invalid_free+0x78/0xb8
[ 149.897936] __kasan_slab_free+0x1fc/0x228
[ 149.898641] kasan_slab_free+0x10/0x18
[ 149.899283] kfree+0x70/0x1f8
[ 149.899798] io_sqe_files_unregister+0xa8/0x140
[ 149.900574] io_ring_ctx_wait_and_kill+0x190/0x3c0
[ 149.901402] io_uring_release+0x2c/0x48
[ 149.902068] __fput+0x18c/0x510
[ 149.902612] ____fput+0xc/0x18
[ 149.903146] task_work_run+0xf0/0x148
[ 149.903778] do_notify_resume+0x554/0x748
[ 149.904467] work_pending+0x8/0x10
[ 149.905060]
[ 149.905331] Allocated by task 3974:
[ 149.905934] __kasan_kmalloc.isra.0.part.1+0x48/0xf8
[ 149.906786] __kasan_kmalloc.isra.0+0xb8/0xd8
[ 149.907531] kasan_kmalloc+0xc/0x18
[ 149.908134] __kmalloc+0x168/0x248
[ 149.908724] __arm64_sys_io_uring_register+0x2b8/0x15a8
[ 149.909622] el0_svc_common+0x100/0x258
[ 149.910281] el0_svc_handler+0x48/0xc0
[ 149.910928] el0_svc+0x8/0xc
[ 149.911425]
[ 149.911696] Freed by task 3974:
[ 149.912242] __kasan_slab_free+0x114/0x228
[ 149.912955] kasan_slab_free+0x10/0x18
[ 149.913602] kfree+0x70/0x1f8
[ 149.914118] __arm64_sys_io_uring_register+0xc2c/0x15a8
[ 149.915009] el0_svc_common+0x100/0x258
[ 149.915670] el0_svc_handler+0x48/0xc0
[ 149.916317] el0_svc+0x8/0xc
[ 149.916817]
[ 149.917101] The buggy address belongs to the object at ffff8004ce07ed00
[ 149.917101] which belongs to the cache kmalloc-128 of size 128
[ 149.919197] The buggy address is located 0 bytes inside of
[ 149.919197] 128-byte region [ffff8004ce07ed00, ffff8004ce07ed80)
[ 149.921142] The buggy address belongs to the page:
[ 149.921953] page:ffff7e0013381f00 count:1 mapcount:0 mapping:ffff800503417c00 index:0x0 compound_mapcount: 0
[ 149.923595] flags: 0x1ffff00000010200(slab|head)
[ 149.924388] raw: 1ffff00000010200dead000000000100dead000000000200ffff800503417c00
[ 149.925706] raw: 0000000000000000000000008040004000000001ffffffff0000000000000000
[ 149.927011] page dumped because: kasan: bad access detected
[ 149.927956]
[ 149.928224] Memory state around the buggy address:
[ 149.929054] ffff8004ce07ec00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 149.930274] ffff8004ce07ec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.931494] >ffff8004ce07ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 149.932712] ^
[ 149.933281] ffff8004ce07ed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.934508] ffff8004ce07ee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.935725] ==================================================================
which is due to a failure in registrering a fileset. This frees the
ctx->user_files pointer, but doesn't clear it. When the io_uring
instance is later freed through the normal channels, we free this
pointer again. At this point it's invalid.
Ensure we clear the pointer when we free it for the error case.
Daniel Borkmann [Wed, 3 Apr 2019 14:49:49 +0000 (16:49 +0200)]
Merge branch 'bpf-flow-dissector-fixes'
Stanislav Fomichev says:
====================
This patch series fixes the existing BPF flow dissector API to
support calling BPF progs from the eth_get_headlen context (the
support itself will be added in bpf-next tree).
The summary of the changes:
* fix VLAN handling in bpf_flow.c, we don't need to peek back and look
at skb->vlan_present; add selftests
* pass and use flow_keys->n_proto instead of skb->protocol
* fix clamping of flow_keys->nhoff for packets with nhoff > 0
* prohibit access to most of the __sk_buff fields from BPF flow
dissector progs; only data/data_end/flow_keys are allowed (all input
is now passed via flow_keys)
* finally, document BPF flow dissector program environment
====================
flow_dissector: allow access only to a subset of __sk_buff fields
Use whitelist instead of a blacklist and allow only a small set of
fields that might be relevant in the context of flow dissector:
* data
* data_end
* flow_keys
This is required for the eth_get_headlen case where we have only a
chunk of data to dissect (i.e. trying to read the other skb fields
doesn't make sense).
Note, that it is a breaking API change! However, we've provided
flow_keys->n_proto as a substitute for skb->protocol; and there is
no need to manually handle skb->vlan_present. So even if we
break somebody, the migration is trivial. Unfortunately, we can't
support eth_get_headlen use-case without those breaking changes.
flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff
Don't allow BPF program to set flow_keys->nhoff to less than initial
value. We currently don't read the value afterwards in anything but
the tests, but it's still a good practice to return consistent
values to the test programs.
selftests/bpf: fix vlan handling in flow dissector program
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.
Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.
Add simple test cases with VLAN tagged frames:
* single vlan for ipv4
* double vlan for ipv6
Maxime Ripard [Fri, 22 Mar 2019 09:16:50 +0000 (10:16 +0100)]
mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
Since this driver only has a dependency on ARCH_SUNXI just because it
doesn't make any sense to run it on something else, we can definitely
enable it through COMPILE_TEST as well to get some build coverage.
Merge tag 'pidfd-fixes-v5.1-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
Pull pidfd fix from Christian Brauner:
"This should be an uncontroversial fix for pidfd_send_signal() by Jann
to better align it's behavior with other signal sending functions:
In one of the early versions of the patchset it was suggested to not
unconditionally error out when a signal with SI_USER is sent to a
non-current task (cf. [1]).
Instead, pidfd_send_signal() currently silently changes this to a
regular kill signal. While this is technically fine, the semantics are
weird since the kernel just silently converts a user's request behind
their back and also no other signal sending function allows to do
this. It gets more hairy when we introduce sending signals to a
specific thread soon.
So let's align pidfd_send_signal() with all the other signal sending
functions and error out when SI_USER signals are sent to a non-current
task"
* tag 'pidfd-fixes-v5.1-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
signal: don't silently convert SI_USER signals to non-current pidfd
Le Ma [Mon, 1 Apr 2019 10:08:30 +0000 (18:08 +0800)]
drm/amdgpu: remove unnecessary rlc reset function on gfx9
The rlc reset function is not necessary during gfx9 initialization/resume phase.
And this function would even cause rlc fw loading failed on some gfx9 ASIC.
Remove this function safely with verification well on Vega/Raven platform.
David S. Miller [Tue, 2 Apr 2019 20:27:11 +0000 (13:27 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-04-01
This series contains two fixes for XDP in the i40e driver.
Björn provides both fixes, first moving a function out of the header and
into the main.c file. Second fixes a regression introduced in an
earlier patch that removed umem from the VSI. This caused an issue
because the setup code would try to enable AF_XDP zero copy
unconditionally, as long as there was a umem placed in the netdev
receive structure.
====================
The device type for ip6 tunnels is set to
ARPHRD_TUNNEL6. However, the ip4ip6_err function
is expecting the device type of the tunnel to be
ARPHRD_TUNNEL. Since the device types do not
match, the function exits and the ICMP error
packet is not sent to the originating host. Note
that the device type for IPv4 tunnels is set to
ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of
ARPHRD_TUNNEL6 instead. Now the tunnel device
type matches and the ICMP error packet is sent
to the originating host.
Yufen Yu [Tue, 26 Mar 2019 13:19:25 +0000 (21:19 +0800)]
blk-mq: add trace block plug and unplug for multiple queues
For now, we just trace plug for single queue device or drivers
provide .commit_rqs, and have not trace plug for multiple queues
device. But, unplug events will be recorded when call
blk_mq_flush_plug_list(). Then, trace events will be asymmetrical,
just have unplug and without plug.
This patch add trace plug and unplug for multiple queues device in
blk_mq_make_request(). After that, we can accurately trace plug and
unplug for multiple queues.
Andreas Kemnade [Sat, 23 Feb 2019 11:47:54 +0000 (12:47 +0100)]
mfd: twl-core: Disable IRQ while suspended
Since commit 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend")
on gta04 we have handle_twl4030_pih() called in situations where pm_runtime_get()
in i2c-omap.c returns -EACCES.
This happens when we wakeup via something behing twl4030 (powerbutton or rtc
alarm). This goes on for minutes until the system is finally resumed.
Disable the irq on suspend and enable it on resume to avoid
having i2c access problems when the irq registers are checked.
Fixes: 6e2bd956936 ("i2c: omap: Use noirq system sleep pm ops to idle device for suspend") Signed-off-by: Andreas Kemnade <[email protected]> Tested-by: Tony Lindgren <[email protected]> Signed-off-by: Lee Jones <[email protected]>
Xin Long [Sun, 31 Mar 2019 08:58:15 +0000 (16:58 +0800)]
sctp: initialize _pad of sockaddr_in before copying to user memory
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
Call Trace:
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:174 [inline]
sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
...
Uninit was stored to memory at:
sctp_transport_init net/sctp/transport.c:61 [inline]
sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
...
Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
struct sockaddr_in) wasn't initialized, but directly copied to user memory
in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
sctp_v6_addr_to_user() does.
====================
nfp: flower: fix matching and pushing vlan CFI bit
This patch clears up some confusion around the meaning of bit 12
for FW messages related to VLAN and flower offload.
Pieter says:
It fixes issues with matching, pushing and popping vlan tags.
We replace the vlan CFI bit with a vlan present bit that
indicates the presence of a vlan tag. We also no longer set
the CFI when pushing vlan tags.
====================
nfp: flower: remove vlan CFI bit from push vlan action
We no longer set CFI when pushing vlan tags, therefore we remove
the CFI bit from push vlan.
Fixes: 1a1e586f54bf ("nfp: add basic action capabilities to flower offloads") Signed-off-by: Pieter Jansen van Vuuren <[email protected]> Signed-off-by: Louis Peens <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Replace vlan CFI bit with a vlan present bit that indicates the
presence of a vlan tag. Previously the driver incorrectly assumed
that an vlan id of 0 is not matchable, therefore we indicate vlan
presence with a vlan present bit.
Jiri Slaby [Fri, 29 Mar 2019 11:19:46 +0000 (12:19 +0100)]
kcm: switch order of device registration to fix a crash
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
Oops: 0002 [#1] SMP KASAN PTI
CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
...
CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
Call Trace:
kcm_create+0x600/0xbf0 [kcm]
__sock_create+0x324/0x750 net/socket.c:1272
...
This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek <[email protected]> Signed-off-by: Jiri Slaby <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
net: sched: fix stats accounting for child NOLOCK qdiscs
Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock)
qdiscs is buggy. Per CPU values are ignored in most places, as a result,
stats dump in the above scenario always report 0 length backlog and parent
backlog len is not updated correctly on NOLOCK qdisc removal.
The first patch address stats dumping, and the second one child qdisc removal.
I'm targeting the net tree as this is a bugfix, but it could be moved to
net-next due to the relatively large diffstat.
====================
Paolo Abeni [Thu, 28 Mar 2019 15:53:13 +0000 (16:53 +0100)]
net: sched: introduce and use qdisc tree flush/purge helpers
The same code to flush qdisc tree and purge the qdisc queue
is duplicated in many places and in most cases it does not
respect NOLOCK qdisc: the global backlog len is used and the
per CPU values are ignored.
This change addresses the above, factoring-out the relevant
code and using the helpers introduced by the previous patch
to fetch the correct backlog len.
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Thu, 28 Mar 2019 15:53:12 +0000 (16:53 +0100)]
net: sched: introduce and use qstats read helpers
Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.
Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.
This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.
DRR needs also to access the child qdisc queue length, so it
needs custom handling.
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Nicolas Dichtel [Thu, 28 Mar 2019 09:35:06 +0000 (10:35 +0100)]
net/sched: fix ->get helper of the matchall cls
It returned always NULL, thus it was never possible to get the filter.
Example:
$ ip link add foo type dummy
$ ip link add bar type dummy
$ tc qdisc add dev foo clsact
$ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
matchall action mirred ingress mirror dev bar
Before the patch:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
Error: Specified filter handle not found.
We have an error talking to the kernel
After:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
not_in_hw
action order 1: mirred (Ingress Mirror to device bar) pipe
index 1 ref 1 bind 1
Jann Horn [Sat, 30 Mar 2019 02:12:32 +0000 (03:12 +0100)]
signal: don't silently convert SI_USER signals to non-current pidfd
The current sys_pidfd_send_signal() silently turns signals with explicit
SI_USER context that are sent to non-current tasks into signals with
kernel-generated siginfo.
This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case.
If a user actually wants to send a signal with kernel-provided siginfo,
they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing
this case is unnecessary.
Instead of silently replacing the siginfo, just bail out with an error;
this is consistent with other interfaces and avoids special-casing behavior
based on security checks.
Ilya Dryomov [Tue, 26 Mar 2019 19:20:58 +0000 (20:20 +0100)]
dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
Some devices don't use blk_integrity but still want stable pages
because they do their own checksumming. Examples include rbd and iSCSI
when data digests are negotiated. Stacking DM (and thus LVM) on top of
these devices results in sporadic checksum errors.
Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
Mikulas Patocka [Thu, 21 Mar 2019 20:46:12 +0000 (16:46 -0400)]
dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
The limit was already incorporated to dm-crypt with commit 4e870e948fba
("dm crypt: fix error with too large bios"), so we don't need to apply
it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
wrong anyway because the variable ti->max_io_len it is supposed to be in
the units of 512-byte sectors not in bytes.
Reduction of the limit to 1048576 sectors could even cause data
corruption in rare cases - suppose that we have a dm-striped device with
stripe size 768MiB. The target will call dm_set_target_max_io_len with
the value 1572864. The buggy code would reduce it to 1048576. Now, the
dm-core will errorneously split the bios on 1048576-sector boundary
insetad of 1572864-sector boundary and pass these stripe-crossing bios
to the striped target.
Andi Kleen [Thu, 21 Mar 2019 22:00:09 +0000 (15:00 -0700)]
dm init: fix const confusion for dm_allowed_targets array
A non const pointer to const cannot be marked initconst.
Mark the array actually const.
Fixes: 6bbc923dfcf5 dm: add support to directly boot to a mapped device Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
YueHaibing [Fri, 22 Mar 2019 14:16:34 +0000 (22:16 +0800)]
dm integrity: make dm_integrity_init and dm_integrity_exit static
Fix sparse warnings:
drivers/md/dm-integrity.c:3619:12: warning:
symbol 'dm_integrity_init' was not declared. Should it be static?
drivers/md/dm-integrity.c:3638:6: warning:
symbol 'dm_integrity_exit' was not declared. Should it be static?
Mikulas Patocka [Wed, 13 Mar 2019 11:56:02 +0000 (07:56 -0400)]
dm integrity: change memcmp to strncmp in dm_integrity_ctr
If the string opt_string is small, the function memcmp can access bytes
that are beyond the terminating nul character. In theory, it could cause
segfault, if opt_string were located just below some unmapped memory.
Change from memcmp to strncmp so that we don't read bytes beyond the end
of the string.
Steve French [Fri, 29 Mar 2019 21:31:07 +0000 (16:31 -0500)]
SMB3: Allow persistent handle timeout to be configurable on mount
Reconnecting after server or network failure can be improved
(to maintain availability and protect data integrity) by allowing
the client to choose the default persistent (or resilient)
handle timeout in some use cases. Today we default to 0 which lets
the server pick the default timeout (usually 120 seconds) but this
can be problematic for some workloads. Add the new mount parameter
to cifs.ko for SMB3 mounts "handletimeout" which enables the user
to override the default handle timeout for persistent (mount
option "persistenthandles") or resilient handles (mount option
"resilienthandles"). Maximum allowed is 16 minutes (960000 ms).
Units for the timeout are expressed in milliseconds. See
section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol
specification for more information.
Steve French [Fri, 29 Mar 2019 03:32:49 +0000 (22:32 -0500)]
smb3: Fix enumerating snapshots to Azure
Some servers (see MS-SMB2 protocol specification
section 3.3.5.15.1) expect that the FSCTL enumerate snapshots
is done twice, with the first query having EXACTLY the minimum
size response buffer requested (16 bytes) which refreshes
the snapshot list (otherwise that and subsequent queries get
an empty list returned). So had to add code to set
the maximum response size differently for the first snapshot
query (which gets the size needed for the second query which
contains the actual list of snapshots).
Ronnie Sahlberg [Thu, 28 Mar 2019 01:20:02 +0000 (11:20 +1000)]
cifs: fix kref underflow in close_shroot()
Fix a bug where we used to not initialize the cached fid structure at all
in open_shroot() if the open was successful but we did not get a lease.
This would leave the structure uninitialized and later when we close the handle
we would in close_shroot() try to kref_put() an uninitialized refcount.
Fix this by always initializing this structure if the open was successful
but only do the extra get() if we got a lease.
This extra get() is only used to hold the structure until we get a lease
break from the server at which point we will kref_put() it during lease
processing.
Björn Töpel [Tue, 12 Feb 2019 08:52:05 +0000 (09:52 +0100)]
i40e: add tracking of AF_XDP ZC state for each queue pair
In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was
introduced; When the VSI was reset, the setup code would try to enable
AF_XDP ZC unconditionally (as long as there was a umem placed in the
netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
XDP is enabled, the corresponding qid in the bitmap is set and the
umem is non-NULL.
vrf: check accept_source_route on the original netdevice
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.
v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).
Chris Wilson [Fri, 29 Mar 2019 16:51:52 +0000 (16:51 +0000)]
drm/i915: Always backoff after a drm_modeset_lock() deadlock
If drm_modeset_lock() reports a deadlock it sets the ctx->contexted
field and insists that the caller calls drm_modeset_backoff() or else it
generates a WARN on cleanup.
Dust Li [Mon, 1 Apr 2019 08:04:53 +0000 (16:04 +0800)]
tcp: fix a potential NULL pointer dereference in tcp_sk_exit
When tcp_sk_init() failed in inet_ctl_sock_create(),
'net->ipv4.tcp_congestion_control' will be left
uninitialized, but tcp_sk_exit() hasn't check for
that.
This patch add checking on 'net->ipv4.tcp_congestion_control'
in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Dust Li <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio race fixes and cleanups from Al Viro.
The aio code had more issues with error handling and races with the aio
completing at just the right (wrong) time along with freeing the file
descriptor when another thread closes the file.
Just a couple of these commits are the actual fixes: the others are
cleanups to either make the fixes simpler, or to make the code legible
and understandable enough that we hope there's no more fundamental races
hiding.
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: move sanity checks and request allocation to io_submit_one()
deal with get_reqs_available() in aio_get_req() itself
aio: move dropping ->ki_eventfd into iocb_destroy()
make aio_read()/aio_write() return int
Fix aio_poll() races
aio: store event at final iocb_put()
aio: keep io_event in aio_kiocb
aio: fold lookup_kiocb() into its sole caller
pin iocb through aio.
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull symlink fixes from Al Viro:
"The ceph fix is already in mainline, Daniel's bpf fix is in bpf tree
(1da6c4d9140c "bpf: fix use after free in bpf_evict_inode"), the rest
is in here"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
debugfs: fix use-after-free on symlink traversal
ubifs: fix use-after-free on symlink traversal
jffs2: fix use-after-free on symlink traversal
Hui Wang [Fri, 29 Mar 2019 06:13:23 +0000 (14:13 +0800)]
HID: i2c-hid: Disable runtime PM on Synaptics touchpad
We have a new Dell laptop which has the synaptics I2C touchpad
(06cb:7e7e) on it. After booting up the Linux, the touchpad doesn't
work, there is no interrupt when touching the touchpad, after
disable the runtime PM, everything works well.
I also tried the quirk of I2C_HID_QUIRK_DELAY_AFTER_SLEEP, it is
better after applied this quirk, there are interrupts but data it
reports is invalid.
Al Viro [Tue, 26 Mar 2019 01:43:37 +0000 (01:43 +0000)]
debugfs: fix use-after-free on symlink traversal
symlink body shouldn't be freed without an RCU delay. Switch debugfs to
->destroy_inode() and use of call_rcu(); free both the inode and symlink
body in the callback. Similar to solution for bpf, only here it's even
more obvious that ->evict_inode() can be dropped.
Al Viro [Tue, 26 Mar 2019 01:40:38 +0000 (01:40 +0000)]
ubifs: fix use-after-free on symlink traversal
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Al Viro [Tue, 26 Mar 2019 01:39:50 +0000 (01:39 +0000)]
jffs2: fix use-after-free on symlink traversal
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7c9 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
header function. It means even when the check added in that fix fails, it
won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
triggered again.
So this patch is to add the process for the err returned from cmd header
function in tipc_nl_compat_dumpit().
Xin Long [Sun, 31 Mar 2019 14:50:09 +0000 (22:50 +0800)]
tipc: check link name with right length in tipc_nl_compat_link_set
A similar issue as fixed by Patch "tipc: check bearer name with right
length in tipc_nl_compat_bearer_enable" was also found by syzbot in
tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
offsetof(struct tipc_link_config, name)'.
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length:
'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.