Git Repo - linux.git/log

IB/cma: Fix reference count leak when no ipv4 addresses are set

Once in_dev_get is called to receive in_device pointer, the
in_device reference counter is increased, but if there are
no ipv4 addresses configured on the net-device the ifa_list
will be null, resulting in a flow that doesn't call in_dev_put
to decrease the ref_cnt.
This was exposed when running RoCE over ipv6 without any ipv4
addresses configured

Fixes: commit 8e3867310c90 ("IB/cma: Fix a race condition in iboe_addr_get_sgid()")
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/iser: don't send an rkey if all data is written as immadiate-data

We might get some bogus error completions in case the target will
remotely invalidate the rkey and the HCA will need to retransmit
from this buffer.

Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

rxe: fix broken receive queue draining

If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <[email protected]>
Signed-off-by: Vijay Immanuel <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>--
Signed-off-by: Doug Ledford <[email protected]>

RDMA/qedr: Prevent memory overrun in verbs' user responses

Wrap ib_copy_to_udata with a function that ensures that the data
being copied over to user space isn't longer than the allowed.

Fixes: cecbcddf6461 ("qedr: Add support for QP verbs")
Fixes: a7efd7773e31 ("qedr: Add support for PD,PKEY and CQ verbs")
Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

iw_cxgb4: don't use WR keys/addrs for 0 byte reads

Only use the read sge lkey/addr and the remote rkey/addr if the
length of the read is not zero. Otherwise the read response might
be treated as the RTR read response and not delivered to the
application. Or worse Terminator hardware will fail a 0B read
if the STAG is 0 even if the read length is 0.

Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx4: Fix CM REQ retries in paravirt mode

CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

By checking if an id exists before creating one, the bug is fixed.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

Here is an excerpt from ibdump running without the patch:

3.285092       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382711       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382861       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject
7.387644       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject

and here is the same with bug fix applied:

3.251010       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.349387       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
8.258443       LID: 4 -> LID: 4       SDP 290 CM: ConnectReply(SDP Hello)
8.259890       LID: 4 -> LID: 4       InfiniBand 290 CM: ReadyToUse

Suggested-by: Venkat Venkatsubra <[email protected]>
Signed-off-by: Håkon Bugge <[email protected]>
Reported-by: Wei Lin Guay <[email protected]>
Tested-by: Wei Lin Guay <[email protected]>
Reviewed-by: Yuval Shaia <[email protected]>
Acked-by: Jack Morgenstein <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/rdmavt: Setting of QP timeout can overflow jiffies computation

Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause
overflow due to the fact that the input to the function usecs_to_jiffies
is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is
equal to or greater than 30. The consequence is unnecessarily excessive
retry and thus degradation of the system performance.

This patch fixes the problem by limiting the input to 5-bit and calling
usecs_to_jiffies() before multiplying the scaling factor.

Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/core: Fix sparse warnings

Delete unused variables to prevent sparse warnings.

Fixes: db1b5ddd5336 ("IB/core: Rename uverbs event file structure")
Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema")
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Fix the value reported for local ack delay

Local ack delay exposed by the driver is 0 which means infinite QP
timeout. Reporting the default value to 16 (approx 260ms)

Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq

While invoking the req_notify_cq hook, ULPs can request
whether the CQs have any CQEs pending. If CQEs are pending,
drivers can indicate it by returning 1 for req_notify_cq.
The stack will poll CQ again till CQ is empty.

This patch peeks the CQ for any valid entries and return accordingly.

Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Fix return value of poll routine

Fix the incorrect reporting of number of polled
entries by taking into account the max CQ depth
in the driver.

Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Enable atomics only if host bios supports

Driver shall check if the host system bios has enabled
Atomic operations capability in PCI Device Control 2
register of the pci-device. Expose the ATOMIC_HCA
flag only if the Atomic operations capability is set.

Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Specify RDMA component when allocating stats context

Starting FW version 20.6.47, firmware is keeping separate statistics
for L2 and RDMA. However, driver needs to specify RDMA or not when
allocating stat_ctx.

Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP

There's a couple of bugs in the support of max_rd_atomic and
max_dest_rd_atomic. In the modify_qp, if the requested max_rd_atomic,
which is the ORRQ size, is greater than what the chip can support,
then we have to cap the request to chip max as we can't have the HW
overflow the ORRQ. Capping the max_rd_atomic support internally is okay
to do as the remaining read/atomic WRs will still be sitting in the SQ.
However, for the max_dest_rd_atomic, the driver has to error out as
this dictates the IRRQ size and we can't control what the remote
side sends.

Signed-off-by: Eddie Wai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Report supported value to IB stack in query_device

- Report supported value for max_mr_size to IB stack in query_device.
   Also, check and log if MR size requested by application in
   reg_user_mr() is greater than value currently supported by driver.
- Report only 4K page size support for now
- Fix Max_QP value returned by ibv_devinfo -vv.
   In case of PF, FW reserves 129 QPs for creating QP1s of VFs
   and PF. So the max_qp value reported by FW for PF doesn'tt include
   the QP1. Fixing this issue by adding 1 with the value reported
   by FW.

Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails

This fix is added only to avoid system crash in some a
specific scenario. When bnxt_re driver is loaded and if
user tries to change interface mac address, delete GID
fails because QP1 is still associated with existing MAC
(default GID). If the above command fails GID tables are
not modified in the h/w or driver, but the GID context memory
is freed. Now, if the user changes the mac back to the original
value, another add_gid comes to the driver where the driver
reports that the GID is already present in its table
and tries to access the context which was already freed.

So, in this case, in order to avoid NULL pointer de-reference,
this patch removes the context memory free if delete_gid fails
and the same context memory is re-used in new add_gid.
Memory cleanup will be taken care during driver unload, while
deleting the GID table.

Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error

Posting WQE size of 2 results in a WQE_FORMAT_ERROR
thrown by the HW as it requires host to supply WQE Size with room
for atleast one SGE so that the resulting WQE size be atleast 3.

Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext

The driver must free the DPI during the dealloc_ucontext
instead of freeing it during dealloc_pd. However, the DPI
allocation scheme remains unchanged.

Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix a warning message

"umem" is a valid pointer. We intended to print "*umem" or even just
"err" instead.

Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/ocrdma: Fix error codes in ocrdma_create_srq()

If either of these allocations fail then we return ERR_PTR(0). That's
equivalent to NULL and results in a NULL pointer dereference in the
caller.

Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/ocrdma: Fix an error code in ocrdma_alloc_pd()

We should preserve the original "status" error code instead of resetting
it to zero. Returning ERR_PTR(0) is the same as NULL and results in a
NULL dereference in the callers. I added a printk() on error instead.

Fixes: 45e86b33ec8b ("RDMA/ocrdma: Cache recv DB until QP moved to RTR")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/cxgb3: Fix error codes in iwch_alloc_mr()

We accidentally don't set the error code on some error paths. It means
return ERR_PTR(0) which is NULL and results in a NULL dereference in the
caller.

Fixes: 13a239330abd ("RDMA/cxgb3: Don't ignore insert_handle() failures")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

cxgb4: Fix error codes in c4iw_create_cq()

If one of these kmalloc() calls fails then we return ERR_PTR(0) which is
NULL. It results in a NULL dereference in the callers.

Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/i40iw: Fix error code in i40iw_create_cq()

We accidentally forgot to set the error code if ib_copy_from_udata()
fails. It means we return ERR_PTR(0) which is NULL and results in a
NULL dereference in the callers.

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/IPoIB: Fix error code in ipoib_add_port()

We accidentally don't see the error code on some of these error paths.
It means we return ERR_PTR(0) which is NULL and it results in a NULL
dereference in the caller.

This bug dates to pre-git days.

Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

RDMA/bnxt_re: checking for NULL instead of IS_ERR()

bnxt_re_alloc_mw() doesn't return NULL, it returns error pointers.

Fixes: 9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Free QP PBLEs when the QP is destroyed

If the physical buffer list entries (PBLEs) of a QP are freed
up at i40iw_dereg_mr, they can be assigned to a newly
created QP before the previous QP is destroyed. Fix this
by freeing PBLEs only when the QP is destroyed.

Signed-off-by: Tatyana Nikolova <[email protected]>
Signed-off-by: Faisal Latif <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Avoid memory leak of CQP request objects

Control Queue Pair (CQP) request objects, which have
not received a completion upon interface close, remain
in memory.

To fix this, identify and free all pending CQP request
objects during destroy CQP OP.

Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Update list correctly

To avoid infinite loop, in i40iw_ieq_handle_exception, update
plist inside while loop.

Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Add missing memory barrier

Add missing write memory barrier before writing the
header containing valid bit to the WQE in i40iw_puda_send.

Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Free QP resources on CQP destroy QP failure

Current flow leaves software QP structures in memory if
Control Queue Pair (CQP) destroy QP OP fails. To fix this,
free QP resources on fail of CQP destroy QP OP.

Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Release cm_id ref on PCI function reset

On PCI function reset, cm_id reference is not released
which causes an application hang, as it waits on the
cm_id to be released on rdma_destroy.

To fix this, call i40iw_cm_disconn during a PCI function
reset to clean-up resources and release cm_id reference.

Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Utilize iwdev->reset during PCI function reset

Utilize iwdev->reset on a PCI function reset notification
instead of passing in reset flag for resource clean-up.

Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Do not poll CCQ after it is destroyed

Control Queue Pair (CQP) OPs, in this case - Update SDs,
cannot poll the Control Completion Queue (CCQ) after CCQ is
destroyed. Instead, poll via registers.

Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

i40iw: Fix order of cleanup in close

The order for calling i40iw_destroy_pble_pool is incorrect.
Also, add PBLE_CHUNK_MEM init state to track pble pool
creation and destruction.

Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Henry Orosco <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

kvm: x86: hyperv: avoid livelock in oneshot SynIC timers

If the SynIC timer message delivery fails due to SINT message slot being
busy, there's no point to attempt starting the timer again until we're
notified of the slot being released by the guest (via EOM or EOI).

Even worse, when a oneshot timer fails to deliver its message, its
re-arming with an expiration time in the past leads to immediate retry
of the delivery, and so on, without ever letting the guest vcpu to run
and release the slot, which results in a livelock.

To avoid that, only start the timer when there's no timer message
pending delivery. When there is, meaning the slot is busy, the
processing will be restarted upon notification from the guest that the
slot is released.

Signed-off-by: Roman Kagan <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

KVM: VMX: Fix invalid guest state detection after task-switch emulation

This can be reproduced by EPT=1, unrestricted_guest=N, emulate_invalid_state=Y
or EPT=0, the trace of kvm-unit-tests/taskswitch2.flat is like below, it tries
to emulate invalid guest state task-switch:

kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
kvm_entry: vcpu 0
kvm_exit: reason TASK_SWITCH rip 0x0 info 40000058 0
kvm_emulate_insn: 42000:0:0f 0b (0x2)
kvm_emulate_insn: 42000:0:0f 0b (0x2) failed
kvm_inj_exception: #UD (0x0)
......................

It appears that the task-switch emulation updates rflags (and vm86
flag) only after the segments are loaded, causing vmx->emulation_required
to be set, when in fact invalid guest state emulation is not needed.

This patch fixes it by updating vmx->emulation_required after the
rflags (and vm86 flag) is updated in task-switch emulation.

Thanks Radim for moving the update to vmx__set_flags and adding Paolo's
suggestion for the check.

Suggested-by: Nadav Amit <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Nadav Amit <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

nvmet: don't report 0-bytes in serial number

The NVME standard mandates that the SN, MN, and FR fields of the Identify
Controller Data Structure be "ASCII strings". That means that they may
not contain 0-bytes, not even string terminators.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
[hch: fixed for the move of the serial field, updated description]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvmet: preserve controller serial number between reboots

The NVMe target has no way to preserve controller serial
IDs across reboots which breaks udev scripts doing
SYMLINK+="dev/disk/by-id/nvme-$env{ID_SERIAL}-part%n.

Export the randomly generated serial number via configfs and allow
setting of a serial via configfs to mitigate this breakage.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvmet: Move serial number from controller to subsystem

The NVMe specification defines the serial number as:

"Serial Number (SN): Contains the serial number for the NVM subsystem
that is assigned by the vendor as an ASCII string. Refer to section
7.10 for unique identifier requirements. Refer to section 1.5 for ASCII
string requirements"

So move it from the controller to the subsystem, where it belongs.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvmet: prefix version configfs file with attr

The NVMe target's attribute files need an attr prefix in order to have
nvmetcli recognize them. Add this attribute.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvme-pci: Fix an error handling path in 'nvme_probe()'

Release resources in the correct order in order not to miss a
'put_device()' if 'nvme_dev_map()' fails.

Fixes: b00a726a9fd8 ("NVMe: Don't unmap controller registers on reset")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvme-pci: Remove nvme_setup_prps BUG_ON

This patch replaces the invalid nvme SGL kernel panic with a warning,
and returns an appropriate error. The warning will occur only on the
first occurance, and sgl details will be printed to help debug how the
request was allowed to form.

Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvme-pci: add another device ID with stripe quirk

Adds a fourth Intel controller which has the "stripe" quirk.

Signed-off-by: David Wayne Fugate <[email protected]>
Acked-by: Keith Busch <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvmet-fc: fix byte swapping in nvmet_fc_ls_create_association

We always need to do non-equal comparisms on the native endian versions
to get the correct result.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: James Smart <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvme: fix byte swapping in the streams code

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value

Check return value from call to devm_kmemdup() in order to prevent a NULL
pointer dereference.

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

trace: fix the errors caused by incompatible type of RCU variables

The variables which are processed by RCU functions should be annotated
as RCU, otherwise sparse will report the errors like below:

"error: incompatible types in comparison expression (different
address spaces)"

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chunyan Zhang <[email protected]>
[ Updated to not be 100% 80 column strict ]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Fix kmemleak in instance_rmdir

Hit the kmemleak when executing instance_rmdir, it forgot releasing
mem of tracing_cpumask. With this fix, the warn does not appear any
more.

unreferenced object 0xffff93a8dfaa7c18 (size 8):
  comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s)
  hex dump (first 8 bytes):
    ff ff ff ff ff ff ff ff                          ........
  backtrace:
    [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280
    [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30
    [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10
    [<ffffffff88571ab0>] instance_mkdir+0x90/0x240
    [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70
    [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0
    [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100
    [<ffffffff88403857>] do_syscall_64+0x67/0x150
    [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a
    [<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: ccfe9e42e451 ("tracing: Make tracing_cpumask available for all instances")
Signed-off-by: Chunyu Hu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

xhci: fix memleak in xhci_run()

Found this issue by kmemleak.
xhci_run() did not check return val and free command for
xhci_queue_vendor_command()

unreferenced object 0xffff88011c0be500 (size 64):
  comm "kworker/0:1", pid 58, jiffies 4294670908 (age 50.420s)
  hex dump (first 32 bytes):
  backtrace:
    [<ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff8121801a>] kmem_cache_alloc_trace+0xca/0x1d0
    [<ffffffff81576bf4>] xhci_alloc_command+0x44/0x130
    [<ffffffff8156f1cc>] xhci_run+0x4cc/0x630
    [<ffffffff8153b84b>] usb_add_hcd+0x3bb/0x950
    [<ffffffff8154eac8>] usb_hcd_pci_probe+0x188/0x500
    [<ffffffff815851ac>] xhci_pci_probe+0x2c/0x220
    [<ffffffff813d2ca5>] local_pci_probe+0x45/0xa0
    [<ffffffff810a54e4>] work_for_cpu_fn+0x14/0x20
    [<ffffffff810a8409>] process_one_work+0x149/0x360
    [<ffffffff810a8d08>] worker_thread+0x1d8/0x3c0
    [<ffffffff810ae7d9>] kthread+0x109/0x140
    [<ffffffff8176d585>] ret_from_fork+0x25/0x30
    [<ffffffffffffffff>] 0xffffffffffffffff

Cc: <[email protected]>
Signed-off-by: Shu Wang <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: xhci: fix spinlock recursion for USB2 test mode

Both xhci_hub_control and xhci_disable_slot tries to hold spinlock, the
spinlock recursion occurs when enters USB2 test mode. Fix it by unlock
spinlock before calling xhci_disable_slot.

Cc: <[email protected]>
Fixes: 0f1d832ed1fb ("usb: xhci: Add port test modes support for usb2")
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: fix 20000ms port resume timeout

A uncleared PLC (port link change) bit will prevent furuther port event
interrupts for that port. Leaving it uncleared caused get_port_status()
to timeout after 20000ms while waiting to get the final port event
interrupt for resume -> U0 state change.

This is a targeted fix for a specific case where we get a port resume event
racing with xhci resume. The port event interrupt handler notices xHC is
not yet running and bails out early, leaving PLC uncleared.

The whole xhci port resuming needs more attention, but while working on it
it anyways makes sense to always ensure PLC is cleared in get_port_status
before setting a new link state and waiting for its completion.

Cc: <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: xhci: Issue stop EP command only when the EP state is running

on AMD platforms with SNPS 3.1 USB controller if stop endpoint command is
issued the controller does not respond, when the EP is not in running
state. HW completes the command execution and reports
"Context State Error" completion code. This is as per the spec. However
HW on receiving the second command additionally marks EP to Flow control
state in HW which is RTL bug. This bug causes the HW not to respond
to any further doorbells that are rung by the driver. This makes the EP
to not functional anymore and causes gross functional failures.

As a workaround, not to hit this problem, it's better to check the EP state
and issue a stop EP command only when the EP is in running state.

As a sidenote, even with this patch there is still a possibility of
triggering the RTL bug if the context state races with the stop endpoint
command as described in xHCI spec 4.6.9

[code simplification and reworded sidenote in commit message -Mathias]
Signed-off-by: Shyam Sundar S K <[email protected]>
Signed-off-by: Nehal Shah <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: Bad Ethernet performance plugged in ASM1042A host

When USB Ethernet is plugged in ASMEDIA ASM1042A xHCI host, bad
performance was manifesting in Web browser use (like download
large file such as ISO image). It is known limitation of
ASM1042A that is not compatible with driver scheduling,
As a workaround we can modify flow control handling of ASM1042A.
The register we modify is changes the behavior

[use quirk bit 28, usleep_range 40-60us, empty non-pci function -Mathias]
Cc: <[email protected]>
Signed-off-by: Jiahau Chang <[email protected]>
Signed-off-by: Ian Pilcher <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: Fix NULL pointer dereference when cleaning up streams for removed host

This off by one in stream_id indexing caused NULL pointer dereference and
soft lockup on machines with USB attached SCSI devices connected to a
hotpluggable xhci controller.

The code that cleans up pending URBs for dead hosts tried to dereference
a stream ring at the invalid stream_id 0.
ep->stream_info->stream_rings[0] doesn't point to a ring.

Start looping stream_id from 1 like in all the other places in the driver,
and check that the ring exists before trying to kill URBs on it.

Reported-by: rocko r <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

debug: Fix WARN_ON_ONCE() for modules

Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
completely broken when called from a module.

The bug was introduced with the following commit:

  19d436268dde ("debug: Add _ONCE() logic to report_bug()")

That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
bit can be written to the flags to indicate the first warning has
occurred.

The bug table was made writable for vmlinux, which relies on
vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
it wasn't made writable for modules, which rely on the ELF section
header flags.

Reported-by: Mike Galbraith <[email protected]>
Tested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 19d436268dde ("debug: Add _ONCE() logic to report_bug()")
Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

ovl: check for bad and whiteout index on lookup

Index should always be of the same file type as origin, except for
the case of a whiteout index. A whiteout index should only exist
if all lower aliases have been unlinked, which means that finding
a lower origin on lookup whose index is a whiteout should be treated
as a lookup error.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

ovl: do not cleanup directory and whiteout index entries

Directory index entries are going to be used for looking up
redirected upper dirs by lower dir fh when decoding an overlay
file handle of a merge dir.

Whiteout index entries are going to be used as an indication that
an exported overlay file handle should be treated as stale (i.e.
after unlink of the overlay inode).

We don't know the verification rules for directory and whiteout
index entries, because they have not been implemented yet, so fail
to mount overlay rw if those entries are found to avoid corrupting
an index that was created by a newer kernel.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

ovl: fix xattr get and set with selinux

inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr
to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which
depends on dentry->d_inode, but d_inode is null and not initialized yet at
this point resulting in an Oops.

Fix by getting the upperdentry info from the inode directly in this case.

Reported-by: Eryu Guan <[email protected]>
Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode")
Signed-off-by: Miklos Szeredi <[email protected]>

x86/platform/intel-mid: Fix a format string overflow warning

We have space for exactly three characters for the index in "max7315_%d_base",
but as GCC points out having more would cause an string overflow:

  arch/x86/platform/intel-mid/device_libs/platform_max7315.c: In function 'max7315_platform_data':
  arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: error: '%d' directive writing between 1 and 11 bytes into a region of size 9 [-Werror=format-overflow=]
     sprintf(base_pin_name, "max7315_%d_base", nr);
                          ^~~~~~~~~~~~~~~~~
  arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: note: directive argument in the range [-2147483647, 2147483647]
  arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:3: note: 'sprintf' output between 15 and 25 bytes into a destination of size 17
     sprintf(base_pin_name, "max7315_%d_base", nr);

This makes it use an snprintf() to truncate the string if that happened
rather than overflowing the stack. In practice, this is safe, because
there won't be a large number of max7315 devices in the systems, and
both the format and the length are defined by the firmware interface.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG

The IOSF_MBI option requires PCI support, without it we get a harmless
Kconfig warning when it gets selected by PUNIT_ATOM_DEBUG:

warning: (X86_INTEL_LPSS && SND_SST_IPC_ACPI && MMC_SDHCI_ACPI && PUNIT_ATOM_DEBUG) selects IOSF_MBI which has unmet direct dependencies (PCI)

This adds another dependency to avoid the warning.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/build: Silence the build with "make -s"

Every kernel build on x86 will result in some output:

  Setup is 13084 bytes (padded to 13312 bytes).
  System is 4833 kB
  CRC 6d35fa35
  Kernel: arch/x86/boot/bzImage is ready  (#2)

This shuts it up, so that 'make -s' is truely silent as long as
everything works. Building without '-s' should produce unchanged
output.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl

The x86 version of insb/insw/insl uses an inline assembly that does
not have the target buffer listed as an output. This can confuse
the compiler, leading it to think that a subsequent access of the
buffer is uninitialized:

  drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
  drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
  drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  drivers/net/sb1000.c: In function 'sb1000_rx':
  drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
  drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]

I tried to mark the exact input buffer as an output here, but couldn't
figure it out. As suggested by Linus, marking all memory as clobbered
however is good enough too. For the outs operations, I also add the
memory clobber, to force the input to be written to local variables.
This is probably already guaranteed by the "asm volatile", but it can't
hurt to do this for symmetry.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.org/lkml/2017/7/12/605
Signed-off-by: Ingo Molnar <[email protected]>

x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning

gcc-7.1.1 produces this warning:

arch/x86/math-emu/reg_add_sub.c: In function 'FPU_add':
arch/x86/math-emu/reg_add_sub.c:80:48: error: ?: using integer constants in boolean context [-Werror=int-in-bool-context]

This appears to be a bug in gcc-7.1.1, and I have reported it as
PR81484. The compiler suggests that code written as

if (a & b ? c : d)

is usually incorrect and should have been

if (a & (b ? c : d))

However, in this case, we correctly write

if ((a & b) ? c : d)

and should not get a warning for it.

This adds a dirty workaround for the problem, adding a comparison with
zero inside of the macro. The warning is currently disabled in the kernel,
so we may decide not to apply the patch, and instead wait for future gcc
releases to fix the problem. On the other hand, it seems to be the
only instance of this particular problem.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Bill Metzenthen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484
Signed-off-by: Ingo Molnar <[email protected]>

x86/fpu/math-emu: Fix possible uninitialized variable use

When building the kernel with "make EXTRA_CFLAGS=...", this overrides
the "PARANOID" preprocessor macro defined in arch/x86/math-emu/Makefile,
and we run into a build warning:

arch/x86/math-emu/reg_compare.c: In function ‘compare_i_st_st’:
arch/x86/math-emu/reg_compare.c:254:6: error: ‘f’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

This fixes the implementation to work correctly even without the PARANOID
flag, and also fixes the Makefile to not use the EXTRA_CFLAGS variable
but instead use the ccflags-y variable in the Makefile that is meant
for this purpose.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Bill Metzenthen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

perf/x86: Shut up false-positive -Wmaybe-uninitialized warning

The intialization function checks for various failure scenarios, but
unfortunately the compiler gets a little confused about the possible
combinations, leading to a false-positive build warning when
-Wmaybe-uninitialized is set:

  arch/x86/events/core.c: In function ‘init_hw_perf_events’:
  arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n",

We can't actually run into this case, so this shuts up the warning
by initializing the variables to a known-invalid state.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: https://patchwork.kernel.org/patch/9392595/
Signed-off-by: Ingo Molnar <[email protected]>

x86/defconfig: Remove stale, old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):

- EXPERIMENTAL is gone since v3.9;
- IP_NF_TARGET_ULOG: commit d4da843e6fad ("netfilter: kill remnants of ulog targets");
- USB_LIBUSUAL: commit f61870ee6f8c ("usb: remove libusual");

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/ioapic: Pass the correct data to unmask_ioapic_irq()

One of the rarely executed code pathes in check_timer() calls
unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.

That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
interrupt 0. irq_get_chip_data(0) returns NULL, so the following
dereference in unmask_ioapic_irq() causes a kernel panic.

The issue went unnoticed in the first place because irq_get_chip_data()
returns a void pointer so the compiler cannot do a type check on the
argument. The code path was added for machines with broken configuration,
but it seems that those machines are either not running current kernels or
simply do not longer exist.

Hand in irq_get_irq_data(0) as argument which provides the correct data.

[ tglx: Rewrote changelog ]

Fixes: 4467715a44cc ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
Signed-off-by: Seunghun Han <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/acpi: Prevent out of bound access caused by broken ACPI tables

The bus_irq argument of mp_override_legacy_irq() is used as the index into
the isa_irq_to_gsi[] array. The bus_irq argument originates from
ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
tables, but is nowhere sanity checked.

That allows broken or malicious ACPI tables to overwrite memory, which
might cause malfunction, panic or arbitrary code execution.

Add a sanity check and emit a warning when that triggers.

[ tglx: Added warning and rewrote changelog ]

Signed-off-by: Seunghun Han <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

drm/mst: Avoid processing partially received up/down message transactions

Currently we may process up/down message transactions containing
uninitialized data. This can happen if there was an error during the
reception of any message in the transaction, but we happened to receive
the last message correctly with the end-of-message flag set.

To avoid this abort the reception of the transaction when the first
error is detected, rejecting any messages until a message with the
start-of-message flag is received (which will start a new transaction).
This is also what the DP 1.4 spec 2.11.8.2 calls for in this case.

In addtion this also prevents receiving bogus transactions without the
first message with the the start-of-message flag set.

v2:
- unchanged
v3:
- git add the part that actually skips messages after an error in
drm_dp_sideband_msg_build()

Cc: Dave Airlie <[email protected]>
Cc: Lyude <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Lyude <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()

In case of an unknown broadcast message is sent mstb will remain unset,
so check for this.

Cc: Dave Airlie <[email protected]>
Cc: Lyude <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Lyude <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/mst: Fix error handling during MST sideband message reception

Handle any error due to partial reads, timeouts etc. to avoid parsing
uninitialized data subsequently. Also bail out if the parsing itself
fails.

Cc: Dave Airlie <[email protected]>
Cc: Lyude <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Lyude <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

perf/core: Fix scheduling regression of pinned groups

Vince Weaver reported:

> I was tracking down some regressions in my perf_event_test testsuite.
> Some of the tests broke in the 4.11-rc1 timeframe.
>
> I've bisected one of them, this report is about
> tests/overflow/simul_oneshot_group_overflow
> This test creates an event group containing two sampling events, set
> to overflow to a signal handler (which disables and then refreshes the
> event).
>
> On a good kernel you get the following:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 946 (perf::instructions/1000000)
> fd 4 overflows: 473 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946379875
> Count 1: 946365218
>
> With the broken kernels you get:
> Event perf::instructions with period 1000000
> Event perf::instructions with period 2000000
> fd 3 overflows: 938 (perf::instructions/1000000)
> fd 4 overflows: 318 (perf::instructions/2000000)
> Ending counts:
> Count 0: 946373080
> Count 1: 653373058

The root cause of the bug is that the following commit:

487f05e18a ("perf/core: Optimize event rescheduling on active contexts")

erronously assumed that event's 'pinned' setting determines whether the
event belongs to a pinned group or not, but in fact, it's the group
leader's pinned state that matters.

This was discovered by Vince in the test case described above, where two instruction
counters are grouped, the group leader is pinned, but the other event is not;
in the regressed case the counters were off by 33% (the difference between events'
periods), but should be the same within the error margin.

Fix the problem by looking at the group leader's pinning.

Reported-by: Vince Weaver <[email protected]>
Tested-by: Vince Weaver <[email protected]>
Signed-off-by: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

ipv6: avoid overflow of offset in ip6_find_1stfragopt

In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.

Signed-off-by: Sabrina Dubroca <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: tehuti: don't process data if it has not been copied from userspace

The array data is only populated with valid information from userspace
if cmd != SIOCDEVPRIVATE, other cases the array contains garbage on
the stack. The subsequent switch statement acts on a subcommand in
data[0] which could be any garbage value if cmd is SIOCDEVPRIVATE which
seems incorrect to me. Instead, just return EOPNOTSUPP for the case
where cmd == SIOCDEVPRIVATE to avoid this issue.

As a side note, I suspect that the original intention of the code
was for this ioctl to work just for cmd == SIOCDEVPRIVATE (and the
current logic is reversed). However, I don't wont to change the current
semantics in case any userspace code relies on this existing behaviour.

Detected by CoverityScan, CID#139647 ("Uninitialized scalar variable")

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"

This reverts commit cd8966e75ed3c6b41a37047a904617bc44fa481f.

The duplicate CHANGEADDR event message is sent regardless of link
status whereas the setlink changes only generate a notification when
the link is up. Not sending a notification when the link is down breaks
dhcpcd which only processes hwaddr changes when the link is down.

Fixes reported regression:
https://bugzilla.kernel.org/show_bug.cgi?id=196355

Reported-by: Yaroslav Isakov <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Enable CMODE config support for 6390X

Commit f39908d3b1c45 ('net: dsa: mv88e6xxx: Set the CMODE for mv88e6390
ports 9 & 10') added support for setting the CMODE for the 6390X family,
but only enabled it for 9290 and 6390 - and left out 6390X.

Fix support for setting the CMODE on 6390X also by assigning
mv88e6390x_port_set_cmode() to the .port_set_cmode function pointer in
mv88e6390x_ops too.

Fixes: f39908d3b1c4 ("net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10")
Signed-off-by: Martin Hundebøll <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-binding: ptp: Add SoC compatibility strings for dte ptp clock

Add SoC specific compatibility strings to the Broadcom DTE
based PTP clock binding document.

Fixed the document heading and node name.

Fixes: 80d6076140b2 ("dt-binding: ptp: add bindings document for dte based ptp clock")
Signed-off-by: Arun Parameswaran <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

llist: clang: introduce member_address_is_nonnull()

Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate
until &pos->member != NULL. But when building the kernel with Clang,
the compiler assumes &pos->member cannot be NULL if the member's offset
is greater than 0 (which would be equivalent to the object being
non-contiguous in memory). Therefore the loop condition is always true,
and the loops become infinite.

To work around this, introduce the member_address_is_nonnull() macro,
which casts object pointer to uintptr_t, thus letting the member pointer
to be NULL.

Signed-off-by: Alexander Potapenko <[email protected]>
Tested-by: Sodagudi Prasad <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

NET: dwmac: Make dwmac reset unconditional

Unconditional reset dwmac before HW init if reset controller is present.

In existing implementation we reset dwmac only after second module
probing:
(module load -> unload -> load again [reset happens])

Now we reset dwmac at every module load:
(module load [reset happens] -> unload -> load again [reset happens])

Also some reset controllers have only reset callback instead of
assert + deassert callbacks pair, so handle this case.

Signed-off-by: Eugeniy Paltsev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Zero terminate ifr_name in dev_ifname().

The ifr.ifr_name is passed around and assumed to be NULL terminated.

Signed-off-by: David S. Miller <[email protected]>

wireless: wext: terminate ifr name coming from userspace

ifr name is assumed to be a valid string by the kernel, but nothing
was forcing username to pass a valid string.

In turn, this would cause panics as we tried to access the string
past it's valid memory.

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert commit 722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")

Doing the test without taking any locks is racy, and so really it makes
more sense to do it in the flexfiles code (which is the only case that
cares).

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()

If the layout has expired due to a fencing event, then we should not
attempt to commit to the DS.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFS: Fix another COMMIT race in pNFS

We must make sure that cinfo->ds->ncommitting is in sync with the
commit list, since it is checked as part of pnfs_commit_list().

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFS: Fix a COMMIT race in pNFS

We must make sure that cinfo->ds->nwritten is in sync with the
commit list, since it is checked as part of pnfs_scan_commit_lists().

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

mount: copy the port field into the cloned nfs_server structure.

Doing this copy eliminates the "port=0" entry in
the /proc/mounts entries

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=69241
Signed-off-by: Steve Dickson <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull structure randomization updates from Kees Cook:
"Now that IPC and other changes have landed, enable manual markings for
  randstruct plugin, including the task_struct.

  This is the rest of what was staged in -next for the gcc-plugins, and
  comes in three patches, largest first:

   - mark "easy" structs with __randomize_layout

   - mark task_struct with an optional anonymous struct to isolate the
     __randomize_layout section

   - mark structs to opt _out_ of automated marking (which will come
     later)

  And, FWIW, this continues to pass allmodconfig (normal and patched to
  enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
  s390 for me"

* tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  randstruct: opt-out externally exposed function pointer structs
  task_struct: Allow randomized layout
  randstruct: Mark various structs for randomization

Merge tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A number of small fixes for -rc1 Luminous changes plus a readdir race
  fix, marked for stable"

* tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client:
  libceph: potential NULL dereference in ceph_msg_data_create()
  ceph: fix race in concurrent readdir
  libceph: don't call encode_request_finish() on MOSDBackoff messages
  libceph: use alloc_pg_mapping() in __decode_pg_upmap_items()
  libceph: set -EINVAL in one place in crush_decode()
  libceph: NULL deref on osdmap_apply_incremental() error path
  libceph: fix old style declaration warnings

audit: fix memleak in auditd_send_unicast_skb.

Found this issue by kmemleak report, auditd_send_unicast_skb
did not free skb if rcu_dereference(auditd_conn) returns null.

unreferenced object 0xffff88082568ce00 (size 256):
comm "auditd", pid 1119, jiffies 4294708499
backtrace:
[<ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8121820c>] kmem_cache_alloc_node+0xcc/0x210
[<ffffffff8161b99d>] __alloc_skb+0x5d/0x290
[<ffffffff8113c614>] audit_make_reply+0x54/0xd0
[<ffffffff8113dfa7>] audit_receive_msg+0x967/0xd70
----------------
(gdb) list *audit_receive_msg+0x967
0xffffffff8113dff7 is in audit_receive_msg (kernel/audit.c:1133).
1132 skb = audit_make_reply(0, AUDIT_REPLACE, 0,
0, &pvnr, sizeof(pvnr));
---------------
[<ffffffff8113e402>] audit_receive+0x52/0xa0
[<ffffffff8166c561>] netlink_unicast+0x181/0x240
[<ffffffff8166c8e2>] netlink_sendmsg+0x2c2/0x3b0
[<ffffffff816112e8>] sock_sendmsg+0x38/0x50
[<ffffffff816117a2>] SYSC_sendto+0x102/0x190
[<ffffffff81612f4e>] SyS_sendto+0xe/0x10
[<ffffffff8176d337>] entry_SYSCALL_64_fastpath+0x1a/0xa5
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Shu Wang <[email protected]>
Signed-off-by: Paul Moore <[email protected]>

x86: add MULTIUSER dependency for KVM

KVM tries to select 'TASKSTATS', which had additional dependencies:

warning: (KVM) selects TASKSTATS which has unmet direct dependencies (NET && MULTIUSER)

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

KVM: nVMX: Disallow VM-entry in MOV-SS shadow

Immediately following MOV-to-SS/POP-to-SS, VM-entry is
disallowed. This check comes after the check for a valid VMCS. When
this check fails, the instruction pointer should fall through to the
next instruction, the ALU flags should be set to indicate VMfailValid,
and the VM-instruction error should be set to 26 ("VM entry with
events blocked by MOV SS").

Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

KVM: nVMX: track NMI blocking state separately for each VMCS

vmx_recover_nmi_blocking is using a cached value of the guest
interruptibility info, which is stored in vmx->nmi_known_unmasked.
vmx_recover_nmi_blocking is run for both normal and nested guests,
so the cached value must be per-VMCS.

This fixes eventinj.flat in a nested non-EPT environment. With EPT it
works, because the EPT violation handler doesn't have the
vmx->nmi_known_unmasked optimization (it is unnecessary because, unlike
vmx_recover_nmi_blocking, it can just look at the exit qualification).

Thanks to Wanpeng Li for debugging the testcase and providing an initial
patch.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present

If the genpd->attach_dev or genpd->power_on fails, genpd_dev_pm_attach
may return -EPROBE_DEFER initially. However genpd_alloc_dev_data sets
the PM domain for the device unconditionally.

When subsequent attempts are made to call genpd_dev_pm_attach, it may
return -EEXISTS checking dev->pm_domain without re-attempting to call
attach_dev or power_on.

platform_drv_probe then attempts to call drv->probe as the return value
-EEXIST != -EPROBE_DEFER, which may end up in a situation where the
device is accessed without it's power domain switched on.

Fixes: f104e1e5ef57 (PM / Domains: Re-order initialization of generic_pm_domain_data)
Cc: 4.4+ <[email protected]> # v4.4+
Signed-off-by: Sudeep Holla <[email protected]>
Acked-by: Ulf Hansson <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

tracing/ring_buffer: Try harder to allocate

ftrace can fail to allocate per-CPU ring buffer on systems with a large
number of CPUs coupled while large amounts of cache happening in the
page cache. Currently the ring buffer allocation doesn't retry in the VM
implementation even if direct-reclaim made some progress but still
wasn't able to find a free page. On retrying I see that the allocations
almost always succeed. The retry doesn't happen because __GFP_NORETRY is
used in the tracer to prevent the case where we might OOM, however if we
drop __GFP_NORETRY, we risk destabilizing the system if OOM killer is
triggered. To prevent this situation, use the __GFP_RETRY_MAYFAIL flag
introduced recently [1].

Tested the following still succeeds without destabilizing a system with
1GB memory.
echo 300000 > /sys/kernel/debug/tracing/buffer_size_kb

[1] https://marc.info/?l=linux-mm&m=149820805124906&w=2

Link: http://lkml.kernel.org/r/[email protected]
Cc: Tim Murray <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

KVM: x86: masking out upper bits

kvm_read_cr3() returns an unsigned long and gfn is a u64. We intended
to mask out the bottom 5 bits but because of the type issue we mask the
top 32 bits as well. I don't know if this is a real problem, but it
causes static checker warnings.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>

Merge tag 'fixes-for-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

usb: fixes for v4.13-rc2

First set of fixes for the current -rc cycle. Only three fixes on dwc3
this time around (proper order for getting a PHY reference, fix for
unmapping DMA and a fix for requesting IRQ on the OMAP glue layer).

Most fixes are on the renesas USB controller, fixing several old bugs
with most going to stable.

dwc2 also learned that it *must* reset USB Address to zero on Reset
interrupts.

Apart from these, some drivers needed HAS_DMA dependency and there's a
sparse warning fix for bdc udc.

usb: renesas_usbhs: gadget: disable all eps when the driver stops

A gadget driver will not disable eps immediately when ->disconnect()
is called. But, since this driver assumes all eps stop after
the ->disconnect(), unexpected behavior happens (especially in system
suspend).
So, this patch disables all eps in usbhsg_try_stop(). After disabling
eps by renesas_usbhs driver, since some functions will be called by
both a gadget and renesas_usbhs driver, renesas_usbhs driver should
protect uep->pipe. To protect uep->pipe easily, this patch adds a new
lock in struct usbhsg_uep.

Fixes: 2f98382dc ("usb: renesas_usbhs: Add Renesas USBHS Gadget")
Cc: <[email protected]> # v3.0+
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL

This patch fixes an issue that some registers may be not initialized
after resume if the USBHSF_RUNTIME_PWCTRL is not set. Otherwise,
if a cable is not connected, the driver will not enable INTENB0.VBSE
after resume. And then, the driver cannot detect the VBUS.

Fixes: ca8a282a5373 ("usb: gadget: renesas_usbhs: add suspend/resume support")
Cc: <[email protected]> # v3.2+
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

device-dax: fix sysfs duplicate warnings

Fix warnings of the form...

     WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
     sysfs: cannot create duplicate filename '/class/dax/dax12.0'
     Call Trace:
      dump_stack+0x63/0x86
      __warn+0xcb/0xf0
      warn_slowpath_fmt+0x5a/0x80
      ? kernfs_path_from_node+0x4f/0x60
      sysfs_warn_dup+0x62/0x80
      sysfs_do_create_link_sd.isra.2+0x97/0xb0
      sysfs_create_link+0x25/0x40
      device_add+0x266/0x630
      devm_create_dax_dev+0x2cf/0x340 [dax]
      dax_pmem_probe+0x1f5/0x26e [dax_pmem]
      nvdimm_bus_probe+0x71/0x120

...by reusing the namespace id for the device-dax instance name.

Now that we have decided that there will never by more than one
device-dax instance per libnvdimm-namespace parent device [1], we can
directly reuse the namepace ids. There are some possible follow-on
cleanups, but those are saved for a later patch to simplify the -stable
backport.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html

Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem...")
Cc: Jeff Moyer <[email protected]>
Cc: <[email protected]>
Reported-by: Dariusz Dokupil <[email protected]>
Signed-off-by: Dan Williams <[email protected]>