Git Repo - linux.git/log

drm/i915: enable vGPU detection for all

vGPU capability is handled by GVT-g host driver, not needed to
put extra HW check for vGPU detection. And we'll actually support
vGPU from BDW.

Signed-off-by: Ping Gao <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Reviewed-by: Joonas Lahtinen <[email protected]>
Acked-by: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Jani Nikula <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8ef89995c735f978d5dfcb3ca6bce70d41728c91)
Signed-off-by: Jani Nikula <[email protected]>

Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns

We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out. It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433debe added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list. This
patch makes sure we remove things before exiting.

Signed-off-by: Chris Mason <[email protected]>
Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4
cc: [email protected] # 3.15+

drm/atmel-hlcdc: Make ->reset() implementation static

The atmel_hlcdc_crtc_reset() function is never used outside the file and
can be static. This avoids a warning from sparse.

Signed-off-by: Thierry Reding <[email protected]>

drm: atmel-hlcdc: Fix vertical scaling

The code is applying the same scaling for the X and Y components,
thus making the scaling feature only functional when both components
have the same scaling factor.

Do the s/_w/_h/ replacement where appropriate to fix vertical scaling.

Signed-off-by: Jan Leupold <[email protected]>
Fixes: 1a396789f65a2 ("drm: add Atmel HLCDC Display Controller support")
Cc: <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>

thermal: rcar_thermal: Fix priv->zone error handling

In case thermal_zone_xxx_register() returns an error, priv->zone
isn't NULL any more, but contains the error code.

This is passed to thermal_zone_device_unregister(), then. This checks
for priv->zone being NULL, but the error code is != NULL. So it works
with the error code as a pointer. Crashing immediately.

To fix this, reset priv->zone to NULL before entering
rcar_gen3_thermal_remove().

Signed-off-by: Dirk Behme <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>

Merge remote-tracking branches 'spi/fix/lock', 'spi/fix/maintainers', 'spi/fix/put', 'spi/fix/pxa2xx', 'spi/fix/sh-msiof' and 'spi/fix/timeout' into spi-linus

Merge remote-tracking branches 'regulator/fix/email' and 'regulator/fix/qcom-smd' into regulator-linus

arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed

We're trying hard to detect when the HYP idmap overlaps with the
HYP va, as it makes the teardown of a cpu dangerous. But there is
one case where an overlap is completely safe, which is when the
whole of the kernel is idmap'ed, which is likely to happen on 32bit
when RAM is at 0x8000000 and we're using a 2G/2G VA split.

In that case, we can proceed safely.

Reported-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>

perf/x86/intel/cqm: Check cqm/mbm enabled state in event init

Yanqiu Zhang reported kernel panic when using mbm event
on system where CQM is detected but without mbm event
support, like with perf:

  # perf stat -e 'intel_cqm/event=3/' -a

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: [<ffffffff8100d64c>] update_sample+0xbc/0xe0
  ...
   <IRQ>
   [<ffffffff8100d688>] __intel_mbm_event_init+0x18/0x20
   [<ffffffff81113d6b>] flush_smp_call_function_queue+0x7b/0x160
   [<ffffffff81114853>] generic_smp_call_function_single_interrupt+0x13/0x60
   [<ffffffff81052017>] smp_call_function_interrupt+0x27/0x40
   [<ffffffff816fb06c>] call_function_interrupt+0x8c/0xa0
  ...

The reason is that we currently allow to init mbm event
even if mbm support is not detected.  Adding checks for
both cqm and mbm events and support into cqm's event_init.

Fixes: 33c3cc7acfd9 ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init")
Reported-by: Yanqiu Zhang <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Vikas Shivappa <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

usb: gadget: prevent potenial null pointer dereference on skb->len

An earlier fix partially fixed the null pointer dereference on skb->len
by moving the assignment of len after the check on skb being non-null,
however it failed to remove the erroneous dereference when assigning len.
Correctly fix this by removing the initialisation of len as was
originally intended.

Fixes: 70237dc8efd092 ("usb: gadget: function: f_eem: socket buffer may be NULL")
Acked-by: Peter Chen <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

powerpc/powernv: Fix crash on releasing compound PE

The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().

  # echo 0 > /sys/bus/pci/slots/C7/power
  iommu: Removing device 0000:01:00.1 from group 0
  iommu: Removing device 0000:01:00.0 from group 0
  Unable to handle kernel paging request for data at address 0x00000010
  Faulting instruction address: 0xc00000000005d898
  cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
      pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
      lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
      sp: c000000fe82178a0
     msr: 9000000000009033
     dar: 10
   dsisr: 40000000
    current = 0xc000000fe815ab80
    paca    = 0xc00000000ff00400 softe: 0 irq_happened: 0x01
      pid   = 2709, comm = sh
  Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
  (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
  Tue Sep 6 13:37:29 AEST 2016
  enter ? for help
  [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
  [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
  [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
  [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
  [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
  [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
  [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
  [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
  [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
  [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
  [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
  [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
  [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
  [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
  [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
  [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
  [c000000fe8217e30] c000000000009524 system_call+0x38/0x108

It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.

Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/xics/opal: Fix processor numbers in OPAL ICP

When using the OPAL ICP backend we incorrectly pass Linux CPU numbers
rather than HW CPU numbers to OPAL.

Fixes: d74361881f0d ("powerpc/xics: Add ICP OPAL backend")
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n

On ppc64le, builds with CONFIG_KEXEC=n fail with:

arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
if (rc && !kdump_in_progress())

This is because pseries/setup.c includes <linux/kexec.h>, but
kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
because the former only includes the latter if CONFIG_KEXEC_CORE=y.

Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.

Fixes: d3cbff1b5a90 ("powerpc: Put exception configuration in a common place")
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

iio:core: fix IIO_VAL_FRACTIONAL sign handling

7985e7c100 ("iio: Introduce a new fractional value type") introduced a
new IIO_VAL_FRACTIONAL value type meant to represent rational type numbers
expressed by a numerator and denominator combination.

Formating of IIO_VAL_FRACTIONAL values relies upon do_div() usage. This
fails handling negative values properly since parameters are reevaluated
as unsigned values.
Fix this by using div_s64_rem() instead. Computed integer part will carry
properly signed value. Formatted fractional part will always be positive.

Fixes: 7985e7c100 ("iio: Introduce a new fractional value type")
Signed-off-by: Gregor Boirie <[email protected]>
Reviewed-by: Lars-Peter Clausen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

iio: ensure ret is initialized to zero before entering do loop

A recent fix to iio_buffer_read_first_n_outer removed ret from being set by
a return from wait_event_interruptible and also added a continue in a loop
which causes the variable ret to not be set when it reaches the end of the
loop. Fix this by initializing ret to zero.

Also remove extraneous white space at the end of the loop.

Fixes: fcf68f3c0bb2a5 ("fix sched WARNING "do not call blocking ops when !TASK_RUNNING")
Signed-off-by: Colin Ian King <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes a regression in the cryptd code that breaks certain
  accelerated AED algorithms as well as an older regression in the
  caam driver that breaks IPsec"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: caam - fix IV loading for authenc (giv)decryption
  crypto: cryptd - Use correct tfm object for AEAD tracking

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild fix from Michal Marek:
"Fix for 'make deb-pkg'. The bug got introduced in v4.8-rc1"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
builddeb: Skip gcc-plugins when not configured

btrfs: do not decrease bytes_may_use when replaying extents

When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Wang Xiaoguang <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/urgent

* Make for_each_efi_memory_desc_in_map() safe on Xen and prevent an
   infinte loop - Jan Beulich

* Fix boot error on arm64 Qualcomm platforms by refactoring and
   improving the ExitBootServices() hack we already for x86 and moving
   it to the libstub - Jeffrey Hugo

* Use correct return data type for of_get_flat_dt_subnode_by_name()
   so that we correctly handle errors - Andrzej Hajda

Merge tag 'kvm-s390-master-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

A bugfix for the vsie code (setting the wrong field).

KVM: lapic: adjust preemption timer correctly when goes TSC backward

TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load.
The preemption timer, which relies on the guest tsc to reprogram its
preemption timer value, is also reprogrammed if vCPU is scheded in to
a different pCPU. However, the current implementation reprogram preemption
timer before TSC_OFFSET is adjusted to the right value, resulting in the
preemption timer firing prematurely.

This patch fix it by adjusting TSC_OFFSET before reprogramming preemption
timer if TSC backward.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krċmář <[email protected]>
Cc: Yunhong Jiang <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

PM / QoS: avoid calling cancel_delayed_work_sync() during early boot

of_clk_init() ends up calling into pm_qos_update_request() very early
during boot where irq is expected to stay disabled.
pm_qos_update_request() uses cancel_delayed_work_sync() which
correctly assumes that irq is enabled on invocation and
unconditionally disables and re-enables it.

Gate cancel_delayed_work_sync() invocation with kevented_up() to avoid
enabling irq unexpectedly during early boot.

Signed-off-by: Tejun Heo <[email protected]>
Reported-and-tested-by: Qiao Zhou <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>

ceph: do not modify fi->frag in need_reset_readdir()

Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
            } else if (fi->frag |= fpos_frag(new_pos)) {
                       ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^
                       (                             )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^~
                                !=

Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset")
Signed-off-by: Nicolas Iooss <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ovl: fix workdir creation

Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*). Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.

Reported-by: Martin Ziegler <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Fixes: c11b9fdd6a61 ("ovl: remove posix_acl_default from workdir")
Cc: <[email protected]>

KVM: s390: vsie: fix riccbd

We store the address of riccbd at the wrong location, overwriting
gvrd. This means that our nested guest will not be able to use runtime
instrumentation. Also, a memory leak, if our KVM guest actually sets gvrd.

Not noticed until now, as KVM guests never make use of gvrd and runtime
instrumentation wasn't completely tested yet.

Reported-by: Fan Zhang <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

x86/efi: Use efi_exit_boot_services()

The eboot code directly calls ExitBootServices.  This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct.  The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec.  Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.

Signed-off-by: Jeffrey Hugo <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Leif Lindholm <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>

efi/libstub: Use efi_exit_boot_services() in FDT

The FDT code directly calls ExitBootServices.  This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct.  The
FDT code does not handle EFI_INVALID_PARAMETER as required by the spec,
which causes intermittent boot failures on the Qualcomm Technologies
QDF2432.  Call the efi_exit_boot_services() helper intead, which handles
the EFI_INVALID_PARAMETER scenario properly.

Signed-off-by: Jeffrey Hugo <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Leif Lindholm <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>

efi/libstub: Introduce ExitBootServices helper

The spec allows ExitBootServices to fail with EFI_INVALID_PARAMETER if a
race condition has occurred where the EFI has updated the memory map after
the stub grabbed a reference to the map. The spec defines a retry
proceedure with specific requirements to handle this scenario.

This scenario was previously observed on x86 - commit d3768d885c6c ("x86,
efi: retry ExitBootServices() on failure") but the current fix is not spec
compliant and the scenario is now observed on the Qualcomm Technologies
QDF2432 via the FDT stub which does not handle the error and thus causes
boot failures. The user will notice the boot failure as the kernel is not
executed and the system may drop back to a UEFI shell, but will be
unresponsive to input and the system will require a power cycle to recover.

Add a helper to the stub library that correctly adheres to the spec in the
case of EFI_INVALID_PARAMETER from ExitBootServices and can be universally
used across all stub implementations.

Signed-off-by: Jeffrey Hugo <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Leif Lindholm <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>

efi/libstub: Allocate headspace in efi_get_memory_map()

efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves. This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted. To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.

Signed-off-by: Jeffrey Hugo <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Leif Lindholm <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>

usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition

The previous driver is possible to stop the transfer wrongly.
For example:
1) An interrupt happens, but not BRDY interruption.
2) Read INTSTS0. And than state->intsts0 is not set to BRDY.
3) BRDY is set to 1 here.
4) Read BRDYSTS.
5) Clear the BRDYSTS. And then. the BRDY is cleared wrongly.

Remarks:
- The INTSTS0.BRDY is read only.
- If any bits of BRDYSTS are set to 1, the BRDY is set to 1.
- If BRDYSTS is 0, the BRDY is set to 0.

So, this patch adds condition to avoid such situation. (And about
NRDYSTS, this is not used for now. But, avoiding any side effects,
this patch doesn't touch it.)

Fixes: d5c6a1e024dd ("usb: renesas_usbhs: fixup interrupt status clear method")
Cc: <[email protected]> # v3.8+
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

usb: phy: phy-generic: Check clk_prepare_enable() error

clk_prepare_enable() may fail, so we should better check its return
value and propagate it in the case of failure.

Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON

This driver should clear the bit. Otherwise, the VBUS will output
wrongly if the usb port on a board has VBUS output capability.

Fixes: 746bfe63bba3 ("usb: gadget: renesas_usb3: add support for
Renesas USB3.0 peripheral controller")
Cc: <[email protected]> # v4.5+
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

Revert "usb: dwc3: gadget: always decrement by 1"

This reverts commit 6f8245b4e37c ("usb: dwc3: gadget: always decrement
by 1").

We can't always decrement this value.

We should decrement only if the calculation of free slots results in a
LINK TRB being among one of the free slots (dequeue < enqueue).

Otherwise, if the LINK TRB is not among the free slots then it should
not be decremented.

Signed-off-by: John Youn <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>

efi: Fix handling error value in fdt_find_uefi_params

of_get_flat_dt_subnode_by_name can return negative value in case of error.
Assigning the result to unsigned variable and checking if the variable
is lesser than zero is incorrect and always false.
The patch fixes it by using signed variable to check the result.

The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci

Signed-off-by: Andrzej Hajda <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Shawn Lin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>

efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen

While commit 55f1ea15216 ("efi: Fix for_each_efi_memory_desc_in_map()
for empty memmaps") made an attempt to deal with empty memory maps, it
didn't address the case where the map field never gets set, as is
apparently the case when running under Xen.

Reported-by: <[email protected]>
Tested-by: <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: <[email protected]> # v4.7+
Signed-off-by: Jan Beulich <[email protected]>
[ Guard the loop with a NULL check instead of pointer underflow ]
Signed-off-by: Matt Fleming <[email protected]>

sched/core: Fix a race between try_to_wake_up() and a woken up task

The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.

The task being woken up is already awake from a schedule()
and is doing the following:

do {
schedule()
set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up():

while (p->on_cpu)
cpu_relax();

Analysis:

The instance I've seen involves the following race:

CPU1 CPU2

while () {
   if (cond)
     break;
   do {
     schedule();
     set_current_state(TASK_UN..)
   } while (!cond);
wakeup_routine()
  spin_lock_irqsave(wait_lock)
   raw_spin_lock_irqsave(wait_lock)   wake_up_process()
}   try_to_wake_up()
set_current_state(TASK_RUNNING);   ..
list_del(&waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:

CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
   wake_up_process()
   try_to_wake_up()
   raw_spin_lock_irqsave(p->pi_lock)
   ..
   if (p->on_rq && ttwu_wakeup())
   ..
   while (p->on_cpu)
     cpu_relax()
   ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue:

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.

Signed-off-by: Balbir Singh <[email protected]>
[ Updated comment to clarify matching barriers. Many
  architectures do not have a full barrier in switch_to()
  so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
Cc: Alexey Kardashevskiy <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

perf/core: Remove WARN from perf_event_read()

This effectively reverts commit:

71e7bc2bab77 ("perf/core: Check return value of the perf_event_read() IPI")

... and puts in a comment explaining why we ignore the return value.

Reported-by: Vegard Nossum <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: David Carrillo-Cisneros <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 71e7bc2bab77 ("perf/core: Check return value of the perf_event_read() IPI")
Signed-off-by: Ingo Molnar <[email protected]>

locking/barriers: Don't use sizeof(void) in lockless_dereference()

My previous commit:

112dc0c8069e ("locking/barriers: Suppress sparse warnings in lockless_dereference()")

caused sparse to complain that (in radix-tree.h) we use sizeof(void)
since that rcu_dereference()s a void *.

Really, all we need is to have the expression *p in here somewhere
to make sure p is a pointer type, and sizeof(*p) was the thing that
came to my mind first to make sure that's done without really doing
anything at runtime.

Another thing I had considered was using typeof(*p), but obviously
we can't just declare a typeof(*p) variable either, since that may
end up being void. Declaring a variable as typeof(*p)* gets around
that, and still checks that typeof(*p) is valid, so do that. This
type construction can't be done for _________p1 because that will
actually be used and causes sparse address space warnings, so keep
a separate unused variable for it.

Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 112dc0c8069e ("locking/barriers: Suppress sparse warnings in lockless_dereference()")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/microcode/AMD: Fix load of builtin microcode with randomized memory

We do not need to add the randomization offset when the microcode is
built in.

Reported-and-tested-by: Emanuel Czirai <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

ARM: dts: imx6qdl: Fix SPDIF regression

Commit 833f2cbf7091 ("ARM: dts: imx6: change the core clock of spdif")
changed many more clocks than only the SPDIF core clock as stated in
the commit message.

The MLB clock has been added and this causes SPDIF regression as
reported by Xavi Drudis Ferran and also in this forum post:
https://forum.digikey.com/thread/34240

The MX6Q Reference Manual does not mention that MLB is a clock related
to SPDIF, so change it back to a dummy clock to restore SPDIF
functionality.

Thanks to Ambika for providing the fix at:
https://community.nxp.com/thread/387131

Fixes: 833f2cbf7091 ("ARM: dts: imx6: change the core clock of spdif")
Cc: <[email protected]> # 4.4.x
Reported-by: Xavi Drudis Ferran <[email protected]>
Signed-off-by: Fabio Estevam <[email protected]>
Tested-by: Xavi Drudis Ferran <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>

Linux 4.8-rc5

af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

Right now we use the 'readlock' both for protecting some of the af_unix
IO path and for making the bind be single-threaded.

The two are independent, but using the same lock makes for a nasty
deadlock due to ordering with regards to filesystem locking. The bind
locking would want to nest outside the VSF pathname locking, but the IO
locking wants to nest inside some of those same locks.

We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
splice-bind deadlock") which moved the readlock inside the vfs locks,
but that caused problems with overlayfs that will then call back into
filesystem routines that take the lock in the wrong order anyway.

Splitting the locks means that we can go back to having the bind lock be
the outermost lock, and we don't have any deadlocks with lock ordering.

Acked-by: Rainer Weikusat <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "af_unix: Fix splice-bind deadlock"

This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1.

It turns out that it just replaces one deadlock with another one: we can
still get the wrong lock ordering with the readlock due to overlayfs
calling back into the filesystem layer and still taking the vfs locks
after the readlock.

The proper solution ends up being to just split the readlock into two
pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
(taken *inside* the filesystem locks). The two locks are independent
anyway.

Signed-off-by: Linus Torvalds <[email protected]>
Reviewed-by: Shmulik Ladkani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'vxlan-fixes'

Jiri Benc says:

====================
vxlan: fix error reporting

This patchset improves checking for invalid configuration in VXLAN and
fixes problems with duplicated and inappropriate error messages.
====================

Signed-off-by: David S. Miller <[email protected]>

vxlan: fix duplicated and wrong error messages

vxlan_dev_configure outputs error messages before returning, no need to
print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may
return a particular error code for a different reason than vxlan_newlink
thinks.

Move the remaining error messages into vxlan_dev_configure and let
vxlan_newlink just pass on the error code.

Signed-off-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: reject multicast destination without an interface

Currently, kernel accepts configurations such as:

ip l a type vxlan dstport 4789 id 1 group 239.192.0.1
ip l a type vxlan dstport 4789 id 1 group ff0e::110

However, neither of those really works. In the IPv4 case, the interface
cannot be brought up ("RTNETLINK answers: No such device"). This is because
multicast join will be rejected without the interface being specified.

In the IPv6 case, multicast wil be joined on the first interface found. This
is not what the user wants as it depends on random factors (order of
interfaces).

Note that it's possible to add a local address but it doesn't solve
anything. For IPv4, it's not considered in the multicast join (thus the same
error as above is returned on ifup). This could be added but it wouldn't
help for IPv6 anyway. For IPv6, we do need the interface.

Just reject a configuration that sets multicast address and does not provide
an interface. Nobody can depend on the previous behavior as it never worked.

Signed-off-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: Fix bonding crash

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs

If there are outstanding LAYOUTGET rpc calls, then we want to ensure that
we keep the layout stateid around so we that don't inadvertently pick up
an old/misordered sequence id.
The race is as follows:

Client Server
====== ======
LAYOUTGET(seqid)
LAYOUTGET(seqid)
return LAYOUTGET(seqid+1)
return LAYOUTGET(seqid+2)
process LAYOUTGET(seqid+2)
forget layout
process LAYOUTGET(seqid+1)

If it forgets the layout stateid before processing seqid+1, then
the client will not check the layout->plh_barrier, and so will set
the stateid with seqid+1.

Signed-off-by: Trond Myklebust <[email protected]>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
"A single fix for an AMD erratum so machines without a BIOS fix work"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/AMD: Apply erratum 665 on machines without a BIOS fix

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"Two fixlet from the timers departement:

   - A fix for scheduler stalls in the tick idle code affecting
     NOHZ_FULL kernels

   - A trivial compile fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz: Fix softlockup on scheduler stalls in kvm guest
  clocksource/drivers/atmel-pit: Fix compilation error

nvme-rdma: destroy nvme queue rdma resources on connect failure

After address resolution, the nvme_rdma_queue rdma resources are
allocated. If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed. Otherwise, rdma resources are leaked.

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

nvme_rdma: keep a ref on the ctrl during delete/flush

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

iw_cxgb4: block module unload until all ep resources are released

Otherwise an endpoint can be still closing down causing a touch
after free crash. Also WARN_ON if ulps have failed to destroy
various resources during device removal.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

iw_cxgb4: call dev_put() on l2t allocation failure

Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

Merge tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- a stable fix in both DM crypt and DM log-writes for too large bios
   (as generated by bcache)

- two other stable fixes for DM log-writes

- a stable fix for a DM crypt bug that could result in freeing pointers
   from uninitialized memory in the tfm allocation error path

- a DM bufio cleanup to discontinue using create_singlethread_workqueue()

* tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm bufio: remove use of deprecated create_singlethread_workqueue()
  dm crypt: fix free of bad values after tfm allocation failure
  dm crypt: fix error with too large bios
  dm log writes: fix check of kthread_run() return value
  dm log writes: fix bug with too large bios
  dm log writes: move IO accounting earlier to fix error path

Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"I'm still prepping a set of fixes for btrfs fsync, just nailing down a
  hard to trigger memory corruption.  For now, these are tested and ready."

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
  Btrfs: fix endless loop in balancing block groups
  Btrfs: kill invalid ASSERT() in process_all_refs()

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
"arm64 and arm/perf fixes:

   - arm64 fix: debug exception unmasking on the CPU resume path

   - ARM PMU fixes: memory leak on error path and NULL pointer
     dereference"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1
  drivers/perf: arm_pmu: Fix NULL pointer dereference during probe
  drivers/perf: arm_pmu: Fix leak in error path

Merge tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a number of small driver fixes for 4.8-rc5.

  The largest thing here is deleting an obsolete driver,
  drivers/misc/bh1780gli.c, as the functionality of it was replaced by
  an iio driver a while ago.

  The other fixes are things that have been reported, or reverts of
  broken stuff (the binder change).  All of these changes have been in
  linux-next for a while with no reported issues"

* tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  thunderbolt: Don't declare Falcon Ridge unsupported
  thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller.
  thunderbolt: Fix resume quirk for Falcon Ridge 4C.
  lkdtm: Mark lkdtm_rodata_do_nothing() notrace
  mei: me: disable driver on SPT SPS firmware
  Revert "android: binder: fix dangling pointer comparison"
  drivers/iio/light/Kconfig: SENSORS_BH1780 cleanup
  android: binder: fix dangling pointer comparison
  misc: delete bh1780 driver

Merge tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are three small fixes for 4.8-rc5.

  One for sysfs, one for kernfs, and one documentation fix, all for
  reported issues.  All of these have been in linux-next for a while"

* tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: correctly handle read offset on PREALLOC attrs
  documentation: drivers/core/of: fix name of of_node symlink
  kernfs: don't depend on d_find_any_alias() when generating notifications

Merge tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver fixes from Greg KH:
"Here are a number of small fixes for staging and IIO drivers that
  resolve reported problems.

  Full details are in the shortlog.  All of these have been in
  linux-next with no reported issues"

* tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (35 commits)
  arm: dts: rockchip: add reset node for the exist saradc SoCs
  arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
  iio: adc: rockchip_saradc: reset saradc controller before programming it
  iio: accel: kxsd9: Fix raw read return
  iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
  iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
  include/linux: fix excess fence.h kernel-doc notation
  staging: wilc1000: correctly check if associatedsta has not been found
  staging: wilc1000: NULL dereference on error
  staging: wilc1000: txq_event: Fix coding error
  MAINTAINERS: Add file patterns for ion device tree bindings
  MAINTAINERS: Update maintainer entry for wilc1000
  iio: chemical: atlas-ph-sensor: fix typo in val assignment
  iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
  staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
  staging: comedi: dt2811: fix a precedence bug
  staging: comedi: adv_pci1760: Do not return EINVAL for CMDF_ROUND_DOWN.
  staging: comedi: ni_mio_common: fix wrong insn_write handler
  staging: comedi: comedi_test: fix timer race conditions
  staging: comedi: daqboard2000: bug fix board type matching code
  ...

Merge tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for 4.8-rc5.  One fixes an
  oft-reported build issue with the fintek driver, another reverts a
  patch that was causing problems, one fixes a crash, and some new
  device ids were added.

  All of these have been in linux-next for a while"

* tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250: added acces i/o products quad and octal serial cards
  serial: 8250_mid: fix divide error bug if baud rate is 0
  Revert "tty/serial/8250: use mctrl_gpio helpers"
  8250/fintek: rename IRQ_MODE macro

Merge tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
"Here are some USB and PHY driver fixes for 4.8-rc5

  Nothing major, lots of little fixes for reported bugs, and a build fix
  for a missing .h file that the phy drivers needed.  All of these have
  been in linux-next for a while with no reported issues"

* tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
  usb: musb: Fix locking errors for host only mode
  usb: dwc3: gadget: always decrement by 1
  usb: dwc3: debug: fix ep name on trace output
  usb: gadget: udc: core: don't starve DMA resources
  USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
  USB: avoid left shift by -1
  USB: fix typo in wMaxPacketSize validation
  usb: gadget: Add the gserial port checking in gs_start_tx()
  usb: dwc3: gadget: don't rely on jiffies while holding spinlock
  usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
  usb: gadget: function: f_rndis: socket buffer may be NULL
  usb: gadget: function: f_eem: socket buffer may be NULL
  usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
  usb: dwc2: Add reset control to dwc2
  usb: dwc3: core: allow device to runtime_suspend several times
  usb: dwc3: pci: runtime_resume child device
  USB: serial: option: add WeTelecom WM-D200
  usb: chipidea: udc: don't touch DP when controller is in host mode
  USB: serial: mos7840: fix non-atomic allocation in write path
  USB: serial: mos7720: fix non-atomic allocation in write path
  ...

devpts: return NULL pts 'priv' entry for non-devpts nodes

In commit 8ead9dd54716 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly. And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.

And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.

So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.

Reported-by: Dmitry Vyukov <[email protected]>
Cc: [email protected] # 4.6+
Cc: Eric W Biederman" <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: fix mapping size check

pgoff_to_phys() validates that both the starting address and the length
of the mapping against the resource list. We need to check for a
mapping size of PMD_SIZE not PAGE_SIZE in the pmd fault path.

Signed-off-by: Dan Williams <[email protected]>

iio: accel: kxsd9: Fix scaling bug

All the scaling of the KXSD9 involves multiplication with a
fraction number < 1.

However the scaling value returned from IIO_INFO_SCALE was
unpredictable as only the micros of the value was assigned, and
not the integer part, resulting in scaling like this:

$cat in_accel_scale
-1057462640.011978

Fix this by assigning zero to the integer part.

Cc: [email protected]
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

iio: accel: bmc150: reset chip at init time

In at least one known setup, the chip comes up in a state where reading
the chip ID returns garbage unless it's been reset, due to noise on the
wires during system boot.

All supported chips have the same reset method, and based on the
datasheets they all need 1.3 or 1.8ms to recover after reset. So, do
the conservative thing here and always reset the chip.

Signed-off-by: Olof Johansson <[email protected]>
Reviewed-by: Srinivas Pandruvada <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present

If the server fails to set lrp->res.lrs_present in the LAYOUTRETURN reply,
then that means it believes the client holds no more layout state for that
file, and that the layout stateid is now invalid.

Signed-off-by: Trond Myklebust <[email protected]>

pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID

If the layout was marked as invalid, we want to ensure to initialise
the layout header fields correctly.

Signed-off-by: Trond Myklebust <[email protected]>

pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised

According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.

Client Server
====== ======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
LAYOUTGET return seqid=X+1
LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
Forget layout stateid
Process LAYOUTGET
Set seqid=X+1

The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.

When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...

Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).

Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected] # v4.5+
Signed-off-by: Trond Myklebust <[email protected]>

NFS: Fix error reporting in nfs_file_write()

When doing O_DSYNC writes, the actual write errors are reported through
generic_write_sync(), so we must test the result.

Reported-by: J. R. Okajima <[email protected]>
Fixes: 18290650b1c8 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Trond Myklebust <[email protected]>

iio: fix pressure data output unit in hid-sensor-attributes

According to IIO ABI definition, IIO_PRESSURE data output unit is
kilopascal:
http://lxr.free-electrons.com/source/Documentation/ABI/testing/sysfs-bus-iio

This patch fix output unit of HID pressure sensor IIO driver from pascal to
kilopascal to follow IIO ABI definition.

Signed-off-by: Kweh, Hock Leong <[email protected]>
Reviewed-by: Srinivas Pandruvada <[email protected]>
Cc: <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

sunrpc: fix UDP memory accounting

The commit f9b2ee714c5c ("SUNRPC: Move UDP receive data path
into a workqueue context"), as a side effect, moved the
skb_free_datagram() call outside the scope of the related socket
lock, but UDP sockets require such lock to be held for proper
memory accounting.
Fix it by replacing skb_free_datagram() with
skb_free_datagram_locked().

Fixes: f9b2ee714c5c ("SUNRPC: Move UDP receive data path into a workqueue context")
Reported-and-tested-by: Jan Stancek <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Cc: [email protected] # 4.4+
Signed-off-by: Trond Myklebust <[email protected]>

Merge remote-tracking branch 'regmap/fix/rbtree' into regmap-linus

Merge remote-tracking branch 'regmap/fix/cache' into regmap-linus

spi: Prevent unexpected SPI time out due to arithmetic overflow

When reading SPI flash as MTD device, the transfer length is
directly passed to the spi driver. If the requested data size
exceeds 512KB, it will cause the time out calculation to
overflow since transfer length is 32-bit unsigned integer.
This issue is resolved by using 64-bit unsigned integer
to perform the arithmetic.

Signed-off-by: Sien Wu <[email protected]>
Acked-by: Brad Keryan <[email protected]>
Acked-by: Gratian Crisan <[email protected]>
Acked-by: Brad Mouring <[email protected]>
Natinst-ReviewBoard-ID 150232
Signed-off-by: Mark Brown <[email protected]>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A collection of fixes for the nvme over fabrics code"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme-rdma: Get rid of redundant defines
  nvme-rdma: Get rid of duplicate variable
  nvme: fabrics drivers don't need the nvme-pci driver
  nvme-fabrics: get a reference when reusing a nvme_host structure
  nvme-fabrics: change NQN UUID to big-endian format
  nvme-loop: set sqsize to 0-based value, per spec
  nvme-rdma: fix sqsize/hsqsize per spec
  fabrics: define admin sqsize min default, per spec
  nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
  nvmet-rdma: Fix use after free
  nvme-rdma: initialize ret to zero to avoid returning garbage

Merge branch 'smsc911x-fixes'

Jeremy Linton says:

====================
net: smsc911x: Move phy and interrupt config

v2-v3: Move error handing into separate patch, replace a couple cases
of fixed errors with the errors being returned from the failing functions.
Hoist irq handler.

The smsc911x driver is doing a number of things in its probe routine that
should be delayed until the interface is started. Because of this, the module
cannot be unloaded, the phy states are incorrect/stale if the interface isn't
running, open's unnecessarily fail causing network configuration problems, and
the /proc/irq nodes are incorrectly named.

Clean up a number of these problems by moving the mdio and interrupt
configuration into the smsc911x_open routine.
====================

Signed-off-by: David S. Miller <[email protected]>

net: smsc911x: Move interrupt allocation to open/stop

The /proc/irq/xx information is incorrect for smsc911x because
the request_irq is happening before the register_netdev has the
proper device name. Moving it to the open also fixes the case
of when the device is renamed.

Reported-by: Will Deacon <[email protected]>
Signed-off-by: Jeremy Linton <[email protected]>
Tested-by: Will Deacon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: smsc911x: Move interrupt handler before open

In preparation for the allocating/enabling interrupts
in the ndo_open routine move the irq handler before it.

Signed-off-by: Jeremy Linton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: smsc911x: Fix register_netdev, phy startup, driver unload ordering

Move phy startup/shutdown into the smsc911x_open/stop routines. This
allows the module to be unloaded because phy_connect_direct is no longer
always holding the module use count. This one change also resolves a
number of other problems.

The link status of a downed interface no longer reflects a stale state.
Errors caused by the net device being opened before the mdio/phy was
configured. There is also a potential power savings as the phy's don't
remain powered when the interface isn't running.

Signed-off-by: Jeremy Linton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: smsc911x: Remove multiple exit points from smsc911x_open

Rework the error handling in smsc911x open in preparation
for the mdio startup being moved here.

Signed-off-by: Jeremy Linton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull TPM bugfix from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: invalid self test error message

tpm: invalid self test error message

The driver emits invalid self test error message even though the init
succeeds.

Signed-off-by: Jarkko Sakkinen <[email protected]>
Fixes: cae8b441fc20 ("tpm: Factor out common startup code")
Reviewed-by: James Morris <[email protected]>
Signed-off-by: James Morris <[email protected]>

Merge tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes ffrom Rafael Wysocki:
"Two stable-candidate fixes for the ACPI early device probing code
  added during the 4.4 cycle, one fixing a typo in a stub macro used
  when CONFIG_ACPI is unset and one that prevents sleeping functions
  from being called under a spinlock (Lorenzo Pieralisi)"

* tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / drivers: replace acpi_probe_lock spinlock with mutex
  ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro

Merge tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"This includes a stable-candidate cpufreq-dt driver problem fix and
  annotations of tracepoints in the runtime PM framework.

  Specifics:

   - Fix the definition of the cpufreq-dt driver's machines table
     introduced during the 4.7 cycle that should be NULL-terminated, but
     the termination entry is missing from it (Wei Yongjun).

   - Annotate tracepoints in the runtime PM framework's core so as to
     allow the functions containing them to be called from the idle code
     path without causing RCU to complain about illegal usage (Paul
     McKenney)"

* tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
  PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
  cpufreq: dt: Add terminate entry for of_device_id tables

Merge branches 'pm-cpufreq-fixes' and 'pm-core-fixes'

* pm-cpufreq-fixes:
  cpufreq: dt: Add terminate entry for of_device_id tables

* pm-core-fixes:
  PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
  PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle

ACPI / drivers: replace acpi_probe_lock spinlock with mutex

Commit e647b532275b ("ACPI: Add early device probing infrastructure")
introduced code that allows inserting driver specific
struct acpi_probe_entry probe entries into ACPI linker sections
(one per-subsystem, eg irqchip, clocksource) that are then walked
to retrieve the data and function hooks required to probe the
respective kernel components.

Probing for all entries in a section is triggered through
the __acpi_probe_device_table() function, that in turn, according
to the table ID a given probe entry reports parses the table
with the function retrieved from the respective section structures
(ie struct acpi_probe_entry). Owing to the current ACPI table
parsing implementation, the __acpi_probe_device_table() function
has to share global variables with the acpi_match_madt() function, so
in order to guarantee mutual exclusion locking is required
between the two functions.

Current kernel code implements the locking through the acpi_probe_lock
spinlock; this has the side effect of requiring all code called
within the lock (ie struct acpi_probe_entry.probe_{table/subtbl} hooks)
not to sleep.

However, kernel subsystems that make use of the early probing
infrastructure are relying on kernel APIs that may sleep (eg
irq_domain_alloc_fwnode(), among others) in the function calls
pointed at by struct acpi_probe_entry.{probe_table/subtbl} entries
(eg gic_v2_acpi_init()), which is a bug.

Since __acpi_probe_device_table() is called from context
that is allowed to sleep the acpi_probe_lock spinlock can be replaced
with a mutex; this fixes the issue whilst still guaranteeing
mutual exclusion.

Signed-off-by: Lorenzo Pieralisi <[email protected]>
Fixes: e647b532275b (ACPI: Add early device probing infrastructure)
Cc: 4.4+ <[email protected]> # 4.4+
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro

When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit e647b532275b ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.

The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.

Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.

Signed-off-by: Lorenzo Pieralisi <[email protected]>
Fixes: e647b532275b (ACPI: Add early device probing infrastructure)
Cc: 4.4+ <[email protected]> # 4.4+
Signed-off-by: Rafael J. Wysocki <[email protected]>

l2tp: fix use-after-free during module unload

Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
-> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
l2tp_tunnel_destruct).

By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
destroying the socket, the private data reserved via the net_generic
mechanism has already been freed, but l2tp_tunnel_destruct() actually
uses this data.

Make sure tunnel deletion for the netns has completed before returning
from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
then waiting for RCU callbacks to complete.

Fixes: 167eb17e0b17 ("l2tp: create tunnel sockets in the right namespace")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/AMD: Apply erratum 665 on machines without a BIOS fix

AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Yaowu Xu <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

IB/hfi1: Rework debugfs to use SRCU

The debugfs RCU trips many debug kernel warnings because of potential
sleeps with an RCU read lock held. This includes both user copy calls
and slab allocations throughout the file.

This patch switches the RCU to use SRCU for file remove/access
race protection.

In one case, the SRCU is implicit in the use of the raw debugfs file
object and just works.

In the seq_file case, a wrapper around seq_read() and seq_lseek() is
used to enforce the SRCU using the debugfs supplied functions
debugfs_use_file_start() and debugfs_use_file_stop().

The sychronize_rcu() is deleted since the SRCU prevents the remove
access race.

The RCU locking is kept for qp_stats since the QP hash list is
protected using the non-sleepable RCU.

Reviewed-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: Make n_krcvqs be an unsigned long integer

The global variable n_krcvqs stores the sum of the number of kernel
receive queues of VLs 0-7 which the user can pass to the driver through
the module parameter array krcvqs which is of type unsigned integer. If
the user passes large value(s) into krcvqs parameter array, it can cause
an arithmetic overflow while calculating n_krcvqs which is also of type
unsigned int. The overflow results in an incorrect value of n_krcvqs
which can lead to kernel crash while loading the driver.

Fix by changing the data type of n_krcvqs to unsigned long. This patch
also changes the data type of other variables that get their values from
n_krcvqs.

Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Harish Chegondi <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: Add QSFP sanity pre-check

Sometimes a QSFP device does not respond in the expected time
after a power-on. Add a read pre-check/retry when starting
the link on driver load.

Reviewed-by: Easwar Hariharan <[email protected]>
Signed-off-by: Dean Luick <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: Fix AHG KDETH Intr shift

In the set_txreq_header_ahg(), The KDETH Intr bit is obtained from the
header in the user sdma request using a KDETH_GET shift and mask macro.
This value is then futher right shifted by 16 causing us to lose the
value i.e it is shifted to zero, leading to the following
smatch warning:
drivers/infiniband/hw/hfi1/user_sdma.c:1482 set_txreq_header_ahg()
warn: mask and shift to zero

The Intr bit should be left shifted into its correct position in the
KDETH header before the AHG update.

Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Harish Chegondi <[email protected]>
Signed-off-by: Jubin John <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: Fix SGE length for misaligned PIO copy

When trying to align the source pointer and there's a byte carry
in an SGE copy, bytes are borrowed from the next quad-word X to
complete the required quad-word copy. Then, the SGE length is
reduced by the number of borrowed bytes. After this, if the
remaining number of bytes from quad-word X (extra bytes) is
greater than the new SGE length, the number of extra bytes needs
to be updated to the new SGE length. Otherwise, when the
SGE length gets updated again after the extra bytes are read to
create the new byte carry, it goes negative, which then becomes
a very large number as the SGE length is an unsigned integer.
This causes SGE buffer to be over-read.

Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Don't return errors from poll_cq

Remove returning errors from mlx5 poll_cq function. Polling CQ
operation in kernel never fails by Mellanox HCA architecture and
respective driver design.

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Use TIR number based on selector

Use TIR number based on selector, it should be done to differentiate
between RSS QP to RAW one.

Reported-by: Sagi Grimberg <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Tested-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Simplify code by removing return variable

Return variable was set in a line before the
actual return was called in begin_wqe function.

This patch removes such variable and simplifies the code.

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Return EINVAL when caller specifies too many SGEs

The returned value should be EINVAL, because it is caused by wrong
caller and not by internal overflow event.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx4: Don't return errors from poll_cq

Remove returning errors from mlx4 poll_cq function. Polling CQ
operation in kernel never fails by Mellanox HCA architecture and
respective driver design.

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>