Linus Torvalds [Sat, 17 Feb 2018 18:20:47 +0000 (10:20 -0800)]
Merge tag 'for-linus-20180217' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith, with fixes all over the map for nvme.
From various folks.
- Classic polling fix, that avoids a latency issue where we still end
up waiting for an interrupt in some cases. From Nitesh Shetty.
- Comment typo fix from Minwoo Im.
* tag 'for-linus-20180217' of git://git.kernel.dk/linux-block:
block: fix a typo in comment of BLK_MQ_POLL_STATS_BKTS
nvme-rdma: fix sysfs invoked reset_ctrl error flow
nvmet: Change return code of discard command if not supported
nvme-pci: Fix timeouts in connecting state
nvme-pci: Remap CMB SQ entries on every controller reset
nvme: fix the deadlock in nvme_update_formats
blk: optimization for classic polling
nvme: Don't use a stack buffer for keep-alive command
nvme_fc: cleanup io completion
nvme_fc: correct abort race condition on resets
nvme: Fix discard buffer overrun
nvme: delete NVME_CTRL_LIVE --> NVME_CTRL_CONNECTING transition
nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process
nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING
Linus Torvalds [Sat, 17 Feb 2018 18:08:28 +0000 (10:08 -0800)]
Merge tag 'mmc-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- meson-gx: Revert to earlier tuning process
- bcm2835: Don't overwrite max frequency unconditionally
* tag 'mmc-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: bcm2835: Don't overwrite max frequency unconditionally
Revert "mmc: meson-gx: include tx phase in the tuning process"
Linus Torvalds [Sat, 17 Feb 2018 18:06:13 +0000 (10:06 -0800)]
Merge tag 'mtd/fixes-for-4.16-rc2' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
- add missing dependency to NAND_MARVELL Kconfig entry
- use the appropriate OOB layout in the VF610 driver
* tag 'mtd/fixes-for-4.16-rc2' of git://git.infradead.org/linux-mtd:
mtd: nand: MTD_NAND_MARVELL should depend on HAS_DMA
mtd: nand: vf610: set correct ooblayout
Linus Torvalds [Sat, 17 Feb 2018 17:48:26 +0000 (09:48 -0800)]
Merge tag 'powerpc-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The main attraction is a fix for a bug in the new drmem code, which
was causing an oops on boot on some versions of Qemu.
There's also a fix for XIVE (Power9 interrupt controller) on KVM, as
well as a few other minor fixes.
Thanks to: Corentin Labbe, Cyril Bur, Cédric Le Goater, Daniel Black,
Nathan Fontenot, Nicholas Piggin"
* tag 'powerpc-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Check for zero filled ibm,dynamic-memory property
powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n
powerpc/powernv: IMC fix out of bounds memory access at shutdown
powerpc/xive: Use hw CPU ids when configuring the CPU queues
powerpc: Expose TSCR via sysfs only on powernv
Linus Torvalds [Sat, 17 Feb 2018 17:46:18 +0000 (09:46 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"The bulk of this is the pte accessors annotation to READ/WRITE_ONCE
(we tried to avoid pushing this during the merge window to avoid
conflicts)
- Updated the page table accessors to use READ/WRITE_ONCE and prevent
compiler transformation that could lead to an apparent loss of
coherency
- Enabled branch predictor hardening for the Falkor CPU
- Fix interaction between kpti enabling and KASan causing the
recursive page table walking to take a significant time
- Fix some sparse warnings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cputype: Silence Sparse warnings
arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
arm64: proc: Set PTE_NG for table entries to avoid traversing them twice
arm64: Add missing Falkor part number for branch predictor hardening
Linus Torvalds [Sat, 17 Feb 2018 17:16:09 +0000 (09:16 -0800)]
Merge tag 'for-linus-4.16a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- fixes for the Xen pvcalls frontend driver
- fix for booting Xen pv domains
- fix for the xenbus driver user interface
* tag 'for-linus-4.16a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
pvcalls-front: wait for other operations to return when release passive sockets
pvcalls-front: introduce a per sock_mapping refcount
x86/xen: Calculate __max_logical_packages on PV domains
xenbus: track caller request id
pvcalls-front: introduce a per sock_mapping refcount
Introduce a per sock_mapping refcount, in addition to the existing
global refcount. Thanks to the sock_mapping refcount, we can safely wait
for it to be 1 in pvcalls_front_release before freeing an active socket,
instead of waiting for the global refcount to be 1.
Joao Martins [Fri, 2 Feb 2018 17:42:33 +0000 (17:42 +0000)]
xenbus: track caller request id
Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent
xenstore accesses") optimized xenbus concurrent accesses but in doing so
broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
charge of xenbus message exchange with the correct header and body. Now,
after the mentioned commit the replies received by application will no
longer have the header req_id echoed back as it was on request (see
specification below for reference), because that particular field is being
overwritten by kernel.
struct xsd_sockmsg
{
uint32_t type; /* XS_??? */
uint32_t req_id;/* Request identifier, echoed in daemon's response. */
uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
uint32_t len; /* Length of data following this. */
/* Generally followed by nul-terminated string(s). */
};
Before there was only one request at a time so req_id could simply be
forwarded back and forth. To allow simultaneous requests we need a
different req_id for each message thus kernel keeps a monotonic increasing
counter for this field and is written on every request irrespective of
userspace value.
Forwarding again the req_id on userspace requests is not a solution because
we would open the possibility of userspace-generated req_id colliding with
kernel ones. So this patch instead takes another route which is to
artificially keep user req_id while keeping the xenbus logic as is. We do
that by saving the original req_id before xs_send(), use the private kernel
counter as req_id and then once reply comes and was validated, we restore
back the original req_id.
Robin Murphy [Fri, 16 Feb 2018 17:04:23 +0000 (17:04 +0000)]
arm64: cputype: Silence Sparse warnings
Sparse makes a fair bit of noise about our MPIDR mask being implicitly
long - let's explicitly describe it as such rather than just relying on
the value forcing automatic promotion.
Linus Torvalds [Fri, 16 Feb 2018 20:22:33 +0000 (12:22 -0800)]
Merge tag 'dma-mapping-4.16-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"A few dma-mapping fixes for the fallout from the changes in rc1"
* tag 'dma-mapping-4.16-2' of git://git.infradead.org/users/hch/dma-mapping:
powerpc/macio: set a proper dma_coherent_mask
dma-mapping: fix a comment typo
dma-direct: comment the dma_direct_free calling convention
dma-direct: mark as is_phys
ia64: fix build failure with CONFIG_SWIOTLB
Will Deacon [Thu, 15 Feb 2018 11:14:56 +0000 (11:14 +0000)]
arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.
Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).
Linus Torvalds [Fri, 16 Feb 2018 17:26:18 +0000 (09:26 -0800)]
Merge tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We have a few assorted fixes, some of them show up during fstests so I
gave them more testing"
* tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
Btrfs: fix null pointer dereference when replacing missing device
btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
Btrfs: fix unexpected -EEXIST when creating new inode
Btrfs: fix use-after-free on root->orphan_block_rsv
Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
Btrfs: fix extent state leak from tree log
Btrfs: fix crash due to not cleaning up tree log block's dirty bits
Btrfs: fix deadlock in run_delalloc_nocow
Linus Torvalds [Fri, 16 Feb 2018 17:23:36 +0000 (09:23 -0800)]
Merge tag 'for-4.16/dm-chained-bios-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mike Snitzer:
"Fix for DM core to properly propagate errors (avoids overriding
non-zero error with 0). This is particularly important given DM core's
increased use of chained bios"
* tag 'for-4.16/dm-chained-bios-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: correctly handle chained bios in dec_pending()
Linus Torvalds [Fri, 16 Feb 2018 17:20:00 +0000 (09:20 -0800)]
Merge tag 'platform-drivers-x86-v4.16-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
- regression fix in keyboard support for Dell laptops
- prevent out-of-boundary write in WMI bus driver
- increase timeout to read functional key status on Lenovo laptops
* tag 'platform-drivers-x86-v4.16-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-laptop: Removed duplicates in DMI whitelist
platform/x86: dell-laptop: fix kbd_get_state's request value
platform/x86: ideapad-laptop: Increase timeout to wait for EC answer
platform/x86: wmi: fix off-by-one write in wmi_dev_probe()
Linus Torvalds [Fri, 16 Feb 2018 17:11:30 +0000 (09:11 -0800)]
Merge tag 'sound-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of usual suspects:
- a handful USB-audio and HD-audio device-specific quirks
- some trivial fixes for the new AC97 bus stuff
- another race fix in ALSA sequencer core"
* tag 'sound-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: PCI quirk for Fujitsu U7x7
ALSA: seq: Fix racy pool initializations
ALSA: usb: add more device quirks for USB DSD devices
ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute
ALSA: ac97: Fix copy and paste typo in documentation
ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
ALSA: ac97: kconfig: Remove select of undefined symbol AC97
ALSA: hda/realtek - Enable Thinkpad Dock device for ALC298 platform
ALSA: hda/realtek - Add headset mode support for Dell laptop
ALSA: hda - Fix headset mic detection problem for two Dell machines
Linus Torvalds [Fri, 16 Feb 2018 17:08:59 +0000 (09:08 -0800)]
Merge tag 'drm-fixes-for-v4.16-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau regression fix, one AMD quirk and a full set of i915
fixes.
The i915 fixes are mostly for things caught by their CI system, main
ones being DSI panel fixes and GEM fixes"
* tag 'drm-fixes-for-v4.16-rc2' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: Make clock gate support conditional
drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3
drm/i915: Free memdup-ed DSI VBT data structures on driver_unload
drm/i915: Add intel_bios_cleanup() function
drm/i915/vlv: Add cdclk workaround for DSI
drm/i915/gvt: fix one typo of render_mmio trace
drm/i915/gvt: Support BAR0 8-byte reads/writes
drm/i915/gvt: add 0xe4f0 into gen9 render list
drm/i915/pmu: Fix building without CONFIG_PM
drm/i915/pmu: Fix sleep under atomic in RC6 readout
drm/i915/pmu: Fix PMU enable vs execlists tasklet race
drm/i915: Lock out execlist tasklet while peeking inside for busy-stats
drm/i915/breadcrumbs: Ignore unsubmitted signalers
drm/i915: Don't wake the device up to check if the engine is asleep
drm/i915: Avoid truncation before clamping userspace's priority value
drm/i915/perf: Fix compiler warning for string truncation
drm/i915/perf: Fix compiler warning for string truncation
drm/amdgpu: add new device to use atpx quirk
NeilBrown [Thu, 15 Feb 2018 09:00:15 +0000 (20:00 +1100)]
dm: correctly handle chained bios in dec_pending()
dec_pending() is given an error status (possibly 0) to be recorded
against a bio. It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status. However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.
This can happen when chained bios are in use. If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.
This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.
A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.
The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.
Nathan Fontenot [Fri, 16 Feb 2018 03:27:41 +0000 (21:27 -0600)]
powerpc/pseries: Check for zero filled ibm,dynamic-memory property
Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory
node with a ibm,dynamic-memory property that is zero-filled. This
causes the drmem code to oops trying to parse this property.
The fix for this is to validate that the property does contain LMB
entries before trying to parse it and bail if the count is zero.
Oops: Kernel access of bad area, sig: 11 [#1]
DAR: 0000000000000010
NIP read_drconf_v1_cell+0x54/0x9c
LR read_drconf_v1_cell+0x48/0x9c
Call Trace:
__param_initcall_debug+0x0/0x28 (unreliable)
drmem_init+0x144/0x2f8
do_one_initcall+0x64/0x1d0
kernel_init_freeable+0x298/0x38c
kernel_init+0x24/0x160
ret_from_kernel_thread+0x5c/0xb4
The ibm,dynamic-reconfiguration-memory device tree property generated
that causes this:
Michael Kelley [Wed, 14 Feb 2018 02:54:03 +0000 (02:54 +0000)]
cpumask: Make for_each_cpu_wrap() available on UP as well
for_each_cpu_wrap() was originally added in the #else half of a
large "#if NR_CPUS == 1" statement, but was omitted in the #if
half. This patch adds the missing #if half to prevent compile
errors when NR_CPUS is 1.
Thierry Reding [Wed, 7 Feb 2018 17:40:27 +0000 (18:40 +0100)]
drm/nouveau: Make clock gate support conditional
The recently introduced clock gate support breaks on Tegra chips because
no thermal support is enabled for those devices. Conditionalize the code
on the existence of thermal support to fix this.
Dave Airlie [Fri, 16 Feb 2018 02:33:03 +0000 (12:33 +1000)]
Merge tag 'drm-intel-fixes-2018-02-14-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
There are important fixes for VLV with MIPI/DSI panels,
2 clean-up patches needed for this MIPI/DSI fix,
and many fixes for GEM including fixes for Perf OA and PMU,
and fixes on scheduler and preemption.
This also includes GVT fixes: "This has one to fix GTT mmio 8b
access from guest and two simple ones for mmio switch and typo fix"
* tag 'drm-intel-fixes-2018-02-14-1' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3
drm/i915: Free memdup-ed DSI VBT data structures on driver_unload
drm/i915: Add intel_bios_cleanup() function
drm/i915/vlv: Add cdclk workaround for DSI
drm/i915/gvt: fix one typo of render_mmio trace
drm/i915/gvt: Support BAR0 8-byte reads/writes
drm/i915/gvt: add 0xe4f0 into gen9 render list
drm/i915/pmu: Fix building without CONFIG_PM
drm/i915/pmu: Fix sleep under atomic in RC6 readout
drm/i915/pmu: Fix PMU enable vs execlists tasklet race
drm/i915: Lock out execlist tasklet while peeking inside for busy-stats
drm/i915/breadcrumbs: Ignore unsubmitted signalers
drm/i915: Don't wake the device up to check if the engine is asleep
drm/i915: Avoid truncation before clamping userspace's priority value
drm/i915/perf: Fix compiler warning for string truncation
drm/i915/perf: Fix compiler warning for string truncation
Linus Torvalds [Thu, 15 Feb 2018 22:50:32 +0000 (14:50 -0800)]
Merge tag 'acpi-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a system resume regression from the 4.13 cycle, clean up
device table handling in the ACPI core, update sysfs ABI documentation
of a couple of drivers and add an expected switch fall-through marker
to the SPCR table parsing code.
Specifics:
- Revert a problematic EC driver change from the 4.13 cycle that
introduced a system resume regression on Thinkpad X240 (Rafael
Wysocki).
- Clean up device tables handling in the ACPI core and the related
part of the device properties framework (Andy Shevchenko).
- Update the sysfs ABI documentatio of the dock and the INT3407
special device drivers (Aishwarya Pant).
- Add an expected switch fall-through marker to the SPCR table
parsing code (Gustavo Silva)"
* tag 'acpi-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: dock: document sysfs interface
ACPI / DPTF: Document dptf_power sysfs atttributes
device property: Constify device_get_match_data()
ACPI / bus: Rename acpi_get_match_data() to acpi_device_get_match_data()
ACPI / bus: Remove checks in acpi_get_match_data()
ACPI / bus: Do not traverse through non-existed device table
ACPI: SPCR: Mark expected switch fall-through in acpi_parse_spcr
ACPI / EC: Restore polling during noirq suspend/resume phases
Linus Torvalds [Thu, 15 Feb 2018 22:40:01 +0000 (14:40 -0800)]
Merge tag 'pm-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a recently introduced build issue related to cpuidle and two
bugs in the PM core, update cpuidle documentation and clean up memory
allocations in the operating performance points (OPP) framework.
Specifics:
- Fix a recently introduced build issue related to cpuidle by
covering all of the relevant combinations of Kconfig options
in its header (Rafael Wysocki).
- Add missing invocation of pm_runtime_drop_link() to the
!CONFIG_SRCU variant of __device_link_del() (Lukas Wunner).
- Fix unbalanced IRQ enable in the wakeup interrupts framework
(Tony Lindgren).
- Update cpuidle sysfs ABI documentation (Aishwarya Pant).
- Use GFP_KERNEL instead of GFP_ATOMIC for allocating memory
in dev_pm_opp_init_cpufreq_table() (Jia-Ju Bai)"
* tag 'pm-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: cpuidle: Fix cpuidle_poll_state_init() prototype
PM / runtime: Update links_count also if !CONFIG_SRCU
PM / wakeirq: Fix unbalanced IRQ enable for wakeirq
Documentation/ABI: update cpuidle sysfs documentation
opp: cpu: Replace GFP_ATOMIC with GFP_KERNEL in dev_pm_opp_init_cpufreq_table
Linus Torvalds [Thu, 15 Feb 2018 22:31:28 +0000 (14:31 -0800)]
Merge tag 'hwmon-for-linus-v4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix bad temperature display on Ryzen/Threadripper"
* tag 'hwmon-for-linus-v4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (k10temp) Only apply temperature offset if result is positive
Linus Torvalds [Thu, 15 Feb 2018 22:29:27 +0000 (14:29 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"This includes a bugfix for virtio 9p fs. It also fixes hybernation for
s390 guests with virtio devices"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: implement PM operations for virtio_ccw
9p/trans_virtio: discard zero-length reply
Now that USB_UHCI_BIG_ENDIAN_MMIO and USB_UHCI_BIG_ENDIAN_DESC are moved
outside of the USB_SUPPORT conditional, simply select them from
SPARC_LEON rather than by the symbol's defaults in drivers/usb/Kconfig,
similar to how it is done for USB_EHCI_BIG_ENDIAN_MMIO and
USB_EHCI_BIG_ENDIAN_DESC.
James Hogan [Wed, 31 Jan 2018 22:24:45 +0000 (22:24 +0000)]
usb: Move USB_UHCI_BIG_ENDIAN_* out of USB_SUPPORT
Move the Kconfig symbols USB_UHCI_BIG_ENDIAN_MMIO and
USB_UHCI_BIG_ENDIAN_DESC out of drivers/usb/host/Kconfig, which is
conditional upon USB && USB_SUPPORT, so that it can be freely selected
by platform Kconfig symbols in architecture code.
For example once the MIPS_GENERIC platform selects are fixed in commit 2e6522c56552 ("MIPS: Fix typo BIG_ENDIAN to CPU_BIG_ENDIAN"), the MIPS
32r6_defconfig warns like so:
warning: (MIPS_GENERIC) selects USB_UHCI_BIG_ENDIAN_MMIO which has unmet direct dependencies (USB_SUPPORT && USB)
warning: (MIPS_GENERIC) selects USB_UHCI_BIG_ENDIAN_DESC which has unmet direct dependencies (USB_SUPPORT && USB)
Linus Torvalds [Thu, 15 Feb 2018 17:28:47 +0000 (09:28 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes:
- fix rq->lock lockdep annotation bug
- fix/improve update_curr_rt() and update_curr_dl() accounting
- update documentation
- remove unused macro"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cpufreq: Remove unused SUGOV_KTHREAD_PRIORITY macro
sched/core: Fix DEBUG_SPINLOCK annotation for rq->lock
sched/rt: Make update_curr_rt() more accurate
sched/deadline: Make update_curr_dl() more accurate
membarrier-sync-core: Document architecture support
Linus Torvalds [Thu, 15 Feb 2018 17:05:26 +0000 (09:05 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"This contains two qspinlock fixes and three documentation and comment
fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/semaphore: Update the file path in documentation
locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()
locking/qspinlock: Ensure node->count is updated before initialising node
locking/qspinlock: Ensure node is initialised before updating prev->next
Documentation/locking/mutex-design: Update to reflect latest changes
platform/x86: dell-laptop: Removed duplicates in DMI whitelist
Fixed a mistake in which several entries were duplicated in the DMI list
from the below commit fe486138 platform/x86: dell-laptop: Add 2-in-1 devices to the DMI whitelist
Laszlo Toth [Tue, 13 Feb 2018 20:43:43 +0000 (21:43 +0100)]
platform/x86: dell-laptop: fix kbd_get_state's request value
Commit 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap
rather than globally")
broke one request, changed it back to the original value.
Tested on a Dell E6540, backlight came back.
Fixes: 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap rather than globally") Signed-off-by: Laszlo Toth <[email protected]> Reviewed-by: Pali Rohár <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
Aaron Ma [Sun, 11 Feb 2018 09:18:49 +0000 (17:18 +0800)]
platform/x86: ideapad-laptop: Increase timeout to wait for EC answer
Lenovo E41-20 needs more time than 100ms to read VPC,
the funtion keys always failed responding.
Increase timeout to get the value from VPC, then
the funtion keys like mic mute key work well.
Jens Axboe [Thu, 15 Feb 2018 02:01:53 +0000 (19:01 -0700)]
Merge branch 'nvme-4.16-rc' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Keith:
"After syncing with Christoph and Sagi, we feel this is a good time to
send our latest fixes across most of the nvme components for 4.16"
* 'nvme-4.16-rc' of git://git.infradead.org/nvme:
nvme-rdma: fix sysfs invoked reset_ctrl error flow
nvmet: Change return code of discard command if not supported
nvme-pci: Fix timeouts in connecting state
nvme-pci: Remap CMB SQ entries on every controller reset
nvme: fix the deadlock in nvme_update_formats
nvme: Don't use a stack buffer for keep-alive command
nvme_fc: cleanup io completion
nvme_fc: correct abort race condition on resets
nvme: Fix discard buffer overrun
nvme: delete NVME_CTRL_LIVE --> NVME_CTRL_CONNECTING transition
nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process
nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
Ingo Molnar [Wed, 14 Feb 2018 07:39:11 +0000 (08:39 +0100)]
x86/entry/64: Fix CR3 restore in paranoid_exit()
Josh Poimboeuf noticed the following bug:
"The paranoid exit code only restores the saved CR3 when it switches back
to the user GS. However, even in the kernel GS case, it's possible that
it needs to restore a user CR3, if for example, the paranoid exception
occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and
SWAPGS."
Josh also confirmed via targeted testing that it's possible to hit this bug.
Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch.
The reason we haven't seen this bug reported by users yet is probably because
"paranoid" entry points are limited to the following cases:
Amongst those entry points only machine_check is one that will interrupt an
IRQS-off critical section asynchronously - and machine check events are rare.
The other main asynchronous entries are NMI entries, which can be very high-freq
with perf profiling, but they are special: they don't use the 'idtentry' macro but
are open coded and restore user CR3 unconditionally so don't have this bug.
x86/cpu: Change type of x86_cache_size variable to unsigned int
Currently, x86_cache_size is of type int, which makes no sense as we
will never have a valid cache size equal or less than 0. So instead of
initializing this variable to -1, it can perfectly be initialized to 0
and use it as an unsigned variable instead.
Andy Lutomirski [Wed, 31 Jan 2018 16:03:10 +0000 (08:03 -0800)]
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
flush_tlb_single() and flush_tlb_one() sound almost identical, but
they really mean "flush one user translation" and "flush one kernel
translation". Rename them to flush_tlb_one_user() and
flush_tlb_one_kernel() to make the semantics more obvious.
[ I was looking at some PTI-related code, and the flush-one-address code
is unnecessarily hard to understand because the names of the helpers are
uninformative. This came up during PTI review, but no one got around to
doing it. ]
Peter Zijlstra [Tue, 13 Feb 2018 13:28:19 +0000 (14:28 +0100)]
x86/speculation: Add <asm/msr-index.h> dependency
Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.
Will Deacon [Mon, 5 Feb 2018 14:16:06 +0000 (14:16 +0000)]
nospec: Move array_index_nospec() parameter checking into separate macro
For architectures providing their own implementation of
array_index_mask_nospec() in asm/barrier.h, attempting to use WARN_ONCE() to
complain about out-of-range parameters using WARN_ON() results in a mess
of mutually-dependent include files.
Rather than unpick the dependencies, simply have the core code in nospec.h
perform the checking for us.
Peter Zijlstra [Fri, 9 Feb 2018 12:16:59 +0000 (13:16 +0100)]
x86/debug: Use UD2 for WARN()
Since the Intel SDM added an ModR/M byte to UD0 and binutils followed
that specification, we now cannot disassemble our kernel anymore.
This now means Intel and AMD disagree on the encoding of UD0. And instead
of playing games with additional bytes that are valid ModR/M and single
byte instructions (0xd6 for instance), simply use UD2 for both WARN() and
BUG().
Josh Poimboeuf [Thu, 8 Feb 2018 23:09:26 +0000 (17:09 -0600)]
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
By default, objtool assumes that a UD2 is a dead end. This is mainly
because GCC 7+ sometimes inserts a UD2 when it detects a divide-by-zero
condition.
Now that WARN() is moving back to UD2, annotate the code after it as
reachable so objtool can follow the code flow.
Josh Poimboeuf [Thu, 8 Feb 2018 23:09:25 +0000 (17:09 -0600)]
objtool: Fix segfault in ignore_unreachable_insn()
Peter Zijlstra's patch for converting WARN() to use UD2 triggered a
bunch of false "unreachable instruction" warnings, which then triggered
a seg fault in ignore_unreachable_insn().
The seg fault happened when it tried to dereference a NULL 'insn->func'
pointer. Thanks to static_cpu_has(), some functions can jump to a
non-function area in the .altinstr_aux section. That breaks
ignore_unreachable_insn()'s assumption that it's always inside the
original function.
Make sure ignore_unreachable_insn() only follows jumps within the
current function.
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
The ldt_gdt and ptrace_syscall selftests, even in their 64-bit variant, use
hard-coded 32-bit syscall numbers and call "int $0x80".
This will fail on 64-bit systems with CONFIG_IA32_EMULATION=y disabled.
Therefore, do not build these tests if we cannot build 32-bit binaries
(which should be a good approximation for CONFIG_IA32_EMULATION=y being enabled).
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
On 64-bit builds, we should not rely on "int $0x80" working (it only does if
CONFIG_IA32_EMULATION=y is enabled). To keep the "Set TF and check int80"
test running on 64-bit installs with CONFIG_IA32_EMULATION=y enabled, build
this test only if we can also build 32-bit binaries (which should be a
good approximation for that).
Nicholas Piggin [Tue, 13 Feb 2018 07:45:11 +0000 (17:45 +1000)]
powerpc/powernv: IMC fix out of bounds memory access at shutdown
The OPAL IMC driver's shutdown handler disables nest PMU counters by
walking nodes and taking the first CPU out of their cpumask, which is
used to index into the paca (get_hard_smp_processor_id()). This does
not always do the right thing, and in particular for CPU-less nodes it
returns NR_CPUS and that overruns the paca and dereferences random
memory.
Fix it by being more careful about checking returned CPU, and only
using online CPUs. It's not clear this shutdown code makes sense after
commit 885dcd709b ("powerpc/perf: Add nest IMC PMU support"), but this
should not make things worse
Currently the bug causes us to call OPAL with a junk CPU number. A
separate patch in development to change the way pacas are allocated
escalates this bug into a crash:
Unable to handle kernel paging request for data at address 0x2a21af1eeb000076
Faulting instruction address: 0xc0000000000a5468
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP opal_imc_counters_shutdown+0x148/0x1d0
LR opal_imc_counters_shutdown+0x134/0x1d0
Call Trace:
opal_imc_counters_shutdown+0x134/0x1d0 (unreliable)
platform_drv_shutdown+0x44/0x60
device_shutdown+0x1f8/0x350
kernel_restart_prepare+0x54/0x70
kernel_restart+0x28/0xc0
SyS_reboot+0x1d0/0x2c0
system_call+0x58/0x6c
Cédric Le Goater [Tue, 13 Feb 2018 08:47:12 +0000 (09:47 +0100)]
powerpc/xive: Use hw CPU ids when configuring the CPU queues
The CPU event notification queues on sPAPR should be configured using
a hardware CPU identifier.
The problem did not show up on the Power Hypervisor because pHyp
supports 8 threads per core which keeps CPU number contiguous. This is
not the case on all sPAPR virtual machines, some use SMT=1.
Also improve error logging by adding the CPU number.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: [email protected] # v4.14+ Signed-off-by: Cédric Le Goater <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Hans de Goede [Wed, 14 Feb 2018 08:21:51 +0000 (09:21 +0100)]
drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3
So far models of the Dell Venue 8 Pro, with a panel with MIPI panel
index = 3, one of which has been kindly provided to me by Jan Brummer,
where not working with the i915 driver, giving a black screen on the
first modeset.
The problem with at least these Dells is that their VBT defines a MIPI
ASSERT sequence, but not a DEASSERT sequence. Instead they DEASSERT the
reset in their INIT_OTP sequence, but the deassert must be done before
calling intel_dsi_device_ready(), so that is too late.
Simply doing the INIT_OTP sequence earlier is not enough to fix this,
because the INIT_OTP sequence also sends various MIPI packets to the
panel, which can only happen after calling intel_dsi_device_ready().
This commit fixes this by splitting the INIT_OTP sequence into everything
before the first DSI packet and everything else, including the first DSI
packet. The first part (everything before the first DSI packet) is then
used as deassert sequence.
Changed in v2:
-Split the init OTP sequence into a deassert reset and the actual init
OTP sequence, instead of calling it earlier and then having the first
mipi_exec_send_packet() call call intel_dsi_device_ready().
Changes in v3:
-Move the whole shebang to intel_bios.c
Hans de Goede [Wed, 14 Feb 2018 08:21:49 +0000 (09:21 +0100)]
drm/i915: Add intel_bios_cleanup() function
Add an intel_bios_cleanup() function to act as counterpart of
intel_bios_init() and move the cleanup of vbt related resources there,
putting it in the same file as the allocation.
Changed in v2:
-While touching the code anyways, remove the unnecessary:
if (dev_priv->vbt.child_dev) done before kfree(dev_priv->vbt.child_dev)
Hans de Goede [Wed, 20 Dec 2017 10:50:17 +0000 (11:50 +0100)]
drm/i915/vlv: Add cdclk workaround for DSI
At least on the Chuwi Vi8 (non pro/plus) the LCD panel will show an image
shifted aprox. 20% to the left (with wraparound) and sometimes also wrong
colors, showing that the panel controller is starting with sampling the
datastream somewhere mid-line. This happens after the first blanking and
re-init of the panel.
After looking at drm.debug output I noticed that initially we inherit the
cdclk of 333333 KHz set by the GOP, but after the re-init we picked 266667
KHz, which turns out to be the cause of this problem, a quick hack to hard
code the cdclk to 333333 KHz makes the problem go away.
I've tested this on various Bay Trail devices, to make sure this not does
cause regressions on other devices and the higher cdclk does not cause
any problems on the following devices:
-GP-electronic T701 1024x600 333333 KHz cdclk after this patch
-PEAQ C1010 1920x1200 333333 KHz cdclk after this patch
-PoV mobii-wintab-800w 800x1280 333333 KHz cdclk after this patch
-Asus Transformer-T100TA 1368x768 320000 KHz cdclk after this patch
Also interesting wrt this is the comment in vlv_calc_cdclk about the
existing workaround to avoid 200 Mhz as clock because that causes issues
in some cases.
This commit extends the "do not use 200 Mhz" workaround with an extra
check to require atleast 320000 KHz (avoiding 266667 KHz) when a DSI
panel is active.
Changes in v2:
-Change the commit message and the code comment to not treat the GOP as
a reference, the GOP should not be treated as a reference
Will Deacon [Tue, 13 Feb 2018 13:14:09 +0000 (13:14 +0000)]
arm64: proc: Set PTE_NG for table entries to avoid traversing them twice
When KASAN is enabled, the swapper page table contains many identical
mappings of the zero page, which can lead to a stall during boot whilst
the G -> nG code continually walks the same page table entries looking
for global mappings.
This patch sets the nG bit (bit 11, which is IGNORED) in table entries
after processing the subtree so we can easily skip them if we see them
a second time.
Linus Torvalds [Wed, 14 Feb 2018 18:06:41 +0000 (10:06 -0800)]
Merge tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A larger batch of fixes than we'd like. Roughly 1/3 fixes for new
code, 1/3 fixes for stable and 1/3 minor things.
There's four commits fixing bugs when using 16GB huge pages on hash,
caused by some of the preparatory changes for pkeys.
Two fixes for bugs in the enhanced IRQ soft masking for local_t, one
of which broke KVM in some circumstances.
Four fixes for Power9. The most bizarre being a bug where futexes
stopped working because a NULL pointer dereference didn't trap during
early boot (it aliased the kernel mapping). A fix for memory hotplug
when using the Radix MMU, and a fix for live migration of guests using
the Radix MMU.
Two fixes for hotplug on pseries machines. One where we weren't
correctly updating NUMA info when CPUs are added and removed. And the
other fixes crashes/hangs seen when doing memory hot remove during
boot, which is apparently a thing people do.
Finally a handful of build fixes for obscure configs and other minor
fixes.
Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin
Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck,
Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de
Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff"
* tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix to use ucontext_t instead of struct ucontext
powerpc/kdump: Fix powernv build break when KEXEC_CORE=n
powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug
powerpc/mm/hash64: Zero PGD pages on allocation
powerpc/mm/hash64: Store the slot information at the right offset for hugetlb
powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled
powerpc/mm: Fix crashes with 16G huge pages
powerpc/mm: Flush radix process translations when setting MMU type
powerpc/vas: Don't set uses_vas for kernel windows
powerpc/pseries: Enable RAS hotplug events later
powerpc/mm/radix: Split linear mapping on hot-unplug
powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID
ocxl: fix signed comparison with less than zero
powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking
powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro
powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove
When reset_controller that is invoked by sysfs fails,
it enters an error flow which practically removes the
nvme ctrl entirely (similar to delete_ctrl flow). It
causes the system to hang, since a sysfs attribute cannot
be unregistered by one of its own methods.
This can be fixed by calling delete_ctrl as a work rather
than sequential code. In addition, it should give the ctrl
a chance to recover using reconnection mechanism (consistant
with FC reset_ctrl error flow). Also, while we're here, return
suitable errno in case the reset ended with non live ctrl.
virtio/s390: implement PM operations for virtio_ccw
Suspend/Resume to/from disk currently fails. Let us wire
up the necessary callbacks. This is mostly just forwarding
the requests to the virtio drivers. The only thing that
has to be done in virtio_ccw itself is to re-set the
virtio revision.
These laptops have a combined jack to attach headsets, the U727 on
the left, the U757 on the right, but a headsets microphone doesn't
work. Using hdajacksensetest I found that pin 0x19 changed the
present state when plugging the headset, in addition to 0x21, but
didn't have the correct configuration (shown as "Not connected").
So this sets the configuration to the same values as the headphone
pin 0x21 except for the device type microphone, which makes it
work correctly. With the patch the configured pins for U727 are
Pin 0x12 (Internal Mic, Mobile-In): present = No
Pin 0x14 (Internal Speaker): present = No
Pin 0x19 (Black Mic, Left side): present = No
Pin 0x1d (Internal Aux): present = No
Pin 0x21 (Black Headphone, Left side): present = No
This commit was initially intended to fix problems with hs200 and hs400
on some boards, mainly the odroid-c2. The OC2 (Rev 0.2) I have performs
well in this modes, so I could not confirm these issues.
We've had several reports about the issues being still present on (some)
OC2, so apparently, this change does not do what it was supposed to do.
Maybe the eMMC signal quality is on the edge on the board. This may
explain the variability we see in term of stability, but this is just a
guess. Lowering the max_frequency to 100Mhz seems to do trick for those
affected by the issue
Worse, the commit created new issues (CRC errors and hangs) on other
boards, such as the kvim 1 and 2, the p200 or the libretech-cc.
According to amlogic, the Tx phase should not be tuned and left in its
default configuration, so it is best to just revert the commit.
Fixes: 0a44697627d1 ("mmc: meson-gx: include tx phase in the tuning process") Cc: <[email protected]> # 4.14+ Signed-off-by: Jerome Brunet <[email protected]> Signed-off-by: Ulf Hansson <[email protected]>
Takashi Iwai [Mon, 12 Feb 2018 14:20:51 +0000 (15:20 +0100)]
ALSA: seq: Fix racy pool initializations
ALSA sequencer core initializes the event pool on demand by invoking
snd_seq_pool_init() when the first write happens and the pool is
empty. Meanwhile user can reset the pool size manually via ioctl
concurrently, and this may lead to UAF or out-of-bound accesses since
the function tries to vmalloc / vfree the buffer.
A simple fix is to just wrap the snd_seq_pool_init() call with the
recently introduced client->ioctl_mutex; as the calls for
snd_seq_pool_init() from other side are always protected with this
mutex, we can avoid the race.
Weinan Li [Fri, 9 Feb 2018 08:01:34 +0000 (16:01 +0800)]
drm/i915/gvt: add 0xe4f0 into gen9 render list
Guest may set this register on KBL platform, it can impact hardware
behavior, so add it into the gen9 render list. Otherwise gpu hang issue may
happen during different vgpu switch.
Tvrtko Ursulin [Tue, 13 Feb 2018 09:57:46 +0000 (09:57 +0000)]
drm/i915/pmu: Fix sleep under atomic in RC6 readout
We are not allowed to call intel_runtime_pm_get from the PMU counter read
callback since the former can sleep, and the latter is running under IRQ
context.
To workaround this, we record the last known RC6 and while runtime
suspended estimate its increase by querying the runtime PM core
timestamps.
Downside of this approach is that we can temporarily lose a chunk of RC6
time, from the last PMU read-out to runtime suspend entry, but that will
eventually catch up, once device comes back online and in the presence of
PMU queries.
Also, we have to be careful not to overshoot the RC6 estimate, so once
resumed after a period of approximation, we only update the counter once
it catches up. With the observation that RC6 is increasing while the
device is suspended, this should not pose a problem and can only cause
slight inaccuracies due clock base differences.
v2: Simplify by estimating on top of PM core counters. (Imre)
Tvrtko Ursulin [Tue, 13 Feb 2018 09:57:45 +0000 (09:57 +0000)]
drm/i915/pmu: Fix PMU enable vs execlists tasklet race
Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking
inside for busy-stats") added a tasklet_disable call in busy stats
enabling, but we failed to understand that the PMU enable callback runs
as an hard IRQ (IPI).
Consequence of this is that the PMU enable callback can interrupt the
execlists tasklet, and will then deadlock when it calls
intel_engine_stats_enable->tasklet_disable.
To fix this, I realized it is possible to move the engine stats enablement
and disablement to PMU event init and destroy hooks. This allows for much
simpler implementation since those hooks run in normal context (can
sleep).
Chris Wilson [Tue, 13 Feb 2018 09:57:44 +0000 (09:57 +0000)]
drm/i915: Lock out execlist tasklet while peeking inside for busy-stats
In order to prevent a race condition where we may end up overaccounting
the active state and leaving the busy-stats believing the GPU is 100%
busy, lock out the tasklet while we reconstruct the busy state. There is
no direct spinlock guard for the execlists->port[], so we need to
utilise tasklet_disable() as a synchronous barrier to prevent it, the
only writer to execlists->port[], from running at the same time as the
enable.
When a request is preempted, it is unsubmitted from the HW queue and
removed from the active list of breadcrumbs. In the process, this
however triggers the signaler and it may see the clear rbtree with the
old, and still valid, seqno, or it may match the cleared seqno with the
now zero rq->global_seqno. This confuses the signaler into action and
signaling the fence.
Keith Busch [Thu, 8 Feb 2018 15:55:34 +0000 (08:55 -0700)]
nvme-pci: Fix timeouts in connecting state
We need to halt the controller immediately if we haven't completed
initialization as indicated by the new "connecting" state.
Fixes: ad70062cdb ("nvme-pci: introduce RECONNECTING state to mark initializing procedure") Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
Keith Busch [Tue, 13 Feb 2018 12:44:44 +0000 (05:44 -0700)]
nvme-pci: Remap CMB SQ entries on every controller reset
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Jianchao Wang [Mon, 12 Feb 2018 12:54:45 +0000 (20:54 +0800)]
nvme: fix the deadlock in nvme_update_formats
nvme_update_formats will invoke nvme_ns_remove under namespaces_mutext.
The will cause deadlock because nvme_ns_remove will also require
the namespaces_mutext. Fix it by getting the ns entries which should
be removed under namespaces_mutext and invoke nvme_ns_remove out of
namespaces_mutext.
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:
In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.
In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.
Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.
Linus Torvalds [Tue, 13 Feb 2018 17:35:17 +0000 (09:35 -0800)]
Merge tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fix from James Hogan:
"A single change (and associated DT binding update) to allow the
address of the MIPS Cluster Power Controller (CPC) to be chosen by DT,
which allows SMP to work on generic MIPS kernels where the bootloader
hasn't configured the CPC address (i.e. the new Ranchu platform)"
* tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base()
dt-bindings: Document mti,mips-cpc binding
We have expected busses to set up a coherent mask to properly use the
common dma mapping code for a long time, and now that I've added a warning
macio turned out to not set one up yet. This sets it to the same value
as the dma_mask, which seems to be what the drivers expect.
Nitesh Shetty [Tue, 13 Feb 2018 15:48:12 +0000 (21:18 +0530)]
blk: optimization for classic polling
This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.
ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Will Deacon [Tue, 13 Feb 2018 13:30:19 +0000 (13:30 +0000)]
locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()
A test_and_{}_bit() operation fails if the value of the bit is such that
the modification does not take place. For example, if test_and_set_bit()
returns 1. In these cases, follow the behaviour of cmpxchg and allow the
operation to be unordered. This also applies to test_and_set_bit_lock()
if the lock is found to be be taken already.
Will Deacon [Tue, 13 Feb 2018 13:22:57 +0000 (13:22 +0000)]
locking/qspinlock: Ensure node->count is updated before initialising node
When queuing on the qspinlock, the count field for the current CPU's head
node is incremented. This needn't be atomic because locking in e.g. IRQ
context is balanced and so an IRQ will return with node->count as it
found it.
However, the compiler could in theory reorder the initialisation of
node[idx] before the increment of the head node->count, causing an
IRQ to overwrite the initialised node and potentially corrupt the lock
state.
Avoid the potential for this harmful compiler reordering by placing a
barrier() between the increment of the head node->count and the subsequent
node initialisation.
Will Deacon [Tue, 13 Feb 2018 13:22:56 +0000 (13:22 +0000)]
locking/qspinlock: Ensure node is initialised before updating prev->next
When a locker ends up queuing on the qspinlock locking slowpath, we
initialise the relevant mcs node and publish it indirectly by updating
the tail portion of the lock word using xchg_tail. If we find that there
was a pre-existing locker in the queue, we subsequently update their
->next field to point at our node so that we are notified when it's our
turn to take the lock.
This can be roughly illustrated as follows:
/* Initialise the fields in node and encode a pointer to node in tail */
tail = initialise_node(node);
/*
* Exchange tail into the lockword using an atomic read-modify-write
* operation with release semantics
*/
old = xchg_tail(lock, tail);
/* If there was a pre-existing waiter ... */
if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
smp_read_barrier_depends();
/* ... then update their ->next field to point to node.
WRITE_ONCE(prev->next, node);
}
The conditional update of prev->next therefore relies on the address
dependency from the result of xchg_tail ensuring order against the
prior initialisation of node. However, since the release semantics of
the xchg_tail operation apply only to the write portion of the RmW,
then this ordering is not guaranteed and it is possible for the CPU
to return old before the writes to node have been published, consequently
allowing us to point prev->next to an uninitialised node.
This patch fixes the problem by making the update of prev->next a RELEASE
operation, which also removes the reliance on dependency ordering.