David S. Miller [Mon, 27 Feb 2017 14:22:10 +0000 (09:22 -0500)]
Merge branch 'qed-fixes'
Yuval Mintz says:
====================
qed: Bug fixes
Patch #1 addresses a day-one race which is dependent on the number of Vfs
[I.e., more child VFs from a single PF make it more probable].
Patch #2 corrects a race that got introduced in the last set of fixes for
qed, one that would happen each time PF transitions to UP state.
I've built & tested those against current net-next.
Please consider applying the series there.
====================
Mintz, Yuval [Mon, 27 Feb 2017 09:06:33 +0000 (11:06 +0200)]
qed: Don't use attention PTT for configuring BW
Commit 653d2ffd6405 ("qed*: Fix link indication race") introduced another
race - one of the inner functions called from the link-change flow is
explicitly using the slowpath context dedicated PTT instead of gaining
that PTT from the caller. Since this flow can now be called from
a different context as well, we're in risk of the PTT breaking.
Fixes: 653d2ffd6405 ("qed*: Fix link indication race") Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Mintz, Yuval [Mon, 27 Feb 2017 09:06:32 +0000 (11:06 +0200)]
qed: Fix race with multiple VFs
A PF syncronizes all IOV activity relating to its VFs
by using a single workqueue handling the work.
The workqueue would reach a bitmask of pending VF events
and act upon each in turn.
Problem is that the indication of a VF message [which sets
the 'vf event' bit for that VF] arrives and is set in
the slowpath attention context, which isn't syncronized with
the processing of the events.
When multiple VFs are present, it's possible that PF would
lose the indication of one of the VF's pending evens, leading
that VF to later timeout.
Instead of adding locks/barriers, simply move from a bitmask
into a per-VF indication inside that VF entry in the PF database.
The following patchset contains netfilter fixes for you net tree,
they are:
1) Missing ct zone size in the nft_ct initialization path, patch
from Florian Westphal.
2) Two patches for netfilter uapi headers, one to remove unnecessary
sysctl.h inclusion and another to fix compilation of xt_hashlimit.h
in userspace, from Dmitry V. Levin.
3) Patch to fix a sloppy change in nf_ct_expect that incorrectly
simplified nf_ct_expect_related_report() in the previous nf-next
batch. This also includes another patch for __nf_ct_expect_check()
to report success by returning 0 to keep it consistent with other
existing functions. From Jarno Rajahalme.
4) The ->walk() iterator of the new bitmap set type goes over the real
bitmap size, this results in incorrect dumps when NFTA_SET_USERDATA
is used.
====================
Marek Szyprowski [Tue, 21 Feb 2017 13:21:01 +0000 (14:21 +0100)]
dma-buf: add support for compat ioctl
Add compat ioctl support to dma-buf. This lets one to use DMA_BUF_IOCTL_SYNC
ioctl from 32bit application on 64bit kernel. Data structures for both 32
and 64bit modes are same, so there is no need for additional translation
layer.
Paul Hüber [Sun, 26 Feb 2017 16:58:19 +0000 (17:58 +0100)]
l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.
Julian Anastasov [Sat, 25 Feb 2017 15:57:43 +0000 (17:57 +0200)]
xfrm: provide correct dst in xfrm_neigh_lookup
Fix xfrm_neigh_lookup to provide dst->path to the
neigh_lookup dst_ops method.
When skb is provided, the IP address in packet should already
match the dst->path address family. But for the non-skb case,
we should consider the last tunnel address as nexthop address.
Fixes: f894cbf847c9 ("net: Add optional SKB arg to dst_ops->neigh_lookup().") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Herbert Xu [Sat, 25 Feb 2017 14:39:50 +0000 (22:39 +0800)]
rhashtable: Fix RCU dereference annotation in rht_bucket_nested
The current annotation is wrong as it says that we're only called
under spinlock. In fact it should be marked as under either
spinlock or RCU read lock.
Herbert Xu [Sat, 25 Feb 2017 14:38:11 +0000 (22:38 +0800)]
rhashtable: Fix use before NULL check in bucket_table_free
Dan Carpenter reported a use before NULL check bug in the function
bucket_table_free. In fact we don't need the NULL check at all as
no caller can provide a NULL argument. So this patch fixes this by
simply removing it.
Roman Mashak [Fri, 24 Feb 2017 22:36:58 +0000 (17:36 -0500)]
net sched actions: do not overwrite status of action creation.
nla_memdup_cookie was overwriting err value, declared at function
scope and earlier initialized with result of ->init(). At success
nla_memdup_cookie() returns 0, and thus module refcnt decremented,
although the action was installed.
$ sudo tc actions add action pass index 1 cookie 1234
$ sudo tc actions ls action gact
$ sudo modprobe act_gact
$ lsmod
Module Size Used by
act_gact 16384 0
...
$ sudo tc actions add action pass index 1 cookie 1234
$ sudo tc actions ls action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0
$
$ lsmod
Module Size Used by
act_gact 16384 1
...
$ sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
$
$ sudo /home/mrv/bin/tc actions del action gact index 1
$ sudo rmmod act_gact
$ lsmod
Module Size Used by
$
Fixes: 1045ba77a ("net sched actions: Add support for user cookies") Signed-off-by: Roman Mashak <[email protected]> Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Howells [Fri, 24 Feb 2017 21:57:13 +0000 (21:57 +0000)]
rxrpc: Kernel calls get stuck in recvmsg
Calls made through the in-kernel interface can end up getting stuck because
of a missed variable update in a loop in rxrpc_recvmsg_data(). The problem
is like this:
(1) A new packet comes in and doesn't cause a notification to be given to
the client as there's still another packet in the ring - the
assumption being that if the client will keep drawing off data until
the ring is empty.
(2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
iterates through the packets. This copies the window pointers into
variables rather than using the information in the call struct
because:
(a) MSG_PEEK might be in effect;
(b) we need a barrier after reading call->rx_top to pair with the
barrier in the softirq routine that loads the buffer.
(3) The reading of call->rx_top is done outside of the loop, and top is
never updated whilst we're in the loop. This means that even through
there's a new packet available, we don't see it and may return -EFAULT
to the caller - who will happily return to the scheduler and await the
next notification.
(4) No further notifications are forthcoming until there's an abort as the
ring isn't empty.
The fix is to move the read of call->rx_top inside the loop - but it needs
to be done before the condition is checked.
Roman Mashak [Fri, 24 Feb 2017 16:00:32 +0000 (11:00 -0500)]
net sched actions: decrement module reference count after table flush.
When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.
Following is example with GACT action:
% sudo modprobe act_gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 1
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....
After the fix:
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module Size Used by
act_gact 16384 3
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
% sudo rmmod act_gact
% lsmod
Module Size Used by
%
Xin Long [Fri, 24 Feb 2017 08:29:06 +0000 (16:29 +0800)]
ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt
Commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups") fixed
the issue for ipv4 ipmr:
ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
access/set raw_sk(sk)->ipmr_table before making sure the socket
is a raw socket, and protocol is IGMP
The same fix should be done for ipv6 ipmr as well.
This patch can fix the panic caused by overwriting the same offset
as ipmr_table as in raw_sk(sk) when accessing other type's socket
by ip_mroute_setsockopt().
Xin Long [Fri, 24 Feb 2017 07:18:46 +0000 (15:18 +0800)]
sctp: set sin_port for addr param when checking duplicate address
Commit b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's
bind address list") tried to check for duplicate address before copying
to asoc's bind_addr list from global addr list.
But all the addrs' sin_ports in global addr list are 0 while the addrs'
sin_ports are bp->port in asoc's bind_addr list. It means even if it's
a duplicate address, af->cmp_addr will still return 0 as the their
sin_ports are different.
This patch is to fix it by setting the sin_port for addr param with
bp->port before comparing the addrs.
Fixes: b8607805dd15 ("sctp: not copying duplicate addrs to the assoc's bind address list") Reported-by: Wei Chen <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Thomas Hellstrom [Tue, 21 Feb 2017 10:42:27 +0000 (17:42 +0700)]
drm/vmwgfx: Work around drm removal of control nodes
vmware tools has a daemon that gets layout information from the GUI and
forwards it to DRM so that the modesetting code can set preferred connector
locations and modes. This daemon was using control nodes but since control
nodes were just removed, make it possible for the daemon to use render- or
primary nodes instead. This is a bit ugly but will allow drm to proceed with
removal of the mostly unused control-node code and allow vmware to proceed
with fixing up automatic layout settings for gnome-shell/wayland.
We bump minor to inform user-space about the api change.
- Use watchdog core to install restart handler: tangox, dw_wdt,
bcm2835_wdt, asm9260_wdt, bcm47xx_wdt
- Convert ts72xx_wdt driver to watchdog core
- Let core handle heartbeat in ep93xx_wdt driver
- Enable COMPILE_TEST where possible
- Various other improvements"
* tag 'watchdog-for-linus-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (54 commits)
watchdog: s3c2410: Add prefix to local function
watchdog: s3c2410: Select MFD_SYSCON on all Exynos platforms
watchdog: s3c2410: Use dev_dbg instead of pr_info
watchdog: s3c2410: Fix infinite interrupt in soft mode
watchdog: s3c2410: Remove confusing CONFIG prefix from local defines
watchdog: softdog: make pretimeout support a compile option
watchdog: zx2967: add watchdog controller driver for ZTE's zx2967 family
dt: bindings: add documentation for zx2967 family watchdog controller
watchdog: sama5d4: Implement resume hook
watchdog: sama5d4: Cache MR instead of a partial config
watchdog: ts72xx_wdt: convert driver to watchdog core
watchdog: ep93xx_wdt: cleanup and let the core handle the heartbeat
watchdog: RDC321X_WDT always depends on PCI
watchdog: add driver for Cortina Gemini watchdog
watchdog: add DT bindings for Cortina Gemini
watchdog: constify watchdog_ops structures
watchdog: Introduce watchdog_stop_on_unregister helper
watchdog: ebc-c384_wdt: Utilize devm_ functions in driver probe callback
watchdog: tegra_wdt: Convert to use device managed functions
watchdog: da9063_wdt: Convert to use device managed functions
...
Eric Dumazet [Thu, 23 Feb 2017 23:22:43 +0000 (15:22 -0800)]
net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
The cited commit makes a great job of finding optimal shift/multiplier
values assuming a 10 seconds wrap around, but forgot to change the
overflow_period computation.
It overflows in cyclecounter_cyc2ns(), and the final result is 804 ms,
which is silly.
Lets simply use 5 seconds, no need to recompute this, given how it is
supposed to work.
Later, we will use a timer instead of a work queue, since the new RX
allocation schem will no longer need mlx4_en_recover_from_oom() and the
service_task firing every 250 ms.
Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1 due to
__nf_ct_expect_check() returning 1 on success. Prevent this
regression in the future by changing the return value of
__nf_ct_expect_check() to 0 on success.
Julian Anastasov [Sun, 26 Feb 2017 13:50:52 +0000 (15:50 +0200)]
ipv4: add missing initialization for flowi4_uid
Avoid matching of random stack value for uid when rules
are looked up on input route or when RP filter is used.
Problem should affect only setups that use ip rules with
uid range.
Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Sat, 25 Feb 2017 23:32:53 +0000 (15:32 -0800)]
Merge tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest update from Shuah Khan:
"This update consists of:
- fixes to several existing tests from Stafford Horne
- cpufreq tests from Viresh Kumar
- Selftest build and install fixes from Bamvor Jian Zhang and Michael
Ellerman
- Fixes to protection-keys tests from Dave Hansen
- Warning fixes from Shuah Khan"
* tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (28 commits)
selftests/powerpc: Fix remaining fallout from recent changes
selftests/powerpc: Fix the clean rule since recent changes
selftests: Fix the .S and .S -> .o rules
selftests: Fix the .c linking rule
selftests: Fix selftests build to just build, not run tests
selftests, x86, protection_keys: fix wrong offset in siginfo
selftests, x86, protection_keys: fix uninitialized variable warning
selftest: cpufreq: Update MAINTAINERS file
selftest: cpufreq: Add special tests
selftest: cpufreq: Add support to test cpufreq modules
selftest: cpufreq: Add suspend/resume/hibernate support
selftest: cpufreq: Add support for cpufreq tests
selftests: Add intel_pstate to TARGETS
selftests/intel_pstate: Update makefile to match new style
selftests/intel_pstate: Fix warning on loop index overflow
cpupower: Restore format of frequency-info limit
selftests/futex: Add headers to makefile dependencies
selftests/futex: Add stdio used for logging
selftests: x86 protection_keys remove dead code
selftests: x86 protection_keys fix unused variable compile warnings
...
- fix buffer size mis-match between kernel space and user space
New configuration button:
- support readahead_readcnt parameter"
* tag 'for-linus-4.11-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: fix buffer size mis-match between kernel space and user space.
orangefs: Dan Carpenter influenced cleanups...
orangefs: Remove orangefs_backing_dev_info
orangefs: Support readahead_readcnt parameter.
orangefs: silence harmless integer overflow warning
Linus Torvalds [Sat, 25 Feb 2017 22:53:58 +0000 (14:53 -0800)]
Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"This has a series of fixes and cleanups that Dave Sterba has been
collecting.
There is a pretty big variety here, cleaning up internal APIs and
fixing corner cases"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (124 commits)
Btrfs: use the correct type when creating cow dio extent
Btrfs: fix deadlock between dedup on same file and starting writeback
btrfs: use btrfs_debug instead of pr_debug in transaction abort
btrfs: btrfs_truncate_free_space_cache always allocates path
btrfs: free-space-cache, clean up unnecessary root arguments
btrfs: convert btrfs_inc_block_group_ro to accept fs_info
btrfs: flush_space always takes fs_info->fs_root
btrfs: pass fs_info to (more) routines that are only called with extent_root
btrfs: qgroup: Move half of the qgroup accounting time out of commit trans
btrfs: remove unused parameter from adjust_slots_upwards
btrfs: remove unused parameters from __btrfs_write_out_cache
btrfs: remove unused parameter from cleanup_write_cache_enospc
btrfs: remove unused parameter from __add_inode_ref
btrfs: remove unused parameter from clone_copy_inline_extent
btrfs: remove unused parameters from btrfs_cmp_data
btrfs: remove unused parameter from __add_inline_refs
btrfs: remove unused parameters from scrub_setup_wr_ctx
btrfs: remove unused parameter from create_snapshot
btrfs: remove unused parameter from init_first_rw_device
btrfs: remove unused parameter from __btrfs_alloc_chunk
...
Linus Torvalds [Sat, 25 Feb 2017 22:35:37 +0000 (14:35 -0800)]
Merge tag 'platform-drivers-x86-v4.11-1' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver updates from Darren Hart:
"Big picture:
- New intel_turbo_max_3 driver, providing max core frequency
information to the scheduler. Intel PMC APL support, s0ix read API,
and fixes.
- New Silead touchscreen platform touchscreen descriptions.
Additional hotkey support for the intel-hid driver.
- New model support for dell-laptop and hp_accel.
- Several cleanups, especially to the fujitsu-laptop and
intel_mid_powerbtn drivers.
Detail summary:
platorm/x86:
- silead depends on I2C being built-in
- add support for devices with Silead touchscreens
- Support Turbo Boost Max 3.0 for non HWP systems
intel_turbo_max_3:
- make it explicitly non-modular
dell-laptop:
- Add Latitude 7480 and others to the DMI whitelist
intel-hid:
- Support 5 button array
thinkpad_acpi:
- Call led_classdev_notify_brightness_hw_changed on kbd brightness change
- Use brightness_set_blocking callback for LEDs
- Stop setting led_classdev brightness directly
acer-wmi:
- add another KEY_WLAN keycode
- Inform firmware that RF Button Driver is active
- setup accelerometer when machine has appropriate notify event
asus-wireless:
- Fix indentation
- Use per-HID HSWC parameters
intel_pmc_ipc:
- Add APL PMC PCI Id
- read s0ix residency API
- Remove unused iTCO_version variable
alienware-wmi:
- Remove header duplicate
intel_pmc_core:
- fix out-of-bounds accesses on stack
intel_mid_powerbtn:
- Use SCU IPC directly
- Unify IRQ acknowledgment
- Move comment to where it belongs
- Unify PBSTATUS access
- Remove snail address
- Sort headers alphabetically
- Join string literals
- Enable driver for Merrifield
- Acknowledge interrupts
- Factor out mfld_ack()
- Introduce driver data
- Substitute mfld by mid
- Convert to use devm_*()
fujitsu-laptop:
- make hotkey handling functions more similar
- break up complex loop condition
- move keycode processing to separate functions
- decrease indentation in acpi_fujitsu_hotkey_notify()
- simplify logolamp_get()
- rework logolamp_set() to properly handle errors
- set default trigger for radio LED to rfkill-any
Linus Torvalds [Sat, 25 Feb 2017 22:28:06 +0000 (14:28 -0800)]
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The usual collection of new drivers, non-critical fixes, and updates
to existing clk drivers. The bulk of the work is on Allwinner and
Rockchip SoCs, but there's also an Intel Atom driver in here too.
Updates:
- Migrate ABx500 to OF
- Qualcomm IPQ4019 CPU clks and general PLL support
- Qualcomm MSM8974 RPM
- Rockchip non-critical fixes and clk id additions
- Samsung Exynos4412 CPUs
- Socionext UniPhier NAND and eMMC support
- ZTE zx296718 i2s and other audio clks
- Renesas CAN and MSIOF clks for R-Car M3-W
- Renesas resets for R-Car Gen2 and Gen3 and RZ/G1
- TI CDCE913, CDCE937, and CDCE949 clk generators
- Marvell Armada ap806 CPU frequencies
- STM32F4* I2S/SAI support
- Broadcom BCM2835 DSI support
- Allwinner sun5i and A80 conversion to new style clk bindings"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (130 commits)
clk: renesas: mstp: ensure register writes complete
clk: qcom: Do not drop device node twice
clk: mvebu: adjust clock handling for the CP110 system controller
clk: mvebu: Expand mv98dx3236-core-clock support
clk: zte: add i2s clocks for zx296718
clk: sunxi-ng: sun9i-a80: Fix wrong pointer passed to PTR_ERR()
clk: sunxi-ng: select SUNXI_CCU_MULT for sun5i
clk: sunxi-ng: Check kzalloc() for errors and cleanup error path
clk: tegra: Add BPMP clock driver
clk: uniphier: add eMMC clock for LD11 and LD20 SoCs
clk: uniphier: add NAND clock for all UniPhier SoCs
ARM: dts: sun9i: Switch to new clock bindings
clk: sunxi-ng: Add A80 Display Engine CCU
clk: sunxi-ng: Add A80 USB CCU
clk: sunxi-ng: Add A80 CCU
clk: sunxi-ng: Support separately grouped PLL lock status register
clk: sunxi-ng: mux: Get closest parent rate possible with CLK_SET_RATE_PARENT
clk: sunxi-ng: mux: honor CLK_SET_RATE_NO_REPARENT flag
clk: sunxi-ng: mux: Fix determine_rate for mux clocks with pre-dividers
clk: qcom: SDHCI enablement on Nexus 5X / 6P
...
Linus Torvalds [Sat, 25 Feb 2017 22:21:18 +0000 (14:21 -0800)]
Merge branch 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"I2C has for you two new drivers (Tegra BPMP and STM32F4), interrupt
support for pca954x muxes, and a bunch of driver bugfixes and
improvements. Nothing really special this cycle.
A few commits have been added to my tree just recently. Those are the
Tegra BPMP driver and a few straightforward bugfixes or cleanups which
I prefer to have upstream rather soonish. The rest had proper
linux-next exposure"
* 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (25 commits)
i2c: thunderx: Replace pci_enable_msix()
i2c: exynos5: fix arbitration lost handling
i2c: exynos5: disable fifo-almost-empty irq signal when necessary
i2c: at91: ensure state is restored after suspending
i2c: bcm2835: Avoid possible NULL ptr dereference
i2c: Add Tegra BPMP I2C proxy driver
dt-bindings: Add Tegra186 BPMP I2C binding
misc: eeprom: at24: use device_property_*() functions instead of of_get_property()
i2c: mux: pca954x: Add interrupt controller support
dt: bindings: i2c-mux-pca954x: Add documentation for interrupt controller
i2c: mux: pca954x: Add missing pca9542 definition to chip_desc
i2c: riic: correctly finish transfers
i2c: i801: Add support for Intel Gemini Lake
i2c: mux: pca9541: Export OF device ID table as module aliases
i2c: mux: pca954x: Export OF device ID table as module aliases
i2c: mux: mlxcpld: remove unused including <linux/version.h>
i2c: busses: constify i2c_algorithm structures
i2c: i2c-mux-gpio: rename i2c-gpio-mux to i2c-mux-gpio
i2c: sh_mobile: document support for r8a7796 (R-Car M3-W)
i2c: i2c-cros-ec-tunnel: Reduce logging noise
...
Linus Torvalds [Sat, 25 Feb 2017 21:45:43 +0000 (13:45 -0800)]
Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
Linus Torvalds [Sat, 25 Feb 2017 18:29:09 +0000 (10:29 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
- almost all of the rest of MM
- misc bits
- KASAN updates
- procfs
- lib/ updates
- checkpatch updates
* emailed patches from Andrew Morton <[email protected]>: (124 commits)
checkpatch: remove false unbalanced braces warning
checkpatch: notice unbalanced else braces in a patch
checkpatch: add another old address for the FSF
checkpatch: update $logFunctions
checkpatch: warn on logging continuations
checkpatch: warn on embedded function names
lib/lz4: remove back-compat wrappers
fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version
crypto: change LZ4 modules to work with new LZ4 module version
lib/decompress_unlz4: change module to work with new LZ4 module version
lib: update LZ4 compressor module
lib/test_sort.c: make it explicitly non-modular
lib: add CONFIG_TEST_SORT to enable self-test of sort()
rbtree: use designated initializers
linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
lib/find_bit.c: micro-optimise find_next_*_bit
lib: add module support to atomic64 tests
lib: add module support to glob tests
lib: add module support to crc32 tests
kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
...
Jarno Rajahalme [Fri, 24 Feb 2017 01:08:53 +0000 (17:08 -0800)]
netfilter: nf_ct_expect: nf_ct_expect_related_report(): Return zero on success.
Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1, which caused openvswitch
conntrack integration fail in FTP test cases.
Fix this by always returning zero on the success code path.
Include <linux/limits.h> like some of uapi/linux/netfilter/xt_*.h
headers do to fix the following linux/netfilter/xt_hashlimit.h
userspace compilation error:
/usr/include/linux/netfilter/xt_hashlimit.h:90:12: error: 'NAME_MAX' undeclared here (not in a function)
char name[NAME_MAX];
Josh Poimboeuf [Sat, 25 Feb 2017 04:31:02 +0000 (22:31 -0600)]
objtool: Prevent GCC from merging annotate_unreachable()
0-day bot reported some new objtool warnings which were caused by the
new annotate_unreachable() macro:
fs/afs/flock.o: warning: objtool: afs_do_unlk()+0x0: duplicate frame pointer save
fs/afs/flock.o: warning: objtool: afs_do_unlk()+0x0: frame pointer state mismatch
fs/btrfs/delayed-inode.o: warning: objtool: btrfs_delete_delayed_dir_index()+0x0: duplicate frame pointer save
fs/btrfs/delayed-inode.o: warning: objtool: btrfs_delete_delayed_dir_index()+0x0: frame pointer state mismatch
fs/dlm/lock.o: warning: objtool: _grant_lock()+0x0: duplicate frame pointer save
fs/dlm/lock.o: warning: objtool: _grant_lock()+0x0: frame pointer state mismatch
fs/ocfs2/alloc.o: warning: objtool: ocfs2_mv_path()+0x0: duplicate frame pointer save
fs/ocfs2/alloc.o: warning: objtool: ocfs2_mv_path()+0x0: frame pointer state mismatch
It turns out that, for older versions of GCC, if a function has multiple
BUG() incantations, GCC will sometimes merge the corresponding
annotate_unreachable() inline asm statements into a single block. That
has the undesirable effect of removing one of the entries in the
__unreachable section, confusing objtool greatly.
A workaround for this issue is to ensure that each instance of the
inline asm statement uses a different label, so that GCC sees the
statements are unique and leaves them alone. The inline asm ‘%=’ token
could be used for that, but unfortunately older versions of GCC don't
support it. So I implemented a poor man's version of it with the
__LINE__ macro.
Alex Hung [Tue, 14 Feb 2017 07:20:34 +0000 (15:20 +0800)]
platform/x86: intel-hid: Support 5 button array
New firmwares include a feature called 5 button array that supports
super key, volume up/down, rotation lock and power button. Support
for this feature is required to fix power button on some recent
systems.
Hans de Goede [Thu, 9 Feb 2017 15:44:13 +0000 (16:44 +0100)]
platform/x86: thinkpad_acpi: Call led_classdev_notify_brightness_hw_changed on kbd brightness change
Make thinkpad_acpi call led_classdev_notify_brightness_hw_changed on the
kbd_led led_classdev registered by thinkpad_acpi when the kbd backlight
brightness is changed through the hotkey.
This will allow userspace to monitor (poll) for brightness changes on
these LEDs caused by the hotkey.
Hans de Goede [Thu, 9 Feb 2017 15:44:12 +0000 (16:44 +0100)]
platform/x86: thinkpad_acpi: Use brightness_set_blocking callback for LEDs
Now a days the LED core can take care of executing brightness_set from
a workqueue if it needs to sleep, make use of this and remove a bunch
of DIY code for this.
Since this commit removes the workqueue usage for LEDs, the
led_sysfs_blink_set callback may now also sleep, this is fine.
There is no need to set the led_classdev's brightness value from
its set_brightness callback, this is taken care of by the led-core and
thinkpad_acpi really should not be mucking with it.
Note that kbdlight_set_level_and_update() is still used by the old
thinpad_acpi specific sysfs interface for the led, so we cannot
remove it.
Hans de Goede [Sun, 29 Jan 2017 13:42:52 +0000 (14:42 +0100)]
leds: class: Add new optional brightness_hw_changed attribute
Some LEDs may have their brightness level changed autonomously
(outside of kernel control) by hardware / firmware. This commit
adds support for an optional brightness_hw_changed attribute to
signal such changes to userspace (if a driver can detect them):
What: /sys/class/leds/<led>/brightness_hw_changed
Date: January 2017
KernelVersion: 4.11
Description:
Last hardware set brightness level for this LED. Some LEDs
may be changed autonomously by hardware/firmware. Only LEDs
where this happens and the driver can detect this, will
have this file.
This file supports poll() to detect when the hardware
changes the brightness.
Reading this file will return the last brightness level set
by the hardware, this may be different from the current
brightness.
Drivers which want to support this, simply add LED_BRIGHT_HW_CHANGED to
their flags field and call led_classdev_notify_brightness_hw_changed()
with the hardware set brightness when they detect a hardware / firmware
triggered brightness change.
Chris Chiu [Wed, 8 Feb 2017 13:51:41 +0000 (07:51 -0600)]
platform/x86: acer-wmi: add another KEY_WLAN keycode
Now that we have informed the firmware that the RF Button driver is
active, laptops such as the Acer TravelMate P238-M will generate
a WMI key event with code 0x86 when the Fn+F3 airplane mode key is
pressed.
Add this keycode to the table so that it is converted to an appropriate
input event.
Chris Chiu [Wed, 8 Feb 2017 13:51:40 +0000 (07:51 -0600)]
platform/x86: acer-wmi: Inform firmware that RF Button Driver is active
The same method to activate LM(Launch Manager) can also be used to
activate the RF Button driver with different bit toggled in the same
lm_status. To express that many functions this byte field can achieve,
rename the lm_status to app_status. And also the app_mask is the bit
mask which specifically indicate which bits are going to be changed.
This solves a problem where the AR9565 wifi included in the
Acer Aspire ES1-421 is permanently hard blocked according to the rfkill
GPIO read by ath9k.
platform/x86: asus-wireless: Use per-HID HSWC parameters
Some Asus machines use 0x4/0x5 as their LED on/off values, while others
use 0x0/0x1, as shown in the DSDT excerpts below. Luckily it seems this
behavior is tied to different HIDs, after looking at 44 DSDTs from
different Asus models.
Another small difference is that a few of them call GWBL instead of
OWGS, and SWBL instead of OWGD. That does not seem to make a difference
for asus-wireless, and is additional reasoning to not try to call these
methods directly.
This patch adds the PCI Device id for Power Management Controller on Intel
Apollo Lake platforms.
Intel PMC IPC Driver loads as a platform driver on Apollo Lake platforms
since Intel BIOS hides the PCI Configuration space for 0:13:1 and
re-enumerates it as ACPI device (INT34D2). The correct PCI Device ID should
be added if some platform firmware choses to enumerate the device via PCI
space.
Shanth Murthy [Mon, 13 Feb 2017 12:02:52 +0000 (04:02 -0800)]
platform/x86: intel_pmc_ipc: read s0ix residency API
This patch adds a new API to indicate S0ix residency in usec. It utilizes
the PMC Global Control Registers (GCR) to read deep and shallow
S0ix residency.
PMC MMIO resources:
o Lower 4kB: IPC1 (PMC inter-processor communication) interface
o Upper 4kB: GCR (Global Control Registers)
This enables the power management framework to take corrective actions when
the platform fails to enter S0ix after kernel freeze as part of the suspend
to idle flow. (echo freeze > /sys/power/state).
This is expected to be used with a S0ix failsafe framework such as:
<https://lwn.net/Articles/689505/>
[rajneesh: folded in "fix division in 32-bit case" from Andy Shevchenko] Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Shanth Murthy <[email protected]>
[andy: fixed kbuild error, removed "total" from variables, fixed macro] Signed-off-by: Andy Shevchenko <[email protected]>
Lines containing "} else {" should not be detected as unbalanced braces.
But the second check can be reduced to ".+else\s*{" and it therefore
never checked if the beginning of a line contains any other character
(like the relevant "}"). This check would also return true for "} else
{" and create warnings like
Miles Chen [Fri, 24 Feb 2017 23:01:34 +0000 (15:01 -0800)]
checkpatch: update $logFunctions
Currently checkpatch.pl does not recognize printk_deferred* functions as
log functions and complains about the line length of printk_deferred*
functions. Add printk_deferred* to logFunctions to fix it.
Joe Perches [Fri, 24 Feb 2017 23:01:28 +0000 (15:01 -0800)]
checkpatch: warn on embedded function names
Embedded function names are less appropriate to use when refactoring can
cause function renaming. Prefer the use of "%s", __func__ to embedded
function names.
Sven Schmidt [Fri, 24 Feb 2017 23:01:25 +0000 (15:01 -0800)]
lib/lz4: remove back-compat wrappers
Remove the functions introduced as wrappers for providing backwards
compatibility to the prior LZ4 version. They're not needed anymore
since there's no callers left.
Sven Schmidt [Fri, 24 Feb 2017 23:01:12 +0000 (15:01 -0800)]
lib: update LZ4 compressor module
Patch series "Update LZ4 compressor module", v7.
This patchset updates the LZ4 compression module to a version based on
LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
which provides an "acceleration" parameter as a tradeoff between high
compression ratio and high compression speed.
We want to use LZ4 fast in order to support compression in lustre and
(mostly, based on that) investigate data reduction techniques in behalf
of storage systems.
Also, it will be useful for other users of LZ4 compression, as with LZ4
fast it is possible to enable applications to use fast and/or high
compression depending on the usecase. For instance, ZRAM is offering a
LZ4 backend and could benefit from an updated LZ4 in the kernel.
[PATCH 1/5] lib: Update LZ4 compressor module
[PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version
[PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
[PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version
[PATCH 5/5] lib/lz4: Remove back-compat wrappers
This patch (of 5):
Update the LZ4 kernel module to LZ4 v1.7.3 by Yann Collet. The kernel
module is inspired by the previous work by Chanho Min. The updated LZ4
module will not break existing code since the patchset contains
appropriate changes.
API changes:
New method LZ4_compress_fast which differs from the variant available in
kernel by the new acceleration parameter, allowing to trade compression
ratio for more compression speed and vice versa.
LZ4_decompress_fast is the respective decompression method, featuring a
very fast decoder (multiple GB/s per core), able to reach RAM speed in
multi-core systems. The decompressor allows to decompress data
compressed with LZ4 fast as well as the LZ4 HC (high compression)
algorithm.
Also the useful functions LZ4_decompress_safe_partial and
LZ4_compress_destsize were added. The latter reverses the logic by
trying to compress as much data as possible from source to dest while
the former aims to decompress partial blocks of data.
A bunch of streaming functions were also added which allow
compressig/decompressing data in multiple steps (so called "streaming
mode").
The methods lz4_compress and lz4_decompress_unknownoutputsize are now
known as LZ4_compress_default respectivley LZ4_decompress_safe. The old
methods will be removed since there's no callers left in the code.
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the code there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular case,
the init ordering becomes slightly earlier when we change it to use
subsys_initcall as done here. However, since it is a self contained
test, this shouldn't be an issue and subsys_initcall seems like a better
fit for this particular case.
We also delete the MODULE_LICENSE tag since that information is now
contained at the top of the file in the comments.
Kees Cook [Fri, 24 Feb 2017 23:01:04 +0000 (15:01 -0800)]
rbtree: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified
during allyesconfig builds of x86, arm, and arm64, with most initializer
fixes extracted from grsecurity.
Niklas Söderlund [Fri, 24 Feb 2017 23:01:01 +0000 (15:01 -0800)]
linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
While working on a thermal driver I encounter a scenario where the
divisor could be negative, instead of adding local code to handle this I
though I first try to add support for this in DIV_ROUND_CLOSEST.
Add support to DIV_ROUND_CLOSEST for negative divisors if both dividend
and divisor variable types are signed. This should not alter current
behavior for users of the macro as previously negative divisors where
not supported.
Matthew Wilcox [Fri, 24 Feb 2017 23:00:58 +0000 (15:00 -0800)]
lib/find_bit.c: micro-optimise find_next_*_bit
This saves 32 bytes on my x86-64 build, mostly due to alignment
considerations and sharing more code between find_next_bit and
find_next_zero_bit, but it does save a couple of instructions.
There's really two parts to this commit:
- First, the first half of the test: (!nbits || start >= nbits) is
trivially a subset of the second half, since nbits and start are both
unsigned
- Second, while looking at the disassembly, I noticed that GCC was
predicting the branch taken. Since this is a failure case, it's
clearly the less likely of the two branches, so add an unlikely() to
override GCC's heuristics.
Bhumika Goyal [Fri, 24 Feb 2017 23:00:46 +0000 (15:00 -0800)]
kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
The object notes_attr of type bin_attribute is not modified after
getting initailized by ksysfs_init. Apart from initialization in
ksysfs_init it is also passed as an argument to the function
sysfs_create_bin_file but this argument is of type const. Therefore,
add __ro_after_init to its declaration.
Viresh Kumar [Fri, 24 Feb 2017 23:00:44 +0000 (15:00 -0800)]
kernel/notifier.c: simplify expression
NOTIFY_STOP_MASK (0x8000) has only one bit set and there is no need to
compare output of "ret & NOTIFY_STOP_MASK" to NOTIFY_STOP_MASK. We just
need to make sure the output is non-zero, that's it.
Yisheng Xie [Fri, 24 Feb 2017 23:00:40 +0000 (15:00 -0800)]
mm balloon: umount balloon_mnt when removing vb device
With CONFIG_BALLOON_COMPACTION=y the kernel will mount balloon_mnt for
balloon page migration when we probe a virtio_balloon device. However
we do not unmount it when removing the device. Fix this.
Kees Cook [Fri, 24 Feb 2017 23:00:38 +0000 (15:00 -0800)]
bug: switch data corruption check to __must_check
The CHECK_DATA_CORRUPTION() macro was designed to have callers do
something meaningful/protective on failure. However, using "return
false" in the macro too strictly limits the design patterns of callers.
Instead, let callers handle the logic test directly, but make sure that
the result IS checked by forcing __must_check (which appears to not be
able to be used directly on macro expressions).
m68k: replace gcc specific macros with ones from compiler.h
There is <linux/compiler.h> which provides macros for various gcc
specific constructs. Eg: __weak for __attribute__((weak)). I've
cleaned all instances of gcc specific attributes with the right macros
for all files under /arch/m68k
Masahiro Yamada [Fri, 24 Feb 2017 23:00:29 +0000 (15:00 -0800)]
include/linux/iopoll.h: include <linux/ktime.h> instead of <linux/hrtimer.h>
The timer APIs this header needs are ktime_get(), ktime_add_us(), and
ktime_compare(). So, including <linux/ktime.h> seems enough. This
commit will cut unnecessary header file parsing.
Mike Frysinger [Fri, 24 Feb 2017 23:00:26 +0000 (15:00 -0800)]
uapi: mqueue.h: add missing linux/types.h include
Commit 63159f5dcccb ("uapi: Use __kernel_long_t in struct mq_attr")
changed the types from long to __kernel_long_t, but didn't add a
linux/types.h include. Code that tries to include this header directly
breaks:
/usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t'
__kernel_long_t mq_flags; /* message queue flags */
This also upsets configure tests for this header:
checking linux/mqueue.h usability... no
checking linux/mqueue.h presence... yes
configure: WARNING: linux/mqueue.h: present but cannot be compiled
configure: WARNING: linux/mqueue.h: check for missing prerequisite headers?
configure: WARNING: linux/mqueue.h: see the Autoconf documentation
configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled"
configure: WARNING: linux/mqueue.h: proceeding with the compiler's result
checking for linux/mqueue.h... no
Lafcadio Wluiki [Fri, 24 Feb 2017 23:00:23 +0000 (15:00 -0800)]
procfs: use an enum for possible hidepid values
Previously, the hidepid parameter was checked by comparing literal
integers 0, 1, 2. Let's add a proper enum for this, to make the
checking more expressive:
Alexey Dobriyan [Fri, 24 Feb 2017 23:00:20 +0000 (15:00 -0800)]
proc: less code duplication in /proc/*/cmdline
After staring at this code for a while I've figured using small 2-entry
array describing ARGV and ENVP is the way to address code duplication
critique.
Greg Thelen [Fri, 24 Feb 2017 23:00:08 +0000 (15:00 -0800)]
kasan: add memcg kmem_cache test
Make a kasan test which uses a SLAB_ACCOUNT slab cache. If the test is
run within a non default memcg, then it uncovers the bug fixed by
"kasan: drain quarantine of memcg slab objects"[1].
If run without fix [1] it shows "Slab cache still has objects", and the
kmem_cache structure is leaked.
Here's an unpatched kernel test:
Greg Thelen [Fri, 24 Feb 2017 23:00:05 +0000 (15:00 -0800)]
kasan: drain quarantine of memcg slab objects
Per memcg slab accounting and kasan have a problem with kmem_cache
destruction.
- kmem_cache_create() allocates a kmem_cache, which is used for
allocations from processes running in root (top) memcg.
- Processes running in non root memcg and allocating with either
__GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
kmem_cache.
- Kasan catches use-after-free by having kfree() and kmem_cache_free()
defer freeing of objects. Objects are placed in a quarantine.
- kmem_cache_destroy() destroys root and non root kmem_caches. It takes
care to drain the quarantine of objects from the root memcg's
kmem_cache, but ignores objects associated with non root memcg. This
causes leaks because quarantined per memcg objects refer to per memcg
kmem cache being destroyed.
To see the problem:
1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
2) from non root memcg, allocate and free a few objects from cache
3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
will trigger a "Slab cache still has objects" warning indicating
that the per memcg kmem_cache structure was leaked.
Fix the leak by draining kasan quarantined objects allocated from non
root memcg.
Racing memcg deletion is tricky, but handled. kmem_cache_destroy() =>
shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
flushes per memcg quarantined objects, even if that memcg has been
rmdir'd and gone through memcg_deactivate_kmem_caches().
This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
enabled. So I don't think it's worth patching stable kernels.
Nathan Fontenot [Fri, 24 Feb 2017 23:00:02 +0000 (15:00 -0800)]
memory-hotplug: use dev_online for memhp_auto_online
Commit 31bc3858ea3e ("add automatic onlining policy for the newly added
memory") provides the capability to have added memory automatically
onlined during add, but this appears to be slightly broken.
The current implementation uses walk_memory_range() to call
online_memory_block, which uses memory_block_change_state() to online
the memory. Instead, we should be calling device_online() for the
memory block in online_memory_block(). This would online the memory
(the memory bus online routine memory_subsys_online() called from
device_online calls memory_block_change_state()) and properly update the
device struct offline flag.
As a result of the current implementation, attempting to remove a memory
block after adding it using auto online fails. This is because doing a
remove, for instance
Minchan Kim [Fri, 24 Feb 2017 22:59:59 +0000 (14:59 -0800)]
mm: do not access page->mapping directly on page_endio
With rw_page, page_endio is used for completing IO on a page and it
propagates write error to the address space if the IO fails. The
problem is it accesses page->mapping directly which might be okay for
file-backed pages but it shouldn't for anonymous page. Otherwise, it
can corrupt one of field from anon_vma under us and system goes panic
randomly.
swap_writepage
bdev_writepage
ops->rw_page
I encountered the BUG during developing new zram feature and it was
really hard to figure it out because it made random crash, somtime
mmap_sem lockdep, sometime other places where places never related to
zram/zsmalloc, and not reproducible with some configuration.
When I consider how that bug is subtle and people do fast-swap test with
brd, it's worth to add stable mark, I think.
We are using the wrong flag value in task_numa_falt function. This can
result in us doing wrong numa fault statistics update, because we update
num_pages_migrate and numa_fault_locality etc based on the flag argument
passed.