Linus Torvalds [Wed, 23 Nov 2016 16:09:21 +0000 (08:09 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Six fixes for bugs that were found via fuzzing, and a trivial
hw-enablement patch for AMD Family-17h CPU PMUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Allow only a single PMU/box within an events group
perf/x86/intel: Cure bogus unwind from PEBS entries
perf/x86: Restore TASK_SIZE check on frame pointer
perf/core: Fix address filter parser
perf/x86: Add perf support for AMD family-17h processors
perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC
perf/core: Do not set cpuctx->cgrp for unscheduled cgroups
rt2800: do not overwrite WPDMA_GLO_CFG_WP_DMA_BURST_SIZE
We already initlized WPDMA_GLO_CFG_WP_DMA_BURST_SIZE to 3 on
rt2800_init_registers() for USB devices. For PCI devices we will use
HW default setting, which is 2, so patch does not change behaviour
on PCI devices.
We should not reset USB_DMA_CFG on rt2800usb_init_registers() as this
function is called indirectly from rt2800_enable_radio(). If we
do so, we wipe out USB_DMA_CFG settings from rt2800usb_enable_radio().
We should only set IEEE80211_HT_MCS_TX_RX_DIF when TX and RX MCS sets
are not equal, i.e. when number of tx streams is different than
number of RX streams.
Wright Feng [Fri, 18 Nov 2016 01:59:52 +0000 (09:59 +0800)]
brcmfmac: update beacon IE after bss up and clear when AP stopped
Firmware doesn't update beacon/Probe Response vendor IEs correctly when
bss is down, so we move brcmf_config_ap_mgmt_ie after BSS up. And host
driver should clear IEs when AP stopped so that the IEs in host side will
be synced with in firmware side.
Vishal Thanki [Wed, 16 Nov 2016 16:01:54 +0000 (17:01 +0100)]
rt2x00: Fix incorrect usage of CONFIG_RT2X00_LIB_USB
In device removal routine, usage of "#ifdef CONFIG_RT2X00_LIB_USB"
will not cover the case when it is configured as module. This will
omit the entire if-block which does cleanup of URBs and cancellation
of pending work. Changing the #ifdef to #if IS_ENABLED() to fix it.
Oliver Hartkopp [Wed, 23 Nov 2016 13:33:25 +0000 (14:33 +0100)]
can: bcm: fix support for CAN FD frames
Since commit 6f3b911d5f29b98 ("can: bcm: add support for CAN FD frames") the
CAN broadcast manager supports CAN and CAN FD data frames.
As these data frames are embedded in struct can[fd]_frames which have a
different length the access to the provided array of CAN frames became
dependend of op->cfsiz. By using a struct canfd_frame pointer for the array of
CAN frames the new offset calculation based on op->cfsiz was accidently applied
to CAN FD frame element lengths.
This fix makes the pointer to the arrays of the different CAN frame types a
void pointer so that the offset calculation in bytes accesses the correct CAN
frame elements.
Moving the exports for assembly code into the assembly files breaks
KSYM trimming, but also breaks modversions.
While fixing the KSYM trimming is trivial, fixing modversions brings
us to a technically worse position that we had prior to the above
change:
- We end up with the prototype definitions divorsed from everything
else, which means that adding or removing assembly level ksyms
become more fragile:
* if adding a new assembly ksyms export, a missed prototype in
asm-prototypes.h results in a successful build if no module in
the selected configuration makes use of the symbol.
* when removing a ksyms export, asm-prototypes.h will get forgotten,
with armksyms.c, you'll get a build error if you forget to touch
the file.
- We end up with the same amount of include files and prototypes,
they're just in a header file instead of a .c file with their
exports.
As for lines of code, we don't get much of a size reduction:
(original commit)
47 files changed, 131 insertions(+), 208 deletions(-)
(fix for ksyms trimming)
7 files changed, 18 insertions(+), 5 deletions(-)
(two fixes for modversions)
1 file changed, 34 insertions(+)
3 files changed, 7 insertions(+), 2 deletions(-)
which results in a net total of only 25 lines deleted.
As there does not seem to be much benefit from this change of approach,
revert the change.
Linus Torvalds [Tue, 22 Nov 2016 21:53:01 +0000 (13:53 -0800)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fix from Zhang Rui:
"We only have one urgent fix this time.
Commit 3105f234e0ab ("thermal/powerclamp: correct cpu support check"),
which is shipped in 4.9-rc3, fixed a problem introduced by commit b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist").
But unfortunately, it broke intel_powerclamp driver module auto-
loading at the same time. Thus we need this change to add back module
auto-loading for 4.9"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/powerclamp: add back module device table
Johan Hedberg [Sat, 12 Nov 2016 15:03:07 +0000 (17:03 +0200)]
Bluetooth: Fix using the correct source address type
The hci_get_route() API is used to look up local HCI devices, however
so far it has been incapable of dealing with anything else than the
public address of HCI devices. This completely breaks with LE-only HCI
devices that do not come with a public address, but use a static
random address instead.
This patch exteds the hci_get_route() API with a src_type parameter
that's used for comparing with the right address of each HCI device.
Linus Torvalds [Tue, 22 Nov 2016 21:48:05 +0000 (13:48 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes.
One prevents timeouts on mpt3sas when trying to use the secure erase
protocol which causes the erase protocol to be aborted. The second is
a regression in a prior fix which causes all commands to abort during
PCI extended error recovery, which is incorrect because PCI EEH is
independent from what's happening on the FC transport"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
scsi: mpt3sas: Fix secure erase premature termination
Linus Torvalds [Tue, 22 Nov 2016 21:20:34 +0000 (13:20 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of driver fixes.
The sunxi fixes are for an incorrect clk tree configuration and a bad
frequency calculation. The other two are fixes for passing the wrong
pointer in drivers recently converted to clk_hw style registration"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: efm32gg: Pass correct type to hw provider registration
clk: berlin: Pass correct type to hw provider registration
clk: sunxi: Fix M factor computation for APB1
clk: sunxi-ng: sun6i-a31: Force AHB1 clock to use PLL6 as parent
Arnd Bergmann [Tue, 22 Nov 2016 20:50:52 +0000 (21:50 +0100)]
NFSv4.x: hide array-bounds warning
A correct bugfix introduced a harmless warning that shows up with gcc-7:
fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.
The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.
The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.
Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net") Cc: [email protected] # v3.7+ Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
Linus Torvalds [Tue, 22 Nov 2016 20:51:35 +0000 (12:51 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes for autogroup scheduling, for races when turning the feature
on/off via /proc/sys/kernel/sched_autogroup_enabled"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/autogroup: Do not use autogroup->tg in zombie threads
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
Linus Torvalds [Tue, 22 Nov 2016 20:17:49 +0000 (12:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two fixes to make (very) old Intel CPUs boot reliably
- fix the intel-mid driver and rename it
- two KASAN false positive fixes
- an FPU fix
- two sysfb fixes
- two build fixes related to new toolchain versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
x86/platform/intel-mid: Register watchdog device after SCU
x86/fpu: Fix invalid FPU ptrace state after execve()
x86/boot: Fail the boot if !M486 and CPUID is missing
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
x86/dumpstack: Prevent KASAN false positive warnings
x86/unwind: Prevent KASAN false positive warnings in guess unwinder
x86/boot: Avoid warning for zero-filling .bss
x86/sysfb: Fix lfb_size calculation
x86/sysfb: Add support for 64bit EFI lfb_base
Andre Noll reported panics after my recent fix (commit 34fad54c2537
"net: __skb_flow_dissect() must cap its return value")
After some more headaches, Alexander root caused the problem to
init_default_flow_dissectors() being called too late, in case
a network driver like IGB is not a module and receives DHCP message
very early.
Fix is to call init_default_flow_dissectors() much earlier,
as it is a core infrastructure and does not depend on another
kernel service.
Fixes: 06635a35d13d4 ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Andre Noll <[email protected]> Diagnosed-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.
That driver has a change_mtu method explicitly for sending
a message to the hardware. If that fails it returns an
error.
Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.
However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.
Arnd Bergmann [Tue, 22 Nov 2016 14:21:22 +0000 (15:21 +0100)]
marvell: mark mvneta and mvpp2 32-bit only
Both of these drivers won't work on 64-bit architectures unless they
are redesigned, since they store a virtual address pointer in a 32-bit
field of the descriptors:
drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
This limits the COMPILE_TEST option for the two drivers again to
only build them on 32-bit. This seems nicer than shutting up the
warnings, in case we ever actually want to use them on 64-bit,
as the warnings indicate which parts of the driver are currently
broken there.
Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ivan Vecera [Tue, 22 Nov 2016 10:24:13 +0000 (11:24 +0100)]
mlxsw: core: Implement thermal zone
Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.
Jiri Pirko [Tue, 22 Nov 2016 10:24:12 +0000 (11:24 +0100)]
mlxsw: reg: Add Management Fan Speed Limit register
The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.
David S. Miller [Tue, 22 Nov 2016 14:55:32 +0000 (09:55 -0500)]
Merge branch 'mv88e6390-initial-support'
Andrew Lunn says:
====================
Start adding support for mv88e6390
This is the first patchset implementing support for the mv88e6390
family. This is a new generation of switch devices and has numerous
incompatible changes to the registers. These patches allow the switch
to the detected during probe, and makes the statistics unit work.
These patches are insufficient to make the mv88e6390 functional. More
patches will follow.
v2:
Move stats code into global1
Change DT compatible string to mv88e6190
Fixed mv88e6351 stats which v1 had broken
====================
The mv88e6390 uses a different bit to select between bank0 and bank1
of the statistics. So implement an ops function for this, and pass the
selector bit to the generic stats read function. Also, the histogram
selection has moved for the mv88e6390, so abstract its selection as
well.
Andrew Lunn [Mon, 21 Nov 2016 22:27:03 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add stats_get_stats to ops structure
Different families have different sets of statistics. Abstract this
using a stats_get_stats op. The mv88e6390 needs a different
implementation, which will be added later.
Andrew Lunn [Mon, 21 Nov 2016 22:27:02 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add stats_get_sset_count|string to ops structure
Different families have different sets of statistics. Abstract this
using a stats_get_sset_count and stats_get_strings op. Each stat has a
bitmap, and the ops implementer uses a bit map mask to count the
statistics which apply for the family, or return the list of strings.
Signed-off-by: Andrew Lunn <[email protected]>
v2:
Rename functions to avoid _ prefix. Signed-off-by: David S. Miller <[email protected]>
Andrew Lunn [Mon, 21 Nov 2016 22:27:01 +0000 (23:27 +0100)]
net: dsa: mv88e6xxx: Add mv88e6390 statistics unit init
The statistics unit on the mv88e6390 needs the histogram mode to be
configured in a different register compared to other devices. Add an
ops to do this.
Signed-off-by: Andrew Lunn <[email protected]>
v2:
Rename to mv88e6390_g1_stats_set_histogram
Move into global1.c Signed-off-by: David S. Miller <[email protected]>
The MV88E6390 has a control register for what the histogram statistics
actually contain. This means the stat_snapshot method should not set
this information. So implement the 6390 stats_snapshot function without
these bits.
Andrew Lunn [Mon, 21 Nov 2016 22:26:59 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Add comment about family a device belongs to
Knowing the family of device belongs to helps with picking the ops
implementation which is appropriate to the device. So add a comment to
each structure of ops.
Andrew Lunn [Mon, 21 Nov 2016 22:26:58 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Abstract stats_snapshot into ops structure
Taking a stats snapshot differs between same families. Abstract this
into an ops member. At the same time, move the code into global1.[ch],
since the registers are in the global1 range.
Andrew Lunn [Mon, 21 Nov 2016 22:26:57 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Add the mv88e6390 family
With the devices added to the tables, the probe will recognize the
switch. This however is not sufficient to make it work properly, other
changes are needed because of incompatibilities.
Russell King [Tue, 22 Nov 2016 13:56:54 +0000 (13:56 +0000)]
drm/arm: hdlcd: fix plane base address update
While testing HDMI with Xorg on the Juno board, I find that when Xorg
starts up or shuts down, the display is shifted significantly to the
right and wrapped in the active region. (No sync bars are visible.)
The timings are correct, it behaves as if the start address has been
shifted many pixels _into_ the framebuffer.
This occurs whenever the display mode size is changed - using xrandr
in Xorg shows that changing the resolution triggers the problem
almost every time, but changing the refresh rate does not.
Using devmem2 to disable and re-enable the HDLCD resolves the issue,
and repeated disable/enable cycles do not make the issue re-appear.
Further debugging shows that we try to update the controller
configuration while enabled.
Alwys ensure that the HDLCD is disabled prior to updating the
controller timings, and use drm_crtc_vblank_off()/drm_crtc_vblank_on()
so that DRM knows whether it can expect vblank interrupts.
Peter Zijlstra [Thu, 17 Nov 2016 17:17:31 +0000 (18:17 +0100)]
perf/x86/intel: Cure bogus unwind from PEBS entries
Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
unwinds sometimes do 'weird' things. In particular, we seemed to be
ending up unwinding from random places on the NMI stack.
While it was somewhat expected that the event record BP,SP would not
match the interrupt BP,SP in that the interrupt is strictly later than
the record event, it was overlooked that it could be on an already
overwritten stack.
Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
when we need stack unwinds.
Note that its still possible the unwind doesn't full match the actual
event, as its entirely possible to have done an (I)RET between record
and interrupt, but on average it should still point in the general
direction of where the event came from. Also, it's the best we can do,
considering.
The particular scenario that triggered the bogus NMI stack unwind was
a PEBS event with very short period, upon enabling the event at the
tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
triggers a record (while still on the NMI stack) which in turn
triggers the next PMI. This then causes back-to-back NMIs and we'll
try and unwind the stack-frame from the last NMI, which obviously is
now overwritten by our own.
Johannes Weiner [Tue, 22 Nov 2016 09:57:42 +0000 (10:57 +0100)]
perf/x86: Restore TASK_SIZE check on frame pointer
The following commit:
75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
access_ok() check.
Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
whereas the access_ok() uses whatever the current address limit of the task is.
We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
then see vmalloc faults when we access what looks like pointers into vmalloc
space:
Oleg Nesterov [Mon, 14 Nov 2016 18:46:12 +0000 (19:46 +0100)]
sched/autogroup: Do not use autogroup->tg in zombie threads
Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().
So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.
Oleg Nesterov [Mon, 14 Nov 2016 18:46:09 +0000 (19:46 +0100)]
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.
However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:
int main(void)
{
int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
assert(sctl > 0);
if (fork()) {
wait(NULL); // destroy the child's ag/tg
pause();
}
Nicholas Piggin [Tue, 22 Nov 2016 03:52:22 +0000 (14:52 +1100)]
powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
After patch 4efca4ed0 ("kbuild: modversions for EXPORT_SYMBOL() for asm"),
asm exports can get modversions CRCs generated if they have C definitions
in asm-prototypes.h. This patch adds missing definitions for 32 and 64 bit
allmodconfig builds.
Fixes: 9445aa1a3062 ("ppc: move exports to definitions") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Herbert Xu [Mon, 21 Nov 2016 08:26:19 +0000 (16:26 +0800)]
crypto: scatterwalk - Remove unnecessary aliasing check in map_and_copy
The aliasing check in map_and_copy is no longer necessary because
the IPsec ESP code no longer provides an IV that points into the
actual request data. As this check is now triggering BUG checks
due to the vmalloced stack code, I'm removing it.
Herbert Xu [Mon, 21 Nov 2016 07:34:00 +0000 (15:34 +0800)]
crypto: algif_hash - Fix result clobbering in recvmsg
Recently an init call was added to hash_recvmsg so as to reset
the hash state in case a sendmsg call was never made.
Unfortunately this ended up clobbering the result if the previous
sendmsg was done with a MSG_MORE flag. This patch fixes it by
excluding that case when we make the init call.
Fixes: a8348bca2944 ("algif_hash - Fix NULL hash crash with shash") Reported-by: Patrick Steinhardt <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
There is a new bit, LPCR_PECE_HVEE (Hypervisor Virtualization Exit
Enable), which controls wakeup from STOP states on Hypervisor
Virtualization Interrupts (which happen to also be all external
interrupts in host or bare metal mode).
It needs to be set or we will miss wakeups.
Fixes: 9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts") Cc: [email protected] # v4.8+ Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[mpe: Rename it to HVEE to match the name in the ISA] Signed-off-by: Michael Ellerman <[email protected]>
Linus Torvalds [Mon, 21 Nov 2016 23:27:41 +0000 (15:27 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull apparmor bugfix from James Morris:
"This has a fix for a policy replacement bug that is fairly serious for
apache mod_apparmor users, as it results in the wrong policy being
applied on an network facing service"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix change_hat not finding hat after policy replacement
1) With modern networking cards we can run out of 32-bit DMA space, so
support 64-bit DMA addressing when possible on sparc64. From Dave
Tushar.
2) Some signal frame validation checks are inverted on sparc32, fix
from Andreas Larsson.
3) Lockdep tables can get too large in some circumstances on sparc64,
add a way to adjust the size a bit. From Babu Moger.
4) Fix NUMA node probing on some sun4v systems, from Thomas Tai.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: drop duplicate header scatterlist.h
lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
sunbmac: Fix compiler warning
sunqe: Fix compiler warnings
sparc64: Enable 64-bit DMA
sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
sparc64: Bind PCIe devices to use IOMMU v2 service
sparc64: Initialize iommu_map_table and iommu_pool
sparc64: Add ATU (new IOMMU) support
sparc64: Add FORCE_MAX_ZONEORDER and default to 13
sparc64: fix compile warning section mismatch in find_node()
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
sparc64: Fix find_node warning if numa node cannot be found
Mika Westerberg [Mon, 21 Nov 2016 13:33:07 +0000 (15:33 +0200)]
watchdog: wdat_wdt: Select WATCHDOG_CORE
The WDAT watchdog driver uses functionality provided by the watchdog timer
core but it did not select it explicitly. This results following linker
error when only WDAT_WDT is enabled in Kconfig:
drivers/built-in.o: In function `wdat_wdt_probe':
drivers/watchdog/wdat_wdt.c:444: undefined reference to `devm_watchdog_register_device'
Fix this by explicitly selecting WATCHDOG_CORE when WDAT watchdog driver is
enabled.
Fixes: 058dfc767008 (ACPI / watchdog: Add support for WDAT hardware watchdog) Reported-by: Vegard Nossum <[email protected]> Signed-off-by: Mika Westerberg <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
1) Clear congestion control state when changing algorithms on an
existing socket, from Florian Westphal.
2) Fix register bit values in altr_tse_pcs portion of stmmac driver,
from Jia Jie Ho.
3) Fix PTP handling in stammc driver for GMAC4, from Giuseppe
CAVALLARO.
4) Fix udplite multicast delivery handling, it ignores the udp_table
parameter passed into the lookups, from Pablo Neira Ayuso.
5) Synchronize the space estimated by rtnl_vfinfo_size and the space
actually used by rtnl_fill_vfinfo. From Sabrina Dubroca.
6) Fix memory leak in fib_info when splitting nodes, from Alexander
Duyck.
7) If a driver does a napi_hash_del() explicitily and not via
netif_napi_del(), it must perform RCU synchronization as needed. Fix
this in virtio-net and bnxt drivers, from Eric Dumazet.
8) Likewise, it is not necessary to invoke napi_hash_del() is we are
also doing neif_napi_del() in the same code path. Remove such calls
from be2net and cxgb4 drivers, also from Eric Dumazet.
9) Don't allocate an ID in peernet2id_alloc() if the netns is dead,
from WANG Cong.
10) Fix OF node and device struct leaks in of_mdio, from Johan Hovold.
11) We cannot cache routes in ip6_tunnel when using inherited traffic
classes, from Paolo Abeni.
12) Fix several crashes and leaks in cpsw driver, from Johan Hovold.
13) Splice operations cannot use freezable blocking calls in AF_UNIX,
from WANG Cong.
14) Link dump filtering by master device and kind support added an error
in loop index updates during the dump if we actually do filter, fix
from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
tcp: zero ca_priv area when switching cc algorithms
net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit
ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC
tipc: eliminate obsolete socket locking policy description
rtnl: fix the loop index update error in rtnl_dump_ifinfo()
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
net: macb: add check for dma mapping error in start_xmit()
rtnetlink: fix FDB size computation
netns: fix get_net_ns_by_fd(int pid) typo
af_unix: conditionally use freezable blocking calls in read
net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
net: ethernet: ti: cpsw: add missing sanity check
net: ethernet: ti: cpsw: fix secondary-emac probe error path
net: ethernet: ti: cpsw: fix of_node and phydev leaks
net: ethernet: ti: cpsw: fix deferred probe
net: ethernet: ti: cpsw: fix mdio device reference leak
net: ethernet: ti: cpsw: fix bad register access in probe error path
net: sky2: Fix shutdown crash
cfg80211: limit scan results cache size
net sched filters: pass netlink message flags in event notification
...
Declare the structure ieee802154_ops as const as it is only passed as an
argument to the function ieee802154_alloc_hw. This argument is of type
const struct ieee802154_ops *, so ieee80254_ops structures having this
property can be declared as const.
Done using Coccinelle:
David S. Miller [Mon, 21 Nov 2016 19:05:50 +0000 (14:05 -0500)]
Merge branch 'geneve-lwt-efficiency'
Pravin B Shelar says:
====================
geneve: Use LWT more effectively.
Following patch series make use of geneve LWT code path for
geneve netdev type of device.
This allows us to simplify geneve module without changing any
functionality.
v2-v3:
Rebase against latest net-next.
v1-v2:
Fix warning reported by kbuild test robot.
====================
pravin shelar [Mon, 21 Nov 2016 19:03:01 +0000 (11:03 -0800)]
geneve: Optimize geneve device lookup.
Rather than comparing 64-bit tunnel-id, compare tunnel vni
which is 24-bit id. This also save conversion from vni
to tunnel id on each tunnel packet receive.
pravin shelar [Mon, 21 Nov 2016 19:02:58 +0000 (11:02 -0800)]
geneve: Unify LWT and netdev handling.
Current geneve implementation has two separate cases to handle.
1. netdev xmit
2. LWT xmit.
In case of netdev, geneve configuration is stored in various
struct geneve_dev members. For example geneve_addr, ttl, tos,
label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
to the device in ip_tunnel_info.
Following patch uses ip_tunnel_info struct to store almost all
of configuration of a geneve netdevice. This allows us to unify
most of geneve driver code around ip_tunnel_info struct.
This dramatically simplify geneve code, since it does not
need to handle two different configuration cases. Removes
duplicate code, single code path can handle either type
of geneve devices.
David S. Miller [Mon, 21 Nov 2016 18:20:17 +0000 (13:20 -0500)]
Merge branch 'tcp-cong-undo_cwnd-mandatory'
Florian Westphal says:
====================
tcp: make undo_cwnd mandatory for congestion modules
highspeed, illinois, scalable, veno and yeah congestion control algorithms
don't provide a 'cwnd_undo' function. This makes the stack default to a
'reno undo' which doubles cwnd. However, the ssthresh implementation of
these algorithms do not halve the slowstart threshold. This causes similar
issue as the one fixed for dctcp in ce6dd23329b1e ("dctcp: avoid bogus
doubling of cwnd after loss").
In light of this it seems better to remove the fallback and make undo_cwnd
mandatory.
First patch fixes those spots where reno undo seems incorrect by providing
.cwnd_undo functions, second patch removes the fallback.
====================
Florian Westphal [Mon, 21 Nov 2016 13:18:38 +0000 (14:18 +0100)]
tcp: make undo_cwnd mandatory for congestion modules
The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
which un-does reno halving behaviour.
It seems more appropriate to let congctl algorithms pair .ssthresh
and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
up for all congestion algorithms that used to rely on the fallback.
Florian Westphal [Mon, 21 Nov 2016 13:18:37 +0000 (14:18 +0100)]
tcp: add cwnd_undo functions to various tcp cc algorithms
congestion control algorithms that do not halve cwnd in their .ssthresh
should provide a .cwnd_undo rather than rely on current fallback which
assumes reno halving (and thus doubles the cwnd).
All of these do 'something else' in their .ssthresh implementation, thus
store the cwnd on loss and provide .undo_cwnd to restore it again.
A followup patch will remove the fallback and all algorithms will
need to provide a .cwnd_undo function.
David S. Miller [Mon, 21 Nov 2016 18:16:59 +0000 (13:16 -0500)]
Merge branch 'bridge-igmpv3-mldv2-support'
Nikolay Aleksandrov says:
====================
bridge: add support for IGMPv3 and MLDv2 querier
This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
Two new options which can be toggled via netlink and sysfs are added that
control the version per-bridge:
multicast_igmp_version - default 2, can be set to 3
multicast_mld_version - default 1, can be set to 2 (this option is
disabled if CONFIG_IPV6=n)
Note that the names do not include "querier", I think that these options
can be re-used later as more IGMPv3 support is added to the bridge so we
can avoid adding more options to switch between v2 and v3 behaviour.
The set uses the already existing br_ip{4,6}_multicast_alloc_query
functions and adds the appropriate header based on the chosen version.
For the initial support I have removed the compatibility implementation
(RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
some details that we need to sort out.
====================
This patch adds basic support for MLDv2 queries, the default is MLDv1
as before. A new multicast option - multicast_mld_version, adds the
ability to change it between 1 and 2 via netlink and sysfs.
The MLD option is disabled if CONFIG_IPV6 is disabled.
This patch adds basic support for IGMPv3 queries, the default is IGMPv2
as before. A new multicast option - multicast_igmp_version, adds the
ability to change it between 2 and 3 via netlink and sysfs. The option
struct member is in a 4 byte hole in net_bridge.
There also a few minor style adjustments in br_multicast_new_group and
br_multicast_add_group.
Florian Westphal [Mon, 21 Nov 2016 09:08:37 +0000 (10:08 +0100)]
tcp: zero ca_priv area when switching cc algorithms
We need to zero out the private data area when application switches
connection to different algorithm (TCP_CONGESTION setsockopt).
When congestion ops get assigned at connect time everything is already
zeroed because sk_alloc uses GFP_ZERO flag. But in the setsockopt case
this contains whatever previous cc placed there.
Gao Feng [Mon, 21 Nov 2016 00:56:21 +0000 (08:56 +0800)]
net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit
The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packe is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.
So l2tp_eth_dev_xmit should add the NET_XMIT_CN check.
NFSv4.1: Keep a reference on lock states while checking
While walking the list of lock_states, keep a reference on each
nfs4_lock_state to be checked, otherwise the lock state could be removed
while the check performs TEST_STATEID and possible FREE_STATEID.
Eric Dumazet [Sun, 20 Nov 2016 17:24:36 +0000 (09:24 -0800)]
mlx4: avoid unnecessary dirtying of critical fields
While stressing a 40Gbit mlx4 NIC with busy polling, I found false
sharing in mlx4 driver that can be easily avoided.
This patch brings an additional 7 % performance improvement in UDP_RR
workload.
1) If we received no frame during one mlx4_en_process_rx_cq()
invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
2) Do not refill rx buffers if we have plenty of them.
This avoids false sharing and allows some bulk/batch optimizations.
Page allocator and its locks will thank us.
Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
cpu handling NIC IRQ should be changed. We should return budget-1
instead, to not fool net_rx_action() and its netdev_budget.
v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
David S. Miller [Mon, 21 Nov 2016 16:25:59 +0000 (11:25 -0500)]
Merge branch 'mlx5-bpf-refcnt-fixes'
Daniel Borkmann says:
====================
Couple of BPF refcount fixes for mlx5
Various mlx5 bugs on eBPF refcount handling found during review.
Last patch in series adds a __must_check to BPF helpers to make
sure we won't run into it again w/o compiler complaining first.
v2 -> v3:
- Just reworked patch 2/4 so we don't need bpf_prog_sub().
- Rebased, rest as is.
v1 -> v2:
- After discussion with Alexei, we agreed upon rebasing the
patches against net-next.
- Since net-next, I've also added the __must_check to enforce
future users to check for errors.
- Fixed up commit message #2.
- Simplify assignment from patch #1 based on Saeed's feedback
on previous set.
====================
Daniel Borkmann [Sat, 19 Nov 2016 00:45:03 +0000 (01:45 +0100)]
bpf: add __must_check attributes to refcount manipulating helpers
Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
with an error, so make sure the caller properly checks their return
value and not just ignores it, which could worst-case lead to use
after free.
Daniel Borkmann [Sat, 19 Nov 2016 00:45:02 +0000 (01:45 +0100)]
bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled, otherwise we leak the program.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:01 +0000 (01:45 +0100)]
bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
There are multiple issues in mlx5e_xdp_set():
1) The batched bpf_prog_add() is currently not checked for errors. When
doing so, it should be done at an earlier point in time to makes sure
that we cannot fail anymore at the time we want to set the program for
each channel. The batched refs short-cut can only be performed when we
don't need to perform a reset for changing the rq type and the device
was in opened state. In case the device was not in opened state, then
the next mlx5e_open_locked() will aquire the refs from the control prog
via mlx5e_create_rq(), same when we need to perform a reset.
2) When swapping the priv->xdp_prog, then no extra reference count must be
taken since we got that from call path via dev_change_xdp_fd() already.
Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
without checking the return code could fail.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Borkmann [Sat, 19 Nov 2016 00:45:00 +0000 (01:45 +0100)]
bpf, mlx5: fix mlx5e_create_rq taking reference on prog
In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail since 92117d8443bc
("bpf: fix refcnt overflow"), so we really must check it. Take the reference
right when we assign it to the rq from priv->xdp_prog, and just drop the
reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jacob Pan [Mon, 14 Nov 2016 19:08:45 +0000 (11:08 -0800)]
thermal/powerclamp: add back module device table
Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
cpu id table with a cpu feature check, which is logically correct.
But we need the module device table to allow module auto loading.
The token table passed into match_token() must be null-terminated, which
it currently is not in the perf's address filter string parser, as caught
by Vince's perf_fuzzer and KASAN.
It doesn't blow up otherwise because of the alignment padding of the table
to the next element in the .rodata, which is luck.
Fixing by adding a null-terminator to the token table.
Jaehoon Chung [Mon, 21 Nov 2016 01:51:48 +0000 (10:51 +0900)]
mmc: dw_mmc: fix the error handling for dma operation
When dma->start is failed,then it has to fall back to PIO mode
for current transfer.
But Host controller was already set to bits relevant to DMA operation.
If needs to use the PIO mode, Host controller has to stop the DMA
operation. (It's more stable than now.)
When it occurred error, it's not running any request.
Andy Shevchenko [Fri, 18 Nov 2016 16:52:24 +0000 (18:52 +0200)]
x86/platform/intel-mid: Register watchdog device after SCU
Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU
IPC channel to send commands and receive responses. If watchdog driver is
initialized quite before SCU and a command has been sent the result is always
an error like the following:
Yu-cheng Yu [Thu, 17 Nov 2016 17:11:35 +0000 (09:11 -0800)]
x86/fpu: Fix invalid FPU ptrace state after execve()
Robert O'Callahan reported that after an execve PTRACE_GETREGSET
NT_X86_XSTATE continues to return the pre-exec register values
until the exec'ed task modifies FPU state.
Andy Lutomirski [Sat, 19 Nov 2016 23:37:30 +0000 (15:37 -0800)]
x86/boot: Fail the boot if !M486 and CPUID is missing
Linux will have all kinds of sporadic problems on systems that don't
have the CPUID instruction unless CONFIG_M486=y. In particular,
sync_core() will explode.
I believe that these kernels had a better chance of working before
commit 05fb3c199bb0 ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
even if we don't have CPUID"). That commit inadvertently fixed a
serious bug: we used to fail to detect the FPU if CPUID wasn't
present. Because we also used to forget to set X86_FEATURE_ALWAYS, we
end up with no cpu feature bits set at all. This meant that
alternative patching didn't do anything and, if paravirt was disabled,
we could plausibly finish the entire boot process without calling
sync_core().
Rather than trying to work around these issues, just have the kernel
fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
and doesn't have CONFIG_M486 set.
Andy Lutomirski [Sun, 20 Nov 2016 02:42:40 +0000 (18:42 -0800)]
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
On the 80486 DX, it seems that some exceptions may leave garbage in
the high bits of CS. This causes sporadic failures in which
early_fixup_exception() refuses to fix up an exception.
As far as I can tell, this has been buggy for a long time, but the
problem seems to have been exacerbated by commits:
1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines")
This appears to have broken for as long as we've had early
exception handling.
[ Note to stable maintainers: This patch is needed all the way back to 3.4,
but it will only apply to 4.6 and up, as it depends on commit:
0e861fbb5bda ("x86/head: Move early exception panic code into early_fixup_exception()")
If you want to backport to kernels before 4.6, please don't backport the
prerequisites (there was a big chain of them that rewrote a lot of the
early exception machinery); instead, ask me and I can send you a one-liner
that will apply. ]
John Johansen [Thu, 1 Sep 2016 04:10:06 +0000 (21:10 -0700)]
apparmor: fix change_hat not finding hat after policy replacement
After a policy replacement, the task cred may be out of date and need
to be updated. However change_hat is using the stale profiles from
the out of date cred resulting in either: a stale profile being applied
or, incorrect failure when searching for a hat profile as it has been
migrated to the new parent profile.
Fixes: 01e2b670aa898a39259bc85c78e3d74820f4d3b6 (failure to find hat) Fixes: 898127c34ec03291c86f4ff3856d79e9e18952bc (stale policy being applied)
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000287 Cc: [email protected] Signed-off-by: John Johansen <[email protected]> Signed-off-by: James Morris <[email protected]>
David S. Miller [Mon, 21 Nov 2016 02:16:14 +0000 (21:16 -0500)]
Merge branch 'mV88e6xxx-interrupt-fixes'
Andrew Lunn says:
====================
Fixes for the MV88e6xxx interrupt code
The interrupt code was never tested with a board who's probing
resulted in an -EPROBE_DEFFERED. So the clean up paths never got
tested. I now do have -EPROBE_DEFFERED, and things break badly during
cleanup. These are the fixes.
This is fixing code in net-next.
v2:
Fix typo pointed out by David Miller
====================
Andrew Lunn [Sun, 20 Nov 2016 19:14:18 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix releasing for the global2 interrupts
It is not possible to use devm_request_threaded_irq() because we have
two stacked interrupt controllers in one device. The lower interrupt
controller cannot be removed until the upper is fully removed. This
happens too late with the devm API, resulting in error messages about
removing a domain while there is still an active interrupt. Swap to
using request_threaded_irq() and manage the release of the interrupt
manually.
Andrew Lunn [Sun, 20 Nov 2016 19:14:17 +0000 (20:14 +0100)]
net: dsa: mv88e6xxx: Fix cleanup on error for g1 interrupt setup
On error, remask the interrupts, release all maps, and remove the
domain. This cannot be done using the mv88e6xxx_g1_irq_free() because
some of these actions are not idempotent.