Git Repo - linux.git/log

gue: Call remcsum_adjust

Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Add remcsum_adjust as common function for remote checksum offload

This function does the work to update a checksum field as part of
remote checksum offload.

remcsum_adjust does the following:

1) Subtract out the calculated checksum from the beginning of the
   packet (ptr arg) to the start offset.
2) Adjust the checksum field indicated by offset based on the modified
   checksum value from above step.
3) Return the difference in the old checksum field value and the
   new one. The caller will use this to update skb->csum and NAPI csum.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

pkt_sched: fq: increase max delay from 125 ms to one second

FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.

It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.

Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Limit count field to 24 bits in qp_alloc_res

Some VF drivers use the upper byte of "param1" (the qp count field)
in mlx4_qp_reserve_range() to pass flags which are used to optimize
the range allocation.

Under the current code, if any of these flags are set, the 32-bit
count field yields a count greater than 2^24, which is out of range,
and this VF fails.

As these flags represent a "best-effort" allocation hint anyway, they may
safely be ignored. Therefore, the PF driver may simply mask out the bits.

Fixes: c82e9aa0a8 "mlx4_core: resource tracking for HCA resources used by guests"
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bcm_sf2'

Florian Fainelli says:

====================
net: dsa: bcm_sf2: misc bugfixes

This patch series contains two bug fixes:

- first patch fixes an issue on the error path of the driver where we could
have left some of our registers mapped

- second patch enforces the use of a software reset of the switch to guarantee
the HW is in a consistent state prior to software initialization
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: reset switch prior to initialization

Our boot agent may have left the switch in an certain configuration
state, make sure we issue a software reset prior to configuring the
switch in order to ensure the HW is in a consistent state, in particular
transmit queues and internal buffers.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: fix unmapping registers in case of errors

In case we fail to ioremap() one of our registers, we would be leaking
existing mappings, unwind those accordingly on errors.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create()

If we detect another vCPU is running we just exit and return 0 as if we
succesfully created the VGIC, but the VGIC wouldn't actual be created.

This shouldn't break in-kernel behavior because the kernel will not
observe the failed the attempt to create the VGIC, but userspace could
be rightfully confused.

Cc: Andre Przywara <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI

When running on a system with a GICv3, we currenly don't allow the guest
to access the system register interface of the GICv3. We do this by
clearing the ICC_SRE_EL2.Enable, which causes all guest accesses to
ICC_SRE_EL1 to trap to EL2 and causes all guest accesses to other ICC_
registers to cause an undefined exception in the guest.

However, we currently don't handle the trap of guest accesses to
ICC_SRE_EL1 and will spill out a warning. The trap just needs to handle
the access as RAZ/WI, and a guest that tries to prod this register and
set ICC_SRE_EL1.SRE=1, must read back the value (which Linux already
does) to see if it succeeded, and will thus observe that ICC_SRE_EL1.SRE
was not set.

Add the simple trap handler in the sorted table of the system registers.

Signed-off-by: Christoffer Dall <[email protected]>
[ardb: added cp15 handling]
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ufs: fix NULL dereference when no regulators are defined

If no voltage supply regulators are defined for the UFS devices (assumed
they are always-on), ufshcd_config_vreg_load() can be called on
suspend/resume paths with vreg == NULL as hba->vreg_info.vcc* equal to
NULL, and it causes NULL pointer dereference.

This fixes it by making ufshcd_config_vreg_{h,l}pm noop when no regulators
are defined.

Signed-off-by: Akinobu Mita <[email protected]>
Reviewed-by: Subhash Jadavani <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

ufs: ensure clk gating work is finished before module unloading

When dynamic clk gating feature is enabled, delayed workqueue machanism
is used in order to detect certain period of inactivity. But there is no
guarantee that scheduled gating work is completed before module unloading.
So it can cause kernel crash by accessing memory after it was freed.

Fix it by cancelling clk gating and ungating works and ensure that its
execution is finished before module unloading.

Signed-off-by: Akinobu Mita <[email protected]>
Reviewed-by: Subhash Jadavani <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

irqchip: brcmstb-l2: Fix error handling of irq_of_parse_and_map

Return value of irq_of_parse_and_map() is unsigned int, with 0
indicating failure, so testing for negative result never works.

Signed-off-by: Dmitry Torokhov <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Kevin Cernekee <[email protected]>
Link: https://lkml.kernel.org/r/20141114221642.GA37468@dtor-ws
Signed-off-by: Jason Cooper <[email protected]>

irqchip: bcm7120-l2: Fix error handling of irq_of_parse_and_map

Return value of irq_of_parse_and_map() is unsigned int, with 0
indicating failure, so testing for negative result never works.

Signed-off-by: Dmitry Torokhov <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Kevin Cernekee <[email protected]>
Link: https://lkml.kernel.org/r/20141114221614.GA37395@dtor-ws
Signed-off-by: Jason Cooper <[email protected]>

Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
"These fix one mishandling of the case when security labels are
  configured out, and two races in the 4.1 backchannel code"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  SUNRPC: Fix locking around callback channel reply receive
  nfsd: correctly define v4.2 support attributes

Merge git://git.kvack.org/~bcrl/aio-fixes

Pull aio fix from Ben LaHaise:
"Dirty page accounting fix for aio"

* git://git.kvack.org/~bcrl/aio-fixes:
aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
"This series fix a nasty issue with radeon adapters on powerpc servers,
  it's all CC'ed stable and has the relevant maintainers ack's/reviews.

  Basically, some (radeon) adapters have issues with MSI addresses above
  1T (only support 40-bits).  We had powerpc specific quirk but it only
  listed a specific revision of an adapter that we shipped with our
  machines and didn't properly handle the audio function which some
  distros enable nowadays.

  So we made the quirk generic and fixed both the graphic and audio
  drivers properly to use it.

  Without that, ppc64 server machines will crash at boot with a radeon
  adapter.

  Note: This has been brewing for a while, it just needed a last respin
  which got delayed due to us moving ozlabs to a new location in town
  and other such things taking priority"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pci: Remove unused force_32bit_msi quirk
  powerpc/pseries: Honor the generic "no_64bit_msi" flag
  powerpc/powernv: Honor the generic "no_64bit_msi" flag
  sound/radeon: Move 64-bit MSI quirk from arch to driver
  gpu/radeon: Set flag to indicate broken 64-bit MSI
  PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
  ALSA: hda - Limit 40bit DMA for AMD HDMI controllers

Merge tag 'hwmon-for-linus-v3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull a hwmon fix from Guenter Roeck:
"Fix hwmon registration problem in g762 driver"

* tag 'hwmon-for-linus-v3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (g762) fix call to devm_hwmon_device_register_with_groups()

Merge tag 'clk-fixes-for-linus' of https://git.linaro.org/people/mike.turquette/linux

Pull clock fixes from Mike Turquette:
"The fixes for the clock framework are all regressions in drivers, plus
  a single fix in one of the basic clock templates.  No fixes to the
  core this time around.

  As with most clock driver fixes these run the gamut from fixing a
  build warning to fixing wrecked memory timings, with a little USB
  tossed in for fun"

* tag 'clk-fixes-for-linus' of https://git.linaro.org/people/mike.turquette/linux:
  clk: pxa: fix pxa27x CCCR bit usage
  clk-divider: Fix READ_ONLY when divider > 1
  clk: qcom: Fix duplicate rbcpr clock name
  clk: at91: usb: fix at91sam9x5 recalc, round and set rate
  clk: at91: usb: fix at91rm9200 round and set rate

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

More work from Al Viro to move away from modifying iovecs
by using iov_iter instead.

Signed-off-by: David S. Miller <[email protected]>

net: Hyper-V: Deletion of an unnecessary check before the function call "vfree"

The vfree() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "serial: of-serial: add PM suspend/resume support"

This reverts commit 2dea53bf57783f243c892e99c10c6921e956aa7e.

Turns out to be broken :(

Cc: Jingchang Lu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tg3: fix ring init when there are more TX than RX channels

If TX channels are set to 4 and RX channels are set to less than 4,
using ethtool -L, the driver will try to initialize more RX channels
than it has allocated, causing an oops.

This fix only initializes the RX ring if it has been allocated.

Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix possible NULL dereference in tcp_vX_send_reset()

After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.

If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Daniel Borkmann <[email protected]>
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <[email protected]>

rtlwifi: Change order in device startup

The existing order of steps when starting the PCI devices works for
2.4G devices, but fails to initialize the 5G section of the RTL8821AE
hardware.

This patch is needed to fix the regression reported in Bug #88811
(https://bugzilla.kernel.org/show_bug.cgi?id=88811).

Reported-by: Valerio Passini <[email protected]>
Tested-by: Valerio Passini <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Valerio Passini <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

rtlwifi: rtl8821ae: Fix 5G detection problem

The changes associated with moving this driver from staging to the regular
tree missed one section setting the allowable rates for the 5GHz band.

This patch is needed to fix the regression reported in Bug #88811
(https://bugzilla.kernel.org/show_bug.cgi?id=88811).

Reported-by: Valerio Passini <[email protected]>
Tested-by: Valerio Passini <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Valerio Passini <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"

This reverts commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187.

If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.

Reported-by: Daniel Borkmann <[email protected]>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ipv6_vxlan_outer_udp_csum'

Alexander Duyck says:

====================
Fix outer UDP checksums for IPv6 VXLAN tunnels

In testing against an older kernel I found a couple issues in the IPv6
VXLAN tunnel checksum logic for the outer UDP checksum.

First the default transitioned from using an outer checksum to not using
one. Second, sometime after that the checksum inputs were changed
resulting the checksum not being correct if it were computed.

These two issues prevented a ping from the newer kernel to the older one.
With these two changes applied I verified I was able to send traffic over
the VXLAN tunnel to a link partner on an older kernel.

The boolean flip fix can be submitted for 3.17 stable as well since the
patch that introduced the issue was included in that kernel.
====================

Signed-off-by: David S. Miller <[email protected]>

vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]

In "vxlan: Call udp_sock_create" there was a logic error that resulted in
the default for IPv6 VXLAN tunnels going from using checksums to not using
checksums. Since there is currently no support in iproute2 for setting
these values it means that a kernel after the change cannot talk over a IPv6
VXLAN tunnel to a kernel prior the change.

Fixes: 3ee64f3 ("vxlan: Call udp_sock_create")
Cc: Tom Herbert <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ip6_udp_tunnel: Fix checksum calculation

The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.

Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.

Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")
Cc: Andy Zhou <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4/cxgb4vf/csiostor: Add T4/T5 PCI ID Table

Add a new file t4_pci_id_tbl.h that contains T4/T5 PCI ID Table so that for all
drivers that uses T4/T5 PCI functions changes can be done in one place.

checkpatch.pl script reports following error, which if tried to fix ends up in
compilation error.

ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END \
+ { 0, } \
+ }

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
new file mode 100644

ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_ID_TABLE_FENTRY(devid) \
+ CH_PCI_ID_TABLE_ENTRY((devid) | \
+ ((CH_PCI_DEVICE_ID_FUNCTION) << 8)), \
+ CH_PCI_ID_TABLE_ENTRY((devid) | \
+ ((CH_PCI_DEVICE_ID_FUNCTION2) << 8))

ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } }

ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } }

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-timestamp: Fix a documentation typo

SOF_TIMESTAMPING_OPT_ID puts the id in ee_data, not ee_info.

Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Input: xpad - use proper endpoint type

The xpad wireless endpoint is not a bulk endpoint on my devices, but
rather an interrupt one, so the USB core complains when it is submitted.
I'm guessing that the author really did mean that this should be an
interrupt urb, but as there are a zillion different xpad devices out
there, let's cover out bases and handle both bulk and interrupt
endpoints just as easily.

Signed-off-by: "Pierre-Loup A. Griffais" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>

Input: elantech - trust firmware about trackpoint presence

Only try to parse data as coming from trackpoint if firmware told us that
trackpoint is present.

Fixes commit caeb0d37fa3e387eb0dd22e5d497523c002033d1

Reported-and-tested-by: Marcus Overhagen <[email protected]>
Reported-and-tested-by: Anders Kaseorg <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>

usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000

This wireless mouse receiver needs a reset-resume quirk to properly come
out of reset.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1165206
Signed-off-by: Hans de Goede <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/ping: handle protocol mismatching scenario

ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.

Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Jane Zhou <[email protected]>
Signed-off-by: Yiwei Zhao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smsc911x: Add minimal runtime PM support

Add minimal runtime PM support (enable on probe, disable on remove), to
ensure proper operation with a parent device that uses runtime PM.

This is needed on systems where the external bus controller module of
the SoC is contained in a PM domain and/or has a gateable functional
clock. In such cases, before accessing any device connected to the
external bus, the PM domain must be powered up, and/or the functional
clock must be enabled, which is typically handled through runtime PM by
the bus controller driver.

An example of this is the kzm9g development board, where an smsc9220
Ethernet controller is connected to the Bus State Controller (BSC) of a
Renesas sh73a0 SoC.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: add tipc_netlink.h to uapi Kbuild

tipc_netlink.h is the user-space header for the new netlink api. It
was accidentally left out of the uapi Kbuild list when the api was
added.

Signed-off-by: Richard Alpe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Check for count mismatch while iterating in selftest

Verify whether both the lock and RCU protected iterators see all
test entries before and after expanding and shrinking has been
performed. Also verify whether the number of entries in the hashtable
remains stable during expansion and shrinking.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

af_packet: fix sparse warning

af_packet produces lots of these:
net/packet/af_packet.c:384:39: warning: incorrect type in return expression (different modifiers)
net/packet/af_packet.c:384:39: expected struct page [pure] *
net/packet/af_packet.c:384:39: got struct page *

this seems to be because sparse does not realize that _pure
refers to function, not the returned pointer.

Tweak code slightly to avoid the warning.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

xen-netback: do not report success if backend_create_xenvif() fails

If xenvif_alloc() or xenbus_scanf() fail in backend_create_xenvif(),
xenbus is left in offline mode but netback_probe() reports success.

The patch implements propagation of error code for backend_create_xenvif().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <[email protected]>
Acked-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc_vlan: fix type of tcfv_push_vid

Should be u16. So fix it to kill the sparse warning.

Fixes: c7e2b9689ef8136 "sched: introduce vlan action"
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: gre: fix wrong skb->protocol in WCCP

When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <[email protected]>
Signed-off-by: Yuri Chislov <[email protected]>
Tested-by: Yuri Chislov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: fix sparse warnings in new nl api

Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.

Signed-off-by: Richard Alpe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
   patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
   Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
   from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
   from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
   the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller <[email protected]>

ipvlan: Initial check-in of the IPVLAN driver.

This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.

This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.

The device operates in two different modes and the difference
in these two modes in primarily in the TX side.

(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.

RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.

(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.

RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.

The devices can be added using the "ip" command from the iproute2
package -

ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]

Signed-off-by: Mahesh Bandewar <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Maciej Żenczykowski <[email protected]>
Cc: Laurent Chavey <[email protected]>
Cc: Tim Hockin <[email protected]>
Cc: Brandon Philips <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

8139too: The maximum MTU should allow for VLAN headers

As pointed out by Ben Hutchings drivers that allow using VLAN have to
provide enough headroom for the VLAN tags.

Signed-off-by: Alban Bedel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fec: init maximum receive buffer size for ring1 and ring2

i.MX6SX fec support three rx ring1, the current driver lost to init
ring1 and ring2 maximum receive buffer size, that cause receving
frame date length error. The driver reports "rcv is not +last" error
log in user case.

Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'iwlwifi-for-john-2014-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

Emmanuel Grumbach <[email protected]> says:

"Not all the firmware know how to handle the HOT_SPOT_CMD.
Make sure that the firmware will know this command before
sending it. This avoids a firmware crash."

Signed-off-by: John W. Linville <[email protected]>

rds: switch rds_message_copy_from_user() to iov_iter

Signed-off-by: Al Viro <[email protected]>

rds: switch ->inc_copy_to_user() to passing iov_iter

instances get considerably simpler from that...

Signed-off-by: Al Viro <[email protected]>

[atm] switch vcc_sendmsg() to copy_from_iter()

... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free. A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...

Signed-off-by: Al Viro <[email protected]>

vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr

Signed-off-by: Al Viro <[email protected]>

tipc_msg_build(): pass msghdr instead of its ->msg_iov

Signed-off-by: Al Viro <[email protected]>

tipc_sendmsg(): pass msghdr instead of its ->msg_iov

Signed-off-by: Al Viro <[email protected]>

switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter

Signed-off-by: Al Viro <[email protected]>

switch AF_PACKET and AF_UNIX to skb_copy_datagram_from_iter()

... and kill skb_copy_datagram_iovec()

Signed-off-by: Al Viro <[email protected]>

kill zerocopy_sg_from_iovec()

no users left

Signed-off-by: Al Viro <[email protected]>

{macvtap,tun}_get_user(): switch to iov_iter

allows to switch macvtap and tun from ->aio_write() to ->write_iter()

Signed-off-by: Al Viro <[email protected]>

new helpers: skb_copy_datagram_from_iter() and zerocopy_sg_from_iter()

Signed-off-by: Al Viro <[email protected]>

switch macvtap to ->read_iter()

Signed-off-by: Al Viro <[email protected]>

switch drivers/net/tun.c to ->read_iter()

Signed-off-by: Al Viro <[email protected]>

new helper: memcpy_to_msg()

Signed-off-by: Al Viro <[email protected]>

switch ipxrtr_route_packet() from iovec to msghdr

Signed-off-by: Al Viro <[email protected]>

new helper: memcpy_from_msg()

Signed-off-by: Al Viro <[email protected]>

new helper: skb_copy_and_csum_datagram_msg()

Signed-off-by: Al Viro <[email protected]>

MIPS: Fix address type used for early memory detection.

In 'early_parse_mem' the data type used for the start
and size of a memory region specified on the command line
is incorrect. If 64-bit addressing is used, the value
gets truncated.

Signed-off-by: Steven J. Hill <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8456/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: Kconfig: Don't allow both microMIPS and SmartMIPS to be selected.

microMIPS and SmartMIPS can't be used together. This fixes the
following build problem:

Warning: the 32-bit microMIPS architecture does not support the `smartmips' extension
arch/mips/kernel/entry.S:90: Error: unrecognized opcode `mtlhx $24'
[...]
arch/mips/kernel/entry.S:109: Error: unrecognized opcode `mtlhx $24'

Signed-off-by: Markos Chandras <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7421/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: kernel: cps-vec: Set ISA level to mips32r2 for the MIPS MT ASE

Fixes the following build warnings:
arch/mips/kernel/cps-vec.S: Assembler messages:
arch/mips/kernel/cps-vec.S:228: Warning: the `mt' extension requires
MIPS32 revision 2 or greater
[...]
arch/mips/kernel/cps-vec.S: Assembler messages:
arch/mips/kernel/cps-vec.S:345: Warning: the `mt' extension requires
MIPS32 revision 2 or greater

Signed-off-by: Markos Chandras <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7355/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: Netlogic: handle modular AHCI builds

Commits a951440971d0 ("MIPS: Netlogic: Support for XLP3XX on-chip SATA")
and fedfcb1137d2 ("MIPS: Netlogic: XLP9XX on-chip SATA support") added
ahci-init and ahci-init-xlp2 as objects to build when CONFIG_SATA_AHCI
is enabled.

If CONFIG_SATA_AHCI is made modular, these two files will also get built
as modules (obj-m), which will result in the following linking failure:

ERROR: "nlm_set_pic_extra_ack" [arch/mips/netlogic/xlp/ahci-init.ko]
undefined!
ERROR: "nlm_io_base" [arch/mips/netlogic/xlp/ahci-init.ko] undefined!
ERROR: "nlm_nodes" [arch/mips/netlogic/xlp/ahci-init-xlp2.ko] undefined!
ERROR: "nlm_set_pic_extra_ack"
[arch/mips/netlogic/xlp/ahci-init-xlp2.ko] undefined!
ERROR: "xlp_socdev_to_node" [arch/mips/netlogic/xlp/ahci-init-xlp2.ko]
undefined!
ERROR: "nlm_io_base" [arch/mips/netlogic/xlp/ahci-init-xlp2.ko]
undefined!

Just check whether CONFIG_SATA_AHCI is defined for this build, and if
that is the case, add these objects to the list of built-in object
files.

Signed-off-by: Florian Fainelli <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7855/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: Netlogic: handle modular USB case

Commit 1004165f346a ("MIPS: Netlogic: USB support for XLP") and then
commit 9eac3591e78b ("MIPS: Netlogic: Add support for USB on XLP2xx")
added usb-init and usb-init-xlp2 as objects to build when CONFIG_USB is
enabled.

If CONFIG_USB is made modular, these two files will also get built as
modules (obj-m), which will result in the following linking failure:

ERROR: "nlm_io_base" [arch/mips/netlogic/xlp/usb-init.ko] undefined!
ERROR: "nlm_nodes" [arch/mips/netlogic/xlp/usb-init-xlp2.ko] undefined!
ERROR: "nlm_set_pic_extra_ack" [arch/mips/netlogic/xlp/usb-init-xlp2.ko]
undefined!
ERROR: "xlp_socdev_to_node" [arch/mips/netlogic/xlp/usb-init-xlp2.ko]
undefined!
ERROR: "nlm_io_base" [arch/mips/netlogic/xlp/usb-init-xlp2.ko]
undefined!

Just check whether CONFIG_USB is defined for this build, and if that is
the case, add these objects to the list of built-in object files.

Signed-off-by: Florian Fainelli <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7854/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: Loongson: Make platform serial setup always built-in.

If SERIAL_8250 is compiled as a module, the platform specific setup
for Loongson will be a module too, and it will not work very well.
At least on Loongson 3 it will trigger a build failure,
since loongson_sysconf is not exported to modules.

Fix by making the platform specific serial code always built-in.

Signed-off-by: Aaro Koskinen <[email protected]>
Reported-by: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Markos Chandras <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/8533/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: fix EVA & non-SMP non-FPU FP context signal handling

The save_fp_context & restore_fp_context pointers were being assigned
to the wrong variables if either:

  - The kernel is configured for UP & runs on a system without an FPU,
    since b2ead5282885 "MIPS: Move & rename
    fpu_emulator_{save,restore}_context".

  - The kernel is configured for EVA, since ca750649e08c "MIPS: kernel:
    signal: Prevent save/restore FPU context in user memory".

This would lead to FP context being clobbered incorrectly when setting
up a sigcontext, then the garbage values being saved uselessly when
returning from the signal.

Fix by swapping the pointer assignments appropriately.

Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected] # v3.15+
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8230/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: cpu-probe: Set the FTLB probability bit on supported cores

Make use of the Config6/FLTBP bit to set the probability of a TLBWR
instruction to hit the FTLB or the VTLB. A value of 0 (which may be
the default value on certain cores, such as proAptiv or P5600)
means that a TLBWR instruction will never hit the VTLB which
leads to performance limitations since it effectively decreases
the number of available TLB slots.

Signed-off-by: Markos Chandras <[email protected]>
Reviewed-by: James Hogan <[email protected]>
Cc: <[email protected]> # v3.15+
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8368/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: BMIPS: Fix ".previous without corresponding .section" warnings

Commit 078a55fc824c1 ("Delete __cpuinit/__CPUINIT usage from MIPS code")
removed our __CPUINIT directives, so now the ".previous" directives
are superfluous. Remove them.

Signed-off-by: Kevin Cernekee <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8156/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: uaccess.h: Fix strnlen_user comment.

Signed-off-by: Ralf Baechle <[email protected]>

MIPS: r4kcache: Add EVA case for protected_writeback_dcache_line

Commit de8974e3f76c0 ("MIPS: asm: r4kcache: Add EVA cache flushing
functions") added cache function for EVA using the cachee instruction.
However, it didn't add a case for the protected_writeback_dcache_line.
mips_dsemul() calls r4k_flush_cache_sigtramp() which in turn uses
the protected_writeback_dcache_line() to flush the trampoline code
back to memory. This used the wrong "cache" instruction leading to
random userland crashes on non-FPU cores.

Signed-off-by: Markos Chandras <[email protected]>
Cc: <[email protected]> # v3.15+
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8331/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: Fix info about plat_setup in arch_mem_init comment

Signed-off-by: Rafał Miłecki <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7607/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: rtlx: Remove KERN_DEBUG from pr_debug() arguments in rtlx.c

Signed-off-by: Masanari Iida <[email protected]>
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/7938/
Signed-off-by: Ralf Baechle <[email protected]>

MIPS: SEAD3: Fix LED device registration.

This isn't a module and shouldn't be one.

Signed-off-by: Ralf Baechle <[email protected]>
Cc: Markos Chandras <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/8202/

MIPS: Fix a copy & paste error in unistd.h

Commit 5df4c8dbbc (MIPS: Wire up bpf syscall.) break the N32 build
because of a copy & paste error.

Signed-off-by: Huacai Chen <[email protected]>
Cc: John Crispin <[email protected]>
Cc: Steven J. Hill <[email protected]>
Cc: [email protected]
Cc: Fuxin Zhang <[email protected]>
Cc: Zhangjin Wu <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/8390/
Signed-off-by: Ralf Baechle <[email protected]>

powerpc/pci: Remove unused force_32bit_msi quirk

This is now fully replaced with the generic "no_64bit_msi" one
that is set by the respective drivers directly.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/pseries: Honor the generic "no_64bit_msi" flag

Instead of the arch specific quirk which we are deprecating

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: <[email protected]>

powerpc/powernv: Honor the generic "no_64bit_msi" flag

Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: <[email protected]>

sound/radeon: Move 64-bit MSI quirk from arch to driver

A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

We now have a generic quirk in the PCI code. We should set it
appropriately for all radeon's from the audio driver.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Takashi Iwai <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
CC: <[email protected]>

gpu/radeon: Set flag to indicate broken 64-bit MSI

Some radeon ASICs don't support all 64 address bits of MSIs despite
advertising support for 64-bit MSIs in their configuration space.

This breaks on systems such as IBM POWER7/8, where 64-bit MSIs can
be assigned with some of the high address bits set.

This makes use of the newly introduced "no_64bit_msi" flag in structure
pci_dev to allow the MSI allocation code to fallback to 32-bit MSIs
on those adapters.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
CC: <[email protected]>
---

Adding Alex's review tag. Patch to the driver is identical to the
reviewed one, I dropped the arch/powerpc hunk rewrote the subject
and cset comment.

PCI/MSI: Add device flag indicating that 64-bit MSIs don't work

This can be set by quirks/drivers to be used by the architecture code
that assigns the MSI addresses.

We additionally add verification in the core MSI code that the values
assigned by the architecture do satisfy the limitation in order to fail
gracefully if they don't (ie. the arch hasn't been updated to deal with
that quirk yet).

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>

ALSA: hda - Limit 40bit DMA for AMD HDMI controllers

AMD/ATI HDMI controller chip models, we already have a filter to lower
to 32bit DMA, but the rest are supposed to be working with 64bit
although the hardware doesn't really work with 63bit but only with 40
or 48bit DMA. In this patch, we take 40bit DMA for safety for the
AMD/ATI controllers as the graphics drivers does.

Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: <[email protected]>

ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic

Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():

        hlist_for_each_entry_rcu(t, head, hash_node) {
                if (local == t->parms.iph.saddr &&
                    remote == t->parms.iph.daddr &&
                    link == t->parms.link &&
==>                 type == t->dev->type &&
                    ip_tunnel_key_match(&t->parms, flags, key))
                        break;
        }

the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....

modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti

do that one or more times, kernel will panic.

fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

enic: use netdev_rss_key_fill() helper

Use of well known RSS key might increase attack surface.

Switch to a random one, using generic helper so that all
ports share a common key.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Christian Benvenuti <[email protected]>
Cc: Govindarajulu Varadarajan <[email protected]>
Cc: Sujith Sankar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: coding style improvements (remove assignment in if statements)

This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.

No changes to the resultant object files result as determined by objdiff.

Signed-off-by: Ian Morris <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Linux 3.18-rc6

uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns. I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sched: Provide update_curr callbacks for stop/idle scheduling classes

Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.

Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine.  He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.

What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.

While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()

do_task_stat(()
thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
        if (task_current(rq, p) && task_on_rq_queued(p)) {
          update_rq_clock(rq);
          up->sched_class->update_curr(rq);
        }

If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task->update_curr.  Ooops.

Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.

Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task.  While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.

Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: Chris Mason <[email protected]>
Reported-and-tested-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'x86-traps' (trap handling from Andy Lutomirski)

Merge x86-64 iret fixes from Andy Lutomirski:
"This addresses the following issues:

   - an unrecoverable double-fault triggerable with modify_ldt.
   - invalid stack usage in espfix64 failed IRET recovery from IST
     context.
   - invalid stack usage in non-espfix64 failed IRET recovery from IST
     context.

  It also makes a good but IMO scary change: non-espfix64 failed IRET
  will now report the correct error.  Hopefully nothing depended on the
  old incorrect behavior, but maybe Wine will get confused in some
  obscure corner case"

* emailed patches from Andy Lutomirski <[email protected]>:
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C

x86_64, traps: Rework bad_iret

It's possible for iretq to userspace to fail.  This can happen because
of a bad CS, SS, or RIP.

Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace.  To make this work, there's
an extra fixup to fudge the gs base into a usable state.

This is suboptimal because it loses the original exception.  It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with.  For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.

This patch throws out bad_iret entirely.  As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C.  It's should be clearer and more correct.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

x86_64, traps: Stop using IST for #SS

On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C

There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly. Move it to C.

This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.

Fixes: 3891a04aafd668686239349ea58f3314ea2af86b
Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>