Tom Herbert [Tue, 25 Nov 2014 19:21:20 +0000 (11:21 -0800)]
gue: Call remcsum_adjust
Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).
Tom Herbert [Tue, 25 Nov 2014 19:21:19 +0000 (11:21 -0800)]
net: Add remcsum_adjust as common function for remote checksum offload
This function does the work to update a checksum field as part of
remote checksum offload.
remcsum_adjust does the following:
1) Subtract out the calculated checksum from the beginning of the
packet (ptr arg) to the start offset.
2) Adjust the checksum field indicated by offset based on the modified
checksum value from above step.
3) Return the difference in the old checksum field value and the
new one. The caller will use this to update skb->csum and NAPI csum.
Jack Morgenstein [Tue, 25 Nov 2014 09:54:31 +0000 (11:54 +0200)]
net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
Some VF drivers use the upper byte of "param1" (the qp count field)
in mlx4_qp_reserve_range() to pass flags which are used to optimize
the range allocation.
Under the current code, if any of these flags are set, the 32-bit
count field yields a count greater than 2^24, which is out of range,
and this VF fails.
As these flags represent a "best-effort" allocation hint anyway, they may
safely be ignored. Therefore, the PF driver may simply mask out the bits.
Fixes: c82e9aa0a8 "mlx4_core: resource tracking for HCA resources used by guests" Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
- first patch fixes an issue on the error path of the driver where we could
have left some of our registers mapped
- second patch enforces the use of a software reset of the switch to guarantee
the HW is in a consistent state prior to software initialization
====================
Florian Fainelli [Wed, 26 Nov 2014 02:08:49 +0000 (18:08 -0800)]
net: dsa: bcm_sf2: reset switch prior to initialization
Our boot agent may have left the switch in an certain configuration
state, make sure we issue a software reset prior to configuring the
switch in order to ensure the HW is in a consistent state, in particular
transmit queues and internal buffers.
Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ard Biesheuvel [Mon, 10 Nov 2014 08:33:56 +0000 (09:33 +0100)]
kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.
However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.
Ard Biesheuvel [Mon, 10 Nov 2014 08:33:55 +0000 (09:33 +0100)]
arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()
Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.
arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create()
If we detect another vCPU is running we just exit and return 0 as if we
succesfully created the VGIC, but the VGIC wouldn't actual be created.
This shouldn't break in-kernel behavior because the kernel will not
observe the failed the attempt to create the VGIC, but userspace could
be rightfully confused.
Christoffer Dall [Wed, 19 Nov 2014 11:23:54 +0000 (11:23 +0000)]
arm64: KVM: Handle traps of ICC_SRE_EL1 as RAZ/WI
When running on a system with a GICv3, we currenly don't allow the guest
to access the system register interface of the GICv3. We do this by
clearing the ICC_SRE_EL2.Enable, which causes all guest accesses to
ICC_SRE_EL1 to trap to EL2 and causes all guest accesses to other ICC_
registers to cause an undefined exception in the guest.
However, we currently don't handle the trap of guest accesses to
ICC_SRE_EL1 and will spill out a warning. The trap just needs to handle
the access as RAZ/WI, and a guest that tries to prod this register and
set ICC_SRE_EL1.SRE=1, must read back the value (which Linux already
does) to see if it succeeded, and will thus observe that ICC_SRE_EL1.SRE
was not set.
Add the simple trap handler in the sorted table of the system registers.
Mark Rutland [Tue, 28 Oct 2014 19:36:45 +0000 (19:36 +0000)]
arm64: KVM: fix unmapping with 48-bit VAs
Currently if using a 48-bit VA, tearing down the hyp page tables (which
can happen in the absence of a GICH or GICV resource) results in the
rather nasty splat below, evidently becasue we access a table that
doesn't actually exist.
Commit 38f791a4e499792e (arm64: KVM: Implement 48 VA support for KVM EL2
and Stage-2) added a pgd_none check to __create_hyp_mappings to account
for the additional level of tables, but didn't add a corresponding check
to unmap_range, and this seems to be the source of the problem.
This patch adds the missing pgd_none check, ensuring we don't try to
access tables that don't exist.
Akinobu Mita [Tue, 18 Nov 2014 14:02:46 +0000 (23:02 +0900)]
ufs: fix NULL dereference when no regulators are defined
If no voltage supply regulators are defined for the UFS devices (assumed
they are always-on), ufshcd_config_vreg_load() can be called on
suspend/resume paths with vreg == NULL as hba->vreg_info.vcc* equal to
NULL, and it causes NULL pointer dereference.
This fixes it by making ufshcd_config_vreg_{h,l}pm noop when no regulators
are defined.
Akinobu Mita [Mon, 24 Nov 2014 05:24:18 +0000 (14:24 +0900)]
ufs: ensure clk gating work is finished before module unloading
When dynamic clk gating feature is enabled, delayed workqueue machanism
is used in order to detect certain period of inactivity. But there is no
guarantee that scheduled gating work is completed before module unloading.
So it can cause kernel crash by accessing memory after it was freed.
Fix it by cancelling clk gating and ungating works and ensure that its
execution is finished before module unloading.
Linus Torvalds [Wed, 26 Nov 2014 03:05:41 +0000 (19:05 -0800)]
Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
"These fix one mishandling of the case when security labels are
configured out, and two races in the 4.1 backchannel code"
* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix slot wake up race in the nfsv4.1 callback code
SUNRPC: Fix locking around callback channel reply receive
nfsd: correctly define v4.2 support attributes
Linus Torvalds [Wed, 26 Nov 2014 02:43:21 +0000 (18:43 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"This series fix a nasty issue with radeon adapters on powerpc servers,
it's all CC'ed stable and has the relevant maintainers ack's/reviews.
Basically, some (radeon) adapters have issues with MSI addresses above
1T (only support 40-bits). We had powerpc specific quirk but it only
listed a specific revision of an adapter that we shipped with our
machines and didn't properly handle the audio function which some
distros enable nowadays.
So we made the quirk generic and fixed both the graphic and audio
drivers properly to use it.
Without that, ppc64 server machines will crash at boot with a radeon
adapter.
Note: This has been brewing for a while, it just needed a last respin
which got delayed due to us moving ozlabs to a new location in town
and other such things taking priority"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Remove unused force_32bit_msi quirk
powerpc/pseries: Honor the generic "no_64bit_msi" flag
powerpc/powernv: Honor the generic "no_64bit_msi" flag
sound/radeon: Move 64-bit MSI quirk from arch to driver
gpu/radeon: Set flag to indicate broken 64-bit MSI
PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
ALSA: hda - Limit 40bit DMA for AMD HDMI controllers
Linus Torvalds [Wed, 26 Nov 2014 02:11:15 +0000 (18:11 -0800)]
Merge tag 'hwmon-for-linus-v3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull a hwmon fix from Guenter Roeck:
"Fix hwmon registration problem in g762 driver"
* tag 'hwmon-for-linus-v3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (g762) fix call to devm_hwmon_device_register_with_groups()
Linus Torvalds [Wed, 26 Nov 2014 01:52:56 +0000 (17:52 -0800)]
Merge tag 'clk-fixes-for-linus' of https://git.linaro.org/people/mike.turquette/linux
Pull clock fixes from Mike Turquette:
"The fixes for the clock framework are all regressions in drivers, plus
a single fix in one of the basic clock templates. No fixes to the
core this time around.
As with most clock driver fixes these run the gamut from fixing a
build warning to fixing wrecked memory timings, with a little USB
tossed in for fun"
* tag 'clk-fixes-for-linus' of https://git.linaro.org/people/mike.turquette/linux:
clk: pxa: fix pxa27x CCCR bit usage
clk-divider: Fix READ_ONLY when divider > 1
clk: qcom: Fix duplicate rbcpr clock name
clk: at91: usb: fix at91sam9x5 recalc, round and set rate
clk: at91: usb: fix at91rm9200 round and set rate
tg3: fix ring init when there are more TX than RX channels
If TX channels are set to 4 and RX channels are set to less than 4,
using ethtool -L, the driver will try to initialize more RX channels
than it has allocated, causing an oops.
This fix only initializes the RX ring if it has been allocated.
Eric Dumazet [Tue, 25 Nov 2014 15:40:04 +0000 (07:40 -0800)]
tcp: fix possible NULL dereference in tcp_vX_send_reset()
After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.
If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191 Reported-by: Jaša Bartelj <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Daniel Borkmann <[email protected]> Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode") Signed-off-by: David S. Miller <[email protected]>
If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.
Reported-by: Daniel Borkmann <[email protected]> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841 Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Tue, 25 Nov 2014 19:12:36 +0000 (14:12 -0500)]
Merge branch 'ipv6_vxlan_outer_udp_csum'
Alexander Duyck says:
====================
Fix outer UDP checksums for IPv6 VXLAN tunnels
In testing against an older kernel I found a couple issues in the IPv6
VXLAN tunnel checksum logic for the outer UDP checksum.
First the default transitioned from using an outer checksum to not using
one. Second, sometime after that the checksum inputs were changed
resulting the checksum not being correct if it were computed.
These two issues prevented a ping from the newer kernel to the older one.
With these two changes applied I verified I was able to send traffic over
the VXLAN tunnel to a link partner on an older kernel.
The boolean flip fix can be submitted for 3.17 stable as well since the
patch that introduced the issue was included in that kernel.
====================
Alexander Duyck [Tue, 25 Nov 2014 04:08:38 +0000 (20:08 -0800)]
vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]
In "vxlan: Call udp_sock_create" there was a logic error that resulted in
the default for IPv6 VXLAN tunnels going from using checksums to not using
checksums. Since there is currently no support in iproute2 for setting
these values it means that a kernel after the change cannot talk over a IPv6
VXLAN tunnel to a kernel prior the change.
Alexander Duyck [Tue, 25 Nov 2014 04:08:32 +0000 (20:08 -0800)]
ip6_udp_tunnel: Fix checksum calculation
The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.
Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.
Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.") Cc: Andy Zhou <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Add a new file t4_pci_id_tbl.h that contains T4/T5 PCI ID Table so that for all
drivers that uses T4/T5 PCI functions changes can be done in one place.
checkpatch.pl script reports following error, which if tried to fix ends up in
compilation error.
ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END \
+ { 0, } \
+ }
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
new file mode 100644
ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_ID_TABLE_FENTRY(devid) \
+ CH_PCI_ID_TABLE_ENTRY((devid) | \
+ ((CH_PCI_DEVICE_ID_FUNCTION) << 8)), \
+ CH_PCI_ID_TABLE_ENTRY((devid) | \
+ ((CH_PCI_DEVICE_ID_FUNCTION2) << 8))
ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } }
ERROR: Macros with complex values should be enclosed in parentheses
+#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } }
The xpad wireless endpoint is not a bulk endpoint on my devices, but
rather an interrupt one, so the USB core complains when it is submitted.
I'm guessing that the author really did mean that this should be an
interrupt urb, but as there are a zillion different xpad devices out
there, let's cover out bases and handle both bulk and interrupt
endpoints just as easily.
Jane Zhou [Mon, 24 Nov 2014 19:44:08 +0000 (11:44 -0800)]
net/ping: handle protocol mismatching scenario
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.
Add minimal runtime PM support (enable on probe, disable on remove), to
ensure proper operation with a parent device that uses runtime PM.
This is needed on systems where the external bus controller module of
the SoC is contained in a PM domain and/or has a gateable functional
clock. In such cases, before accessing any device connected to the
external bus, the PM domain must be powered up, and/or the functional
clock must be enabled, which is typically handled through runtime PM by
the bus controller driver.
An example of this is the kzm9g development board, where an smsc9220
Ethernet controller is connected to the Bus State Controller (BSC) of a
Renesas sh73a0 SoC.
Thomas Graf [Mon, 24 Nov 2014 11:37:58 +0000 (12:37 +0100)]
rhashtable: Check for count mismatch while iterating in selftest
Verify whether both the lock and RCU protected iterators see all
test entries before and after expanding and shrinking has been
performed. Also verify whether the number of entries in the hashtable
remains stable during expansion and shrinking.
====================
netfilter/ipvs updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:
1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
patches from Arturo Borrero.
2) Introduce the nf_tables redir expression, from Arturo Borrero.
3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
Patch from Alex Gartrell.
4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
from Marcelo Leitner.
5) Add some extra validation when registering logger families, also
from Marcelo.
6) Some spelling cleanups from stephen hemminger.
7) Fix sparse warning in nf_logger_find_get().
8) Add cgroup support to nf_tables meta, patch from Ana Rey.
9) A Kconfig fix for the new redir expression and fix sparse warnings in
the new redir expression.
10) Fix several sparse warnings in the netfilter tree, from
Florian Westphal.
11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
nothing when this situation occurs.
12) Add conntrack zone support to xt_connlimit, again from Florian.
13) Add netnamespace support to the h323 conntrack helper, contributed
by Vasily Averin.
14) Remove unnecessary nul-pointer checks before free_percpu() and
module_put(), from Markus Elfring.
15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================
Mahesh Bandewar [Mon, 24 Nov 2014 07:07:46 +0000 (23:07 -0800)]
ipvlan: Initial check-in of the IPVLAN driver.
This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.
This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.
The device operates in two different modes and the difference
in these two modes in primarily in the TX side.
(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.
RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.
(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.
RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.
The devices can be added using the "ip" command from the iproute2
package -
ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
Nimrod Andy [Sun, 23 Nov 2014 09:23:06 +0000 (17:23 +0800)]
net: fec: init maximum receive buffer size for ring1 and ring2
i.MX6SX fec support three rx ring1, the current driver lost to init
ring1 and ring2 maximum receive buffer size, that cause receving
frame date length error. The driver reports "rcv is not +last" error
log in user case.
"Not all the firmware know how to handle the HOT_SPOT_CMD.
Make sure that the firmware will know this command before
sending it. This avoids a firmware crash."
Al Viro [Thu, 20 Nov 2014 12:01:29 +0000 (07:01 -0500)]
[atm] switch vcc_sendmsg() to copy_from_iter()
... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free. A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...
Steven J. Hill [Thu, 13 Nov 2014 15:52:00 +0000 (09:52 -0600)]
MIPS: Fix address type used for early memory detection.
In 'early_parse_mem' the data type used for the start
and size of a memory region specified on the command line
is incorrect. If 64-bit addressing is used, the value
gets truncated.
MIPS: Kconfig: Don't allow both microMIPS and SmartMIPS to be selected.
microMIPS and SmartMIPS can't be used together. This fixes the
following build problem:
Warning: the 32-bit microMIPS architecture does not support the `smartmips' extension
arch/mips/kernel/entry.S:90: Error: unrecognized opcode `mtlhx $24'
[...]
arch/mips/kernel/entry.S:109: Error: unrecognized opcode `mtlhx $24'
Commits a951440971d0 ("MIPS: Netlogic: Support for XLP3XX on-chip SATA")
and fedfcb1137d2 ("MIPS: Netlogic: XLP9XX on-chip SATA support") added
ahci-init and ahci-init-xlp2 as objects to build when CONFIG_SATA_AHCI
is enabled.
If CONFIG_SATA_AHCI is made modular, these two files will also get built
as modules (obj-m), which will result in the following linking failure:
Commit 1004165f346a ("MIPS: Netlogic: USB support for XLP") and then
commit 9eac3591e78b ("MIPS: Netlogic: Add support for USB on XLP2xx")
added usb-init and usb-init-xlp2 as objects to build when CONFIG_USB is
enabled.
If CONFIG_USB is made modular, these two files will also get built as
modules (obj-m), which will result in the following linking failure:
Aaro Koskinen [Wed, 19 Nov 2014 23:05:38 +0000 (01:05 +0200)]
MIPS: Loongson: Make platform serial setup always built-in.
If SERIAL_8250 is compiled as a module, the platform specific setup
for Loongson will be a module too, and it will not work very well.
At least on Loongson 3 it will trigger a build failure,
since loongson_sysconf is not exported to modules.
Fix by making the platform specific serial code always built-in.
Paul Burton [Tue, 28 Oct 2014 11:25:51 +0000 (11:25 +0000)]
MIPS: fix EVA & non-SMP non-FPU FP context signal handling
The save_fp_context & restore_fp_context pointers were being assigned
to the wrong variables if either:
- The kernel is configured for UP & runs on a system without an FPU,
since b2ead5282885 "MIPS: Move & rename
fpu_emulator_{save,restore}_context".
- The kernel is configured for EVA, since ca750649e08c "MIPS: kernel:
signal: Prevent save/restore FPU context in user memory".
This would lead to FP context being clobbered incorrectly when setting
up a sigcontext, then the garbage values being saved uselessly when
returning from the signal.
Fix by swapping the pointer assignments appropriately.
Markos Chandras [Mon, 10 Nov 2014 12:25:34 +0000 (12:25 +0000)]
MIPS: cpu-probe: Set the FTLB probability bit on supported cores
Make use of the Config6/FLTBP bit to set the probability of a TLBWR
instruction to hit the FTLB or the VTLB. A value of 0 (which may be
the default value on certain cores, such as proAptiv or P5600)
means that a TLBWR instruction will never hit the VTLB which
leads to performance limitations since it effectively decreases
the number of available TLB slots.
Kevin Cernekee [Tue, 21 Oct 2014 04:27:51 +0000 (21:27 -0700)]
MIPS: BMIPS: Fix ".previous without corresponding .section" warnings
Commit 078a55fc824c1 ("Delete __cpuinit/__CPUINIT usage from MIPS code")
removed our __CPUINIT directives, so now the ".previous" directives
are superfluous. Remove them.
Markos Chandras [Wed, 5 Nov 2014 08:25:37 +0000 (08:25 +0000)]
MIPS: r4kcache: Add EVA case for protected_writeback_dcache_line
Commit de8974e3f76c0 ("MIPS: asm: r4kcache: Add EVA cache flushing
functions") added cache function for EVA using the cachee instruction.
However, it didn't add a case for the protected_writeback_dcache_line.
mips_dsemul() calls r4k_flush_cache_sigtramp() which in turn uses
the protected_writeback_dcache_line() to flush the trampoline code
back to memory. This used the wrong "cache" instruction leading to
random userland crashes on non-FPU cores.
sound/radeon: Move 64-bit MSI quirk from arch to driver
A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.
We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.
We now have a generic quirk in the PCI code. We should set it
appropriately for all radeon's from the audio driver.
gpu/radeon: Set flag to indicate broken 64-bit MSI
Some radeon ASICs don't support all 64 address bits of MSIs despite
advertising support for 64-bit MSIs in their configuration space.
This breaks on systems such as IBM POWER7/8, where 64-bit MSIs can
be assigned with some of the high address bits set.
This makes use of the newly introduced "no_64bit_msi" flag in structure
pci_dev to allow the MSI allocation code to fallback to 32-bit MSIs
on those adapters.
PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
This can be set by quirks/drivers to be used by the architecture code
that assigns the MSI addresses.
We additionally add verification in the core MSI code that the values
assigned by the architecture do satisfy the limitation in order to fail
gracefully if they don't (ie. the arch hasn't been updated to deal with
that quirk yet).
Takashi Iwai [Wed, 1 Oct 2014 08:30:53 +0000 (10:30 +0200)]
ALSA: hda - Limit 40bit DMA for AMD HDMI controllers
AMD/ATI HDMI controller chip models, we already have a filter to lower
to 32bit DMA, but the rest are supposed to be working with 64bit
although the hardware doesn't really work with 63bit but only with 40
or 48bit DMA. In this patch, we take 40bit DMA for safety for the
AMD/ATI controllers as the graphics drivers does.
lucien [Sun, 23 Nov 2014 07:04:11 +0000 (15:04 +0800)]
ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic
Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():
hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}
Ian Morris [Sun, 23 Nov 2014 21:28:43 +0000 (21:28 +0000)]
ipv6: coding style improvements (remove assignment in if statements)
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.
No changes to the resultant object files result as determined by objdiff.
Andy Lutomirski [Fri, 21 Nov 2014 21:26:07 +0000 (13:26 -0800)]
uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns. I suspect that this is a mistake and that
the code only works because int3 is paranoid.
Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.
Thomas Gleixner [Sun, 23 Nov 2014 22:04:52 +0000 (23:04 +0100)]
sched: Provide update_curr callbacks for stop/idle scheduling classes
Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.
Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine. He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.
What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.
While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()
If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task->update_curr. Ooops.
Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.
Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task. While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.
Linus Torvalds [Sun, 23 Nov 2014 21:56:55 +0000 (13:56 -0800)]
Merge branch 'x86-traps' (trap handling from Andy Lutomirski)
Merge x86-64 iret fixes from Andy Lutomirski:
"This addresses the following issues:
- an unrecoverable double-fault triggerable with modify_ldt.
- invalid stack usage in espfix64 failed IRET recovery from IST
context.
- invalid stack usage in non-espfix64 failed IRET recovery from IST
context.
It also makes a good but IMO scary change: non-espfix64 failed IRET
will now report the correct error. Hopefully nothing depended on the
old incorrect behavior, but maybe Wine will get confused in some
obscure corner case"
* emailed patches from Andy Lutomirski <[email protected]>:
x86_64, traps: Rework bad_iret
x86_64, traps: Stop using IST for #SS
x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
Andy Lutomirski [Sun, 23 Nov 2014 02:00:33 +0000 (18:00 -0800)]
x86_64, traps: Rework bad_iret
It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Andy Lutomirski [Sun, 23 Nov 2014 02:00:32 +0000 (18:00 -0800)]
x86_64, traps: Stop using IST for #SS
On a 32-bit kernel, this has no effect, since there are no IST stacks.
On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.
This fixes a bug in which the espfix64 code mishandles a stack segment
violation.
This saves 4k of memory per CPU and a tiny bit of code.