Helge Deller [Mon, 11 May 2015 20:01:27 +0000 (22:01 +0200)]
parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).
The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.
This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).
Naidu Tellapati [Thu, 7 May 2015 21:22:20 +0000 (18:22 -0300)]
iio: adc: cc10001: Add delay before setting START bit
According to hardware team there should be some delay after
setting channel number, start mode and before setting START.
Add a one microsecond delay for this purpose.
Naidu Tellapati [Thu, 7 May 2015 21:22:18 +0000 (18:22 -0300)]
iio: adc: cc10001: Fix incorrect use of power-up/power-down register
At present we are incorrectly setting the register to 0x1 to power up
the ADC. Since it is an active high power down register, we need to set
the register to 0x0 to actually power up. Conversely, writing 0x1 to the
register powers it down.
This commit adds a couple of helpers to make the code clearer and then
use them to do the power-up/power-down properly.
ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for Peach Boards
The Marvell mwifiex driver prevents the system to enter into a suspend
state if the card power is not preserved during a suspend/resume cycle.
So Suspend-to-RAM and Suspend-to-idle is failing on Exynos5800 Peach Pi
and Exynos5420 Peach Pit Chromebooks.
Add the keep-power-in-suspend Power Management property to the SDIO/MMC
node so the mwifiex suspend handler doesn't fail and the system is able
to enter into a suspend state.
Mike Marciniszyn [Tue, 12 May 2015 17:42:42 +0000 (13:42 -0400)]
IB/qib: fix test of unsigned variable
Commit d4988623cc60 ("IB/qib: use arch_phys_wc_add()")
adjusted mtrr inititialization to use the new interface.
Unfortunately, the new interface returns a signed
value and the patch tested the unsigned wc_cookie.
Fix the issue by changing the type of wc_cookie to int. For
the success case the ret left at zero to avoid
a warning from the caller. For failure wc_cookie
is used as the ret.
RDMA/core: Fix for parsing netlink string attribute
The string iwpm_ulib_name is recorded in a nlmsg as a netlink attribute.
Without this fix parsing of the nlmsg by the userspace port mapper service fails
because of unknown attribute length, causing the port mapper service not to
register the client, which has sent the nlmsg.
Paul Burton [Wed, 6 May 2015 10:52:32 +0000 (11:52 +0100)]
MIPS: fix FP mode selection in lieu of .MIPS.abiflags data
Commit 46490b572544 ("MIPS: kernel: elf: Improve the overall ABI and FPU
mode checks") reworked the ELF FP ABI mode selection logic, but when
CONFIG_MIPS_O32_FP64_SUPPORT is enabled it breaks the use of binaries
which have no PT_MIPS_ABIFLAGS program header & associated
.MIPS.abiflags section.
A default mode is selected based upon whether the ELF contains MIPS32 or
MIPS64 code, but that selection is made in arch_elf_pt_proc.
arch_elf_pt_proc only executes when a PT_MIPS_ABIFLAGS program header is
found. If one is not found then arch_elf_pt_proc is never called, and no
default overall_fp_mode value is selected. When arch_check_elf is
called, both abi0 & abi1 are MIPS_ABI_FP_UNKNOWN which leads to both
prog_req & interp_req being set to none_req. none_req matches none of
the conditions for mode selection at the end of arch_check_elf, so
overall_fp_mode is left untouched. Finally once mips_set_personality_fp
is called the BUG() in the default case is then hit & the kernel likely
panics.
Fix this by moving the selection of a default overall mode to the start
of arch_check_elf, which runs once per ELF executed regardless of
whether it has a PT_MIPS_ABIFLAGS program header.
Will Deacon [Fri, 1 May 2015 16:15:23 +0000 (17:15 +0100)]
arm64: perf: fix memory leak when probing PMU PPIs
Commit d795ef9aa831 ("arm64: perf: don't warn about missing
interrupt-affinity property for PPIs") added a check for PPIs so that
we avoid parsing the interrupt-affinity property for these naturally
affine interrupts.
Unfortunately, this check can trigger an early (successful) return and
we will leak the irqs array. This patch fixes the issue by reordering
the code so that the check is performed before any independent
allocation.
Hans Ulli Kroll [Mon, 11 May 2015 16:13:11 +0000 (18:13 +0200)]
ARM: gemini: fix compiler warning due wrong data type
This patch fixes a compiler warning in gemini_restart()
issued by commit 7b6d864b48d9 ("reboot:arm: reboot_mode
changes from char to enum reboot_mode").
arch/arm/mach-gemini/board-rut1xx.c:93:2: warning: initialization from incompatible pointer type
The warning is harmless, and the patch does not need to
be backported to stable kernels.
Fixes: 7b6d864b48d ("reboot:arm: reboot_mode changes from char to enum reboot_mode.") Signed-off-by: Hans Ulli Kroll <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
Arnd Bergmann [Tue, 12 May 2015 14:41:28 +0000 (16:41 +0200)]
Merge tag 'omap-for-v4.1/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Merge "Few fixes for omap clocks, PRM and hwmod code" from Tony Lindgren:
- Fix bogus comparison of struct clk pointers, turns out
we can fix it by just removing the comparison
- Fix am437 hardreset implementation and remove boottime
warnings by adding the VPFE hwmod data
- Regression fix for am43xx PRM code, simplify things
by reusing the omap4 PRM implementation
* tag 'omap-for-v4.1/fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Remove bogus struct clk comparison for timer clock
ARM: AM33xx+: hwmod: re-use omap4 implementations for reset functionality
ARM: OMAP4+: PRM: add support for passing status register/bit info to reset
ARM: AM43xx: hwmod: add VPFE hwmod entries
Sathya Perla [Tue, 12 May 2015 06:13:50 +0000 (02:13 -0400)]
Update be2net maintainers' email addresses
Emulex developers' email addresses are now "@avagotech" instead of
"@emulex". I'm also replacing Subbu with Padmanabh and Sriharsha in the
maintainers list. The driver's heading was outdated and did not include
some of the chip types (BE3, Lancer and Skyhawk) that the driver has
been supporting for a longtime. I've updated this too.
Sudeep Holla [Thu, 7 May 2015 14:45:05 +0000 (15:45 +0100)]
ARM: vexpress/tc2: Add interrupt-affinity to the PMU node
Commit 9fd85eb502a7 ("ARM: pmu: add support for interrupt-affinity
property") added an optional "interrupt-affinity" property, to specify
the CPU affinity for each SPI listed in the interrupts property.
Without this property, we get this boot warning:
CPU PMU: Failed to parse <no-node>/interrupt-affinity[0]
This patch adds interrupt-affinity to the PMU node in the
vexpress-ca15_a7(a.k.a TC2) device tree.
Robert Schwebel [Thu, 7 May 2015 14:45:04 +0000 (15:45 +0100)]
ARM: vexpress/ca9: Add interrupt-affinity to the PMU node
Commit 9fd85eb502a7 ("ARM: pmu: add support for interrupt-affinity
property") added an optional "interrupt-affinity" property, to specify
the CPU affinity for each SPI listed in the interrupts property.
Without this property, we get this boot warning:
CPU PMU: Failed to parse <no-node>/interrupt-affinity[0]
This patch adds interrupt-affinity to the PMU node in the
vexpress-v2p-ca9 device tree.
Robert Schwebel [Thu, 7 May 2015 14:45:03 +0000 (15:45 +0100)]
ARM: vexpress/ca9: Add unified-cache property to l2 cache node
Commit d9d1f3e2d711 ("ARM: l2c: check that DT files specify the required
"cache-unified" property") mandates to specify this required property.
Without this property, we get this boot warning:
"L2C: device tree omits to specify unified cache"
This patch adds "cache-unified" property to L2 cache node in vexpress
CA9 device tree.
Sudeep Holla [Thu, 7 May 2015 14:45:02 +0000 (15:45 +0100)]
ARM64: juno: add sp810 support and fix sp804 clock frequency
The clock generator in IOFPGA generates the two source clocks: 32kHz and
1MHz for the SP810 System Controller.
The SP810 System Controller selects 32kHz or 1MHz as the sources for
TIM_CLK[3:0], the SP804 timer clocks. The powerup default is 32kHz but
the maximum of "refclk" and "timclk" is chosen by the SP810 driver.
This patch adds support for SP810 system controller and also fixes the
SP804 timer clock frequency.
However the SP804 driver needs to be enabled on ARM64 to test this,
which requires SP804 driver to be moved out of arch/arm.
Arnd Bergmann [Tue, 12 May 2015 13:37:01 +0000 (15:37 +0200)]
Merge tag 'tegra-for-4.1-fixes-for-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into fixes
Merge "ARM: tegra: Device tree fixes for v4.1-rc3" from Thierry Reding:
This contains a fix for a bug that was introduced a couple of months ago
by a patch that git misapplied because of a lack of context.
* tag 'tegra-for-4.1-fixes-for-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: Correct which USB controller has the UTMI pad registers
Ralf Baechle [Tue, 12 May 2015 04:43:04 +0000 (06:43 +0200)]
MIPS: SMP: Fix build error.
CC arch/mips/kernel/smp.o
arch/mips/kernel/smp.c: In function ‘start_secondary’:
arch/mips/kernel/smp.c:149:2: error: passing argument 2 of ‘cpumask_set_cpu’ discards ‘volatile’ qualifier from pointer target type [-Werror]
cpumask_set_cpu(cpu, &cpu_callin_map);
^
In file included from ./arch/mips/include/asm/processor.h:14:0,
from ./arch/mips/include/asm/thread_info.h:15,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from arch/mips/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:18,
from include/linux/interrupt.h:8,
from arch/mips/kernel/smp.c:24:
include/linux/cpumask.h:272:91: note: expected ‘struct cpumask *’ but argument is of type ‘volatile struct cpumask_t *’
static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp)
^
arch/mips/kernel/smp.c: In function ‘smp_prepare_boot_cpu’:
arch/mips/kernel/smp.c:211:2: error: passing argument 2 of ‘cpumask_set_cpu’ discards ‘volatile’ qualifier from pointer target type [-Werror]
cpumask_set_cpu(0, &cpu_callin_map);
^
In file included from ./arch/mips/include/asm/processor.h:14:0,
from ./arch/mips/include/asm/thread_info.h:15,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from arch/mips/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:18,
from include/linux/interrupt.h:8,
from arch/mips/kernel/smp.c:24:
include/linux/cpumask.h:272:91: note: expected ‘struct cpumask *’ but argument is of type ‘volatile struct cpumask_t *’
static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp)
^
arch/mips/kernel/smp.c: In function ‘__cpu_up’:
arch/mips/kernel/smp.c:221:10: error: passing argument 2 of ‘cpumask_test_cpu’ discards ‘volatile’ qualifier from pointer target type [-Werror]
while (!cpumask_test_cpu(cpu, &cpu_callin_map))
^
In file included from ./arch/mips/include/asm/processor.h:14:0,
from ./arch/mips/include/asm/thread_info.h:15,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from arch/mips/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:18,
from include/linux/interrupt.h:8,
from arch/mips/kernel/smp.c:24:
include/linux/cpumask.h:294:90: note: expected ‘const struct cpumask *’ but argument is of type ‘volatile struct cpumask_t *’
static inline int cpumask_test_cpu(int cpu, const struct cpumask *cpumask)
^
cc1: all warnings being treated as errors
make[2]: *** [arch/mips/kernel/smp.o] Error 1
make[1]: *** [arch/mips/kernel] Error 2
make: *** [arch/mips] Error 2
Linus Torvalds [Mon, 11 May 2015 21:42:52 +0000 (14:42 -0700)]
Merge branch 'for-4.1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
"Mainly pnfs fixes (and for problems with generic callback code made
more obvious by pnfs)"
* 'for-4.1' of git://linux-nfs.org/~bfields/linux:
nfsd: skip CB_NULL probes for 4.1 or later
nfsd: fix callback restarts
nfsd: split transport vs operation errors for callbacks
svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures
nfsd: fix pNFS return on close semantics
nfsd: fix the check for confirmed openowner in nfs4_preprocess_stateid_op
nfsd/blocklayout: pretend we can send deviceid notifications
Steve Wise [Thu, 7 May 2015 21:34:23 +0000 (16:34 -0500)]
iw_cxgb4: use wildcard mapping for getting remote addr info
For listening endpoints bound to the wildcard address, we need to pass
the wildcard address mapping to iwpm_get_remote_info() instead of the
mapped address of the new child connection.
Without this fix, and with iwarp port mapping enabled, each iw_cxgb4
connection that is spawned from a listening endpoint bound to the wildcard
address, will generate an annoying dmesg entry about failing to find
the remote address mapping info, and the connection state displayed in
debugfs under /sys/kernel/debug/iw_cxgb4/<pci-slot-no>/eps will not have
the peer's address/port mapping info. The connection still works though.
Fixes: 5b6b8fe ("RDMA/cxgb4: Report the actual address of the remote connecting peer") Signed-off-by: Steve Wise <[email protected]> Reviewed-by: Tatyana Nikolova <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
Using an element of a struct as the address for the memcpy of the whole
struct may introduce a buffer overflow and does not help readability either
simply pass the real thing as first argument to memcpy.
Linus Torvalds [Mon, 11 May 2015 20:57:47 +0000 (13:57 -0700)]
Merge tag 'spi-fix-v4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A number of driver specific fixes (including several missing
dependencies for randconfig type cases) plus two core fixes.
One makes the setup_transfer() callback optional which unbreaks some
drivers which had been merged with it omitted due to local versions of
this patch and another ensures that we don't corrupt data by leaking
internal dummy buffers to callers, causing the callers to think they
allocated those buffers"
* tag 'spi-fix-v4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl-espi: fix behaviour for full-duplex xfers
spi: fsl-spi: fix devm_ioremap_resource() error case
spi: Kconfig: Add SOC_LS1021A to SPI_FSL_DSPI dependence
spi/omap2-mcpsi: Always call spi_finalize_current_message()
spi: fsl-spi: use devm_ioremap_resource() to map parameter ram on CPM1
spi: bitbang: Make setup_transfer() callback optional
spi: check tx_buf and rx_buf in spi_unmap_msg
spi: bcm2835: change timeout of polling driver to 1s
spi: bcm2835: Add GPIOLIB dependency
Tony Lindgren [Mon, 11 May 2015 20:18:19 +0000 (13:18 -0700)]
ARM: OMAP2+: Remove bogus struct clk comparison for timer clock
With recent changes to use determine_rate, the comparison of two
clocks won't work without clk_is_match that does __clk_get_hw
on the clocks first.
As we've been unconditionally already calling clk_set_parent
already because of the bogus comparison, let's just remove the
check as suggested by Stephen Boyd <[email protected]>.
Linus Torvalds [Mon, 11 May 2015 18:09:54 +0000 (11:09 -0700)]
Merge tag 'iommu-fixes-v4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Three fixes have queued up:
- reference count fix in the AMD IOMMUv2 driver
- sign extension fix in the ARM-SMMU driver
- build fix for rockchip driver with device tree"
* tag 'iommu-fixes-v4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: Fix sign-extension of upstream bus addresses at stage 1
iommu/rockchip: Fix build without CONFIG_OF
iommu/amd: Fix bug in put_pasid_state_wait
Linus Torvalds [Mon, 11 May 2015 17:54:20 +0000 (10:54 -0700)]
Merge branch 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Rather big for fixes pull.
- SCC controllers never lived to see the light of the day. Both
libata and ide drivers removed.
- In some configurations, link power management policy changes
sometimes cause delayed spurious PHY events which can develop into
noticeable failures. This has been reported several times over the
years. Gabriele's patches suppress PHY events for a while after
LPM policy changes which should help most of these failures without
causing too much problem for hotplug use cases.
- A few controller specific fixes"
[ Hmm. I don't think removing SSC support is really a "fix", but hey, it
removes a lot of lines of code. Which I like. So ... good riddance ]
* 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: avoton port-disable reset-quirk
ata: select DW_DMAC in case of SATA_DWC
libata: Blacklist queued TRIM on all Samsung 800-series
libata: Ignore spurious PHY event on LPM policy change
libata: Add helper to determine when PHY events should be ignored
ata: ahci_st: fixup layering violations / drvdata errors
Remove celleb-only SCC PATA drivers
Linus Torvalds [Mon, 11 May 2015 17:33:31 +0000 (10:33 -0700)]
Merge tag 'md/4.1-rc3-fixes' of git://neil.brown.name/md
Pull md bugfixes from Neil Brown:
"A few fixes for md.
Most of these are related to the new "batched stripe writeout", but
there are a few others"
* tag 'md/4.1-rc3-fixes' of git://neil.brown.name/md:
md/raid5: fix handling of degraded stripes in batches.
md/raid5: fix allocation of 'scribble' array.
md/raid5: don't record new size if resize_stripes fails.
md/raid5: avoid reading parity blocks for full-stripe write to degraded array
md/raid5: more incorrect BUG_ON in handle_stripe_fill.
md/raid5: new alloc_stripe() to allocate an initialize a stripe.
md-raid0: conditional mddev->queue access to suit dm-raid
David Ward [Sun, 10 May 2015 02:01:47 +0000 (22:01 -0400)]
net_sched: gred: use correct backlog value in WRED mode
In WRED mode, the backlog for a single virtual queue (VQ) should not be
used to determine queue behavior; instead the backlog is summed across
all VQs. This sum is currently used when calculating the average queue
lengths. It also needs to be used when determining if the queue's hard
limit has been reached, or when reporting each VQ's backlog via netlink.
q->backlog will only be used if the queue switches out of WRED mode.
Felix Fietkau [Sat, 9 May 2015 21:08:38 +0000 (23:08 +0200)]
pppoe: drop pppoe device in pppoe_unbind_sock_work
After receiving a PADT and the socket is closed, user space will no
longer drop the reference to the pppoe device.
This leads to errors like this:
[ 488.570000] unregister_netdevice: waiting for eth0.2 to become free. Usage count = 2
Fixes: 287f3a943fe ("pppoe: Use workqueue to die properly when a PADT is received") Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Will Deacon [Fri, 8 May 2015 16:44:22 +0000 (17:44 +0100)]
iommu/arm-smmu: Fix sign-extension of upstream bus addresses at stage 1
Stage 1 translation is controlled by two sets of page tables (TTBR0 and
TTBR1) which grow up and down from zero respectively in the ARMv8
translation regime. For the SMMU, we only care about TTBR0 and, in the
case of a 48-bit virtual space, we expect to map virtual addresses 0x0
through to 0xffff_ffff_ffff.
Given that some masters may be incapable of emitting virtual addresses
targetting TTBR1 (e.g. because they sit on a 48-bit bus), the SMMU
architecture allows bit 47 to be sign-extended, halving the virtual
range of TTBR0 but allowing TTBR1 to be used. This is controlled by the
SEP field in TTBCR2.
The SMMU driver incorrectly enables this sign-extension feature, which
causes problems when userspace addresses are programmed into a master
device with the SMMU expecting to map the incoming transactions via
TTBR0; if the top bit of address is set, we will instead get a
translation fault since TTBR1 walks are disabled in the TTBCR.
This patch fixes the issue by disabling sign-extension of a fixed
virtual address bit and instead basing the behaviour on the upstream bus
size: the incoming address is zero extended unless the upstream bus is
only 49 bits wide, in which case bit 48 is used as the sign bit and is
replicated to the upper bits.
Mark Brown [Mon, 11 May 2015 16:29:46 +0000 (17:29 +0100)]
Merge tag 'spi-v4.1-rc1' into spi-linus
spi: Fixes for v4.1
A few driver fixes plus two changes for the core, one to make the
setup_transfer() callback optional which fixes crashes in some drivers
which were updated to use new interfaces without apparent testing and
one to ensure we don't expose the data buffers we use for dummy
transfers to drivers which avoids potential issues with multiple
accesses to them or reuse.
# gpg: Signature made Sat 25 Apr 2015 10:59:47 BST using RSA key ID 5D5487D0
# gpg: key CD7BEEBC: no public key for trusted key - skipped
# gpg: key CD7BEEBC marked as ultimately trusted
# gpg: key AF88CD16: no public key for trusted key - skipped
# gpg: key AF88CD16 marked as ultimately trusted
# gpg: key 16005C11: no public key for trusted key - skipped
# gpg: key 16005C11 marked as ultimately trusted
# gpg: key 5621E907: no public key for trusted key - skipped
# gpg: key 5621E907 marked as ultimately trusted
# gpg: key 5C6153AD: no public key for trusted key - skipped
# gpg: key 5C6153AD marked as ultimately trusted
# gpg: Good signature from "Mark Brown <[email protected]>"
# gpg: aka "Mark Brown <[email protected]>"
# gpg: aka "Mark Brown <[email protected]>"
# gpg: aka "Mark Brown <[email protected]>"
# gpg: aka "Mark Brown <[email protected]>"
# gpg: aka "Mark Brown <[email protected]>"
Stefan Wahren [Sat, 9 May 2015 07:58:09 +0000 (07:58 +0000)]
net: qca_spi: Fix possible race during probe
Registering the netdev before setting the priv data is unsafe.
So fix this possible race by setting the priv data first.
Signed-off-by: Stefan Wahren <[email protected]> Cc: <[email protected]> # v3.18+ Fixes: 291ab06e (net: qualcomm: new Ethernet over SPI driver for QCA7000) Signed-off-by: David S. Miller <[email protected]>
Filipe Manana [Thu, 23 Apr 2015 10:28:48 +0000 (11:28 +0100)]
Btrfs: fix race when reusing stale extent buffers that leads to BUG_ON
There's a race between releasing extent buffers that are flagged as stale
and recycling them that makes us it the following BUG_ON at
btrfs_release_extent_buffer_page:
BUG_ON(extent_buffer_under_io(eb))
The BUG_ON is triggered because the extent buffer has the flag
EXTENT_BUFFER_DIRTY set as a consequence of having been reused and made
dirty by another concurrent task.
Here follows a sequence of steps that leads to the BUG_ON.
CPU 0 CPU 1 CPU 2
path->nodes[0] == eb X
X->refs == 2 (1 for the tree, 1 for the path)
btrfs_header_generation(X) == current trans id
flag EXTENT_BUFFER_DIRTY set on X
btrfs_release_path(path)
unlocks X
reads eb X
X->refs incremented to 3
locks eb X
btrfs_del_items(X)
X becomes empty
clean_tree_block(X)
clear EXTENT_BUFFER_DIRTY from X
btrfs_del_leaf(X)
unlocks X
extent_buffer_get(X)
X->refs incremented to 4
btrfs_free_tree_block(X)
X's range is not pinned
X's range added to free
space cache
free_extent_buffer_stale(X)
lock X->refs_lock
set EXTENT_BUFFER_STALE on X
release_extent_buffer(X)
X->refs decremented to 3
unlocks X->refs_lock
btrfs_release_path()
unlocks X
free_extent_buffer(X)
X->refs becomes 2
__btrfs_cow_block(Y)
btrfs_alloc_tree_block()
btrfs_reserve_extent()
find_free_extent()
gets offset == X->start
btrfs_init_new_buffer(X->start)
btrfs_find_create_tree_block(X->start)
alloc_extent_buffer(X->start)
find_extent_buffer(X->start)
finds eb X in radix tree
free_extent_buffer(X)
lock X->refs_lock
test X->refs == 2
test bit EXTENT_BUFFER_STALE is set
test !extent_buffer_under_io(eb)
increments X->refs to 3
mark_extent_buffer_accessed(X)
check_buffer_tree_ref(X)
--> does nothing,
X->refs >= 2 and
EXTENT_BUFFER_TREE_REF
is set in X
clear EXTENT_BUFFER_STALE from X
locks X
btrfs_mark_buffer_dirty()
set_extent_buffer_dirty(X)
check_buffer_tree_ref(X)
--> does nothing, X->refs >= 2 and
EXTENT_BUFFER_TREE_REF is set
sets EXTENT_BUFFER_DIRTY on X
test and clear EXTENT_BUFFER_TREE_REF
decrements X->refs to 2
release_extent_buffer(X)
decrements X->refs to 1
unlock X->refs_lock
unlock X
free_extent_buffer(X)
lock X->refs_lock
release_extent_buffer(X)
decrements X->refs to 0
btrfs_release_extent_buffer_page(X)
BUG_ON(extent_buffer_under_io(X))
--> EXTENT_BUFFER_DIRTY set on X
Fix this by making find_extent buffer wait for any ongoing task currently
executing free_extent_buffer()/free_extent_buffer_stale() if the extent
buffer has the stale flag set.
A more clean alternative would be to always increment the extent buffer's
reference count while holding its refs_lock spinlock but find_extent_buffer
is a performance critical area and that would cause lock contention whenever
multiple tasks search for the same extent buffer concurrently.
A build server running a SLES 12 kernel (3.12 kernel + over 450 upstream
btrfs patches backported from newer kernels) was hitting this often:
I was also able to reproduce this on a 3.19 kernel, corresponding to Chris'
integration branch from about a month ago, running the following stress
test on a qemu/kvm guest (with 4 virtual cpus and 16Gb of ram):
Filipe Manana [Wed, 6 May 2015 15:15:09 +0000 (16:15 +0100)]
Btrfs: fix race between block group creation and their cache writeout
So creating a block group has 2 distinct phases:
Phase 1 - creates the btrfs_block_group_cache item and adds it to the
rbtree fs_info->block_group_cache_tree and to the corresponding list
space_info->block_groups[];
Phase 2 - adds the block group item to the extent tree and corresponding
items to the chunk tree.
The first phase adds the block_group_cache_item to a list of pending block
groups in the transaction handle, and phase 2 happens when
btrfs_end_transaction() is called against the transaction handle.
It happens that once phase 1 completes, other concurrent tasks that use
their own transaction handle, but points to the same running transaction
(struct btrfs_trans_handle->transaction), can use this block group for
space allocations and therefore mark it dirty. Dirty block groups are
tracked in a list belonging to the currently running transaction (struct
btrfs_transaction) and not in the transaction handle (btrfs_trans_handle).
This is a problem because once a task calls btrfs_commit_transaction(),
it calls btrfs_start_dirty_block_groups() which will see all dirty block
groups and attempt to start their writeout, including those that are
still attached to the transaction handle of some concurrent task that
hasn't called btrfs_end_transaction() yet - which means those block
groups haven't gone through phase 2 yet and therefore when
write_one_cache_group() is called, it won't find the block group items
in the extent tree and abort the current transaction with -ENOENT,
turning the fs into readonly mode and require a remount.
Fix this by ignoring -ENOENT when looking for block group items in the
extent tree when we attempt to start the writeout of the block group
caches outside the critical section of the transaction commit. We will
try again later during the critical section and if there we still don't
find the block group item in the extent tree, we then abort the current
transaction.
This issue happened twice, once while running fstests btrfs/067 and once
for btrfs/078, which produced the following trace:
Filipe Manana [Tue, 5 May 2015 18:03:10 +0000 (19:03 +0100)]
Btrfs: fix panic when starting bg cache writeout after IO error
When waiting for the writeback of block group cache we returned
immediately if there was an error during writeback without waiting
for the ordered extent to complete. This left a short time window
where if some other task attempts to start the writeout for the same
block group cache it can attempt to add a new ordered extent, starting
at the same offset (0) before the previous one is removed from the
ordered tree, causing an ordered tree panic (calls BUG()).
This normally doesn't happen in other write paths, such as buffered
writes or direct IO writes for regular files, since before marking
page ranges dirty we lock the ranges and wait for any ordered extents
within the range to complete first.
Fix this by making btrfs_wait_ordered_range() not return immediately
if it gets an error from the writeback, waiting for all ordered extents
to complete first.
This issue happened often when running the fstest btrfs/088 and it's
easy to trigger it by running in a loop until the panic happens:
for ((i = 1; i <= 10000; i++)) do ./check btrfs/088 ; done
Filipe Manana [Tue, 5 May 2015 14:21:27 +0000 (15:21 +0100)]
Btrfs: fix crash after inode cache writeback failure
If the writeback of an inode cache failed we were unnecessarilly
attempting to release again the delalloc metadata that we previously
reserved. However attempting to do this a second time triggers an
assertion at drop_outstanding_extent() because we have no more
outstanding extents for our inode cache's inode. If we were able
to start writeback of the cache the reserved metadata space is
released at btrfs_finished_ordered_io(), even if an error happens
during writeback.
So make sure we don't repeat the metadata space release if writeback
started for our inode cache.
This issue was trivial to reproduce by running the fstest btrfs/088
with "-o inode_cache", which triggered the assertion leading to a
BUG() call and requiring a reboot in order to run the remaining
fstests. Trace produced by btrfs/088:
Peter Antoine [Mon, 11 May 2015 07:50:45 +0000 (08:50 +0100)]
drm/i915: Avoid GPU hang when coming out of s3 or s4
This patch fixes a timing issue that causes a GPU hang when the system
comes out of power saving.
During pm_resume, We are submitting batchbuffers before enabling
Interrupts this is causing us to miss the context switch interrupt,
and in consequence intel_execlists_handle_ctx_events is not triggered.
This patch is based on a patch from Deepak S <[email protected]>
from another platform.
The above patch added a call to init_context() to fix an issue introduced
by a previous patch. But, it then opened up a small timing window for the
batches being added by the init_context (basically setting up the context)
to complete before the interrupts have been turned on, thus hanging the
GPU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89600 Cc: [email protected] # 4.0+ Signed-off-by: Peter Antoine <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
[Jani: fixed typo in subject, massaged the comments a bit] Signed-off-by: Jani Nikula <[email protected]>
Bert Vermeulen [Fri, 8 May 2015 14:18:49 +0000 (16:18 +0200)]
net: mdio-gpio: Allow for unspecified bus id
When the bus id was supplied via a struct platform_device, the driver wasn't
handling -1 to mean an unspecified id of the only instance of this driver,
as the platform spec requires.
af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).
This patch fixes an issue where the send(MSG_DONTWAIT) call
on a TX_RING is not fully non-blocking in cases where the device's sndBuf is
full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly
occuring error code (most likely EGAIN) to the caller. As the fast-path stays
as it is, we keep the unlikely() around skb == NULL.
Michal Schmidt [Thu, 7 May 2015 18:37:10 +0000 (20:37 +0200)]
bnx2x: limit fw delay in kdump to 5s after boot
Commit 12a8541d5c82 "bnx2x: Delay during kdump load" added a 5 seconds
delay to bnx2x's probe function in the kdump case to let the firmware
realize the old driver is gone.
The problem with the delay is that it is per-device, so if you have
several bnx2x NICs in NPAR mode, the delays can accumulate to minutes.
Fix it by adjusting the delay so that we do not wait more than
necessary, i.e. no more delaying after 5 seconds of kernel boot time.
ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits.
The ARM JIT code emits "ldr rX, [pc, #offset]" to access the literal
pool. #offset maximum value is 4095 and if the generated code is too
large, the #offset value can overflow and not point to the expected
slot in the literal pool. Additionally, when overflow occurs, bits of
the overflow can end up changing the destination register of the ldr
instruction.
Fix that by detecting the overflow in imm_offset() and setting a flag
that is checked for each BPF instructions converted in
build_body(). As of now it can only be detected in the second pass. As
a result the second build_body() call can now fail, so add the
corresponding cleanup code in that case.
Using multiple literal pools in the JITed code is going to require
lots of intrusive changes to the JIT code (which would better be done
as a feature instead of fix), just delegating to the kernel BPF
interpreter in that case is a more straight forward, minimal fix and
easy to backport.
Fixes: ddecdfcea0ae ("ARM: 7259/3: net: JIT compiler for packet filters") Signed-off-by: Nicolas Schichan <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
In that case, emit_udiv() will be called with rn == ARM_R0 (r_scratch)
and loading rm first into ARM_R0 will result in jit_udiv() function
being called the same dividend and divisor. Fix that by loading rn
first into ARM_R1 and then rm into ARM_R0.
Linus Torvalds [Sun, 10 May 2015 21:58:53 +0000 (14:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"I really need to get back to sending these on my Friday, instead of my
Monday morning, but nothing too amazing in here: a few amdkfd fixes, a
few radeon fixes, i915 fixes, one tegra fix and one core fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: Zero out invalid vblank timestamp in drm_update_vblank_count.
drm/tegra: Don't use vblank_disable_immediate on incapable driver.
drm/radeon: stop trying to suspend UVD sessions
drm/radeon: more strictly validate the UVD codec
drm/radeon: make UVD handle checking more strict
drm/radeon: make VCE handle check more strict
drm/radeon: fix userptr lockup
drm/radeon: fix userptr BO unpin bug v3
drm/amdkfd: Initialize sdma vm when creating sdma queue
drm/amdkfd: Don't report local memory size
drm/amdkfd: allow unregister process with queues
drm/i915: Drop PIPE-A quirk for 945GSE HP Mini
drm/i915: Sink rate read should be saved in deca-kHz
drm/i915/dp: there is no audio on port A
drm/i915: Add missing MacBook Pro models with dual channel LVDS
drm/i915: Assume dual channel LVDS if pixel clock necessitates it
drm/radeon: don't setup audio on asics that don't support it
drm/radeon: disable semaphores for UVD V1 (v2)
Dave Airlie [Sun, 10 May 2015 20:06:22 +0000 (06:06 +1000)]
Merge tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel into drm-fixes
misc i915 fixes.
* tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Drop PIPE-A quirk for 945GSE HP Mini
drm/i915: Sink rate read should be saved in deca-kHz
drm/i915/dp: there is no audio on port A
drm/i915: Add missing MacBook Pro models with dual channel LVDS
drm/i915: Assume dual channel LVDS if pixel clock necessitates it
Mario Kleiner [Tue, 7 Apr 2015 04:31:09 +0000 (06:31 +0200)]
drm: Zero out invalid vblank timestamp in drm_update_vblank_count.
Since commit 844b03f27739135fe1fed2fef06da0ffc4c7a081 we make
sure that after vblank irq off, we return the last valid
(vblank count, vblank timestamp) pair to clients, e.g., during
modesets, which is good.
An overlooked side effect of that commit for kms drivers without
support for precise vblank timestamping is that at vblank irq
enable, when we update the vblank counter from the hw counter, we
can't update the corresponding vblank timestamp, so now we have a
totally mismatched timestamp for the new count to confuse clients.
Restore old client visible behaviour from before Linux 3.17, but
zero out the timestamp at vblank counter update (instead of disable
as in original implementation) if we can't generate a meaningful
timestamp immediately for the new vblank counter. This will fix
this regression, so callers know they need to retry again later
if they need a valid timestamp, but at the same time preserves
the improvements made in the commit mentioned above.
Linus Torvalds [Sun, 10 May 2015 18:13:19 +0000 (11:13 -0700)]
Merge tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
Pull samsung fixes from Kukjin Kim:
"Here is Samsung fixes for v4.1. Since I've missed to send this via
arm-soc tree before v4.1-rc3, so I'm sending this to you directly
- fix commit ea08de16eb1b ("ARM: dts: Add DISP1 power domain for
exynos5420") which causes 'unhandled fault: imprecise external
abort' error when PD turned off. ("make DP a consumer of DISP1
power domain")
- fix 's3c-rtc' probe failure on Odriod-X2/U2/U3 boards ("add
'rtc_src' clock to rtc node for source clock of rtc")
- fix typo for 'cpu-crit-0' trip point on exynos5420/5440
- fix S2R failure on exynos5250-snow due to card power of Marvell
WiFi driver (suspend/resume) ("add keep-power-in-susped to WiFi
SDIO node")"
* tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for exynos5250-snow
ARM: dts: Fix typo in trip point temperature for exynos5420/5440
ARM: dts: add 'rtc_src' clock to rtc node for exynos4412-odroid boards
ARM: dts: Make DP a consumer of DISP1 power domain on Exynos5420
Peter Hurley [Mon, 13 Apr 2015 17:24:34 +0000 (13:24 -0400)]
pty: Fix input race when closing
A read() from a pty master may mistakenly indicate EOF (errno == -EIO)
after the pty slave has closed, even though input data remains to be read.
For example,
Several conditions are required to trigger this race:
1. the ldisc read buffer must become full so the input worker exits
2. the read() count parameter must be >= 4096 so the ldisc read buffer
is empty
3. the subsequent read() occurs before the kicked worker has processed
more input
However, the underlying cause of the race is that data is pipelined, while
tty state is not; ie., data already written by the pty slave end is not
yet visible to the pty master end, but state changes by the pty slave end
are visible to the pty master end immediately.
Pipeline the TTY_OTHER_CLOSED state through input worker to the reader.
1. Introduce TTY_OTHER_DONE which is set by the input worker when
TTY_OTHER_CLOSED is set and either the input buffers are flushed or
input processing has completed. Readers/polls are woken when
TTY_OTHER_DONE is set.
2. Reader/poll checks TTY_OTHER_DONE instead of TTY_OTHER_CLOSED.
3. A new input worker is started from pty_close() after setting
TTY_OTHER_CLOSED, which ensures the TTY_OTHER_DONE state will be
set if the last input worker is already finished (or just about to
exit).
Pan Xinhui [Sat, 28 Mar 2015 02:42:56 +0000 (10:42 +0800)]
tty/n_gsm.c: fix a memory leak when gsmtty is removed
when gsmtty_remove put dlci, it will cause memory leak if dlci->port's refcount is zero.
So we do the cleanup work in .cleanup callback instead.
dlci will be last put in two call chains.
1) gsmld_close -> gsm_cleanup_mux -> gsm_dlci_release -> dlci_put
2) gsmld_remove -> dlci_put
so there is a race. the memory leak depends on the race.
In call chain 2. we hit the memory leak. below comment tells.
Dan Williams [Fri, 8 May 2015 19:23:55 +0000 (15:23 -0400)]
ahci: avoton port-disable reset-quirk
Avoton AHCI occasionally sees drive probe timeouts at driver load time.
When this happens SCR_STATUS indicates device detected, but no D2H FIS
reception. Reset the internal link state machines by bouncing
port-enable in the PCS register when this occurs.
staging: gdm724x: Correction of variable usage after applying ALIGN()
Fix regression introduced by commit <29ef8a53542a>. After it writing
AT commands to /dev/GCT-ATM0 is unsuccessful (no echo, no response)
and dmesg show "gdmtty: invalid payload : 1 16 f011".
Before that commit value of dummy_cnt was only a padding size. After using
ALIGN() this value is increased by its first argument. So the following
usage of this variable needs correction.
David S. Miller [Sun, 10 May 2015 02:23:59 +0000 (22:23 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-05-07
This series contains updates to igb only.
Toshiaki provides two fixes for igb, first fixes an issue when changing
the number of rings by ethtool which causes oops because of uninitialized
pointers. The second fix resolves a typo where tx_ring was used instead
of the desired rx_ring.
====================
Linus Torvalds [Sat, 9 May 2015 23:13:38 +0000 (16:13 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A few patches have come up since the merge window. The largest one is
a rewrite of the PXA lubbock/mainstone IRQ handling. This was already
broken in 2011 by a change to the GPIO code and only noticed now.
The other changes contained here are:
MAINTAINERS file updates:
- Ray Jui and Scott Branden are now co-maintainers for some of the
mach-bcm chips, while Christian Daudt and Marc Carino have stepped
down.
- Andrew Victor is no longer maintaining at91. Instead, Alexandre
Belloni now becomes an official maintainer, after having done a
bulk of the work for a while.
- Baruch Siach, who added the mach-digicolor platform in 4.1 is now
listed as maintainer
- The git URL for mach-socfpga has changed
Bug fixes:
- Three bug fixes for new rockchip rk3288 code
- A regression fix to make SD card support work on certain ux500
boards
- multiple smaller dts fixes for imx, omap, mvebu, and shmobile
- a regression fiix for omap3 power consumption
- a fix for regression in the ARM CCI bus driver
Configuration changes:
- more imx platforms are now enabled in multi_v7_defconfig"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (39 commits)
MAINTAINERS: add Conexant Digicolor machines entry
MAINTAINERS: socfpga: update the git repo for SoCFPGA
ARM: multi_v7_defconfig: Select more FSL SoCs
MAINTAINERS: replace an AT91 maintainer
drivers: CCI: fix used_mask init in validate_group()
bus: omap_l3_noc: Fix master id address decoding for OMAP5
bus: omap_l3_noc: Fix offset for DRA7 CLK1_HOST_CLK1_2 instance
ARM: dts: dra7: Fix efuse register size for ABB
ARM: dts: am57xx-beagle-x15: Switch GPIO fan number
ARM: dts: am57xx-beagle-x15: Switch UART mux pins
ARM: dts: am437x-sk: reduce col-scan-delay-us
ARM: dts: am437x-sk: fix for new newhaven display module revision
ARM: dts: am57xx-beagle-x15: Fix RTC aliases
ARM: dts: am57xx-beagle-x15: Fix IRQ type for mcp7941x
ARM: dts: omap3: Add #iommu-cells to isp and iva iommu
ARM: omap2plus_defconfig: Enable EXTCON_USB_GPIO
ARM: dts: OMAP3-N900: Add microphone bias voltages
ARM: OMAP2+: Fix omap off idle power consumption creeping up
MAINTAINERS: Update brcmstb entry
MAINTAINERS: Remove Christian Daudt for mach-bcm
...
Linus Torvalds [Sat, 9 May 2015 23:07:14 +0000 (16:07 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user-namespace fix from Eric Biederman:
"Eric Windish recently reported a really bug that allows mounting fresh
copies of proc and sysfs when it really should not be allowed. The
code attempted to verify that proc and sysfs were fully visible but
there is a test missing to ensure that the root of the filesystem is
visible. Doh!
The following patch fixes that.
This fixes a containment issue that the docker folks are seeing"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: Fix fs_fully_visible to verify the root directory is visible
Linus Torvalds [Sat, 9 May 2015 21:59:05 +0000 (14:59 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Two patches from the irq departement:
- a simple fix to make dummy_irq_chip usable for wakeup scenarios
- removal of the gic arch_extn hackery. Now that all users are
converted we really want to get rid of the interface so people wont
come up with new use cases"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: gic: Drop support for gic_arch_extn
genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
transfer_buffer_length is of type u32. It's therefore wrong to assign it
to a signed integer. This patch avoids the overflow.
It's worth noting that entry->length here is a long; perhaps it would be
beneficial at somepoint to change this to be unsigned as well, if
nothing else relies on its signedness for error conditions or the like.
Tony Camuso [Wed, 6 May 2015 13:09:18 +0000 (09:09 -0400)]
netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2)
This patch should have been part of the previous patch having the
same summary. See http://marc.info/?l=linux-kernel&m=143039470103795&w=2
Unfortunately, I didn't check to see where else this lock was used before
submitting that patch. This should take care of it for netxen_nic, as I
did a thorough search this time.
To recap from the original patch; although testing this driver with
DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces,
it would be more prudent in the case of tx_clean_lock to use _bh
versions of spin_[un]lock, since this lock is manipulated in both
the process and softirq contexts.
This patch was tested for functionality and regressions with netperf
and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled.
Jean Delvare [Wed, 6 May 2015 07:04:40 +0000 (09:04 +0200)]
net: amd-xgbe: Add hardware dependency
The amd-xgbe driver currently only works with the Seattle SoC, which
is ARM64 architecture, so there is no point in building this driver on
other architectures except for build testing purpose. The dependency
list can be updated later if the driver ever supports other
architectures.
Nathan Sullivan [Tue, 5 May 2015 20:00:25 +0000 (15:00 -0500)]
net: macb: Handle the RXUBR interrupt on all devices
The same hardware issue the at91 must work around applies to at least the
Zynq ethernet, and possibly more devices. The driver also needs to handle
the RXUBR interrupt since it turns it on with MACB_RX_INT_FLAGS anyway.
This patch-set contains bug fixes for state-recovery at the RDS
layer when the underlying transport is TCP and the TCP state at one
of the endpoints is reset
V2 changes: DaveM comments to reduce memory footprint, follow
NFS/RPC model where possible. Added test-case #3
Without the changes in this set, when one of the endpoints is reset,
the existing code does not correctly clean up RDS socket state for stale
connections, resulting in some unstable, timing-dependant behavior on
the wire, including an infinite exchange of 3WHs back-and-forth, and a
resulting potential to never converge RDS state.
Test cases used to verify the changes in this set are:
1. Start rds client/server applications on two participating nodes,
node1 and node2. After at least one packet has been sent (to establish
the TCP connection), restart the rds_tcp module on the client, and
now resend packets. Tcpdump should show server sending a FIN for the
"old" client port, and clean connection establishment/exchange for
the new client port.
2. At the end of step 1, restart rds srever on node2, and start client on
node1, make sure using tcpdump, 'netstat -an|grep 16385' that
packets flow correctly.
3. start RDS client/server application on two participating nodes, and
repeat steps 1 and 2, but this time, simulate node failure by doing
"ifconfig <intf> down", so no FIN is sent.
====================
net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.
When the peer of an RDS-TCP connection restarts, a reconnect
attempt should only be made from the active side of the TCP
connection, i.e. the side that has a transient TCP port
number. Do not add the passive side of the TCP connection
to the c_hash_node and thus avoid triggering rds_queue_reconnect()
for passive rds connections.
net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.
When running RDS over TCP, the active (client) side connects to the
listening ("passive") side at the RDS_TCP_PORT. After the connection
is established, if the client side reboots (potentially without even
sending a FIN) the server still has a TCP socket in the esablished
state. If the server-side now gets a new SYN comes from the client
with a different client port, TCP will create a new socket-pair, but
the RDS layer will incorrectly pull up the old rds_connection (which
is still associated with the stale t_sock and RDS socket state).
This patch corrects this behavior by having rds_tcp_accept_one()
always create a new connection for an incoming TCP SYN.
The rds and tcp state associated with the old socket-pair is cleaned
up via the rds_tcp_state_change() callback which would typically be
invoked in most cases when the client-TCP sends a FIN on TCP restart,
triggering a transition to CLOSE_WAIT state. In the rarer event of client
death without a FIN, TCP_KEEPALIVE probes on the socket will detect
the stale socket, and the TCP transition to CLOSE state will trigger
the RDS state cleanup.
Markus Stenberg [Tue, 5 May 2015 10:36:59 +0000 (13:36 +0300)]
ipv6: Fixed source specific default route handling.
If there are only IPv6 source specific default routes present, the
host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail
calls ip6_route_output first, and given source address any, it fails,
and ip6_route_get_saddr is never called.
The change is to use the ip6_route_get_saddr, even if the initial
ip6_route_output fails, and then doing ip6_route_output _again_ after
we have appropriate source address available.
Note that this is '99% fix' to the problem; a correct fix would be to
do route lookups only within addrconf.c when picking a source address,
and never call ip6_route_output before source address has been
populated.
David S. Miller [Sat, 9 May 2015 19:51:00 +0000 (15:51 -0400)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
Here are a couple of important Bluetooth & mac802154 fixes for 4.1:
- mac802154 fix for crypto algorithm allocation failure checking
- mac802154 wpan phy leak fix for error code path
- Fix for not calling Bluetooth shutdown() if interface is not up
Let me know if there are any issues pulling. Thanks.
====================
Jakub Kiciński observed that this patch can cause the pl011
driver to hang if if the only process with a pl011 port open is
killed by a signal, pl011_shutdown() can get called with an
arbitrary amount of data still in the FIFO.
Calling _shutdown() with the TX FIFO non-empty is questionable
behaviour and my itself be a bug.
Since the affected patch was speculative anyway, and brings limited
benefit, the simplest course is to remove the assumption that TXIS
will always be left asserted after the port is shut down.
Joe Lawrence [Thu, 30 Apr 2015 14:16:04 +0000 (17:16 +0300)]
xhci: gracefully handle xhci_irq dead device
If the xHCI host controller has died (ie, device removed) or suffered
other serious fatal error (STS_FATAL), then xhci_irq should handle this
condition with IRQ_HANDLED instead of -ESHUTDOWN.
xhci: Solve full event ring by increasing TRBS_PER_SEGMENT to 256
Our event ring consists of only one segment, and we risk filling
the event ring in case we get isoc transfers with short intervals
such as webcams that fill a TD every microframe (125us)
With 64 TRB segment size one usb camera could fill the event ring in 8ms.
A setup with several cameras and other devices can fill up the
event ring as it is shared between all devices.
This has occurred when uvcvideo queues 5 * 32TD URBs which then
get cancelled when the video mode changes. The cancelled URBs are returned
in the xhci interrupt context and blocks the interrupt handler from
handling the new events.
A full event ring will block xhci from scheduling traffic and affect all
devices conneted to the xhci, will see errors such as Missed Service
Intervals for isoc devices, and and Split transaction errors for LS/FS
interrupt devices.
Increasing the TRB_PER_SEGMENT will also increase the default endpoint ring
size, which is welcome as for most isoc transfer we had to dynamically
expand the endpoint ring anyway to be able to queue the 5 * 32TDs uvcvideo
queues.
xhci: fix isoc endpoint dequeue from advancing too far on transaction error
Isoc TDs usually consist of one TRB, sometimes two. When all goes well we
receive only one success event for a TD, and move the dequeue pointer to
the next TD.
This fails if the TD consists of two TRBs and we get a transfer error
on the first TRB, we will then see two events for that TD.
Fix this by making sure the event we get is for the last TRB in that TD
before moving the dequeue pointer to the next TD. This will resolve some
of the uvc and dvb issues with the
"ERROR Transfer event TRB DMA ptr not part of current TD" error message
Merge tag 'fixes-for-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.1-rc2
Here's the first pull request for v4.1-rc cycle,
it contains a few interesting fixes including a
fix to correct register offsets on dwc3, a fix
for Kconfig dependencies on isp1301 phy driver,
and a bug fix for our configfs gadget creation
interface.
Package C8 to C10 was introduced in newer Intel CPUs, we need to
include them in the package c-state residency calculation.
Otherwise, idle injection target is not accurately maintained by
the closed control loop.
Also cleaned up the code to make it scale better with large number
of c-states.
Mark the module init / exit functions with __init / __exit accodingly.
This allows making the intel_powerclamp_ids[] array __initconst, too, as
it only gets referenced from powerclamp_probe(). This is safe as
file2alias doesn't care about the section, but the symbol name for the
MODULE_DEVICE_TABLE alias.