David Ertman [Tue, 17 Dec 2013 04:42:42 +0000 (04:42 +0000)]
e1000e: Fix a compile flag mis-match for suspend/resume
This patch addresses a mis-match between the declaration and usage of
the e1000_suspend and e1000_resume functions. Previously, these
functions were declared in a CONFIG_PM_SLEEP wrapper, and then utilized
within a CONFIG_PM wrapper. Both the declaration and usage will now be
contained within CONFIG_PM wrappers.
This patch is to fix a compiler warning of maybe-uininitialized-variable
that is generated from gcc when the -O3 flag is used. In the function
e1000_reset_hw_80003es2lan(), the variable krmn_reg_data is first given
a value by being passed to a register read function as a
pass-by-reference parameter. But, the return value of that read
function was never checked to see if the read failed and the variable
not given an initial value. The compiler was smart enough to spot
this. This patch is to check the return value for that read function
and return it, if an error occurs, without trying to utilize the value
in kmrn_reg_data.
David Ertman [Sat, 14 Dec 2013 07:18:18 +0000 (07:18 +0000)]
e1000e: fix compiler warnings
This patch is to fix a compiler warning of __bad_udelay due to a value
of >999 being passed as a parameter to udelay() in the function
e1000e_phy_has_link_generic(). This affects the gcc compiler when
it is given a flag of -O3 and the icc compiler.
This patch is also making the change from mdelay() to msleep() in the
same function, since it was determined though code inspection that this
function is never called in atomic context.
Jan Kara [Wed, 18 Dec 2013 05:44:44 +0000 (00:44 -0500)]
ext4: fix deadlock when writing in ENOSPC conditions
Akira-san has been reporting rare deadlocks of his machine when running
xfstests test 269 on ext4 filesystem. The problem turned out to be in
ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
ext4_should_retry_alloc() while holding i_data_sem. Since
ext4_should_retry_alloc() can force a transaction commit, this is a
lock ordering violation and leads to deadlocks.
Fix the problem by just removing the retry loops. These functions should
just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
function must take care of retrying after dropping all necessary locks.
John Fastabend [Tue, 26 Nov 2013 06:33:52 +0000 (06:33 +0000)]
net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock held
It is useful to be able to walk all upper devices when bringing
a device online where the RTNL lock is held. In this case it
is safe to walk the all_adj_list because the RTNL lock is used
to protect the write side as well.
This patch adds a check to see if the rtnl lock is held before
throwing a warning in netdev_all_upper_get_next_dev_rcu().
Also because we now have a call site for lockdep_rtnl_is_held()
outside COFIG_LOCK_PROVING an inline definition returning 1 is
needed. Similar to the rcu_read_lock_is_held().
Russell King [Mon, 16 Dec 2013 12:39:31 +0000 (12:39 +0000)]
imx-drm: imx-drm-core: improve safety of imx_drm_add_crtc()
We must not add more CRTCs than we have declared to the vblank
helpers, otherwise we overflow their arrays. Force failure if we
exceed the number of CRTCs.
Russell King [Mon, 16 Dec 2013 12:39:11 +0000 (12:39 +0000)]
imx-drm: imx-drm-core: make imx_drm_crtc_register() safer
imx_drm_crtc_register() doesn't clean up the CRTC upon failure, which
leaves the CRTC attached to the DRM device. Also, it does setup after
attaching the CRTC to the DRM device.
Fix this by reordering the function such that we do the setup before
drm_crtc_init(): this fixes both issues.
Enable lock claims that it is serializing tve_enable/disable calls.
However, DRM already serialises mode sets with a mutex, which prevents
encoder/connector functions being called concurrently. Secondly,
holding a spinlock while calling clk_prepare_enable() is wrong; it
will cause a might_sleep() warning should that debugging be enabled.
So, let's just get rid of the enable_lock.
Clean up the IPUv3 CRTC device registration; we don't need a separate
function just to call platform_device_register_data(), and we don't
need the return value converted at all.
Update the IPU client id under a mutex, so that parallel probing
doesn't race.
Russell King [Mon, 16 Dec 2013 11:33:44 +0000 (11:33 +0000)]
imx-drm: imx-drm-core: fix DRM cleanup paths
We must call drm_vblank_cleanup() on the error cleanup and unload paths
after we've had a successful call to drm_vblank_init(). Ensure that
the calls are in the reverse order to the initialisation order.
Tomi Valkeinen [Mon, 16 Dec 2013 07:14:48 +0000 (09:14 +0200)]
Revert "ARM: OMAP2+: Remove legacy mux code for display.c"
Commit e30b06f4d5f000c31a7747a7e7ada78a5fd419a1 (ARM: OMAP2+: Remove
legacy mux code for display.c) removed non-DT DSI and HDMI pinmuxing.
However, DSI pinmuxing is still needed, and removing that caused DSI
displays not to work.
Soren Brinkmann [Mon, 2 Dec 2013 19:38:38 +0000 (11:38 -0800)]
tty: xuartps: Properly guard sysrq specific code
Commit 'tty: xuartps: Implement BREAK detection, add SYSRQ support'
(0c0c47bc40a2e358d593b2d7fb93b50027fbfc0c) introduced sysrq support
without properly guarding sysrq specific code which results in build
errors when sysrq is disabled:
DNAME=KBUILD_STR(xilinx_uartps)" -c -o
drivers/tty/serial/.tmp_xilinx_uartps.o
drivers/tty/serial/xilinx_uartps.c
drivers/tty/serial/xilinx_uartps.c: In function 'xuartps_isr':
drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port'
has no member named 'sysrq'
drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port'
has no member named 'sysrq'
drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port'
has no member named 'sysrq'
make[3]: *** [drivers/tty/serial/xilinx_uartps.o] Error 1
Pull networking fixes from David Miller:
"A quick batch of fixes, including the annoying bad lock stack problem
introduced by udp_sk_rx_dst_set() locking change:
1) Use xchg() instead of sk_dst_lock() in udp_sk_rx_dst_set(), from
Eric Dumazet.
2) qlcnic bug fixes from Himanshu Madhani and Manish Chopra.
3) Update IPSEC MAINTAINERS entry, from Steffen Klassert.
4) Administrative neigh entry changes should generate netlink
notifications the same as event generated ones. From Bob
Gilligan.
5) Netfilter SYNPROXY fixes from Patrick McHardy.
6) Netfilter nft_reject endianness fixes from Eric Leblond"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
qlcnic: Dump mailbox registers when mailbox command times out.
qlcnic: Fix mailbox processing during diagnostic test
qlcnic: Allow firmware dump collection when auto firmware recovery is disabled
qlcnic: Fix memory allocation
qlcnic: Fix TSS/RSS validation for 83xx/84xx series adapter.
qlcnic: Fix TSS/RSS ring validation logic.
qlcnic: Fix diagnostic test for all adapters.
qlcnic: Fix usage of netif_tx_{wake, stop} api during link change.
xen-netback: fix fragments error handling in checksum_setup_ip()
neigh: Netlink notification for administrative NUD state change
ipv4: improve documentation of ip_no_pmtu_disc
net: unix: allow bind to fail on mutex lock
MAINTAINERS: Update the IPsec maintainer entry
udp: ipv4: do not use sk_dst_lock from softirq context
netvsc: don't flush peers notifying work during setting mtu
can: peak_usb: fix mem leak in pcan_usb_pro_init()
can: ems_usb: fix urb leaks on failure paths
sctp: loading sctp when load sctp_probe
netfilter: nft_reject: fix endianness in dump function
netfilter: SYNPROXY target: restrict to INPUT/FORWARD
David S. Miller [Tue, 17 Dec 2013 22:21:30 +0000 (17:21 -0500)]
Merge branch 'fixes-for-3.13' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:
====================
this is a pull request with two fixes for net/master, the current release
cycle.
It consists of a patch by Alexey Khoroshilov from the Linux Driver Verification
project, which fixes a memory leak in ems_usb's failure patch. And a patch by
me which fixes a memory leak in the peak usb driver.
====================
David S. Miller [Tue, 17 Dec 2013 21:25:24 +0000 (16:25 -0500)]
Merge branch 'qlcnic'
Himanshu Madhani says:
====================
qlcnic: Bug fixes.
This series contains bug fixes for mailbox handling and multi Tx queue support
for all supported adapters.
changes from v1 -> v2
o updated patch to fix usage of netif_tx_{wake,stop} api during link change
as per David Miller's suggestion.
o Dropped patch to use spinklock per tx queue for more work.
o Added reworked patch for memory allocation failures.
o Added patch to allow capturing of dump, when auto recovery is disabled in firmware.
o Added patches for mailbox interrupt handling and debugging data for mailbox failure.
Manish Chopra [Mon, 16 Dec 2013 20:37:01 +0000 (15:37 -0500)]
qlcnic: Allow firmware dump collection when auto firmware recovery is disabled
o Allow driver to collect firmware dump, during a forced firmware dump
operation, when auto firmware recovery is disabled. Also, during this
operation, driver should not allow reset recovery to be performed.
Himanshu Madhani [Mon, 16 Dec 2013 20:36:59 +0000 (15:36 -0500)]
qlcnic: Fix TSS/RSS validation for 83xx/84xx series adapter.
o Current code was not allowing the user to configure more
than one Tx ring using ethtool for 83xx/84xx adapter.
This regression was introduced by commit id 18afc102fdcb95d6c7d57f2967a06f2f8fe3ba4c ("qlcnic: Enable
multiple Tx queue support for 83xx/84xx Series adapter.")
Himanshu Madhani [Mon, 16 Dec 2013 20:36:58 +0000 (15:36 -0500)]
qlcnic: Fix TSS/RSS ring validation logic.
o TSS/RSS ring validation does not take into account that either
of these ring values can be 0. This patch fixes this validation
and would fail set_channel operation if any of these ring value
is 0. This regression was added as part of commit id 34e8c406fda5b5a9d2e126a92bab84cd28e3b5fa ("qlcnic: refactor Tx/SDS
ring calculation and validation in driver.")
Himanshu Madhani [Mon, 16 Dec 2013 20:36:57 +0000 (15:36 -0500)]
qlcnic: Fix diagnostic test for all adapters.
o Driver should re-allocate all Tx queues after completing
diagnostic tests. This regression was added by commit id c2c5e3a0681bb1945c0cb211a5f4baa22cb2cbb3 ("qlcnic: Enable
diagnostic test for multiple Tx queues.")
Himanshu Madhani [Mon, 16 Dec 2013 20:36:56 +0000 (15:36 -0500)]
qlcnic: Fix usage of netif_tx_{wake, stop} api during link change.
o Driver was using netif_tx_{stop,wake}_all_queues() api
during link change event. Remove these api calls to
manage queue start/stop event, as core networking stack
will manage this based on netif_carrier_{on,off} call.
These API's were modified as part of commit id 012ec81223aa45d2b80aeafb77392fd1a19c7b10 ("qlcnic: Multi Tx
queue support for 82xx Series adapter.")
Boris BREZILLON [Sun, 8 Dec 2013 14:59:59 +0000 (15:59 +0100)]
usb: ohci-at91: fix irq and iomem resource retrieval
When using dt resources retrieval (interrupts and reg properties) there is
no predefined order for these resources in the platform dev resources
table.
Retrieve resources using platform_get_resource and platform_get_irq
functions instead of direct resource table entries to avoid resource type
mismatch.
Bob Gilligan [Sun, 15 Dec 2013 21:39:56 +0000 (13:39 -0800)]
neigh: Netlink notification for administrative NUD state change
The neighbour code sends up an RTM_NEWNEIGH netlink notification if
the NUD state of a neighbour cache entry is changed by a timer (e.g.
from REACHABLE to STALE), even if the lladdr of the entry has not
changed.
But an administrative change to the the NUD state of a neighbour cache
entry that does not change the lladdr (e.g. via "ip -4 neigh change
... nud ...") does not trigger a netlink notification. This means
that netlink listeners will not hear about administrative NUD state
changes such as from a resolved state to PERMANENT.
This patch changes the neighbor code to generate an RTM_NEWNEIGH
message when the NUD state of an entry is changed administratively.
staging: comedi: drivers: fix return value of comedi_load_firmware()
Some of the callback functions that upload the firmware in the comedi
drivers return a positive value indicating the number of bytes sent
to the device. Detect this condition and just return '0' to indicate
a successful upload.
Ian Abbott [Fri, 13 Dec 2013 12:00:30 +0000 (12:00 +0000)]
staging: comedi: 8255_pci: fix for newer PCI-DIO48H
At some point, Measurement Computing / ComputerBoards redesigned the
PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip.
This meant they had to put their hardware registers in the PCI BAR 2
region instead of PCI BAR 1. Unfortunately, they kept the same PCI
device ID for the new design. This means the driver recognizes the
newer cards, but doesn't work (and is likely to screw up the local
configuration registers of the PLX chip) because it's using the wrong
region.
Since the PCI subvendor and subdevice IDs were both zero on the old
design, but are the same as the vendor and device on the new design, we
can tell the old design and new design apart easily enough. Split the
existing entry for the PCI-DIO48H in `pci_8255_boards[]` into two new
entries, referenced by different entries in the PCI device ID table
`pci_8255_pci_table[]`. Use the same board name for both entries.
Merge tag 'iio-fixes-for-3.13c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Third set of fixes for IIO in the 3.13 cycle.
* Fix for a bug in the new cm36651 driver where it told the IIO driver it
was providing a decimal part, but then didn't. Now it correctly tells the
IIO core that it is only providing an integer value. This prevents random
incorrect values being output on a sysfs read.
* 3 fixes where drivers were miss specifying the endianness of their channels
as output through the buffer interface. These were discovered whilst
removing the terrible IIO_ST macro once and for all. The result is that
userspace may be informed that the buffer elements are being output as
little endian (on little endian platforms) when infact they are big endian.
Thus userspace will handle them incorrectly. This incorrect buffer
element specification is provided as sysfs attributes under
iio:deviceN/scan_elements.
Linus Torvalds [Tue, 17 Dec 2013 20:57:36 +0000 (12:57 -0800)]
Merge tag 's2mps11-build' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator/clk fix from Mark Brown:
"Fix s2mps11 build
This patch fixes a build failure that appeared in v3.13-rc4 due to an
RTC/MFD update merged via -mm"
* tag 's2mps11-build' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
mfd: s2mps11: Fix build after regmap field rename in sec-core.c
Jonathan Cameron [Wed, 11 Dec 2013 18:45:00 +0000 (18:45 +0000)]
iio:adc:ad7887 Fix channel reported endianness from cpu to big endian
Note this also sets the endianness to big endian whereas it would
previously have defaulted to the cpu endian. Hence technically
this is a bug fix on LE platforms.
Linus Torvalds [Tue, 17 Dec 2013 20:36:26 +0000 (12:36 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Five self-contained fixlets:
- fix clocksource driver build bug
- fix two sched_clock() bugs triggering on specific hardware
- fix devicetree enumeration bug affecting specific hardware
- fix irq handler registration race resulting in boot crash
- fix device node refcount bug"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: dw_apb_timer_of: Fix support for dts binding "snps,dw-apb-timer"
clocksource: dw_apb_timer_of: Fix read_sched_clock
clocksource: sunxi: Stop timer from ticking before enabling interrupts
clocksource: clksrc-of: Do not drop unheld reference on device node
clocksource: armada-370-xp: Register sched_clock after the counter reset
clocksource: time-efm32: Select CLKSRC_MMIO
Linus Torvalds [Tue, 17 Dec 2013 20:35:54 +0000 (12:35 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Three fixes for scheduler crashes, each triggers in relatively rare,
hardware environment dependent situations"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Rework sched_fair time accounting
math64: Add mul_u64_u32_shr()
sched: Remove PREEMPT_NEED_RESCHED from generic code
sched: Initialize power_orig for overlapping groups
Jonathan Cameron [Wed, 11 Dec 2013 18:45:00 +0000 (18:45 +0000)]
iio:imu:adis16400 fix pressure channel scan type
A single channel in this driver was using the IIO_ST macro.
This does not provide a parameter for setting the endianness of
the channel. Thus this channel will have been reported as whatever
is the native endianness of the cpu rather than big endian. This
means it would be incorrect on little endian platforms.
Jonathan Cameron [Wed, 11 Dec 2013 18:45:00 +0000 (18:45 +0000)]
staging:iio:mag:hmc5843 fix incorrect endianness of channel as a result of missuse of the IIO_ST macro.
This driver sets the shift value equal to IIO_BE (or 1) rather than setting
that to 0 and specificying the endianness. This means the channel type is
missreported as
[be|le]:u16/16>>1 where the be|le is dependent on the cpu native endianness,
rather than
be:u16/16>>0 resulting in any userspace code using this information, miss
converting the channel and generating thoroughly trashed data.
David S. Miller [Tue, 17 Dec 2013 20:06:20 +0000 (15:06 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
The following patchset contains two Netfilter fixes for your net
tree, they are:
* Fix endianness in nft_reject, the NFTA_REJECT_TYPE netlink attributes
was not converted to network byte order as needed by all nfnetlink
subsystems, from Eric Leblond.
* Restrict SYNPROXY target to INPUT and FORWARD chains, this avoid a
possible crash due to misconfigurations, from Patrick McHardy.
====================
Sasha Levin [Fri, 13 Dec 2013 15:54:22 +0000 (10:54 -0500)]
net: unix: allow bind to fail on mutex lock
This is similar to the set_peek_off patch where calling bind while the
socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
spew after a while.
This is also the last place that did a straightforward mutex_lock(), so
there shouldn't be any more of these patches.
Eric Dumazet [Sun, 15 Dec 2013 18:53:46 +0000 (10:53 -0800)]
udp: ipv4: do not use sk_dst_lock from softirq context
Using sk_dst_lock from softirq context is not supported right now.
Instead of adding BH protection everywhere,
udp_sk_rx_dst_set() can instead use xchg(), as suggested
by David.
Reported-by: Fengguang Wu <[email protected]> Fixes: 975022310233 ("udp: ipv4: must add synchronization in udp_sk_rx_dst_set()") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Tue, 17 Dec 2013 19:46:51 +0000 (11:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
"One of these is fixing a regression from the d_flags file type patch
that went into -rc1 that broke instantiation of inodes and dentries
(we were doing dentries first). The other is just an off-by-one
corner case"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: Avoid data inconsistency due to d-cache aliasing in readpage()
ceph: initialize inode before instantiating dentry
This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
called from worker and also trying to hold the rtnl_lock. This will lead the
flush won't succeed forever. Solve this by not canceling and flushing the work,
this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
synchronized with the netif_tx_disable() called by netvsc_change_mtu().
Linus Torvalds [Tue, 17 Dec 2013 19:43:46 +0000 (11:43 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Uli's patch fixes a regression in ptrace caused by a mis-merge of a
previous LE patch. The rest are all more endian fixes, all fairly
trivial, found during testing of 3.13-rc's"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/powernv: Fix OPAL LPC access in Little Endian
powerpc/powernv: Fix endian issue in opal_xscom_read
powerpc: Fix endian issues in crash dump code
powerpc/pseries: Fix endian issues in MSI code
powerpc/pseries: Fix PCIE link speed endian issue
powerpc/pseries: Fix endian issues in nvram code
powerpc/pseries: Fix endian issues in /proc/ppc64/lparcfg
powerpc: Fix topology core_id endian issue on LE builds
powerpc: Fix endian issue in setup-common.c
powerpc: PTRACE_PEEKUSR always returns FPR0
Josh Boyer [Fri, 11 Oct 2013 12:45:51 +0000 (08:45 -0400)]
cpupower: Fix segfault due to incorrect getopt_long arugments
If a user calls 'cpupower set --perf-bias 15', the process will end with
a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi
call. This is because the getopt_long structure currently has all of
the options as having an optional_argument when they really have a
required argument. We change the structure to use required_argument to
match the short options and it resolves the issue.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439
Sujith Manoharan [Mon, 16 Dec 2013 01:34:59 +0000 (07:04 +0530)]
ath9k: Fix interrupt handling for the AR9002 family
This patch adds a driver workaround for a HW issue.
A race condition in the HW results in missing interrupts,
which can be avoided by a read/write with the ISR register.
All chips in the AR9002 series are affected by this bug - AR9003
and above do not have this problem.
Mathy Vanhoef [Thu, 28 Nov 2013 11:21:45 +0000 (12:21 +0100)]
ath9k_htc: properly set MAC address and BSSID mask
Pick the MAC address of the first virtual interface as the new hardware MAC
address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579.
Chris Wilson [Tue, 17 Dec 2013 14:34:50 +0000 (14:34 +0000)]
drm/i915: Use the correct GMCH_CTRL register for Sandybridge+
The GMCH_CTRL register (or MGCC in the spec) is at a different address
on Sandybridge, and the address to which we currently write to is
undefined. These stray writes appear to upset (hard hang) my Ivybridge
machine whilst it is in UEFI mode.
Note that the register is still marked as locked RO on Sandybridge, so
vgaarb is still dysfunctional.
Peter Hurley [Mon, 9 Dec 2013 23:06:07 +0000 (18:06 -0500)]
n_tty: Fix apparent order of echoed output
With block processing of echoed output, observed output order is still
required. Push completed echoes and echo commands prior to output.
Introduce echo_mark echo buffer index, which tracks completed echo
commands; ie., those submitted via commit_echoes but which may not
have been committed. Ensure that completed echoes are output prior
to subsequent terminal writes in process_echoes().
Fixes newline/prompt output order in cooked mode shell.
JongHo Kim [Tue, 17 Dec 2013 14:02:24 +0000 (23:02 +0900)]
ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function
When the process is sleeping at the SNDRV_PCM_STATE_PAUSED
state from the wait_for_avail function, the sleep process will be woken by
timeout(10 seconds). Even if the sleep process wake up by timeout, by this
patch, the process will continue with sleep and wait for the other state.
Dave Chinner [Thu, 12 Dec 2013 05:34:38 +0000 (16:34 +1100)]
xfs: abort metadata writeback on permanent errors
If we are doing aysnc writeback of metadata, we can get write errors
but have nobody to report them to. At the moment, we simply attempt
to reissue the write from io completion in the hope that it's a
transient error.
When it's not a transient error, the buffer is stuck forever in
this loop, and we cannot break out of it. Eventually, unmount will
hang because the AIL cannot be emptied and everything goes downhill
from them.
To solve this problem, only retry the write IO once before aborting
it. We don't throw the buffer away because some transient errors can
last minutes (e.g. FC path failover) or even hours (thin
provisioned devices that have run out of backing space) before they
go away. Hence we really want to keep trying until we can't try any
more.
Because the buffer was not cleaned, however, it does not get removed
from the AIL and hence the next pass across the AIL will start IO on
it again. As such, we still get the "retry forever" semantics that
we currently have, but we allow other access to the buffer in the
mean time. Meanwhile the filesystem can continue to modify the
buffer and relog it, so the IO errors won't hang the log or the
filesystem.
Now when we are pushing the AIL, we can see all these "permanent IO
error" buffers and we can issue a warning about failures before we
retry the IO. We can also catch these buffers when unmounting an
issue a corruption warning, too.
Dave Chinner [Thu, 12 Dec 2013 05:34:36 +0000 (16:34 +1100)]
xfs: swalloc doesn't align allocations properly
When swalloc is specified as a mount option, allocations are
supposed to be aligned to the stripe width rather than the stripe
unit of the underlying filesystem. However, it does not do this.
What the implementation does is round up the allocation size to a
stripe width, hence ensuring that all allocations span a full stripe
width. It does not, however, ensure that that allocation is aligned
to a stripe width, and hence the allocations can span multiple
underlying stripes and so still see RMW cycles for things like
direct IO on MD RAID.
So, if the swalloc mount option is set, change the allocation
alignment in xfs_bmap_btalloc() to use the stripe width rather than
the stripe unit.
The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem. Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.
Remove xfsbdstrat and opencode the error handling in the callers. All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.
Dave Chinner [Thu, 21 Nov 2013 23:41:16 +0000 (10:41 +1100)]
xfs: align initial file allocations correctly
The function xfs_bmap_isaeof() is used to indicate that an
allocation is occurring at or past the end of file, and as such
should be aligned to the underlying storage geometry if possible.
Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the
behaviour of this function for empty files - it turned off
allocation alignment for this case accidentally. Hence large initial
allocations from direct IO are not getting correctly aligned to the
underlying geometry, and that is cause write performance to drop in
alignment sensitive configurations.
Fix it by considering allocation into empty files as requiring
aligned allocation again.
Namjae Jeon [Sun, 8 Dec 2013 14:33:50 +0000 (23:33 +0900)]
MAINTAINERS: fix incorrect mail address of XFS maintainer
When I tried to send the patches to XFS Maintainers,
I got returned mail included delivery fail message for Dave's mail.
Maybe, Dave Chinner mail address is incorrect.
I try to fix it correctly.
Jie Liu [Tue, 26 Nov 2013 13:38:49 +0000 (21:38 +0800)]
xfs: fix infinite loop by detaching the group/project hints from user dquot
xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
# mount -ouquota,gquota /dev/sda7 /xfs
# mkdir /xfs/test
# xfs_quota -xc 'off -g' /xfs <-- hangs up
# echo w > /proc/sysrq-trigger
# dmesg
It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s). Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.
This problem has been around since Linux 3.4, it was introduced by:
[ b84a3a9675 xfs: remove the per-filesystem list of dquots ]
Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.
In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.
Jie Liu [Tue, 26 Nov 2013 13:38:34 +0000 (21:38 +0800)]
xfs: fix assertion failure at xfs_setattr_nonsize
For CRC enabled v5 super block, change a file's ownership can simply
trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
project quota are enabled, i.e,
We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO. While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee
that the desired quota is still active.
Alex Deucher [Mon, 9 Dec 2013 22:46:59 +0000 (17:46 -0500)]
drm/radeon/dpm: disable ss on Cayman
Spread spectrum seems to cause hangs when dynamic clock
switching is enabled. Disable it for now. This does not
affect performance or the amount of power saved. Tracked
down by Martin Andersson.
This patch fixes the rates declared in the CPU DAI parameters:
- SNDRV_PCM_RATE_KNOT and the discrete rates SNDRV_PCM_RATE_xxx should
not be used with SNDRV_PCM_RATE_CONTINUOUS,
- SNDRV_PCM_RATE_CONTINUOUS asks for rate_min and rate_max,
- the device may do streaming down to 5512Hz.
Kirill Tkhai [Wed, 27 Nov 2013 15:59:13 +0000 (19:59 +0400)]
sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities
This patch touches the RT group scheduling case.
Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.
The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq. The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.
Mel Gorman [Tue, 17 Dec 2013 09:21:25 +0000 (09:21 +0000)]
sched: Assign correct scheduling domain to 'sd_llc'
Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.
One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.
This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.
4-core machine
3.13.0-rc3 3.13.0-rc3
vanilla fixsd-v3r3
Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%)
Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%)
Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%)
Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%)
Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%)
Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%)
Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%)
8-core machine
Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%)
Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%)
Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%)
Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%)
Mean 6 23.21 ( 0.00%) 69.46 (-199.27%)
Mean 7 15.85 ( 0.00%) 101.72 (-541.77%)
Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%)
Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%)
Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%)
It's eliminated for one machine and reduced for another.
Vince Weaver [Fri, 13 Dec 2013 20:52:25 +0000 (15:52 -0500)]
perf: Document the new transaction sample type
Commit fdfbbd07e91f8fe3871 ("perf: Add generic transaction flags")
added support for PERF_SAMPLE_TRANSACTION but forgot to add documentation
for the sample type to include/uapi/linux/perf_event.h
perf: Disable all pmus on unthrottling and rescheduling
Currently, only one PMU in a context gets disabled during unthrottling
and event_sched_{out,in}(), however, events in one context may belong to
different pmus, which results in PMUs being reprogrammed while they are
still enabled.
This means that mixed PMU use [which is rare in itself] resulted in
potentially completely unreliable results: corrupted events, bogus
results, etc.
This patch temporarily disables PMUs that correspond to
each event in the context while these events are being modified.
Li Zefan [Tue, 17 Dec 2013 03:13:39 +0000 (11:13 +0800)]
cgroup: don't recycle cgroup id until all csses' have been destroyed
Hugh reported this bug:
> CONFIG_MEMCG_SWAP is broken in 3.13-rc. Try something like this:
>
> mkdir -p /tmp/tmpfs /tmp/memcg
> mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs
> mount -t cgroup -o memory memcg /tmp/memcg
> mkdir /tmp/memcg/old
> echo 512M >/tmp/memcg/old/memory.limit_in_bytes
> echo $$ >/tmp/memcg/old/tasks
> cp /dev/zero /tmp/tmpfs/zero 2>/dev/null
> echo $$ >/tmp/memcg/tasks
> rmdir /tmp/memcg/old
> sleep 1 # let rmdir work complete
> mkdir /tmp/memcg/new
> umount /tmp/tmpfs
> dmesg | grep WARNING
> rmdir /tmp/memcg/new
> umount /tmp/memcg
>
> Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91
> res_counter_uncharge_locked+0x1f/0x2f()
>
> Breakage comes from 34c00c319ce7 ("memcg: convert to use cgroup id").
>
> The lifetime of a cgroup id is different from the lifetime of the
> css id it replaced: memsw's css_get()s do nothing to hold on to the
> old cgroup id, it soon gets recycled to a new cgroup, which then
> mysteriously inherits the old's swap, without any charge for it.
Instead of removing cgroup id right after all the csses have been
offlined, we should do that after csses have been destroyed.
To make sure an invalid css pointer won't be returned after the css
is destroyed, make sure css_from_id() returns NULL in this case.
tj: Updated comment to note planned changes for cgrp->id.
Marc Carino [Tue, 17 Dec 2013 02:15:53 +0000 (18:15 -0800)]
libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs
Certain drives cannot handle queued TRIM commands properly, even
though support is indicated in the IDENTIFY DEVICE buffer. This patch
allows for disabling the commands for the affected drives and apply it
to the Micron/Crucial M500 SSDs which exhibit incorrect protocol
behavior when issued queued TRIM commands, which could lead to silent
data corruption.
tj: Merged two unnecessarily split patches and made minor edits
including shortening horkage name.
Marcel Holtmann [Tue, 17 Dec 2013 11:21:25 +0000 (03:21 -0800)]
Bluetooth: Fix HCI User Channel permission check in hci_sock_sendmsg
The HCI User Channel is an admin operation which enforces CAP_NET_ADMIN
when binding the socket. Problem now is that it then requires also
CAP_NET_RAW when calling into hci_sock_sendmsg. This is not intended
and just an oversight since general HCI sockets (which do not require
special permission to bind) and HCI User Channel share the same code
path here.
Remove the extra CAP_NET_RAW check for HCI User Channel write operation
since the permission check has already been enforced when binding the
socket. This also makes it possible to open HCI User Channel from a
privileged process and then hand the file descriptor to an unprivilged
process.
can: peak_usb: fix mem leak in pcan_usb_pro_init()
This patch fixes a memory leak in pcan_usb_pro_init(). In patch
f14e224 net: can: peak_usb: Do not do dma on the stack
the struct pcan_usb_pro_fwinfo *fi and struct pcan_usb_pro_blinfo *bi were
converted from stack to dynamic allocation va kmalloc(). However the
corresponding kfree() was not introduced.
There are a couple failure paths where urb leaks.
Is spare code within ems_usb_start_xmit(),
usb_free_urb() should be used to deallocate urb instead of usb_unanchor_urb().
In ems_usb_start() there is no usb_free_urb() if usb_submit_urb() fails.
Found by Linux Driver Verification project (linuxtesting.org).
Felipe Balbi [Tue, 10 Dec 2013 23:00:06 +0000 (17:00 -0600)]
usb: phy: fix driver dependencies
both isp1301-omap and fsl_usb2_otg drivers
depend on usb_bus_start_enum() which is only
defined if CONFIG_USB != n. There is a problem,
however, where both those drivers could be
statically linked, while CONFIG_USB=m.
James Hogan [Tue, 10 Dec 2013 22:28:04 +0000 (22:28 +0000)]
serial: 8250_dw: Fix LCR workaround regression
Commit c49436b657d0 (serial: 8250_dw: Improve unwritable LCR workaround)
caused a regression. It added a check that the LCR was written properly
to detect and workaround the busy quirk, but the behaviour of bit 5
(UART_LCR_SPAR) differs between IP versions 3.00a and 3.14c per the
docs. On older versions this caused the check to fail and it would
repeatedly force idle and rewrite the LCR register, causing delays and
preventing any input from serial being received.
This is fixed by masking out UART_LCR_SPAR before making the comparison.
wangweidong [Fri, 13 Dec 2013 03:00:10 +0000 (11:00 +0800)]
sctp: loading sctp when load sctp_probe
when I modprobe sctp_probe, it failed with "FATAL: ". I found that
sctp should load before sctp_probe register jprobe. So I add a
sctp_setup_jprobe for loading 'sctp' when first failed to register
jprobe, just do this similar to dccp_probe.
v2: add MODULE_SOFTDEP and check of request_module, as suggested by Neil
Peter Hurley [Thu, 12 Dec 2013 02:11:58 +0000 (21:11 -0500)]
tty: Fix hang at ldsem_down_read()
When a controlling tty is being hung up and the hang up is
waiting for a just-signalled tty reader or writer to exit, and a new tty
reader/writer tries to acquire an ldisc reference concurrently with the
ldisc reference release from the signalled reader/writer, the hangup
can hang. The new reader/writer is sleeping in ldsem_down_read() and the
hangup is sleeping in ldsem_down_write() [1].
The new reader/writer fails to wakeup the waiting hangup because the
wrong lock count value is checked (the old lock count rather than the new
lock count) to see if the lock is unowned.
Change helper function to return the new lock count if the cmpxchg was
successful; document this behavior.
Kent Overstreet [Mon, 11 Nov 2013 21:58:34 +0000 (13:58 -0800)]
bcache: New writeback PD controller
The old writeback PD controller could get into states where it had throttled all
the way down and take way too long to recover - it was too complicated to really
understand what it was doing.
This rewrites a good chunk of it to hopefully be simpler and make more sense,
and it also pays more attention to units which should make the behaviour a bit
easier to understand.
Kent Overstreet [Mon, 16 Dec 2013 22:12:09 +0000 (14:12 -0800)]
bcache: bugfix for race between moving_gc and bucket_invalidate
There is a possibility for a bucket to be invalidated by the allocator
while moving_gc was copying it's contents to another bucket, if the
bucket only held cached data. To prevent this moving checks for
a stale ptr (to an invalidated bucket), before and after reads.
It it finds one, it simply ignores moving that data. This only
affects bcache if the moving_gc was turned on, note that it's
off by default.
bcache: bugfix - moving_gc now moves only correct buckets
Removed gc_move_threshold because picking buckets only by
threshold could lead moving extra buckets (ei. if there are
buckets at the threshold that aren't supposed to be moved
do to space considerations).
This is replaced by a GC_MOVE bit in the gc_mark bitmask.
Now only marked buckets get moved.
Kent Overstreet [Mon, 11 Nov 2013 05:55:27 +0000 (21:55 -0800)]
bcache: Fix dirty_data accounting
Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.
NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes
necessary that got squashed in with other patches; the full patch against 3.10
is 408cc2f47eeac93a, available at:
git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes