Git Repo - linux.git/log

[netdrvr] smc911x: fix for driver resume (and compilation warning)

I am trying out suspend, resume on an OMAP3 based board. What I see
during resume is that the SMC911x driver resume routing gets stuck
after trying to transmit the packet out of the controller. Some debug
messages below:

--> smc911x_drv_resume
eth0: --> smc911x_reset
eth0: smc911x_reset timeout waiting for PM restore
eth0: --> smc911x_enable
eth0: --> smc911x_phy_configure()
eth0: --> smc911x_phy_reset()
eth0: phy caps=0x782d
eth0: phy advertised caps=0x0de1
eth0: --> smc911x_phy_check_media
smc911x_phy_read: phyaddr=0x1, phyreg=0x01, phydata=0x7809
smc911x_phy_read: phyaddr=0x1, phyreg=0x01, phydata=0x7809
eth0: link down
Restarting tasks ... eth0: --> smc911x_hard_start_xmit
eth0: --> smc911x_hardware_send_pkt
eth0: --> smc911x_hard_start_xmit
eth0: --> smc911x_hardware_send_pkt
eth0: --> smc911x_hard_start_xmit
eth0: --> smc911x_hardware_send_pkt
nfs: server 172.24.190.217 not responding, still trying
nfs: server 172.24.190.217 not responding, still trying

The following change makes it work fine: (The change within
smc911x_drv_probe function was to get rid of a compilation warning).

Signed-off-by: Romit Dasgupta <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

RDMA/cxgb3: deadlock in iw_cxgb3 can cause hang when configuring interface.

When the iw_cxgb3 module's cxgb3_client "add" func gets called by the
cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo
function in cxgb3 to get the fw version and other info.  Currently the
iwarp driver grabs the rtnl lock around this down call to serialize.
As of 2.6.27 or so, things changed such that the rtnl lock is held around
the call to the netdev driver open function.  Also the cxgb3_client "add"
function doesn't get called if the device is down.

So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device,
the iw_cxgb3 add func gets called with the rtnl_lock held.   If you
load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func
gets called without the rtnl_lock held.  The former causes the deadlock,
the latter does not.

In addition, there are iw_cxgb3 sysfs handlers that also can call
down into cxgb3 to gather the fw and hw versions.  These can be called
concurrently on different processors and at any time.  Thus we need to
push this serialization down in the cxgb3 driver get_drvinfo func.

The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3.

Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

cxgb3 - Limit multiqueue setting to msi-x

Allow multiqueue setting in MSI-X mode only

Signed-off-by: Divy Le Ray <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

cxgb3 - eeprom read fixes

Protect against invalid phy entries in the eeprom.
Extend eeprom access timeout.

Signed-off-by: Divy Le Ray <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

myri10ge: fix stop/go ordering even more

The doorbell writes may be seen out of order by the firmware if they
are in WC memory since the tx spin(un)lock does not flush WC writes.
Hence if the "stop" is written on a different CPU than the "go", it
is possible that the stop will arrive after the go unless we add an
explicit memory barrier (and mmiowb() is not enough).

It fixes transmit hangs in multi tx queue mode.

Signed-off-by: Brice Goglin <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

powerpc: Update desktop/server defconfigs

Turned off CONFIG_PCI_LEGACY and turned on EXT4, and otherwise mostly
took the defaults. This also updates ppc6xx_defconfig, which covers
the 6xx/7xx/7xxx-based embedded boards.

Signed-off-by: Paul Mackerras <[email protected]>

powerpc: Fix msr check in compat_sys_swapcontext

The new context may not be 16-byte aligned, so the real address of the
mcontext structure should be read from the uc_regs pointer instead of
directly using the (unaligned) uc_mcontext field.

Signed-off-by: Andreas Schwab <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>

Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent

drm/i915: Move legacy breadcrumb out of the reserved status page area

Addresses in the hardware status page below index 0x20 are reserved for use
by the hardware. The legacy breadcrumb was sitting at index 5. Move it to
index 0x21, and make sure everyone uses the defined value instead of
hard-coded constants.

Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/i915: Filter pci devices based on PCI_CLASS_DISPLAY_VGA

This fixes hangs on 855-class hardware by avoiding double attachment of the
driver due to the stub second head device having the same pci id as the real
device.

Other DRM drivers probably want this treatment as well, but I'm applying it
just to this one for safety. But we should clean up the drm_pciids.h mess
now so that each driver has its own pci id list header in its own directory.
Lets do that in the next release.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

libata: fix last_reset timestamp handling

ehc->last_reset is used to ensure that resets are not issued too
close to each other.  It's initialized to jiffies minus one minute
on EH entry.  However, when new links are initialized after PMP is
probed, new links have zero for this timestamp resulting in long wait
depending on the current jiffies.

This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
which case last_reset is always initialized.  As an added precaution,
WARN_ON() is added so that warning is printed if last_reset is
in future.

This problem is spotted and debugged by Shane Huang.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Shane Huang <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

libata: Avoid overflow in ata_tf_read_block() when tf->hba_lbal > 127

Phillip O'Donnell <[email protected]> pointed out that the same
sign extension bug that was fixed in commit ba14a9c2 ("libata: Avoid
overflow in ata_tf_to_lba48() when tf->hba_lbal > 127") also appears to
exist in ata_tf_read_block(). Fix this by adding a cast to u64.

Signed-off-by: Roland Dreier <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

[libata] pata_pcmcia: another memory card support

Support for Apacer photo steno pro card.

Signed-off-by: Marc Pignat <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

[libata] pata_sch: notice attached slave devices

I posted this last month, but was prompted to do so again in bz#467457

Add capability flag to support slave devices with pata_sch driver.

Signed-off-by: Mark Salter <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

[libata] pata_cs553*.c: cleanup kernel-doc

No arguments named @deadline in cs5535_cable_detect() and
cs5536_cable_detect(). Remove them.

Signed-off-by: Qinghuang Feng <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>

drm/radeon: map registers at load time

Now that the radeon driver has suspend/resume functions, it needs to map its
registers at load time or it will likely crash if a suspend operation occurs
before the driver has been initialized.

This patch moves the register mapping code from firstopen to load and makes
the mapping into a _DRM_DRIVER one so that the core won't remove it at
lastclose time.

Fixes (at least partially) kernel bz #11891.

Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm: Remove infrastructure for supporting i915's vblank swapping.

It's not used in any other drivers, and doesn't look like it will be from
drm.git master.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

i915: Remove racy delayed vblank swap ioctl.

When userland detected that this ioctl was supported (by version number check),
it used it in a racy way -- dispatch delayed swap, wait for vblank, continue
rendering. As there was no mechanism for it to wait for the swap to finish,
sometimes it would render before the swap and garbage would be displayed on
the screen.

By removing the ioctl and returning -EINVAL, userland returns to its previous,
correct rendering path of waiting for a vblank then dispatching a swap. The
only path that could have used this ioctl correctly was page flipping, which
relied on only one client running and emitting wait-for-vblank-before-rendering
in the command stream. That path also falls back correctly, at the performance
cost of not being able to queue up rendering before the flip occurs.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

i915: Don't whine when pci_enable_msi() fails.

This probably just means the chipset doesn't support MSI, which is fine.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

i915: Don't attempt to short-circuit object_wait_rendering by checking domains.

This could return early when reading after writing a buffer, if somebody
had already put it on the flushing list (write domains are 0, but still
active), leading to glReadPixels failure.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

i915: Clean up sarea pointers on leavevt

This corresponds to the setup of the sarea pointers in DMA initialization,
though neither is exactly the point at which the sarea is set up or torn down.

Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

i915: Save/restore MCHBAR_RENDER_STANDBY on GM965/GM45

This register is set by the 2D driver to prevent lockups, and so it needs to
be preserved across suspend/resume too. This makes my X200s work.

Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock

Impact: fix hang/crash on ia64 under high load

This is ugly, but the simplest patch by far.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.

Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.

Thanks to Ingo who nacked my previous buggy patch.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reported-by: Doug Chapman <[email protected]>

dsa: fix master interface allmulti/promisc handling

Before commit b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 ("net: only
invoke dev->change_rx_flags when device is UP"), the dsa driver could
sort-of get away with only fiddling with the master interface's
allmulti/promisc counts in ->change_rx_flags() and not touching them
in ->open() or ->stop(). After this commit (note that it was merged
almost simultaneously with the dsa patches, which is why this wasn't
caught initially), the breakage that was already there became more
apparent.

Since it makes no sense to keep the master interface's allmulti or
promisc count pinned for a slave interface that is down, copy the vlan
driver's sync logic (which does exactly what we want) over to dsa to
fix this.

Bug report from Dirk Teurlings <[email protected]> and Peter van Valderen
<[email protected]>.

Signed-off-by: Lennert Buytenhek <[email protected]>
Tested-by: Dirk Teurlings <[email protected]>
Tested-by: Peter van Valderen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dsa: fix skb->pkt_type when mac address of slave interface differs

When a dsa slave interface has a mac address that differs from that
of the master interface, eth_type_trans() won't explicitly set
skb->pkt_type back to PACKET_HOST -- we need to do this ourselves
before calling eth_type_trans().

Signed-off-by: Lennert Buytenhek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix setting of skb->tail in skb_recycle_check()

Since skb_reset_tail_pointer() reads skb->data, we need to set
skb->data before calling skb_reset_tail_pointer(). This was causing
spurious skb_over_panic()s from skb_put() being called on a recycled
skb that had its skb->tail set to beyond where it should have been.

Bug report from Peter van Valderen <[email protected]>.

Signed-off-by: Lennert Buytenhek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix /proc/net/snmp as memory corruptor

icmpmsg_put() can happily corrupt kernel memory, using a static
table and forgetting to reset an array index in a loop.

Remove the static array since its not safe without proper locking.

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mac80211: fix a buffer overrun in station debug code

net/mac80211/debugfs_sta.c
The trailing zero was written to state[4], it's out of bounds.

Signed-off-by: Jianjun Kong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ring-buffer: prevent infinite looping on time stamping

Impact: removal of unnecessary looping

The lockless part of the ring buffer allows for reentry into the code
from interrupts. A timestamp is taken, a test is preformed and if it
detects that an interrupt occurred that did tracing, it tries again.

The problem arises if the timestamp code itself causes a trace.
The detection will detect this and loop again. The difference between
this and an interrupt doing tracing, is that this will fail every time,
and cause an infinite loop.

Currently, we test if the loop happens 1000 times, and if so, it will
produce a warning and disable the ring buffer.

The problem with this approach is that it makes it difficult to perform
some types of tracing (tracing the timestamp code itself).

Each trace entry has a delta timestamp from the previous entry.
If a trace entry is reserved but and interrupt occurs and traces before
the previous entry is commited, the delta timestamp for that entry will
be zero. This actually makes sense in terms of tracing, because the
interrupt entry happened before the preempted entry was commited, so
one may consider the two happening at the same time. The order is
still preserved in the buffer.

With this idea, instead of trying to get a new timestamp if an interrupt
made it in between the timestamp and the test, the entry could simply
make the delta zero and continue. This will prevent interrupts or
tracers in the timer code from causing the above loop.

Signed-off-by: Steven Rostedt <[email protected]>

ftrace: disable tracing on resize

Impact: fix for bug on resize

This patch addresses the bug found here:

http://bugzilla.kernel.org/show_bug.cgi?id=11996

When ftrace converted to the new unified trace buffer, the resizing of
the buffer was not protected as much as it was originally. If tracing
is performed while the resize occurs, then the buffer can be corrupted.

This patch disables all ftrace buffer modifications before a resize
takes place.

Signed-off-by: Steven Rostedt <[email protected]>

netfilter: payload_len is be16, add size of struct rather than size of pointer

payload_len is a be16 value, not cpu_endian, also the size of a ponter
to a struct ipv6hdr was being added, not the size of the struct itself.

Signed-off-by: Harvey Harrison <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: fix ip6_mr_init error path

The order of cleanup operations in the error/exit section of ip6_mr_init()
is completely inversed. It should be the other way around.
Also a del_timer() is missing in the error path.

Signed-off-by: Benjamin Thery <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[4/4] dca: fixup initialization dependency

Mark dca_init as a subsys_initcall since it needs to be ready to go
before dependent drivers start registering themselves.

Cc: <[email protected]>
Reported-and-tested-by: Mark Rustad <[email protected]>
Acked-by: Maciej Sosnowski <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[3/4] I/OAT: fix async_tx.callback checking

async_tx.callback should be checked for the first
not the last descriptor in the chain.

Cc: <[email protected]>
Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[2/4] I/OAT: fix dma_pin_iovec_pages() error handling

Error handling needs to be modified in dma_pin_iovec_pages().
It should return NULL instead of ERR_PTR
(pinned_list is checked for NULL in tcp_recvmsg() to determine
if iovec pages have been successfully pinned down).
In case of error for the first iovec,
local_list->nr_iovecs needs to be initialized.

Cc: <[email protected]>
Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[1/4] I/OAT: fix channel resources free for not allocated channels

If the ioatdma driver is loaded but not used it does not allocate descriptors.
Before it frees channel resources it should first be sure
that they have been previously allocated.

Cc: <[email protected]>
Signed-off-by: Maciej Sosnowski <[email protected]>
Tested-by: Tom Picard <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ssb: Fix DMA-API compilation for non-PCI systems

This fixes compilation of the SSB DMA-API code on non-PCI platforms.

Signed-off-by: Michael Buesch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

SSB: hide empty sub menu

If the target system cannot support SSB, then don't show the menu option as
it'll simply be an empty submenu.

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nohz: disable tick_nohz_kick_tick() for now

Impact: nohz powersavings and wakeup regression

commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
device from irq_enter()) causes a serious wakeup regression.

While the patch is correct it does not take into account that spurious
wakeups happen on x86. A fix for this issue is available, but we just
revert to the .27 behaviour and let long running softirqs screw
themself.

Disable it for now.

Signed-off-by: Thomas Gleixner <[email protected]>

vlan: Fix typos in proc output string

Signed-off-by: Ferenc Wagner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

irq: call __irq_enter() before calling the tick_idle_check

Impact: avoid spurious ksoftirqd wakeups

The tick idle check which is called from irq_enter() was run before
the call to __irq_enter() which did not set the in_interrupt() bits in
preempt_count. That way the raise of a softirq woke up softirqd for
nothing as the softirq was handled on return from interrupt.

Call __irq_enter() before calling into the tick idle check code.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

x86: Make NUMA on 32-bit depend on BROKEN

While investigating the failure of hibernation on 32-bit x86 with
CONFIG_NUMA set, as described in this message
http://marc.info/?l=linux-kernel&m=122634118116226&w=4
I asked some people for help and I was told that it wasn't really
worth the effort, because CONFIG_NUMA was generally broken on 32-bit
x86 systems and it shouldn't be used in such configs. For this
reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on
x86-32.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

KEYS: Make request key instantiate the per-user keyrings

Make request_key() instantiate the per-user keyrings so that it doesn't oops
if it needs to get hold of the user session keyring because there isn't a
session keyring in place.

Signed-off-by: David Howells <[email protected]>
Tested-by: Steve French <[email protected]>
Tested-by: Rutger Nijlunsing <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc: Repair device bindings documentation

Commit d0fc2eaaf4c56a95f5ed29b6bfb609e19714fc16 "powerpc/fsl: Refactor
device bindings" split out a number of device bindings from
booting-without-of.txt into separate files. Having them all in one file
was a frequent source of merge conflicts.

However, in the next merge, 49997d75152b3d23c53b0fa730599f2f74c92c65, there
was another conflict. Some of the bindings removed from
booting-without-of.txt were mistakenly added back in and the copies in
dts-bindings were kept as well.

This patch re-removes "Freescale Display Interface" and "Freescale on board
FPGA" and fixes the table of contents.

Signed-off-by: Trent Piepho <[email protected]>
Signed-off-by: Kumar Gala <[email protected]>

Btrfs: empty_size allocation fixes again

The allocator wasn't catching all of the cases where it needed to do
extra loops because the check to enforce them wasn't happening early
enough.

When the allocator decided to increase the size of the allocation
for metadata clustering, it wasn't always setting the empty_size to
include the extra (optional) bytes. This also fixes the empty_size field
to be correct.

Signed-off-by: Chris Mason <[email protected]>

sparc64: Update defconfig.

Signed-off-by: David S. Miller <[email protected]>

Revert "sparc: correct section of current_pc()"

This reverts commit 8dd9453737822469837d48d5da3785ce70fb2118.

This fixes a boot failure reported by Robert Reif.

The code above the section change expects to fallthrough, so
we can't make such a section change here.

Btrfs: tune btrfs unplug functions for a small number of devices

When btrfs unplugs, it tries to find the correct device to unplug
via search through the extent_map tree. This avoids unplugging
a device that doesn't need it, but is a waste of time for filesystems
with a small number of devices.

This patch checks the total number of devices before doing the
search.

Signed-off-by: Chris Mason <[email protected]>

ocfs2: Check search result in ocfs2_xattr_block_get()

ocfs2_xattr_block_get() calls ocfs2_xattr_search() to find an external
xattr, but doesn't check the search result that is passed back via struct
ocfs2_xattr_search. Add a check for search result, and pass back -ENODATA if
the xattr search failed. This avoids a later NULL pointer error.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: fix printk related build warnings in xattr.c

Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: truncate outstanding block after direct io failure

Signed-off-by: Dmitri Monakhov <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2/xattr: Proper hash collision handle in bucket division

In ocfs2/xattr, we must make sure the xattrs which have the same hash value
exist in the same bucket so that the search schema can work. But in the old
implementation, when we want to extend a bucket, we just move half number of
xattrs to the new bucket. This works in most cases, but if we are lucky
enough we will move 2 xattrs into 2 different buckets. This means that an
xattr from the previous bucket cannot be found anymore. This patch fix this
problem by finding the right position during extending the bucket and extend
an empty bucket if needed.

Signed-off-by: Tao Ma <[email protected]>
Cc: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: return 0 in page_mkwrite to let VFS retry.

In ocfs2_page_mkwrite, we return -EINVAL when we found the page mapping
isn't updated, and it will cause the user space program get SIGBUS and
exit. The reason is that during race writeable mmap, we will do
unmap_mapping_range in ocfs2_data_downconvert_worker. The good thing is
that if we reuturn 0 in page_mkwrite, VFS will retry fault and then
call page_mkwrite again, so it is safe to return 0 here.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Set journal descriptor to NULL after journal shutdown

Patch sets journal descriptor to NULL after the journal is shutdown.
This ensures that jbd2_journal_release_jbd_inode(), which removes the
jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
even after the journal has been shutdown.

Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Fix check of return value of ocfs2_start_trans() in xattr.c.

On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM),
so we should check whether handle is NULL. Fix them to use IS_ERR().
Jan has made the patch for other part in ocfs2(thank Jan for it), so
this is just the fix for fs/ocfs2/xattr.c.

Signed-off-by: Tao Ma <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails

We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked()
and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed
space was not released). Fix it.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Fix checking of return value of new_inode()

new_inode() does not return ERR_PTR() but NULL in case of failure. Correct
checking of the return value.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Fix check of return value of ocfs2_start_trans()

On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM).
Thus checks for !handle are wrong. Fix them to use IS_ERR().

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Fix some typos in xattr annotations.

Fix some typos in the xattr annotations.

Signed-off-by: Tao Ma <[email protected]>
Reported-by: Coly Li <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Remove unused ocfs2_restore_xattr_block().

Since now ocfs2 supports empty xattr buckets, we will never remove
the xattr index tree even if all the xattrs are removed, so this
function will never be called. So remove it.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Don't repeat ocfs2_xattr_block_find()

ocfs2_xattr_block_get() looks up the xattr in a startlingly familiar
way; it's identical to the function ocfs2_xattr_block_find(). Let's just
use the later in the former.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Specify appropriate journal access for new xattr buckets.

There are a couple places that get an xattr bucket that may be reading
an existing one or may be allocating a new one. They should specify the
correct journal access mode depending.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Check errors from ocfs2_xattr_update_xattr_search()

The ocfs2_xattr_update_xattr_search() function can return an error when
trying to read blocks off of disk. The caller needs to check this error
before using those (possibly invalid) blocks.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Don't return -EFAULT from a corrupt xattr entry.

If the xattr disk structures are corrupt, return -EIO, not -EFAULT.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: Check xattr block signatures properly.

The xattr.c code is currently memcmp()ing naking buffer pointers.
Create the OCFS2_IS_VALID_XATTR_BLOCK() macro to match its peers and use
that.

In addition, failed signature checks were returning -EFAULT, which is
completely wrong. Return -EIO.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: add handler_map array bounds checking

Make the handler_map array as large as the possible value range to avoid
a fencepost error.

[ Utilize alternate method -- Joel ]

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: remove duplicate definition in xattr

Include/linux/xattr.h already has the definition about xattr prefix,
so remove the duplicate definitions in xattr.c.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: fix function declaration and definition in xattr

Because we merged the xattr sources into one file, some functions
no longer belong in the header file.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

ocfs2: fix license in xattr

This patch fixes the license in xattr.c and xattr.h.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>

Btrfs: Turn off extent state leak debugging

The extent_io.c code has a #define to find and cleanup extent state leaks
on module unmount. This adds a very highly contended spinlock to a
hot path for most FS operations.

Turn it off by default. A later changeset will add a .config option
for it.

Signed-off-by: Chris Mason <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Make the HP EliteBook 8530p use AD1884A model laptop
  ALSA: gusextreme: Fix build errors
  ALSA: hdsp: check for iobox and upload firmware during ioctl
  ALSA: HDSP: check for io box before uploading firmware
  ALSA: hda - Add another HP model (6730s) for AD1884A
  alsa: fix snd_BUG_on() and friends
  ALSA: hda - Add a quirk for MEDION MD96630
  ALSA: hda - Limit the number of GPIOs show in proc

Merge branches 'topic/fix/misc' and 'topic/fix/hda' into for-linus

ALSA: hda - Make the HP EliteBook 8530p use AD1884A model laptop

Added a QUIRK to patch_analog.c for the HP Elitebook 8530p
(IDs 0x103c:0x30e7) to use AD1884A model 'laptop' by default.
Playback and Capture confirmed working.

Signed-off-by: Travis Place <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Btrfs: Fix usage of struct extent_map->orig_start

This makes sure the orig_start field in struct extent_map gets set
everywhere the extent_map structs are created or modified.

Signed-off-by: Chris Mason <[email protected]>

Btrfs: Use invalidatepage when writepage finds a page outside of i_size

With all the recent fixes to the delalloc locking, it is now safe
again to use invalidatepage inside the writepage code for
pages outside of i_size. This used to deadlock against some of the
code to write locked ranges of pages, but all of that has been fixed.

Signed-off-by: Chris Mason <[email protected]>

Btrfs: Try harder while searching for free space

The loop searching for free space would exit out too soon when
metadata clustering was trying to allocate a large extent. This makes
sure a full scan of the free space is done searching for only the
minimum extent size requested by the higher layers.

Signed-off-by: Chris Mason <[email protected]>

Btrfs: Fix use after free during compressed reads

Yan's fix to use the correct file offset during compressed reads used the
extent_map struct pointer after it had been freed. This saves the
fields we want for later use instead.

Signed-off-by: Chris Mason <[email protected]>

x86: HPET: enter hpet_interrupt_handler with interrupts disabled

Some functions that may be called from this handler require that
interrupts are disabled. Also, combining IRQF_DISABLED and
IRQF_SHARED does not reliably disable interrupts in a handler, so
remove IRQF_SHARED from the irq flags (this irq is not shared anyway).

Signed-off-by: Matt Fleming <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: "Will Newton" <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP

In hpet_next_event() we check that the value we just wrote to
HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that
the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if
timer is anything other than timer 0, is likely to fail.

Signed-off-by: Matt Fleming <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

x86: HPET: convert WARN_ON to WARN_ON_ONCE

It is possible to flood the console with call traces if the WARN_ON
condition is true because of the frequency with which this function is
called.

Signed-off-by: Matt Fleming <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

libata: revert convert-to-block-tagging patches

This patch reverts the following three commits which convert libata to
use block layer tagging.

43a49cbdf31e812c0d8f553d433b09b421f5d52c
e013e13bf605b9e6b702adffbe2853cfc60e7806
2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e

Although using block layer tagging is the right direction, due to the
tight coupling among tag number, data structure allocation and
hardware command slot allocation, libata doesn't work correctly with
the current conversion.

The biggest problem is guaranteeing that tag 0 is always used for
non-NCQ commands. Due to the way blk-tag is implemented and how SCSI
starts and finishes requests, such guarantee can't be made. I'm not
sure whether this would actually break any low level driver but it
doesn't look like a good idea to break such assumption given the
frailty of ATA controllers.

So, for the time being, keep using the old dumb in-libata qc
allocation.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jens Axobe <[email protected]>
Cc: Jeff Garzik <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Btrfs: Fix csum error for compressed data

The decompress code doesn't take the logical offset in extent
pointer into account. If the logical offset isn't zero, data
will be decompressed into wrong pages.

The solution used here is to record the starting offset of the extent
in the file separately from the logical start of the extent_map struct.
This allows us to avoid problems inserting overlapping extents.

Signed-off-by: Yan Zheng <[email protected]>

Btrfs: Make sure pages are dirty before doing delalloc for them

This adds a PageDirty check to the writeback path that locks pages
for delalloc. If a page wasn't dirty at this point, it is in the
process of being truncated away.

Signed-off-by: Chris Mason <[email protected]>

Btrfs: Don't substract too much from the allocation target (avoid wrapping)

When metadata allocation clustering has to fall back to unclustered
allocs because large free areas could not be found, it was sometimes
substracting too much from the total bytes to allocate. This would
make it wrap below zero.

Signed-off-by: Chris Mason <[email protected]>

sh: Handle fixmap TLB eviction more coherently.

There was a race in the kmap_coherent() implementation. While we
guarded against preemption, there was nothing preventing eviction of
the pre-faulted fixmap entry from the UTLB. Under certain workloads
this would result in the fixmap entries used for cache colouring being
evicted from the UTLB in the midst of a copy_page().

In addition to pre-faulting, we also make sure to preserve the PTEs
in the kernel page table and introduce a cached PTE for kmap_coherent()
usage. This follows a similar change on MIPS ("[MIPS] Fix aliasing bug
in copy_to_user_page / copy_from_user_page").

Reported-by: Hideo Saito <[email protected]>
Reported-by: CHIKAMA Masaki <[email protected]>
Tested-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Paul Mundt <[email protected]>

sched: clean up debug info

Impact: clean up and fix debug info printout

While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.

Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

[XFS] XFS: Check for valid transaction headers in recovery

When we are about to add a new item to a transaction in recovery, we need
to check that it is valid first. Currently we just assert that header
magic number matches, but in production systems that is not present and we
add a corrupted transaction to the list to be processed. This results in a
kernel oops later when processing the corrupted transaction.

Instead, if we detect a corrupted transaction, abort recovery and leave
the user to clean up the mess that has occurred.

SGI-PV: 988145

SGI-Modid: xfs-linux-melb:xfs-kern:32356a

Signed-off-by: David Chinner <[email protected]>
Signed-off-by: Tim Shimmin <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: Lachlan McIlroy <[email protected]>

[XFS] handle memory allocation failures during log initialisation

When there is no memory left in the system, xfs_buf_get_noaddr()
can fail. If this happens at mount time during xlog_alloc_log()
we fail to catch the error and oops.

Catch the error from xfs_buf_get_noaddr(), and allow other memory
allocations to fail and catch those errors too. Report the error
to the console and fail the mount with ENOMEM.

Tested by manually injecting errors into xfs_buf_get_noaddr() and
xlog_alloc_log().

Version 2:
o remove unnecessary casts of the returned pointer from kmem_zalloc()

SGI-PV: 987246

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Lachlan McIlroy <[email protected]>

ALSA: gusextreme: Fix build errors

gusextreme depends on opl3 support. Add the approriate select to Kconfig.
Also remove the unnecessary hwdep select.

Relevant build errors:
ERROR: "snd_opl3_hwdep_new" [sound/isa/gus/snd-gusextreme.ko] undefined!
ERROR: "snd_opl3_create" [sound/isa/gus/snd-gusextreme.ko] undefined!

Signed-off-by: Ville Syrjala <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

[XFS] Account for allocated blocks when expanding directories

When we create a directory, we reserve a number of blocks for the maximum
possible expansion of of the directory due to various btree splits,
freespace allocation, etc. Unfortunately, each allocation is not reflected
in the total number of blocks still available to the transaction, so the
maximal reservation is used over and over again.

This leads to problems where an allocation group has only enough blocks
for *some* of the allocations required for the directory modification.
After the first N allocations, the remaining blocks in the allocation
group drops below the total reservation, and subsequent allocations fail
because the allocator will not allow the allocation to proceed if the AG
does not have the enough blocks available for the entire allocation total.

This results in an ENOSPC occurring after an allocation has already
occurred. This results in aborting the directory operation (leaving the
directory in an inconsistent state) and cancelling a dirty transaction,
which results in a filesystem shutdown.

Avoid the problem by reflecting the number of blocks allocated in any
directory expansion in the total number of blocks available to the
modification in progress. This prevents a directory modification from
being aborted part way through with an ENOSPC.

SGI-PV: 988144

SGI-Modid: xfs-linux-melb:xfs-kern:32340a

Signed-off-by: David Chinner <[email protected]>
Signed-off-by: Lachlan McIlroy <[email protected]>

[XFS] Wait for all I/O on truncate to zero file size

It's possible to have outstanding xfs_ioend_t's queued when the file size
is zero. This can happen in the direct I/O path when a direct I/O write
fails due to ENOSPC. In this case the xfs_ioend_t will still be queued (ie
xfs_end_io_direct() does not know that the I/O failed so can't force the
xfs_ioend_t to be flushed synchronously).

When we truncate a file on unlink we don't know to wait for these
xfs_ioend_ts and we can have a use-after-free situation if the inode is
reclaimed before the xfs_ioend_t is finally processed.

As was suggested by Dave Chinner lets wait for all I/Os to complete when
truncating the file size to zero.

SGI-PV: 981668

SGI-Modid: xfs-linux-melb:xfs-kern:32216a

Signed-off-by: Lachlan McIlroy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

[XFS] Fix use-after-free with log and quotas

Destroying the quota stuff on unmount can access the log - ie
XFS_QM_DONE() ends up in xfs_dqunlock() which calls
xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the
log has already been destroyed. Just move the cleanup of the quota code
earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving
XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot.

SGI-PV: 987086

SGI-Modid: xfs-linux-melb:xfs-kern:32148a

Signed-off-by: Lachlan McIlroy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Peter Leckie <[email protected]>

Linux 2.6.28-rc4

regression: disable timer peek-ahead for 2.6.28

It's showing up as regressions; disabling it very likely just papers
over an underlying issue, but time is running out for 2.6.28, lets get
back to this for 2.6.29

Fixes: #11826 and #11893
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
kbuild: Fixup deb-pkg target to generate separate firmware deb

kbuild: Fixup deb-pkg target to generate separate firmware deb

The below is a simplistic fix for "make deb-pkg"; it splits the
firmware out to a linux-firmware-image package and adds an
(unversioned) Suggests to the linux package for this firmware.

Signed-Off-By: Jonathan McDowell <[email protected]>
Acked-by: Frans Pop <[email protected]>
Signed-off-by: Sam Ravnborg <[email protected]>

Don't ask twice about not including staging drivers

The "Exclude staging drivers" question is there so that we don't build
staging drivers for allyesconfig or allnoconfig settings, but it's very
irritating when you've already said "no" to staging drivers earlier.

There is absolutely no point in declining twice - once you've declined
the staging drivers, you're done.

So make the second question depend on the first question having been
answered in the affirmative.

Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux

* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
Fix nfsd truncation of readdir results

Merge branch 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpumask: introduce new API, without changing anything, v3
  cpumask: new API, v2
  cpumask: introduce new API, without changing anything

Fix nfsd truncation of readdir results

Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
situations" introduced a bug: on a directory in an exported ext3
filesystem with dir_index unset, a READDIR will only return about 250
entries, even if the directory was larger.

Bisected it back to this commit; reverting it fixes the problem.

It turns out that in this case ext3 reads a block at a time, then
returns from readdir, which means we can end up with buf.full==0 but
with more entries in the directory still to be read.  Before 8d7c4203
(but after c002a6c797 "Optimise NFS readdir hack slightly"), this would
cause us to return the READDIR result immediately, but with the eof bit
unset.  That could cause a performance regression (because the client
would need more roundtrips to the server to read the whole directory),
but no loss in correctness, since the cleared eof bit caused the client
to send another readdir.  After 8d7c4203, the setting of the eof bit
made this a correctness problem.

So, move nfserr_eof into the loop and remove the buf.full check so that
we loop until buf.used==0.  The following seems to do the right thing
and reduces the network traffic since we don't return a READDIR result
until the buffer is full.

Tested on an empty directory & large directory; eof is properly sent and
there are no more short buffers.

Signed-off-by: Doug Nazar <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>