]> Git Repo - linux.git/log
linux.git
9 years agoMerge tag 'hwmon-for-linus-v4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 18 Dec 2015 20:51:52 +0000 (12:51 -0800)]
Merge tag 'hwmon-for-linus-v4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Select CONFIG_BITREVERSE for sht15 driver to avoid build failure if
   it is not configured.

 - Force wait for conversion time for the first valid data in tmp102
   driver to avoid reporting erroneous data to the thermal subsystem.

* tag 'hwmon-for-linus-v4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (sht15) Select CONFIG_BITREVERSE
  hwmon: (tmp102) Force wait for conversion time for the first valid data

9 years agoMerge tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Dec 2015 20:38:35 +0000 (12:38 -0800)]
Merge tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Two similar fixes for the Intel and AMD IOMMU drivers to add proper
  access checks before calling handle_mm_fault"

* tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Do access checks before calling handle_mm_fault()
  iommu/amd: Do proper access checking before calling handle_mm_fault()

9 years agoMerge tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Dec 2015 20:24:52 +0000 (12:24 -0800)]
Merge tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:
 - XSA-155 security fixes to backend drivers.
 - XSA-157 security fixes to pciback.

* tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: fix up cleanup path when alloc fails
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()
  xen/x86/pvh: Use HVM's flush_tlb_others op
  xen: Resume PMU from non-atomic context
  xen/events/fifo: Consume unprocessed events when a CPU dies

9 years agoMerge tag 'arc-fixes-for-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Dec 2015 20:19:01 +0000 (12:19 -0800)]
Merge tag 'arc-fixes-for-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC architecture fixes from Vineet Gupta:
 "Fixes for:

 - perf interrupts on SMP: Not enabled (at boot) and disabled (at runtime)
 - stack unwinder regression (for modules, ignoring dwarf3)
 - nsim hosed for non default kernel link base builds"

* tag 'arc-fixes-for-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: smp: Rename platform hook @init_cpu_smp -> @init_per_cpu
  ARC: rename smp operation init_irq_cpu() to init_per_cpu()
  ARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing
  ARC: dw2 unwind: Reinstante unwinding out of modules
  ARC: [plat-sim] unbork non default CONFIG_LINUX_LINK_BASE
  ARC: intc: Document arc_request_percpu_irq() better
  ARCv2: perf: Ensure perf intr gets enabled on all cores
  ARC: intc: No need to clear IRQ_NOAUTOEN
  ARCv2: intc: Fix random perf irq disabling in SMP setup
  ARC: [axs10x] cap ethernet phy to 100 Mbit/sec

9 years agoMerge tag 'sound-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 18 Dec 2015 19:47:06 +0000 (11:47 -0800)]
Merge tag 'sound-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "As usual in rc6, this update contains only a few HD-audio and
  USB-audio device-specific quirks: yet another Thinkpad noise fixes,
  Dell headphone mic fixes, and AudioQuest DragonFly fixes"

* tag 'sound-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add a fixup for Thinkpad X1 Carbon 2nd
  ALSA: hda - Set codec to D3 at reboot/shutdown on Thinkpads
  ALSA: hda - Apply click noise workaround for Thinkpads generically
  ALSA: hda - Fix headphone mic input on a few Dell ALC293 machines
  ALSA: usb-audio: Add sample rate inquiry quirk for AudioQuest DragonFly
  ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly

9 years agoMerge tag 'for-linus-20151217' of git://git.infradead.org/linux-mtd
Linus Torvalds [Fri, 18 Dec 2015 19:19:16 +0000 (11:19 -0800)]
Merge tag 'for-linus-20151217' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
 "I was holding out on this pull request for a bit, since there are a
  few other small issues being discussed that look like 4.4-rc
  regressions.  Hopefully I can get those stabilized soon, but these are
  ready at any rate:

   - A little bit of a last-minute change for the device tree "fixed
     partition" binding.  This is needed because we might want to reuse
     the 'partitions' subnode for other sorts of partitioning
     descriptions -- e.g., for describing which on-flash partition
     format(s) might be used on the system.

   - Also tone down a warning message, since it is probably going to
     show up on a lot of systems where it should just be ignored"

* tag 'for-linus-20151217' of git://git.infradead.org/linux-mtd:
  doc: dt: mtd: partitions: add compatible property to "partitions" node
  mtd: ofpart: don't complain about missing 'partitions' node too loudly

9 years agoMerge branch 'freespace-tree' into for-linus-4.5
Chris Mason [Fri, 18 Dec 2015 19:11:10 +0000 (11:11 -0800)]
Merge branch 'freespace-tree' into for-linus-4.5

Signed-off-by: Chris Mason <[email protected]>
9 years agoUSB: fix invalid memory access in hub_activate()
Alan Stern [Wed, 16 Dec 2015 18:32:38 +0000 (13:32 -0500)]
USB: fix invalid memory access in hub_activate()

Commit 8520f38099cc ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue.  However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so.  As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated.  Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.

This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running.  It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Alexandru Cornea <[email protected]>
Tested-by: Alexandru Cornea <[email protected]>
Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work")
CC: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
9 years agoUSB: ipaq.c: fix a timeout loop
Dan Carpenter [Wed, 16 Dec 2015 11:06:37 +0000 (14:06 +0300)]
USB: ipaq.c: fix a timeout loop

The code expects the loop to end with "retries" set to zero but, because
it is a post-op, it will end set to -1.  I have fixed this by moving the
decrement inside the loop.

Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.')
Signed-off-by: Dan Carpenter <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
9 years ago[media] airspy: increase USB control message buffer size
Antti Palosaari [Mon, 26 Oct 2015 20:58:14 +0000 (18:58 -0200)]
[media] airspy: increase USB control message buffer size

Driver requested device firmware version string during probe using
only 24 byte long buffer. That buffer is too small for newer firmware
versions, which causes device firmware hang - device stops responding
to any commands after that. Increase buffer size to 128 which should
be enough for any current and future version strings.

Link: https://github.com/airspy/host/issues/27
Cc: <[email protected]> # 3.17+
Reported-by: Benjamin Vernoux <[email protected]>
Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
9 years ago[media] hackrf: move RF gain ctrl enable behind module parameter
Antti Palosaari [Fri, 23 Oct 2015 22:01:31 +0000 (20:01 -0200)]
[media] hackrf: move RF gain ctrl enable behind module parameter

Used Avago MGA-81563 RF amplifier could be destroyed pretty easily
with too strong signal or transmitting to bad antenna.
Add module parameter 'enable_rf_gain_ctrl' which allows enabling
RF gain control - otherwise, default without the module parameter,
RF gain control is set to 'grabbed' state which prevents setting
value to the control.

Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
9 years ago[media] hackrf: fix possible null ptr on debug printing
Antti Palosaari [Wed, 21 Oct 2015 21:02:41 +0000 (19:02 -0200)]
[media] hackrf: fix possible null ptr on debug printing

drivers/media/usb/hackrf/hackrf.c:1533 hackrf_probe()
error: we previously assumed 'dev' could be null (see line 1366)

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
9 years ago[media] Revert "[media] ivtv: avoid going past input/audio array"
Mauro Carvalho Chehab [Wed, 11 Nov 2015 11:22:36 +0000 (09:22 -0200)]
[media] Revert "[media] ivtv: avoid going past input/audio array"

This patch broke ivtv logic, as reported at
 https://bugzilla.redhat.com/show_bug.cgi?id=1278942

This reverts commit 09290cc885937cab3b2d60a6d48fe3d2d3e04061.

Cc: [email protected] # for v4.1 and upper
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
9 years agoMerge tag 'phy-for-4.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon...
Greg Kroah-Hartman [Fri, 18 Dec 2015 17:24:50 +0000 (09:24 -0800)]
Merge tag 'phy-for-4.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-linus

Kishon writes:

phy: for 4.4 -rc

*) Add missing of_node_put in a bunch of PHY drivers
*) Add get_device in devm_of_phy_get_by_index()
*) Fix randconfig build error in sun9i usb driver

9 years agoxen-pciback: fix up cleanup path when alloc fails
Doug Goldstein [Thu, 26 Nov 2015 20:32:39 +0000 (14:32 -0600)]
xen-pciback: fix up cleanup path when alloc fails

When allocating a pciback device fails, clear the private
field. This could lead to an use-after free, however
the 'really_probe' takes care of setting
dev_set_drvdata(dev, NULL) in its failure path (which we would
exercise if the ->probe function failed), so we we
are OK. However lets be defensive as the code can change.

Going forward we should clean up the pci_set_drvdata(dev, NULL)
in the various code-base. That will be for another day.

Reviewed-by: Boris Ostrovsky <[email protected]>
Reported-by: Jonathan Creekmore <[email protected]>
Signed-off-by: Doug Goldstein <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agohwmon: (sht15) Select CONFIG_BITREVERSE
Arnd Bergmann [Fri, 18 Dec 2015 14:52:28 +0000 (15:52 +0100)]
hwmon: (sht15) Select CONFIG_BITREVERSE

If CONFIG_BITREVERSE is not built-in, the sht15 driver fails to link:

drivers/built-in.o: In function `sht15_crc8':
drivers/hwmon/sht15.c:195: undefined reference to `byte_rev_table'

This adds a Kconfig 'select' statement, like all other users of
bitrev.h have it.

Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: 33836ee98533 ("hwmon:change sht15_reverse()")
Signed-off-by: Guenter Roeck <[email protected]>
9 years agoxen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:13:27 +0000 (18:13 -0500)]
xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.

commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.

Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).

Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.

Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.

This is part of XSA-157

CC: [email protected]
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
Konrad Rzeszutek Wilk [Wed, 1 Apr 2015 14:49:47 +0000 (10:49 -0400)]
xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.

Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).

This does not change the behavior of XEN_PCI_OP_disable_msi[|x].

The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.

However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.

This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.

Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.

This is part of XSA-157

CC: [email protected]
Reviewed-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen/pciback: Do not install an IRQ handler for MSI interrupts.
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 22:24:08 +0000 (17:24 -0500)]
xen/pciback: Do not install an IRQ handler for MSI interrupts.

Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:

 for (i = 0; i < entry->nvec_used; i++)
        BUG_ON(irq_has_action(entry->irq + i));

Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.

To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.

Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.

Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.

Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).

Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.

The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.

P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.

This is part of XSA-157

CC: [email protected]
Reviewed-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X...
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:07:44 +0000 (18:07 -0500)]
xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled

The guest sequence of:

  a) XEN_PCI_OP_enable_msix
  b) XEN_PCI_OP_enable_msix

results in hitting an NULL pointer due to using freed pointers.

The device passed in the guest MUST have MSI-X capability.

The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).

The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.

The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.

This is part of XSA-157

CC: [email protected]
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 15:08:22 +0000 (11:08 -0400)]
xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

The guest sequence of:

 a) XEN_PCI_OP_enable_msi
 b) XEN_PCI_OP_enable_msi
 c) XEN_PCI_OP_disable_msi

results in hitting an BUG_ON condition in the msi.c code.

The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.

The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:

BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));

and blows up.

The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.

This is part of XSA-157.

CC: [email protected]
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen/pciback: Save xen_pci_op commands before processing it
Konrad Rzeszutek Wilk [Mon, 16 Nov 2015 17:40:48 +0000 (12:40 -0500)]
xen/pciback: Save xen_pci_op commands before processing it

Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.

The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.

This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.

This is part of XSA155.

CC: [email protected]
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen-scsiback: safely copy requests
David Vrabel [Mon, 16 Nov 2015 18:02:32 +0000 (18:02 +0000)]
xen-scsiback: safely copy requests

The copy of the ring request was lacking a following barrier(),
potentially allowing the compiler to optimize the copy away.

Use RING_COPY_REQUEST() to ensure the request is copied to local
memory.

This is part of XSA155.

CC: [email protected]
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen-blkback: read from indirect descriptors only once
Roger Pau Monné [Tue, 3 Nov 2015 16:40:43 +0000 (16:40 +0000)]
xen-blkback: read from indirect descriptors only once

Since indirect descriptors are in memory shared with the frontend, the
frontend could alter the first_sect and last_sect values after they have
been validated but before they are recorded in the request.  This may
result in I/O requests that overflow the foreign page, possibly
overwriting local pages when the I/O request is executed.

When parsing indirect descriptors, only read first_sect and last_sect
once.

This is part of XSA155.

CC: [email protected]
Signed-off-by: Roger Pau Monné <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen-blkback: only read request operation from shared ring once
Roger Pau Monné [Tue, 3 Nov 2015 16:34:09 +0000 (16:34 +0000)]
xen-blkback: only read request operation from shared ring once

A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.

When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().

This is part of XSA155.

CC: [email protected]
Signed-off-by: Roger Pau Monné <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen-netback: use RING_COPY_REQUEST() throughout
David Vrabel [Fri, 30 Oct 2015 15:17:06 +0000 (15:17 +0000)]
xen-netback: use RING_COPY_REQUEST() throughout

Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.

This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.

This is part of XSA155.

CC: [email protected]
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen-netback: don't use last request to determine minimum Tx credit
David Vrabel [Fri, 30 Oct 2015 15:16:01 +0000 (15:16 +0000)]
xen-netback: don't use last request to determine minimum Tx credit

The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.

Instead allow for the worst case 128 KiB packet.

This is part of XSA155.

CC: [email protected]
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agoxen: Add RING_COPY_REQUEST()
David Vrabel [Fri, 30 Oct 2015 14:58:08 +0000 (14:58 +0000)]
xen: Add RING_COPY_REQUEST()

Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected).  Safe usage of a request
generally requires taking a local copy.

Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy().  This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.

Use a volatile source to prevent the compiler from reordering or
omitting the copy.

This is part of XSA155.

CC: [email protected]
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
9 years agopowerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"
Alistair Popple [Fri, 18 Dec 2015 06:16:17 +0000 (17:16 +1100)]
powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"

Commit 25642e1459ac ("powerpc/opal-irqchip: Fix double endian
conversion") fixed an endian bug by calling opal_handle_events() in
opal_event_unmask().

However this introduced a deadlock if we find an event is active
during unmasking and call opal_handle_events() again. The bad call
sequence is:

  opal_interrupt()
  -> opal_handle_events()
     -> generic_handle_irq()
        -> handle_level_irq()
           -> raw_spin_lock(&desc->lock)
              handle_irq_event(desc)
              unmask_irq(desc)
              -> opal_event_unmask()
                 -> opal_handle_events()
                    -> generic_handle_irq()
                       -> handle_level_irq()
                          -> raw_spin_lock(&desc->lock) (BOOM)

When generating multiple opal events in quick succession this would lead
to the following stall warnings:

EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32
INFO: rcu_sched detected stalls on CPUs/tasks:

         12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065
         15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065
         (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602)
NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696]
INFO: rcu_sched detected stalls on CPUs/tasks:
         12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371
         15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371
         (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290)

This patch corrects the problem by queuing the work if an event is
active during unmasking, which is similar to the pre-endian fix
behaviour.

Fixes: 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion")
Signed-off-by: Alistair Popple <[email protected]>
Reported-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
9 years agoFix remove_and_add_spares removes drive added as spare in slot_store
Goldwyn Rodrigues [Fri, 18 Dec 2015 04:19:16 +0000 (15:19 +1100)]
Fix remove_and_add_spares removes drive added as spare in slot_store

Commit 2910ff17d154baa5eb50e362a91104e831eb2bb6
introduced a regression which would remove a recently added spare via
slot_store. Revert part of the patch which touches slot_store() and add
the disk directly using pers->hot_add_disk()

Fixes: 2910ff17d154 ("md: remove_and_add_spares() to activate specific
rdev")
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Signed-off-by: Pawel Baldysiak <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
9 years agomd: fix bug due to nested suspend
Mikulas Patocka [Fri, 18 Dec 2015 04:19:16 +0000 (15:19 +1100)]
md: fix bug due to nested suspend

The patch c7bfced9a6716ff66c9d61f934bb60af08d4688c committed to 4.4-rc
causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with
this BUG, the reason is that we attempt to suspend a device that is
already suspended. See also
https://bugzilla.redhat.com/show_bug.cgi?id=1283491

This patch fixes the bug by changing functions mddev_suspend and
mddev_resume to always nest.
The number of nested calls to mddev_nested_suspend is kept in the
variable mddev->suspended.
[neilb: made mddev_suspend() always nest instead of introduce mddev_nested_suspend]

kernel BUG at drivers/md/md.c:317!
CPU: 3 PID: 32754 Comm: lvm Not tainted 4.4.0-rc2 #1
task: 0000000047076040 ti: 0000000047014000 task.ti: 0000000047014000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000000000000001111 Not tainted
r00-03  000000000804000f 00000000102c5280 0000000010c7522c 000000007e3d1810
r04-07  0000000010c6f000 000000004ef37f20 000000007e3d1dd0 000000007e3d1810
r08-11  000000007c9f1600 0000000000000000 0000000000000001 ffffffffffffffff
r12-15  0000000010c1d000 0000000000000041 00000000f98d63c8 00000000f98e49e4
r16-19  00000000f98e49e4 00000000c138fd06 00000000f98d63c8 0000000000000001
r20-23  0000000000000002 000000004ef37f00 00000000000000b0 00000000000001d1
r24-27  00000000424783a0 000000007e3d1dd0 000000007e3d1810 00000000102b2000
r28-31  0000000000000001 0000000047014840 0000000047014930 0000000000000001
sr00-03  0000000007040800 0000000000000000 0000000000000000 0000000007040800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000102c538c 00000000102c5390
 IIR: 03ffe01f    ISR: 0000000000000000  IOR: 00000000102b2748
 CPU:        3   CR30: 0000000047014000 CR31: 0000000000000000
 ORIG_R28: 00000000000000b0
 IAOQ[0]: mddev_suspend+0x10c/0x160 [md_mod]
 IAOQ[1]: mddev_suspend+0x110/0x160 [md_mod]
 RP(r2): raid1_add_disk+0xd4/0x2c0 [raid1]
Backtrace:
 [<0000000010c7522c>] raid1_add_disk+0xd4/0x2c0 [raid1]
 [<0000000010c20078>] raid_resume+0x390/0x418 [dm_raid]
 [<00000000105833e8>] dm_table_resume_targets+0xc0/0x188 [dm_mod]
 [<000000001057f784>] dm_resume+0x144/0x1e0 [dm_mod]
 [<0000000010587dd4>] dev_suspend+0x1e4/0x568 [dm_mod]
 [<0000000010589278>] ctl_ioctl+0x1e8/0x428 [dm_mod]
 [<0000000010589518>] dm_compat_ctl_ioctl+0x18/0x68 [dm_mod]
 [<0000000040377b88>] compat_SyS_ioctl+0xd0/0x1558

Fixes: c7bfced9a671 ("md: suspend i/o during runtime blk_integrity_unregister")
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
9 years agoMD: change journal disk role to disk 0
Shaohua Li [Fri, 18 Dec 2015 04:19:16 +0000 (15:19 +1100)]
MD: change journal disk role to disk 0

Neil pointed out setting journal disk role to raid_disks will confuse
reshape if we support reshape eventually. Switching the role to 0 (we
should be fine as long as the value >=0) and skip sysfs file creation to
avoid error.

Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
9 years agomd/raid10: fix data corruption and crash during resync
Artur Paszkiewicz [Fri, 18 Dec 2015 04:19:16 +0000 (15:19 +1100)]
md/raid10: fix data corruption and crash during resync

The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
bio_copy_data()") replaced manual data copying with bio_copy_data() but
it doesn't work as intended. The source bio (fbio) is already processed,
so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt.  Because of
this, bio_copy_data() either does not copy anything, or worse, copies
data from the ->bi_next bio if it is set.  This causes wrong data to be
written to drives during resync and sometimes lockups/crashes in
bio_copy_data():

[  517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
[  517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
[  517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
[  517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
[  517.468529] RIP: 0010:[<ffffffff812e1888>]  [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
[  517.478164] RSP: 0018:ffff880150dfbc98  EFLAGS: 00000246
[  517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
[  517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
[  517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
[  517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
[  517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
[  517.525412] FS:  0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
[  517.534844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
[  517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  517.566144] Stack:
[  517.568626]  ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
[  517.577659]  0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
[  517.586715]  ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
[  517.595773] Call Trace:
[  517.598747]  [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
[  517.605610]  [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
[  517.611987]  [<ffffffff814ff206>] md_thread+0x106/0x140
[  517.618072]  [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
[  517.624252]  [<ffffffff814ff100>] ? super_1_load+0x520/0x520
[  517.630817]  [<ffffffff8109ef89>] kthread+0xc9/0xe0
[  517.636506]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
[  517.643653]  [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
[  517.649929]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70

Signed-off-by: Artur Paszkiewicz <[email protected]>
Reviewed-by: Shaohua Li <[email protected]>
Cc: [email protected] (v4.2+)
Fixes: c31df25f20e3 ("md/raid10: make sync_request_write() call bio_copy_data()")
Signed-off-by: NeilBrown <[email protected]>
9 years agoBtrfs: fix locking bugs when defragging leaves
Filipe Manana [Fri, 18 Dec 2015 01:57:29 +0000 (01:57 +0000)]
Btrfs: fix locking bugs when defragging leaves

When running fstests btrfs/070, with a higher number of fsstress
operations, I ran frequently into two different locking bugs when
defragging directories.

The first bug produced the following traces:

[133860.229792] ------------[ cut here ]------------
[133860.251062] WARNING: CPU: 2 PID: 26057 at fs/btrfs/locking.c:46 btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]()
[133860.253576] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
[133860.282566] CPU: 2 PID: 26057 Comm: btrfs Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[133860.284393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[133860.286827]  0000000000000000 ffff880207697b78 ffffffff812566f4 0000000000000000
[133860.288341]  ffff880207697bb0 ffffffff8104d0a6 ffffffffa052d4c1 ffff880178f60e00
[133860.294219]  ffff880178f60e00 0000000000000000 00000000000000f6 ffff880207697bc0
[133860.295831] Call Trace:
[133860.306518]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[133860.307473]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[133860.308619]  [<ffffffffa052d4c1>] ? btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
[133860.310068]  [<ffffffff8104d172>] warn_slowpath_null+0x1a/0x1c
[133860.312552]  [<ffffffffa052d4c1>] btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
[133860.314630]  [<ffffffffa04d5787>] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
[133860.323596]  [<ffffffffa04d99cb>] btrfs_realloc_node+0xb3/0x341 [btrfs]
[133860.325233]  [<ffffffffa050e396>] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
[133860.332427]  [<ffffffffa04fc2ce>] btrfs_defrag_root+0x63/0xca [btrfs]
[133860.337259]  [<ffffffffa052a34e>] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
[133860.340147]  [<ffffffffa052b00b>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[133860.344833]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[133860.346343]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.353248]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.354242]  [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[133860.355232]  [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[133860.356237]  [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[133860.358587]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[133860.360195]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[133860.361380]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[133860.363578]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[133860.366217] ---[ end trace 2cadb2f653437e49 ]---
[133860.367399] ------------[ cut here ]------------
[133860.368162] kernel BUG at fs/btrfs/locking.c:307!
[133860.369430] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[133860.370205] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
[133860.370205] CPU: 2 PID: 26057 Comm: btrfs Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[133860.370205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[133860.370205] task: ffff8800aec6db40 ti: ffff880207694000 task.ti: ffff880207694000
[133860.370205] RIP: 0010:[<ffffffffa052d466>]  [<ffffffffa052d466>] btrfs_assert_tree_locked+0x10/0x14 [btrfs]
[133860.370205] RSP: 0018:ffff880207697bc0  EFLAGS: 00010246
[133860.370205] RAX: 0000000000000000 RBX: ffff880178f60e00 RCX: 0000000000000000
[133860.370205] RDX: ffff88023ec4fb50 RSI: 00000000ffffffff RDI: ffff880178f60e00
[133860.370205] RBP: ffff880207697bc0 R08: 0000000000000001 R09: 0000000000000000
[133860.370205] R10: 0000160000000000 R11: ffffffff81651000 R12: ffff880178f60e00
[133860.370205] R13: 0000000000000000 R14: 00000000000000f6 R15: ffff8801ff409000
[133860.370205] FS:  00007f763efd48c0(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
[133860.370205] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[133860.370205] CR2: 0000000002158048 CR3: 000000003fd6c000 CR4: 00000000000006e0
[133860.370205] Stack:
[133860.370205]  ffff880207697bd8 ffffffffa052d4d0 0000000000000000 ffff880207697be8
[133860.370205]  ffffffffa04d5787 ffff880207697c80 ffffffffa04d99cb ffff8801ff409590
[133860.370205]  ffff880207697ca8 000000f507697c80 ffff880183c11bb8 0000000000000000
[133860.370205] Call Trace:
[133860.370205]  [<ffffffffa052d4d0>] btrfs_set_lock_blocking_rw+0x66/0xbd [btrfs]
[133860.370205]  [<ffffffffa04d5787>] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
[133860.370205]  [<ffffffffa04d99cb>] btrfs_realloc_node+0xb3/0x341 [btrfs]
[133860.370205]  [<ffffffffa050e396>] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
[133860.370205]  [<ffffffffa04fc2ce>] btrfs_defrag_root+0x63/0xca [btrfs]
[133860.370205]  [<ffffffffa052a34e>] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
[133860.370205]  [<ffffffffa052b00b>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[133860.370205]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[133860.370205]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.370205]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[133860.370205]  [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[133860.370205]  [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[133860.370205]  [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[133860.370205]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[133860.370205]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[133860.370205]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[133860.370205]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f

This bug happened because we assumed that by setting keep_locks to 1 in
our search path, our path after a call to btrfs_search_slot() would have
all nodes locked, which is not always true because unlock_up() (called by
btrfs_search_slot()) will unlock a node in a path if the slot of the node
below it doesn't point to the last item or beyond the last item. For
example, when the tree has a heigth of 2 and path->slots[0] has a value
smaller than btrfs_header_nritems(path->nodes[0]) - 1, the node at level 2
will be unlocked (also because lowest_unlock is set to 1 due to the fact
that the value passed as ins_len to btrfs_search_slot is 0).
This resulted in btrfs_find_next_key(), called before btrfs_realloc_node(),
to release out path and call again btrfs_search_slot(), but this time with
the cow parameter set to 0, meaning the resulting path got only read locks.
Therefore when we called btrfs_realloc_node(), with path->nodes[1] having
a read lock, it resulted in the warning and BUG_ON when calling
btrfs_set_lock_blocking() against the node, as that function expects the
node to have a write lock.

The second bug happened often when the first bug didn't happen, and made
us hang and hitting the following warning at fs/btrfs/locking.c:

   251  void btrfs_tree_lock(struct extent_buffer *eb)
   252  {
   253          WARN_ON(eb->lock_owner == current->pid);

This happened because the tree search we made at btrfs_defrag_leaves()
before calling btrfs_find_next_key() locked a leaf and all the other
nodes in the path, so btrfs_find_next_key() had no need to release the
path and make a new search (with path->lowest_level set to 1). This
made btrfs_realloc_node() attempt to write lock the same leaf again,
resulting in a hang/deadlock.

So fix these issues by calling btrfs_find_next_key() after calling
btrfs_realloc_node() and setting the search path's lowest_level to 1
to avoid the hang/deadlock when attempting to write lock the leaves
at btrfs_realloc_node().

Signed-off-by: Filipe Manana <[email protected]>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 17 Dec 2015 22:05:22 +0000 (14:05 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix uninitialized variable warnings in nfnetlink_queue, a lot of
    people reported this...  From Arnd Bergmann.

 2) Don't init mutex twice in i40e driver, from Jesse Brandeburg.

 3) Fix spurious EBUSY in rhashtable, from Herbert Xu.

 4) Missing DMA unmaps in mvpp2 driver, from Marcin Wojtas.

 5) Fix race with work structure access in pppoe driver causing
    corruptions, from Guillaume Nault.

 6) Fix OOPS due to sh_eth_rx() not checking whether netdev_alloc_skb()
    actually succeeded or not, from Sergei Shtylyov.

 7) Don't lose flags when settifn IFA_F_OPTIMISTIC in ipv6 code, from
    Bjørn Mork.

 8) VXLAN_HD_RCO defined incorrectly, fix from Jiri Benc.

 9) Fix clock source used for cookies in SCTP, from Marcelo Ricardo
    Leitner.

10) aurora driver needs HAS_DMA dependency, from Geert Uytterhoeven.

11) ndo_fill_metadata_dst op of vxlan has to handle ipv6 tunneling
    properly as well, from Jiri Benc.

12) Handle request sockets properly in xfrm layer, from Eric Dumazet.

13) Double stats update in ipv6 geneve transmit path, fix from Pravin B
    Shelar.

14) sk->sk_policy[] needs RCU protection, and as a result
    xfrm_policy_destroy() needs to free policies using an RCU grace
    period, from Eric Dumazet.

15) SCTP needs to clone ipv6 tx options in order to avoid use after
    free, from Eric Dumazet.

16) Missing kbuild export if ila.h, from Stephen Hemminger.

17) Missing mdiobus_alloc() return value checking in mdio-mux.c, from
    Tobias Klauser.

18) Validate protocol value range in ->create() methods, from Hannes
    Frederic Sowa.

19) Fix early socket demux races that result in illegal dst reuse, from
    Eric Dumazet.

20) Validate socket address length in pptp code, from WANG Cong.

21) skb_reorder_vlan_header() uses incorrect offset and can corrupt
    packets, from Vlad Yasevich.

22) Fix memory leaks in nl80211 registry code, from Ola Olsson.

23) Timeout loop count handing fixes in mISDN, xgbe, qlge, sfc, and
    qlcnic.  From Dan Carpenter.

24) msg.msg_iocb needs to be cleared in recvfrom() otherwise, for
    example, AF_ALG will interpret it as an async call.  From Tadeusz
    Struk.

25) inetpeer_set_addr_v4 forgets to initialize the 'vif' field, from
    Eric Dumazet.

26) rhashtable enforces the minimum table size not early enough,
    breaking how we calculate the per-cpu lock allocations.  From
    Herbert Xu.

27) Fix FCC port lockup in 82xx driver, from Martin Roth.

28) FOU sockets need to be freed using RCU, from Hannes Frederic Sowa.

29) Fix out-of-bounds access in __skb_complete_tx_timestamp() and
    sock_setsockopt() wrt.  timestamp handling.  From WANG Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (117 commits)
  net: check both type and procotol for tcp sockets
  drivers: net: xgene: fix Tx flow control
  tcp: restore fastopen with no data in SYN packet
  af_unix: Revert 'lock_interruptible' in stream receive code
  fou: clean up socket with kfree_rcu
  82xx: FCC: Fixing a bug causing to FCC port lock-up
  gianfar: Don't enable RX Filer if not supported
  net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
  rhashtable: Fix walker list corruption
  rhashtable: Enforce minimum size on initial hash table
  inet: tcp: fix inetpeer_set_addr_v4()
  ipv6: automatically enable stable privacy mode if stable_secret set
  net: fix uninitialized variable issue
  bluetooth: Validate socket address length in sco_sock_bind().
  net_sched: make qdisc_tree_decrease_qlen() work for non mq
  ser_gigaset: remove unnecessary kfree() calls from release method
  ser_gigaset: fix deallocation of platform device structure
  ser_gigaset: turn nonsense checks into WARN_ON
  ser_gigaset: fix up NULL checks
  qlcnic: fix a timeout loop
  ...

9 years agonet: check both type and procotol for tcp sockets
WANG Cong [Thu, 17 Dec 2015 07:39:04 +0000 (23:39 -0800)]
net: check both type and procotol for tcp sockets

Dmitry reported the following out-of-bound access:

Call Trace:
 [<ffffffff816cec2e>] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
 [<ffffffff84affb14>] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
 [<     inline     >] SYSC_setsockopt net/socket.c:1746
 [<ffffffff84aed7ee>] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
 [<ffffffff85c18c76>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

This is because we mistake a raw socket as a tcp socket.
We should check both sk->sk_type and sk->sk_protocol to ensure
it is a tcp socket.

Willem points out __skb_complete_tx_timestamp() needs to fix as well.

Reported-by: Dmitry Vyukov <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agodrivers: net: xgene: fix Tx flow control
Iyappan Subramanian [Thu, 17 Dec 2015 06:26:05 +0000 (22:26 -0800)]
drivers: net: xgene: fix Tx flow control

Currently the Tx flow control is based on reading the hardware state,
which is not accurate since it may not reflect the descriptors that
are not yet reached the memory.

To accurately control the Tx flow, changing it to be software based.

Signed-off-by: Iyappan Subramanian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agotcp: restore fastopen with no data in SYN packet
Eric Dumazet [Wed, 16 Dec 2015 21:53:10 +0000 (13:53 -0800)]
tcp: restore fastopen with no data in SYN packet

Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert
tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.

Some Fast Open users do not actually add any data in the SYN packet.

Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives")
Reported-by: Yuchung Cheng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Al Viro <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoaf_unix: Revert 'lock_interruptible' in stream receive code
Rainer Weikusat [Wed, 16 Dec 2015 20:09:25 +0000 (20:09 +0000)]
af_unix: Revert 'lock_interruptible' in stream receive code

With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(&u->readlock) to
mutex_lock_interruptible(&u->readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.

Signed-off-by: Rainer Weikusat <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoBtrfs: add free space tree mount option
Omar Sandoval [Wed, 30 Sep 2015 03:50:38 +0000 (20:50 -0700)]
Btrfs: add free space tree mount option

Now we can finally hook up everything so we can actually use free space
tree. The free space tree is enabled by passing the space_cache=v2 mount
option. On the first mount with the this option set, the free space tree
will be created and the FREE_SPACE_TREE read-only compat bit will be
set. Any time the filesystem is mounted from then on, we must use the
free space tree. The clear_cache option will also clear the free space
tree.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: wire up the free space tree to the extent tree
Omar Sandoval [Wed, 30 Sep 2015 03:50:37 +0000 (20:50 -0700)]
Btrfs: wire up the free space tree to the extent tree

The free space tree is updated in tandem with the extent tree. There are
only a handful of places where we need to hook in:

1. Block group creation
2. Block group deletion
3. Delayed refs (extent creation and deletion)
4. Block group caching

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: add free space tree sanity tests
Omar Sandoval [Wed, 30 Sep 2015 03:50:36 +0000 (20:50 -0700)]
Btrfs: add free space tree sanity tests

This tests the operations on the free space tree trying to excercise all
of the main cases for both formats. Between this and xfstests, the free
space tree should have pretty good coverage.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: implement the free space B-tree
Omar Sandoval [Wed, 30 Sep 2015 03:50:35 +0000 (20:50 -0700)]
Btrfs: implement the free space B-tree

The free space cache has turned out to be a scalability bottleneck on
large, busy filesystems. When the cache for a lot of block groups needs
to be written out, we can get extremely long commit times; if this
happens in the critical section, things are especially bad because we
block new transactions from happening.

The main problem with the free space cache is that it has to be written
out in its entirety and is managed in an ad hoc fashion. Using a B-tree
to store free space fixes this: updates can be done as needed and we get
all of the benefits of using a B-tree: checksumming, RAID handling,
well-understood behavior.

With the free space tree, we get commit times that are about the same as
the no cache case with load times slower than the free space cache case
but still much faster than the no cache case. Free space is represented
with extents until it becomes more space-efficient to use bitmaps,
giving us similar space overhead to the free space cache.

The operations on the free space tree are: adding and removing free
space, handling the creation and deletion of block groups, and loading
the free space for a block group. We can also create the free space tree
by walking the extent tree and clear the free space tree.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: introduce the free space B-tree on-disk format
Omar Sandoval [Wed, 30 Sep 2015 03:50:34 +0000 (20:50 -0700)]
Btrfs: introduce the free space B-tree on-disk format

The on-disk format for the free space tree is straightforward. Each
block group is represented in the free space tree by a free space info
item that stores accounting information: whether the free space for this
block group is stored as bitmaps or extents and how many extents of free
space exist for this block group (regardless of which format is being
used in the tree). Extents are (start, FREE_SPACE_EXTENT, length) keys
with no corresponding item, and bitmaps instead have the
FREE_SPACE_BITMAP type and have a bitmap item attached, which is just an
array of bytes.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: refactor caching_thread()
Omar Sandoval [Wed, 30 Sep 2015 03:50:33 +0000 (20:50 -0700)]
Btrfs: refactor caching_thread()

We're also going to load the free space tree from caching_thread(), so
we should refactor some of the common code.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: add helpers for read-only compat bits
Omar Sandoval [Wed, 30 Sep 2015 03:50:32 +0000 (20:50 -0700)]
Btrfs: add helpers for read-only compat bits

We're finally going to add one of these for the free space tree, so
let's add the same nice helpers that we have for the incompat bits.
While we're add it, also add helpers to clear the bits.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: add extent buffer bitmap sanity tests
Omar Sandoval [Wed, 30 Sep 2015 03:50:31 +0000 (20:50 -0700)]
Btrfs: add extent buffer bitmap sanity tests

Sanity test the extent buffer bitmap operations (test, set, and clear)
against the equivalent standard kernel operations.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoBtrfs: add extent buffer bitmap operations
Omar Sandoval [Wed, 30 Sep 2015 03:50:30 +0000 (20:50 -0700)]
Btrfs: add extent buffer bitmap operations

These are going to be used for the free space tree bitmap items.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
9 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 17 Dec 2015 19:55:29 +0000 (11:55 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Some i915 fixes, one omap fix, one core regression fix.

  Not even enough fixes for a twelve days of xmas song, which seemms
  good"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm: Don't overwrite UNVERFIED mode status to OK
  drm/omap: fix fbdev pix format to support all platforms
  drm/i915: Do a better job at disabling primary plane in the noatomic case.
  drm/i915/skl: Double RC6 WRL always on
  drm/i915/skl: Disable coarse power gating up until F0
  drm/i915: Remove incorrect warning in context cleanup

9 years agolocking/osq: Fix ordering of node initialisation in osq_lock
Will Deacon [Fri, 11 Dec 2015 17:46:41 +0000 (17:46 +0000)]
locking/osq: Fix ordering of node initialisation in osq_lock

The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"):

    mutex_optimistic_spin+0x9c/0x1d0
    __mutex_lock_slowpath+0x44/0x158
    mutex_lock+0x54/0x58
    kernfs_iop_permission+0x38/0x70
    __inode_permission+0x88/0xd8
    inode_permission+0x30/0x6c
    link_path_walk+0x68/0x4d4
    path_openat+0xb4/0x2bc
    do_filp_open+0x74/0xd0
    do_sys_open+0x14c/0x228
    SyS_openat+0x3c/0x48
    el0_svc_naked+0x24/0x28

This is because in osq_lock we initialise the node for the current CPU:

    node->locked = 0;
    node->next = NULL;
    node->cpu = curr;

and then publish the current CPU in the lock tail:

    old = atomic_xchg_acquire(&lock->tail, curr);

Once the update to lock->tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.

Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated.  This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.

Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney <[email protected]>
Reported-by: Andrew Pinski <[email protected]>
Tested-by: Andrew Pinski <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
9 years agoMerge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Thu, 17 Dec 2015 19:20:13 +0000 (11:20 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

 - Two bug fixes for misuse of PAGE_MASK in scatterlist and dma-debug.
   These are tagged for -stable.  The scatterlist impact is potentially
  corrupted dma addresses on HIGHMEM enabled platforms.

 - A minor locking fix for the NFIT hot-add implementation that is new
   in 4.4-rc.  This would only trigger in the case a hot-add raced
   driver removal.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dma-debug: Fix dma_debug_entry offset calculation
  Revert "scatterlist: use sg_phys()"
  nfit: acpi_nfit_notify(): Do not leave device locked

9 years agoMerge remote-tracking branch 'mkp-scsi/4.4/scsi-fixes' into fixes
James Bottomley [Thu, 17 Dec 2015 15:32:08 +0000 (07:32 -0800)]
Merge remote-tracking branch 'mkp-scsi/4.4/scsi-fixes' into fixes

9 years agogpio: revert get() to non-errorprogating behaviour
Linus Walleij [Thu, 17 Dec 2015 09:14:24 +0000 (10:14 +0100)]
gpio: revert get() to non-errorprogating behaviour

commit e20538b82f1f
("gpio: Propagate errors from chip->get()")
started to propagate errors from the .get() functions since
we can get errors from the infrastructure of e.g. slowbus
GPIO expanders.

However it turns out a bunch of drivers relied on the core
to clamp the value, so we need to revert to the old behaviour
and go over all drivers and fix them to conform to the
expectations of the core before we go back to propagating
the error code.

Cc: [email protected] # 4.3+
Cc: Bjorn Andersson <[email protected]>
Cc: Vladimir Zapolskiy <[email protected]>
Fixes: e20538b82f1f ("gpio: Propagate errors from chip->get()")
Reported-by: Michael Trimarchi <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
9 years agogpio: generic: clamp values from bgpio_get_set()
Linus Walleij [Thu, 10 Dec 2015 14:55:29 +0000 (15:55 +0100)]
gpio: generic: clamp values from bgpio_get_set()

The bgpio_get_set() call should return a value clamped to [0,1],
the current code will return a negative value if reading
bit 31, which turns the value negative as this is a signed value
and thus gets interpreted as an error by the gpiolib core.
Found on the gpio-mxc but applies to any MMIO driver.

Cc: [email protected] # 4.3+
Cc: [email protected]
Cc: Vladimir Zapolskiy <[email protected]>
Fixes: e20538b82f1f ("gpio: Propagate errors from chip->get()")
Reported-by: Clemens Gruber <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
9 years agoBtrfs: fix leaking of ordered extents after direct IO write error
Filipe Manana [Tue, 8 Dec 2015 19:23:20 +0000 (19:23 +0000)]
Btrfs: fix leaking of ordered extents after direct IO write error

When doing a direct IO write, __blockdev_direct_IO() can call the
btrfs_get_blocks_direct() callback one or more times before it calls the
btrfs_submit_direct() callback. However it can fail after calling the
first callback and before calling the second callback, which is a problem
because the first one creates ordered extents and the second one is the
one that submits bios that cover the ordered extents created by the first
one. That means the ordered extents will never complete nor have any of
the flags BTRFS_ORDERED_IO_DONE / BTRFS_ORDERED_IOERR set, resulting in
subsequent operations (such as other direct IO writes, buffered writes or
hole punching) that lock the same IO range and lookup for ordered extents
in the range to hang forever waiting for those ordered extents because
they can not complete ever, since no bio was submitted.

Fix this by tracking a range of created ordered extents that don't have
yet corresponding bios submitted and completing the ordered extents in
the range if __blockdev_direct_IO() fails with an error.

Signed-off-by: Filipe Manana <[email protected]>
9 years agoBtrfs: fix deadlock between direct IO write and defrag/readpages
Filipe Manana [Tue, 8 Dec 2015 16:23:16 +0000 (16:23 +0000)]
Btrfs: fix deadlock between direct IO write and defrag/readpages

If readpages() (triggered by defrag or buffered reads) is called while a
direct IO write is in progress, we have a small time window where we can
deadlock, resulting in traces like the following being generated:

[84723.212993] INFO: task fio:2849 blocked for more than 120 seconds.
[84723.214310]       Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[84723.215640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[84723.217313] fio        D ffff88023ec75218     0  2849   2835 0x00000000
[84723.218778]  ffff880122dfb6e8 0000000000000092 0000000000000000 ffff88023ec75200
[84723.220458]  ffff88000e05d2c0 ffff880122dfc000 ffff88023ec75200 7fffffffffffffff
[84723.230597]  0000000000000002 ffffffff8147891a ffff880122dfb700 ffffffff8147856a
[84723.232085] Call Trace:
[84723.232625]  [<ffffffff8147891a>] ? bit_wait+0x3c/0x3c
[84723.233529]  [<ffffffff8147856a>] schedule+0x7d/0x95
[84723.234398]  [<ffffffff8147baa3>] schedule_timeout+0x43/0x10b
[84723.235384]  [<ffffffff810f82eb>] ? time_hardirqs_on+0x15/0x28
[84723.236426]  [<ffffffff8108a23d>] ? trace_hardirqs_on+0xd/0xf
[84723.237502]  [<ffffffff810af8a3>] ? read_seqcount_begin.constprop.20+0x57/0x6d
[84723.238807]  [<ffffffff8108a09b>] ? trace_hardirqs_on_caller+0x16/0x1ab
[84723.242012]  [<ffffffff8108a23d>] ? trace_hardirqs_on+0xd/0xf
[84723.243064]  [<ffffffff810af2ad>] ? timekeeping_get_ns+0xe/0x33
[84723.244116]  [<ffffffff810afa2e>] ? ktime_get+0x41/0x52
[84723.245029]  [<ffffffff81477cff>] io_schedule_timeout+0xb7/0x12b
[84723.245942]  [<ffffffff81477cff>] ? io_schedule_timeout+0xb7/0x12b
[84723.246596]  [<ffffffff81478953>] bit_wait_io+0x39/0x45
[84723.247503]  [<ffffffff81478b93>] __wait_on_bit_lock+0x49/0x8d
[84723.248540]  [<ffffffff8111684f>] __lock_page+0x66/0x68
[84723.249558]  [<ffffffff81081c9b>] ? autoremove_wake_function+0x3a/0x3a
[84723.250844]  [<ffffffff81124a04>] lock_page+0x2c/0x2f
[84723.251871]  [<ffffffff81124afc>] invalidate_inode_pages2_range+0xf5/0x2aa
[84723.253274]  [<ffffffff81117c34>] ? filemap_fdatawait_range+0x12d/0x146
[84723.254757]  [<ffffffff81118191>] ? filemap_fdatawrite_range+0x13/0x15
[84723.256378]  [<ffffffffa05139a2>] btrfs_get_blocks_direct+0x1b0/0x664 [btrfs]
[84723.258556]  [<ffffffff8119e3f9>] ? submit_page_section+0x7b/0x111
[84723.260064]  [<ffffffff8119eb90>] do_blockdev_direct_IO+0x658/0xbdb
[84723.261479]  [<ffffffffa05137f2>] ? btrfs_page_exists_in_range+0x1a9/0x1a9 [btrfs]
[84723.262961]  [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.264449]  [<ffffffff8119f144>] __blockdev_direct_IO+0x31/0x33
[84723.265614]  [<ffffffff8119f144>] ? __blockdev_direct_IO+0x31/0x33
[84723.266769]  [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.268264]  [<ffffffffa050935d>] btrfs_direct_IO+0x1b9/0x259 [btrfs]
[84723.270954]  [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.272465]  [<ffffffff8111878c>] generic_file_direct_write+0xb3/0x128
[84723.273734]  [<ffffffffa051955c>] btrfs_file_write_iter+0x228/0x404 [btrfs]
[84723.275101]  [<ffffffff8116ca6f>] __vfs_write+0x7c/0xa5
[84723.276200]  [<ffffffff8116cfab>] vfs_write+0xa0/0xe4
[84723.277298]  [<ffffffff8116d79d>] SyS_write+0x50/0x7e
[84723.278327]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[84723.279595] INFO: lockdep is turned off.
[84723.379035] INFO: task btrfs:2923 blocked for more than 120 seconds.
[84723.380323]       Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[84723.381608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[84723.383003] btrfs           D ffff88023ed75218     0  2923   2859 0x00000000
[84723.384277]  ffff88001311f860 0000000000000082 ffff88001311f840 ffff88023ed75200
[84723.385748]  ffff88012c6751c0 ffff880013120000 ffff88012042fe68 ffff88012042fe30
[84723.387152]  ffff880221571c88 0000000000000001 ffff88001311f878 ffffffff8147856a
[84723.388620] Call Trace:
[84723.389105]  [<ffffffff8147856a>] schedule+0x7d/0x95
[84723.391882]  [<ffffffffa051da32>] btrfs_start_ordered_extent+0x161/0x1fa [btrfs]
[84723.393718]  [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31
[84723.395659]  [<ffffffffa0522c5b>] __do_contiguous_readpages.constprop.21+0x81/0xdc [btrfs]
[84723.397383]  [<ffffffffa050ac96>] ? btrfs_submit_direct+0x3f0/0x3f0 [btrfs]
[84723.398852]  [<ffffffffa0522da3>] __extent_readpages.constprop.20+0xed/0x100 [btrfs]
[84723.400561]  [<ffffffff81123f6c>] ? __lru_cache_add+0x5d/0x72
[84723.401787]  [<ffffffffa0523896>] extent_readpages+0x111/0x1a7 [btrfs]
[84723.403121]  [<ffffffffa050ac96>] ? btrfs_submit_direct+0x3f0/0x3f0 [btrfs]
[84723.404583]  [<ffffffffa05088fa>] btrfs_readpages+0x1f/0x21 [btrfs]
[84723.406007]  [<ffffffff811226df>] __do_page_cache_readahead+0x168/0x1f4
[84723.407502]  [<ffffffff81122988>] ondemand_readahead+0x21d/0x22e
[84723.408937]  [<ffffffff81122988>] ? ondemand_readahead+0x21d/0x22e
[84723.410487]  [<ffffffff81122af1>] page_cache_sync_readahead+0x3d/0x3f
[84723.411710]  [<ffffffffa0535388>] btrfs_defrag_file+0x419/0xaaf [btrfs]
[84723.413007]  [<ffffffffa0531db0>] ? kzalloc+0xf/0x11 [btrfs]
[84723.414085]  [<ffffffffa0535b43>] btrfs_ioctl_defrag+0x125/0x14e [btrfs]
[84723.415307]  [<ffffffffa0536753>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[84723.416532]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[84723.417731]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[84723.418699]  [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[84723.421532]  [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[84723.422629]  [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[84723.423712]  [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[84723.424801]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[84723.425968]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[84723.427063]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[84723.428138]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f

Consider the following logical and physical file layout:

logical:    ... [ prealloc extent A ] [ prealloc extent B ] [ extent C ] ...
                4K                    8K                    16K

physical:   ... 12853248              12857344              1103101952   ...
                                      (= 12853248 + 4K)

Extents A and B are physically adjacent. The following diagram shows a
sequence of events that lead to the deadlock when we attempt to do a
direct IO write against the file range [4K, 16K[ and a defrag is triggered
simultaneously.

           CPU 1                                               CPU 2

 btrfs_direct_IO()

   btrfs_get_blocks_direct()
     creates ordered extent A, covering
     the 4k prealloc extent A (range [4K, 8K[)

                                                    btrfs_defrag_file()
                                                      page_cache_sync_readahead([0K, 1M[)
                                                        btrfs_readpages()
                                                          extent_readpages()

                                                            locks all pages in the file
                                                            range [0K, 128K[ through calls
                                                            to add_to_page_cache_lru()

                                                            __do_contiguous_readpages()

                                                               finds ordered extent A

                                                               waits for it to complete

   btrfs_get_blocks_direct() called again

     lock_extent_direct(range [8K, 16K[)

       finds a page in range [8K, 16K[ through
       btrfs_page_exists_in_range()

       invalidate_inode_pages2_range([8K, 16K[)

         --> tries to lock pages that are already
             locked by the task at CPU 2

         --> our task, running __blockdev_direct_IO(),
             hangs waiting to lock the pages and the
             submit bio callback, btrfs_submit_direct(),
             ends up never being called, resulting in the
             ordered extent A never completing (because a
             corresponding bio is never submitted) and
             CPU 2 will wait for it forever while holding
             the pages locked
              ---> deadlock!

Fix this by removing the page invalidation approach when attempting to
lock the range for IO from the callback btrfs_get_blocks_direct() and
falling back buffered IO. This was a rare case anyway and well behaved
applications do not mix concurrent direct IO writes with buffered reads
anyway, being a concurrent defrag the only normal case that could lead
to the deadlock.

Signed-off-by: Filipe Manana <[email protected]>
9 years agoBtrfs: fix error path when failing to submit bio for direct IO write
Filipe Manana [Tue, 24 Nov 2015 16:23:54 +0000 (16:23 +0000)]
Btrfs: fix error path when failing to submit bio for direct IO write

Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
bio for direct IO") fixed problems with the error handling code after we
fail to submit a bio for direct IO. However there were 2 problems that it
did not address when the failure is due to memory allocation failures for
direct IO writes:

1) We considered that there could be only one ordered extent for the whole
   IO range, which is not always true, as we can have multiple;

2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
   which can make other tasks running btrfs_wait_logged_extents() hang
   forever, since they wait for that bit to be set. The general assumption
   is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
   and it precedes setting the bit BTRFS_ORDERED_COMPLETE.

Fix these issues by moving part of the btrfs_endio_direct_write() handler
into a new helper function and having that new helper function called when
we fail to allocate memory to submit the bio (and its private object) for
a direct IO write.

Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
9 years agoBtrfs: fix memory leaks after transaction is aborted
Filipe Manana [Fri, 27 Nov 2015 16:12:00 +0000 (16:12 +0000)]
Btrfs: fix memory leaks after transaction is aborted

When a transaction is aborted, or its commit fails before writing the new
superblock and calling btrfs_finish_extent_commit(), we leak reference
counts on the block groups attached to the transaction's delete_bgs list,
because btrfs_finish_extent_commit() is never called for those two cases.
Fix this by dropping their references at btrfs_put_transaction(), which
is called when transactions are aborted (by making the transaction kthread
commit the transaction) or if their commits fail.

Signed-off-by: Filipe Manana <[email protected]>
9 years agoBtrfs: fix race when finishing dev replace leading to transaction abort
Filipe Manana [Fri, 20 Nov 2015 10:42:47 +0000 (10:42 +0000)]
Btrfs: fix race when finishing dev replace leading to transaction abort

During the final phase of a device replace operation, I ran into a
transaction abort that resulted in the following trace:

[23919.655368] WARNING: CPU: 10 PID: 30175 at fs/btrfs/extent-tree.c:9843 btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]()
[23919.664742] BTRFS: Transaction aborted (error -2)
[23919.665749] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 parport psmouse acpi_cpufreq processor i2c_core evdev microcode pcspkr button serio_raw ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic ata_piix virtio_pci floppy virtio_ring libata e1000 virtio scsi_mod [last unloaded: btrfs]
[23919.679442] CPU: 10 PID: 30175 Comm: fsstress Not tainted 4.3.0-rc5-btrfs-next-17+ #1
[23919.682392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[23919.689151]  0000000000000000 ffff8804020cbb50 ffffffff812566f4 ffff8804020cbb98
[23919.692604]  ffff8804020cbb88 ffffffff8104d0a6 ffffffffa03eea69 ffff88041b678a48
[23919.694230]  ffff88042ac38000 ffff88041b678930 00000000fffffffe ffff8804020cbbf0
[23919.696716] Call Trace:
[23919.698669]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[23919.700597]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[23919.701958]  [<ffffffffa03eea69>] ? btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]
[23919.703612]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[23919.705047]  [<ffffffffa03eea69>] btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]
[23919.706967]  [<ffffffffa0402097>] __btrfs_end_transaction+0x84/0x2dd [btrfs]
[23919.708611]  [<ffffffffa0402300>] btrfs_end_transaction+0x10/0x12 [btrfs]
[23919.710099]  [<ffffffffa03ef0b8>] btrfs_alloc_data_chunk_ondemand+0x121/0x28b [btrfs]
[23919.711970]  [<ffffffffa0413025>] btrfs_fallocate+0x7d3/0xc6d [btrfs]
[23919.713602]  [<ffffffff8108b78f>] ? lock_acquire+0x10d/0x194
[23919.714756]  [<ffffffff81086dbc>] ? percpu_down_read+0x51/0x78
[23919.716155]  [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0
[23919.718918]  [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0
[23919.724170]  [<ffffffff8116b579>] vfs_fallocate+0x170/0x1ff
[23919.725482]  [<ffffffff8117c1d7>] ioctl_preallocate+0x89/0x9b
[23919.726790]  [<ffffffff8117c5ef>] do_vfs_ioctl+0x406/0x4e6
[23919.728428]  [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[23919.729642]  [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[23919.730782]  [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[23919.731847]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[23919.733330] ---[ end trace 166ef301a335832a ]---

This is due to a race between device replace and chunk allocation, which
the following diagram illustrates:

         CPU 1                                    CPU 2

 btrfs_dev_replace_finishing()

   at this point
    dev_replace->tgtdev->devid ==
    BTRFS_DEV_REPLACE_DEVID (0ULL)

   ...

   btrfs_start_transaction()
   btrfs_commit_transaction()

                                               btrfs_fallocate()
                                                 btrfs_alloc_data_chunk_ondemand()
                                                   btrfs_join_transaction()
                                                     --> starts a new transaction
                                                   do_chunk_alloc()
                                                     lock fs_info->chunk_mutex
                                                       btrfs_alloc_chunk()
                                                         --> creates extent map for
                                                             the new chunk with
                                                             em->bdev->map->stripes[i]->dev->devid
                                                             == X (X > 0)
                                                         --> extent map is added to
                                                             fs_info->mapping_tree
                                                         --> initial phase of bg A
                                                             allocation completes
                                                     unlock fs_info->chunk_mutex

   lock fs_info->chunk_mutex

   btrfs_dev_replace_update_device_in_mapping_tree()
     --> iterates fs_info->mapping_tree and
         replaces the device in every extent
         map's map->stripes[] with
         dev_replace->tgtdev, which still has
         an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)

                                                   btrfs_end_transaction()
                                                     btrfs_create_pending_block_groups()
                                                       --> starts final phase of
                                                           bg A creation (update device,
                                                           extent, and chunk trees, etc)
                                                       btrfs_finish_chunk_alloc()

                                                         btrfs_update_device()
                                                           --> attempts to update a device
                                                               item with ID == 0ULL
                                                               (BTRFS_DEV_REPLACE_DEVID)
                                                               which is the current ID of
                                                               bg A's
                                                               em->bdev->map->stripes[i]->dev->devid
                                                           --> doesn't find such item
                                                               returns -ENOENT
                                                           --> the device id should have been X
                                                               and not 0ULL

                                                       got -ENOENT from
                                                       btrfs_finish_chunk_alloc()
                                                       and aborts current transaction

   finishes setting up the target device,
   namely it sets tgtdev->devid to the value
   of srcdev->devid, which is X (and X > 0)

   frees the srcdev

   unlock fs_info->chunk_mutex

So fix this by taking the device list mutex when processing the chunk's
extent map stripes to update the device items. This avoids getting the
wrong device id and use-after-free problems if the task finishing a
chunk allocation grabs the replaced device, which is freed while the
dev replace task is holding the device list mutex.

This happened while running fstest btrfs/071.

Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
9 years agopowerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
Stewart Smith [Fri, 11 Dec 2015 01:08:23 +0000 (12:08 +1100)]
powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type

When running on newer OPAL firmware that supports sending extra
OPAL_MSG types, we would print a warning on *every* message received.

This could be a problem for kernels that don't support OPAL_MSG_OCC
on machines that are running real close to thermal limits and the
OCC is throttling the chip. For a kernel that is paying attention to
the message queue, we could get these notifications quite often.

Conceivably, future message types could also come fairly often,
and printing that we didn't understand them 10,000 times provides
no further information than printing them once.

Cc: [email protected]
Signed-off-by: Stewart Smith <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
9 years agoARC: smp: Rename platform hook @init_cpu_smp -> @init_per_cpu
Vineet Gupta [Thu, 17 Dec 2015 06:52:21 +0000 (12:22 +0530)]
ARC: smp: Rename platform hook @init_cpu_smp -> @init_per_cpu

Makes it similar to smp_ops which also has callback with same name

Signed-off-by: Vineet Gupta <[email protected]>
9 years agoARC: rename smp operation init_irq_cpu() to init_per_cpu()
Noam Camus [Wed, 16 Dec 2015 01:10:27 +0000 (03:10 +0200)]
ARC: rename smp operation init_irq_cpu() to init_per_cpu()

This will better reflect its description i.e. "any needed setup..."
and not just do an "IPI request".

Signed-off-by: Noam Camus <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
9 years agoARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing
Vineet Gupta [Wed, 16 Dec 2015 11:47:00 +0000 (17:17 +0530)]
ARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing

ARC dwarf unwinder only supports CIE version == 1
The boot time dwarf sanitizer (part of binary lookup table constructor)
would simply bail if it saw CIE version == 3, rendering unwinder with a
NULL lookup table.

It seems libgcc linked with kernel does have such entries.

With fallback linear search removed, and a NULL binary lookup table,
unwinder fails to generate any stack trace.

So allow graceful ignoring of unsupported CIE entries.

This problem was initially seen in Alexey's setup (and not mine) as he
was using buildroot built toolchain (libgcc) which doesn't get built with
CFLAGS_FOR_TARGET="-gdwarf-2 which is my default

Fixes STAR 9000985048: "kernel unwinder broken with stock tools"

Fixes: 2e22502c080f ARC: dw2 unwind: Remove falllback linear search thru FDE entries
Reported-by Alexey Brodkin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
9 years agoARC: dw2 unwind: Reinstante unwinding out of modules
Vineet Gupta [Fri, 11 Dec 2015 12:34:22 +0000 (18:04 +0530)]
ARC: dw2 unwind: Reinstante unwinding out of modules

The fix which removed linear searching of dwarf (because binary lookup
data always exists) missed out on the fact that modules don't get the
binary lookup tables info. This caused unwinding out of modules to stop
working.

So add binary lookup header setup (equivalent of eh_frame_hdr setup) to
modules as well.

While at it, confine the header setup to within unwinder code,
reducing one API exposed out of unwinder code.

Fixes: 2e22502c080f ARC: dw2 unwind: Remove falllback linear search thru FDE entries
Cc: <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
9 years agoARC: [plat-sim] unbork non default CONFIG_LINUX_LINK_BASE
Vineet Gupta [Tue, 15 Dec 2015 08:27:16 +0000 (13:57 +0530)]
ARC: [plat-sim] unbork non default CONFIG_LINUX_LINK_BASE

HIGHMEM support bumped the default memory size for nsim platform to 1G.
Thus total memory ended at the very edge of start of peripherals address
space. With linux link base shifted, memory started bleeding into
peripheral space which caused early boot bad_page spew !

Fixes: 29e332261d2 ("ARC: mm: HIGHMEM: populate high memory from DT")
Reported-by: Anton Kolesov <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
9 years agofou: clean up socket with kfree_rcu
Hannes Frederic Sowa [Tue, 15 Dec 2015 20:01:53 +0000 (21:01 +0100)]
fou: clean up socket with kfree_rcu

fou->udp_offloads is managed by RCU. As it is actually included inside
the fou sockets, we cannot let the memory go out of scope before a grace
period. We either can synchronize_rcu or switch over to kfree_rcu to
manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
and geneve.

Fixes: 23461551c00628c ("fou: Support for foo-over-udp RX path")
Cc: Tom Herbert <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoMerge tag 'mac80211-for-davem-2015-12-15' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Wed, 16 Dec 2015 23:33:38 +0000 (18:33 -0500)]
Merge tag 'mac80211-for-davem-2015-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Another set of fixes:
 * memory leak fixes (from Ola)
 * operating mode notification spec compliance fix (from Eyal)
 * copy rfkill names in case pointer becomes invalid (myself)
 * two hardware restart fixes (myself)
 * get rid of "limiting TX power" log spam (myself)
====================

Signed-off-by: David S. Miller <[email protected]>
9 years ago82xx: FCC: Fixing a bug causing to FCC port lock-up
Martin Roth [Tue, 15 Dec 2015 02:17:53 +0000 (04:17 +0200)]
82xx: FCC: Fixing a bug causing to FCC port lock-up

The patch fixes FCC port lock-up, which occurs as a result of a bug
during underrun/collision handling. Within the tx_startup() function
in mac-fcc.c, the address of last BD is not calculated correctly.
As a result of wrong calculation of the last BD address, the next
transmitted BD may be set to an area out of the transmit BD ring.
This actually causes to port lock-up and it is not recoverable.

Signed-off-by: Martin Roth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agogianfar: Don't enable RX Filer if not supported
Hamish Martin [Tue, 15 Dec 2015 01:14:50 +0000 (14:14 +1300)]
gianfar: Don't enable RX Filer if not supported

After commit 15bf176db1fb ("gianfar: Don't enable the Filer w/o the
Parser"), 'TSEC' model controllers (for example as seen on MPC8541E)
always have 8 bytes stripped from the front of received frames.
Only 'eTSEC' gianfar controllers have the RX Filer capability (amongst
other enhancements). Previously this was treated as always enabled
for both 'TSEC' and 'eTSEC' controllers.
In commit 15bf176db1fb ("gianfar: Don't enable the Filer w/o the Parser")
a subtle change was made to the setting of 'uses_rxfcb' to effectively
always set it (since 'rx_filer_enable' was always true). This had the
side-effect of always stripping 8 bytes from the front of received frames
on 'TSEC' type controllers.

We now only enable the RX Filer capability on controller types that
support it, thereby avoiding the issue for 'TSEC' type controllers.

Reviewed-by: Chris Packham <[email protected]>
Reviewed-by: Mark Tomlinson <[email protected]>
Signed-off-by: Hamish Martin <[email protected]>
Reviewed-by: Claudiu Manoil <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agodma-debug: Fix dma_debug_entry offset calculation
Daniel Mentz [Wed, 16 Dec 2015 01:38:48 +0000 (17:38 -0800)]
dma-debug: Fix dma_debug_entry offset calculation

dma-debug uses struct dma_debug_entry to keep track of dma coherent
memory allocation requests. The virtual address is converted into a pfn
and an offset. Previously, the offset was calculated using an incorrect
bit mask.  As a result, we saw incorrect error messages from dma-debug
like the following:

"DMA-API: exceeded 7 overlapping mappings of cacheline 0x03e00000"

Cacheline 0x03e00000 does not exist on our platform.

Cc: <[email protected]>
Fixes: 0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
Signed-off-by: Daniel Mentz <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
9 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Wed, 16 Dec 2015 18:57:24 +0000 (10:57 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Further ARM fixes:
   - Anson Huang noticed that we were corrupting a register we shouldn't
     be during suspend on some CPUs.
   - Shengjiu Wang spotted a bug in the 'swp' instruction emulation.
   - Will Deacon fixed a bug in the ASID allocator.
   - Laura Abbott fixed the kernel permission protection to apply to all
     threads running in the system.
   - I've fixed two bugs with the domain access control register
     handling, one to do with printing an appropriate value at oops
     time, and the other to further fix the uaccess_with_memcpy code"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8475/1: SWP emulation: Restore original *data when failed
  ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted
  ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN
  ARM: report proper DACR value in oops dumps
  ARM: 8464/1: Update all mm structures with section adjustments
  ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers

9 years agonet: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
Hannes Frederic Sowa [Mon, 14 Dec 2015 22:30:43 +0000 (23:30 +0100)]
net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration

Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.

Fixes: 79462ad02e86180 ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agorhashtable: Fix walker list corruption
Herbert Xu [Wed, 16 Dec 2015 08:45:54 +0000 (16:45 +0800)]
rhashtable: Fix walker list corruption

The commit ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable:
Fix sleeping inside RCU critical section in walk_stop") introduced
a new spinlock for the walker list.  However, it did not convert
all existing users of the list over to the new spin lock.  Some
continued to use the old mutext for this purpose.  This obviously
led to corruption of the list.

The fix is to use the spin lock everywhere where we touch the list.

This also allows us to do rcu_rad_lock before we take the lock in
rhashtable_walk_start.  With the old mutex this would've deadlocked
but it's safe with the new spin lock.

Fixes: ba7c95ea3870 ("rhashtable: Fix sleeping inside RCU...")
Reported-by: Colin Ian King <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agorhashtable: Enforce minimum size on initial hash table
Herbert Xu [Wed, 16 Dec 2015 10:13:14 +0000 (18:13 +0800)]
rhashtable: Enforce minimum size on initial hash table

William Hua <[email protected]> wrote:
>
> I wasn't aware there was an enforced minimum size. I simply set the
> nelem_hint in the rhastable_params struct to 1, expecting it to grow as
> needed. This caused a segfault afterwards when trying to insert an
> element.

OK we're doing the size computation before we enforce the limit
on min_size.

---8<---
We need to do the initial hash table size computation after we
have obtained the correct min_size/max_size parameters.  Otherwise
we may end up with a hash table whose size is outside the allowed
envelope.

Fixes: a998f712f77e ("rhashtable: Round up/down min/max_size to...")
Reported-by: William Hua <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoMerge remote-tracking branches 'spi/fix/dspi' and 'spi/fix/spidev' into spi-linus
Mark Brown [Wed, 16 Dec 2015 13:28:32 +0000 (13:28 +0000)]
Merge remote-tracking branches 'spi/fix/dspi' and 'spi/fix/spidev' into spi-linus

9 years agoMerge remote-tracking branch 'spi/fix/core' into spi-linus
Mark Brown [Wed, 16 Dec 2015 13:28:31 +0000 (13:28 +0000)]
Merge remote-tracking branch 'spi/fix/core' into spi-linus

9 years agospi: fix parent-device reference leak
Johan Hovold [Mon, 14 Dec 2015 15:16:19 +0000 (16:16 +0100)]
spi: fix parent-device reference leak

Fix parent-device reference leak due to SPI-core taking an unnecessary
reference to the parent when allocating the master structure, a
reference that was never released.

Note that driver core takes its own reference to the parent when the
master device is registered.

Fixes: 49dce689ad4e ("spi doesn't need class_device")
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected]
9 years agospi: spidev: Hold spi_lock over all defererences of spi in release()
Mark Brown [Mon, 16 Nov 2015 13:57:37 +0000 (13:57 +0000)]
spi: spidev: Hold spi_lock over all defererences of spi in release()

We use the spi_lock spinlock to protect against races between the device
being removed and file operations on the spidev.  This means that in the
removal path all references to the device need to be done under lock as
in removal we dropping references to the device.

Reported-by: Vegard Nossum <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
9 years agoPartial revert of "powerpc: Individual System V IPC system calls"
Michael Ellerman [Wed, 16 Dec 2015 10:26:28 +0000 (21:26 +1100)]
Partial revert of "powerpc: Individual System V IPC system calls"

This partially reverts commit a34236155afb1cc41945e58388ac988431bcb0b8.

While reviewing the glibc patch to exploit the individual IPC calls,
Arnd & Andreas noticed that we were still requiring userspace to pass
IPC_64 in order to get the new style IPC API.

With a bit of cleanup in the kernel we can drop that requirement, and
instead only provide the new style API, which will simplify things for
userspace.

Rather than try and sneak that patch into 4.4, instead we will drop the
individual IPC calls for powerpc, and merge them again in 4.5 once the
cleanup patch has gone in.

Because we've already added sys_mlock2() as syscall #378, we don't do a
full revert of the IPC calls. Instead we drop the __NR #defines, and
send those now undefined syscall numbers to sys_ni_syscall(). This
leaves a gap in the syscall numbers, but we'll reuse them when we merge
the individual IPC calls.

Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
9 years agoinet: tcp: fix inetpeer_set_addr_v4()
Eric Dumazet [Wed, 16 Dec 2015 04:56:44 +0000 (20:56 -0800)]
inet: tcp: fix inetpeer_set_addr_v4()

David Ahern added a vif field in the a4 part of inetpeer_addr struct.

This broke IPv4 TCP fast open client side and more generally tcp metrics
cache, because inetpeer_addr_cmp() is now comparing two u32 instead of
one.

inetpeer_set_addr_v4() needs to properly init vif field, otherwise
the comparison result depends on uninitialized data.

Fixes: 192132b9a034 ("net: Add support for VRFs to inetpeer cache")
Reported-by: Yuchung Cheng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoipv6: automatically enable stable privacy mode if stable_secret set
Hannes Frederic Sowa [Tue, 15 Dec 2015 21:59:12 +0000 (22:59 +0100)]
ipv6: automatically enable stable privacy mode if stable_secret set

Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.

Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: Bjørn Mork <[email protected]>
Cc: Bjørn Mork <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoRevert "scatterlist: use sg_phys()"
Dan Williams [Tue, 15 Dec 2015 20:54:06 +0000 (12:54 -0800)]
Revert "scatterlist: use sg_phys()"

commit db0fa0cb0157 "scatterlist: use sg_phys()" did replacements of
the form:

    phys_addr_t phys = page_to_phys(sg_page(s));
    phys_addr_t phys = sg_phys(s) & PAGE_MASK;

However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long).  Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.

Cc: <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Russell King <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: db0fa0cb0157 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <[email protected]>
Reported-by: Vitaly Lavrov <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
9 years agonet: fix uninitialized variable issue
[email protected] [Tue, 15 Dec 2015 18:46:17 +0000 (10:46 -0800)]
net: fix uninitialized variable issue

msg_iocb needs to be initialized on the recv/recvfrom path.
Otherwise afalg will wrongly interpret it as an async call.

Cc: [email protected]
Reported-by: Harald Freudenberger <[email protected]>
Signed-off-by: Tadeusz Struk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agobluetooth: Validate socket address length in sco_sock_bind().
David S. Miller [Tue, 15 Dec 2015 20:39:08 +0000 (15:39 -0500)]
bluetooth: Validate socket address length in sco_sock_bind().

Signed-off-by: David S. Miller <[email protected]>
9 years agoInput: elan_i2c - set input device's vendor and product IDs
Charlie Mooney [Tue, 15 Dec 2015 19:32:10 +0000 (11:32 -0800)]
Input: elan_i2c - set input device's vendor and product IDs

Previously the "vendor" and "product" IDs for the elan_i2c driver simply
reported 0000.  This patch modifies the elan_i2c driver to include the
Elan vendor ID and the touchpad's product id under
input/input*/{vendor,product}.

Specifically, this is to allow us to apply a generic Elan gestures config
that will apply to all Elan touchpads on ChromeOS.  These configs  match to
input devices in various ways, but one major way is by matching on vendor
ID.  Adding this patch allows the default Elan touchpad config to be
applied to Elan touchpads in this kernel by matching on devices that have
vendor ID 04f3.

Note that product ID is also available via custom sysfs entry "product_id"
as well.

Signed-off-by: Charlie Mooney <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
9 years agoMerge tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Tue, 15 Dec 2015 18:56:39 +0000 (10:56 -0800)]
Merge tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "This has fixes spread thru driver, notably among them:

   - edma fixes for recent edma DT changes which went into 4.4
   - odd fixes for at_hdmac
   - minor fixes on bc dma and mic dma"

* tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: at_xdmac: fix at_xdmac_prep_dma_memcpy()
  dmaengine: edma: DT: Change reserved slot array from 16bit to 32bit type
  dmaengine: edma: DT: Change memcpy channel array from 16bit to 32bit type
  dmaengine: mic_x100: add missing spin_unlock
  dmaengine: bcm2835-dma: Convert to use DMA pool
  dmaengine: at_xdmac: fix bad behavior in interleaved mode
  dmaengine: at_xdmac: fix false condition for memset_sg transfers
  dmaengine: at_xdmac: fix macro typo

9 years agoMerge tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba...
Linus Torvalds [Tue, 15 Dec 2015 18:50:13 +0000 (10:50 -0800)]
Merge tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull two fbdev fixes from Tomi Valkeinen:
 - OMAP: fix analog tv-out when using omapdrm
 - fsl: Fix kernel crash when diu_ops is not implemented

* tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  OMAPDSS: fix timings for VENC to match what omapdrm expects
  video: fbdev: fsl: Fix kernel crash when diu_ops is not implemented

9 years agoMerge tag 'please-pull-mlock2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl...
Linus Torvalds [Tue, 15 Dec 2015 18:45:29 +0000 (10:45 -0800)]
Merge tag 'please-pull-mlock2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull ia64 fix from Tony Luck:
 "Wire up mlock2() syscall for ia64"

* tag 'please-pull-mlock2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Enable mlock2 syscall for ia64

9 years agonet_sched: make qdisc_tree_decrease_qlen() work for non mq
Eric Dumazet [Tue, 15 Dec 2015 17:43:12 +0000 (09:43 -0800)]
net_sched: make qdisc_tree_decrease_qlen() work for non mq

Stas Nichiporovich reported a regression in his HFSC qdisc setup
on a non multi queue device.

It turns out I mistakenly added a TCQ_F_NOPARENT flag on all qdisc
allocated in qdisc_create() for non multi queue devices, which was
rather buggy. I was clearly mislead by the TCQ_F_ONETXQUEUE that is
also set here for no good reason, since it only matters for the root
qdisc.

Fixes: 4eaf3b84f288 ("net_sched: fix qdisc_tree_decrease_qlen() races")
Reported-by: Stas Nichiporovich <[email protected]>
Tested-by: Stas Nichiporovich <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoMerge branch 'ser_gigaset-platform-device-dealloc'
David S. Miller [Tue, 15 Dec 2015 18:24:22 +0000 (13:24 -0500)]
Merge branch 'ser_gigaset-platform-device-dealloc'

Paul Bolle says:

====================
ser_gigaset: fix deallocation of platform device structure

Sascha Levin reported that the syzkaller fuzzer triggered a WARNING in
ser_gigaset (see https://lkml.kernel.org/g/56587467.8050102@oracle.com ). It
turned out that ser_gigaset has always deallocated its platform device
structure incorrectly. Tilman submitted the patch that fixes that (3/4) and a
related cleanup (4/4).

Tilman also submitted a minor cleanup of some NULL checks (1/4) that prompted
Alan to turn those checks into WARN_ONs (2/4). If no one hits these WARN_ONs in
the next couple of releases these WARN_ONs should be removed.
====================

Signed-off-by: David S. Miller <[email protected]>
9 years agoser_gigaset: remove unnecessary kfree() calls from release method
Tilman Schmidt [Tue, 15 Dec 2015 17:11:31 +0000 (18:11 +0100)]
ser_gigaset: remove unnecessary kfree() calls from release method

device->platform_data and platform_device->resource are never used
and remain NULL through their entire life. Drops the kfree() calls
for them from the device release method.

Signed-off-by: Tilman Schmidt <[email protected]>
Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoser_gigaset: fix deallocation of platform device structure
Tilman Schmidt [Tue, 15 Dec 2015 17:11:30 +0000 (18:11 +0100)]
ser_gigaset: fix deallocation of platform device structure

When shutting down the device, the struct ser_cardstate must not be
kfree()d immediately after the call to platform_device_unregister()
since the embedded struct platform_device is still in use.
Move the kfree() call to the release method instead.

Signed-off-by: Tilman Schmidt <[email protected]>
Fixes: 2869b23e4b95 ("drivers/isdn/gigaset: new M101 driver (v2)")
Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoser_gigaset: turn nonsense checks into WARN_ON
Alan Cox [Tue, 15 Dec 2015 17:11:29 +0000 (18:11 +0100)]
ser_gigaset: turn nonsense checks into WARN_ON

These checks do nothing useful to protect the code from races. On the
other hand if the old code has been masking a real bug we would like to
know about it.

The check for tiocmset is kept because it is valid for a tty driver to
have a NULL tiocmset method. That in itself is probably a mistake given
modern coding practices - but needs fixing in the tty layer.

Signed-off-by: Alan Cox <[email protected]>
Acked-by: Tilman Schmidt <[email protected]>
Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoser_gigaset: fix up NULL checks
Tilman Schmidt [Tue, 15 Dec 2015 17:11:28 +0000 (18:11 +0100)]
ser_gigaset: fix up NULL checks

Commit f34d7a5b7010 ("tty: The big operations rework") changed
tty->driver to tty->ops but left NULL checks for tty->driver untouched.
Fix.

Signed-off-by: Tilman Schmidt <[email protected]>
[pebolle: removed Fixes tag]
Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Tue, 15 Dec 2015 18:21:04 +0000 (10:21 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a boundary condition in the blkcipher SG walking code that
  can lead to a crash when used with the new chacha20 algorithm"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Copy iv from desc even for 0-len walks

9 years agoFix user-visible spelling error
Linus Torvalds [Tue, 15 Dec 2015 18:15:57 +0000 (10:15 -0800)]
Fix user-visible spelling error

Pavel Machek reports a warning about W+X pages found in the "Persisent"
kmap area.  After grepping for it (using the correct spelling), and not
finding it, I noticed how the debug printk was just misspelled.  Fix it.

The actual mapping bug that Pavel reported is still open.  It's
apparently a separate issue from the known EFI page tables, looks like
it's related to the HIGHMEM mappings.

Signed-off-by: Linus Torvalds <[email protected]>
9 years agoqlcnic: fix a timeout loop
Dan Carpenter [Tue, 15 Dec 2015 13:56:16 +0000 (16:56 +0300)]
qlcnic: fix a timeout loop

The problem here is that at the end of the loop we test for if
idc->vnic_wait_limit is zero, but since idc->vnic_wait_limit-- is a
post-op, it actually ends up set to (u8)-1.  I have fixed this by
moving the decrement inside the loop.

Fixes: 486a5bc77a4a ('qlcnic: Add support for 83xx suspend and resume.')
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agosfc: fix a timeout loop
Dan Carpenter [Tue, 15 Dec 2015 11:06:08 +0000 (14:06 +0300)]
sfc: fix a timeout loop

We test for if "tries" is zero at the end but "tries--" is a post-op so
it will end with "tries" set to -1.  I have changed it to a pre-op
instead.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoqlge: fix a timeout loop in ql_change_rx_buffers()
Dan Carpenter [Tue, 15 Dec 2015 10:52:36 +0000 (13:52 +0300)]
qlge: fix a timeout loop in ql_change_rx_buffers()

The problem here is that after the loop we test for "if (!i) " but
because "i--" is a post-op we exit with i set to -1.  I have fixed this
by changing it to a pre-op instead.  I had to change the starting value
from 3 to 4 so that we still iterate 3 times.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 years agoamd-xgbe: fix a couple timeout loops
Dan Carpenter [Tue, 15 Dec 2015 10:12:29 +0000 (13:12 +0300)]
amd-xgbe: fix a couple timeout loops

At the end of the loop we test "if (!count)" but because "count--" is
a post-op then the loop will end with count set to -1.  I have fixed
this by changing it to --count.

Fixes: c5aa9e3b8156 ('amd-xgbe: Initial AMD 10GbE platform driver')
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This page took 0.160273 seconds and 4 git commands to generate.