Git Repo - linux.git/log

Drivers: hv: hv_fcopy: fix a race condition for SMP guest

We should schedule the 5s "timer work" before starting the data transfer,
otherwise, the data transfer code may finish so fast on another
virtual cpu that when the code(fcopy_write()) trying to cancel the 5s
"timer work" can occasionally fail because the "timer work" may haven't
been scheduled yet and as a result the fcopy process will be aborted
wrongly by fcopy_work_func() in 5s.

Thank Liz Zhang <[email protected]> for the initial investigation
on the bug.

This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1118123

Tested-by: Liz Zhang <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: [email protected]
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge branches 'pm-sleep' and 'pm-cpufreq'

* pm-sleep:
  PM / sleep: fix freeze_ops NULL pointer dereferences
  PM / sleep: Fix request_firmware() error at resume

* pm-cpufreq:
  cpufreq: make table sentinel macros unsigned to match use
  cpufreq: move policy kobj to policy->cpu at resume
  cpufreq: cpu0: OPPs can be populated at runtime
  cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD
  cpufreq: imx6q: Select PM_OPP
  cpufreq: sa1110: set memory type for h3600

cpufreq: make table sentinel macros unsigned to match use

Commit 5eeaf1f18973 (cpufreq: Fix build error on some platforms that
use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public
header.  Warnings are now generated when objects including that header
are built with -Wsign-compare (as an out-of-tree module might be):

.../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’:
.../include/linux/cpufreq.h:519:27: warning: comparison between signed
and unsigned integer expressions [-Wsign-compare]
  while ((*pos)->frequency != CPUFREQ_TABLE_END)
                           ^
.../include/linux/cpufreq.h:520:25: warning: comparison between signed
and unsigned integer expressions [-Wsign-compare]
   if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
                         ^

Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but
are used with unsigned member 'frequency' of cpufreq_frequency_table.
Update the macro definitions to be explicitly unsigned to match their
use.

This also corrects potentially wrong behavior of clk_rate_table_iter()
if unsigned long is wider than usigned int.

Fixes: 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*)
Signed-off-by: Brian W Hart <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

usb: Check if port status is equal to RxDetect

When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller
[1022:7814], the second hotplugging will experience the USB 3.0 pen
drive is recognized as high-speed device. After bisecting the kernel,
I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd
(USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing
some experiments, the bug can be fixed by avoiding executing the function
hub_usb3_port_disable(). Because the port status with [AMD] FCH USB
XHCI Controlleris [1022:7814] is already in RxDetect
(I tried printing out the port status before setting to Disabled state),
it's reasonable to check the port status before really executing
hub_usb3_port_disable().

Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.)
Signed-off-by: Gavin Guo <[email protected]>
Acked-by: Alan Stern <[email protected]>
Cc: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge branch 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

A few more fixes for 3.16.  The pageflipping fixes I dropped last week
have finally shaped up so this is mostly fixes for fallout from the
pageflipping code changes.  Also fix a memory leak and a black screen
when restoring the backlight on console unblanking.

* 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: Make classic pageflip completion path less racy.
  drm/radeon: Add missing vblank_put in pageflip ioctl error path.
  drm/radeon: Remove redundant fence unref in pageflip path.
  drm/radeon: Complete page flip even if waiting on the BO fence fails
  drm/radeon: Move pinning the BO back to radeon_crtc_page_flip()
  drm/radeon: Prevent too early kms-pageflips triggered by vblank.
  drm/radeon: set default bl level to something reasonable
  drm/radeon: avoid leaking edid data

usb: chipidea: udc: Disable auto ZLP generation on ep0

There are 2 methods for ZLP (zero-length packet) generation:
1) In software
2) Automatic generation by device controller

1) is implemented in UDC driver and it attaches ZLP to IN packet if
   descriptor->size < wLength
2) can be enabled/disabled by setting ZLT bit in the QH

When gadget ffs is connected to ubuntu host, the host sends
get descriptor request and wLength in setup packet is 255 while the
size of descriptor which will be sent by gadget in IN packet is
64 byte. So the composite driver sets req->zero = 1.
In UDC driver following code will be executed then

        if (hwreq->req.zero && hwreq->req.length
            && (hwreq->req.length % hwep->ep.maxpacket == 0))
                add_td_to_list(hwep, hwreq, 0);

Case-A:
So in case of ubuntu host, UDC driver will attach a ZLP to the IN packet.
ubuntu host will request 255 byte in IN request, gadget will send 64 byte
with ZLP and host will come to know that there is no more data.
But hold on, by default ZLT=0 for endpoint 0 so hardware also tries to
automatically generate the ZLP which blocks enumeration for ~6 seconds due
to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING)

Case-B:
In case when gadget ffs is connected to Apple device, Apple device sends
setup packet with wLength=64. So descriptor->size = 64 and wLength=64
therefore req->zero = 0 and UDC driver will not attach any ZLP to the
IN packet. Apple device requests 64 bytes, gets 64 bytes and doesn't
further request for IN data. But ZLT=0 by default for endpoint 0 so
hardware tries to automatically generate the ZLP which blocks enumeration
for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any
requests (OUT/PING)

According to USB2.0 specs:

    8.5.3.2 Variable-length Data Stage
    A control pipe may have a variable-length data phase in which the
    host requests more data than is contained in the specified data
    structure. When all of the data structure is returned to the host,
    the function should indicate that the Data stage is ended by
    returning a packet that is shorter than the MaxPacketSize for the
    pipe. If the data structure is an exact multiple of wMaxPacketSize
    for the pipe, the function will return a zero-length packet to indicate
    the end of the Data stage.

In Case-A mentioned above:
If we disable software ZLP generation & ZLT=0 for endpoint 0 OR if software
ZLP generation is not disabled but we set ZLT=1 for endpoint 0 then
enumeration doesn't block for 6 seconds.

In Case-B mentioned above:
If we disable software ZLP generation & ZLT=0 for endpoint then enumeration
still blocks due to ZLP automatically generated by hardware and host not needing
it. But if we keep software ZLP generation enabled but we set ZLT=1 for
endpoint 0 then enumeration doesn't block for 6 seconds.

So the proper solution for this issue seems to disable automatic ZLP generation
by hardware (i.e by setting ZLT=1 for endpoint 0) and let software (UDC driver)
handle the ZLP generation based on req->zero field.

Cc: [email protected]
Signed-off-by: Abbas Raza <[email protected]>
Signed-off-by: Peter Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: huawei_cdc_ncm: add "subclass 3" devices

Huawei's usage of the subclass and protocol fields is not 100%
clear to us, but there appears to be a very strict system.

A device with the "shared" device ID 12d1:1506 and this NCM
function was recently reported (showing only default altsetting):

    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      3
      bInterfaceProtocol     22
      iInterface              8 CDC Network Control Model (NCM)
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  06 24 1a 00 01 1f
      ** UNRECOGNIZED:  0c 24 1b 00 01 00 04 10 14 dc 05 20
      ** UNRECOGNIZED:  0d 24 0f 0a 0f 00 00 00 ea 05 03 00 01
      ** UNRECOGNIZED:  05 24 06 01 01
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0010  1x 16 bytes
        bInterval               9

Cc: Enrico Mioso <[email protected]>
Signed-off-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qmi_wwan: add two Sierra Wireless/Netgear devices

Add two device IDs found in an out-of-tree driver downloadable
from Netgear.

Signed-off-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

wan/x25_asy: integer overflow in x25_asy_change_mtu()

If "newmtu * 2 + 4" is too large then it can cause an integer overflow
leading to memory corruption. Eric Dumazet suggests that 65534 is a
reasonable upper limit.

Btw, "newmtu" is not allowed to be a negative number because of the
check in dev_set_mtu(), so that's ok.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branches 'acpi-scan' and 'acpi-video'

* acpi-scan:
  ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt

* acpi-video:
  ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s
  Revert "ACPI / video: change acpi-video brightness_switch_enabled default to 0"

Merge tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
"Two fixes found during migration of PV guests.  David would be the one
  doing this pull but he is on vacation.

  Fixes:
   - fix console deadlock when resuming PV guests
   - fix regression hit when ballooning and resuming PV guests"

* tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: set ballooned out pages as invalid in p2m
  xen/manage: fix potential deadlock when resuming the console

Merge tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"A few more fixes for ftrace infrastructure.

  I was cleaning out my INBOX and found two fixes from zhangwei from a
  year ago that were lost in my mail.  These fix an inconsistency
  between trace_puts() and the way trace_printk() works.  The reason
  this is important to fix is because when trace_printk() doesn't have
  any arguments, it turns into a trace_puts().  Not being able to enable
  a stack trace against trace_printk() because it does not have any
  arguments is quite confusing.  Also, the fix is rather trivial and low
  risk.

  While porting some changes to PowerPC I discovered that it still has
  the function graph tracer filter bug that if you also enable stack
  tracing the function graph tracer filter is ignored.  I fixed that up.

  Finally, Martin Lau, fixed a bug that would cause readers of the
  ftrace ring buffer to block forever even though it was suppose to be
  NONBLOCK"

This also includes the fix from an earlier pull request:

"Oleg Nesterov fixed a memory leak that happens if a user creates a
  tracing instance, sets up a filter in an event, and then removes that
  instance.  The filter allocates memory that is never freed when the
  instance is destroyed"

* tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Fix polling on trace_pipe
  tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: instance_rmdir() leaks ftrace_event_file->filter

drm/radeon: Make classic pageflip completion path less racy.

Need to protect mmio flip programming by event lock as well.

Need to also first enable pflip irq, then mmio program,
otherwise a flip completion may get unnoticed in the vblank
of actual completion if the flip is programmed, but
radeon_flip_work_func gets preempted immediately after
mmio programming and before vblank. In that case the
vblank irq handler wouldn't run radeon_crtc_handle_vblank()
with the completion check routine, miss the completed flip,
and only notice one vblank after actual completion, causing
a false/delayed report of flip completion.

Signed-off-by: Mario Kleiner <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: Add missing vblank_put in pageflip ioctl error path.

Signed-off-by: Mario Kleiner <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: Remove redundant fence unref in pageflip path.

Not needed anymore, as it is already unreffed within
radeon_flip_work_func() after its only use.

Signed-off-by: Mario Kleiner <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: Complete page flip even if waiting on the BO fence fails

Otherwise the DRM core and userspace will be confused about which BO the
CRTC is scanning out.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: Move pinning the BO back to radeon_crtc_page_flip()

As well as enabling the vblank interrupt. These shouldn't take any
significant amount of time, but at least pinning the BO has actually been
seen to fail in practice before, in which case we need to let userspace
know about it.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: Prevent too early kms-pageflips triggered by vblank.

Since 3.16-rc1 we have this new failure:

When the userspace XOrg ddx schedules vblank events to
trigger deferred kms-pageflips, e.g., via the OML_sync_control
extension call glXSwapBuffersMscOML(), or if a glXSwapBuffers()
is called immediately after completion of a previous swapbuffers
call, e.g., in a tight rendering loop with minimal rendering,
it happens frequently that the pageflip ioctl() is executed
within the same vblank in which a previous kms-pageflip completed,
or - for deferred swaps - always one vblank earlier than requested
by the client app.

This causes premature pageflips and detection of failure by
the ddx, e.g., XOrg log warnings like...

"(WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
completion event has impossible msc 201025 < target_msc 201026"

... and error/invalid return values of glXWaitForSbcOML() and
Intel_swap_events extension.

Reason is the new way in which kms-pageflips are programmed
since 3.16.

This commit changes the time window in which the hw can
execute pending programmed pageflips. Before, a pending flip
would get executed anywhere within the vblank interval. Now
a pending flip only gets executed at the leading edge of
vblank (start of front porch), making sure that a invocation
of the pageflip ioctl() within a given vblank interval will
only lead to pageflip completion in the following vblank.

Tested to death on a DCE-4 card.

Signed-off-by: Mario Kleiner <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon: set default bl level to something reasonable

If the value in the scratch register is 0, set it to the
max level. This fixes an issue where the console fb blanking
code calls back into the backlight driver on unblank and then
sets the backlight level to 0 after the driver has already
set the mode and enabled the backlight.

bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=81382
https://bugs.freedesktop.org/show_bug.cgi?id=70207

Signed-off-by: Alex Deucher <[email protected]>
Tested-by: David Heidelberger <[email protected]>
Cc: [email protected]

drm/radeon: avoid leaking edid data

In some cases we fetch the edid in the detect() callback
in order to determine what sort of monitor is connected.
If that happens, don't fetch the edid again in the get_modes()
callback or we will leak the edid.

Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

irqchip: gic: Add binding probe for ARM GIC400

Commit 3ab72f9156bb "dt-bindings: add GIC-400 binding" added the
"arm,gic-400" compatible string, but the corresponding IRQCHIP_DECLARE
was never added to the gic driver.

Therefore add the missing irqchip declaration for it.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Removed additional empty line and adapted commit message to mark it
as fixing an issue.
Signed-off-by: Heiko Stuebner <[email protected]>
Acked-by: Will Deacon <[email protected]>
Fixes: 3ab72f9156bb ("dt-bindings: add GIC-400 binding")
Cc: <[email protected]> # v3.14+
Link: https://lkml.kernel.org/r/2621565.f5eISveXXJ@diego
Signed-off-by: Jason Cooper <[email protected]>

drivers/ata/pata_ep93xx.c: use signed int type for result of platform_get_irq()

[linux-3.16-rc5/drivers/ata/pata_ep93xx.c:929]: (style) Checking if unsigned
variable 'irq' is less than zero.

Source code is

    irq = platform_get_irq(pdev, 0);
    if (irq < 0) {

but

    unsigned int irq;

$ fgrep platform_get_irq `find . -name \*.h -print`
./include/linux/platform_device.h:extern int platform_get_irq(struct
platform_device *, unsigned int);

Now using "int" type instead of "unsigned int" for "irq" variable.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80401
Reported-by: David Binderman <[email protected]>
Signed-off-by: Andrey Utkin <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

cpufreq: move policy kobj to policy->cpu at resume

This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.

Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.

Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().

But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.

This bug got introduced with commit 42f921a, which overlooked this scenario.

To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.

Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <[email protected]> # 3.13+
Reported-and-tested-by: Bu Yitian <[email protected]>
Reported-by: Saravana Kannan <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

net: ppp: fix creating PPP pass and active filters

Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use
sk_unattached_filter api") inadvertently changed the logic when setting
PPP pass and active filters. This applies to both the generic PPP subsystem
implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem
implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl()
(or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to
remove a pass/active filter previously set by using a filter of length zero.
However, with the new code this is not possible anymore as this case is not
explicitly checked for, which leads to passing NULL as a filter to
sk_unattached_filter_create(). This results in returning EINVAL to the caller.

Additionally, the variables ppp->pass_filter and ppp->active_filter (or
is->pass_filter and is->active_filter, resp.) are not reset to NULL, although
the filters they point to may have been destroyed by
sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are
left behind (provided the pointers were previously non-NULL).

This patch corrects both problems by checking whether the filter passed is
empty or non-empty, and prevents sk_unattached_filter_create() from being
called in the first case. Moreover, the pointers are always reset to NULL
as soon as sk_unattached_filter_destroy() returns.

Signed-off-by: Christoph Schulz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's

Fix a regression introduced by commit 35f6f45 ("net/mlx4_en: Don't use
irq_affinity_notifier to track changes in IRQ affinity map").
When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc
was NULL. This caused a kernel crash under heavy traffic - when having more
than rx NAPI budget completions.
Fixed to have it set for both EQ modes.

Signed-off-by: Amir Vadai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branches 'cxgb4' and 'mlx5' into for-next

IB/mlx5: Enable "block multicast loopback" for kernel consumers

In commit f360d88a2efd, we advertise blocking multicast loopback to both
kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB)
to use it with their UD QPs. Fix that.

Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support")
Reported-by: Haggai Eran <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

sunvnet: clean up objects created in vnet_new() on vnet_exit()

Nothing cleans up the objects created by
vnet_new(), they are completely leaked.

vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Dave Kleikamp <[email protected]>
Reviewed-by: Karl Volz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40

The ethernet port on my ASUS A88X Pro mainboard stopped working
several times a day, with messages like these in dmesg:

AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x001e address=0x0000000000003000 flags=0x0050]

Searching the web for these messages led me to similar reports about
different hardware supported by r8169, and eventually to commits
3ced8c955e74d319f3e3997f7169c79d524dfd06 ('r8169: enforce RX_MULTI_EN
for the 8168f.') and eb2dc35d99028b698cdedba4f5522bc43e576bd2 ('r8169:
RxConfig hack for the 8168evl'). So I tried this change, and it fixes
the problem for me.

Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

hwmon: (adt7470) Fix writes to temperature limit registers

Temperature limit registers are signed. Limits therefore need
to be clamped to (-128, 127) degrees C and not to (0, 255)
degrees C.

Without this fix, writing a limit of 128 degrees C sets the
actual limit to -128 degrees C.

Signed-off-by: Guenter Roeck <[email protected]>
Cc: [email protected]
Reviewed-by: Axel Lin <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes

The following patchset contains nf_tables fixes, they are:

1) Fix wrong transaction handling when the table flags are not
   modified.

2) Fix missing rcu read_lock section in the netlink dump path, which
   is not protected by the nfnl_lock.

3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate
   interferences with updates.

4) Fix 64 bits chain counters when they are retrieved from a 32 bits
   arch, from Eric Dumazet.
====================

Signed-off-by: David S. Miller <[email protected]>

drm/qxl: return IRQ_NONE if it was not our irq

Return IRQ_NONE if it was not our irq. This is necessary for the case
when qxl is sharing irq line with a device A in a crash kernel. If qxl
is initialized before A and A's irq was raised during this gap,
returning IRQ_HANDLED in this case will cause this irq to be raised
again after EOI since kernel think it was handled but in fact it was
not.

Cc: Gerd Hoffmann <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

net-gre-gro: Fix a bug that breaks the forwarding path

Fixed a bug that was introduced by my GRE-GRO patch
(bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE
support to the GRO stack) that breaks the forwarding path
because various GSO related fields were not set. The bug will
cause on the egress path either the GSO code to fail, or a
GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following
fix has been tested for both cases.

Signed-off-by: H.K. Jerry Chu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:

- Fix ELM suspend/resume

- Reduce warnings if NAND ECC is too weak

- Add CFI support for Sharp LH28F640BF NOR

   The last fix is coming in because other commits in the 3.16 cycle
   depended on this support.

* tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd:
  mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR
  mtd: nand: reduce the warning noise when the ECC is too weak
  mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"A cpufreq lockup fix and a compiler warning fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix compiler warnings
x86, tsc: Fix cpufreq lockup

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Tooling fixes and an Intel PMU driver fixlet"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do not allow optimized switch for non-cloned events
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  perf symbols: Get kernel start address by symbol name
  perf tools: Fix segfault in cumulative.callchain report

Merge tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Things seem to calm down so far, just a small few HD-audio fixes
  (regression fixes and a new codec ID addition) popping up"

* tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix broken PM due to incomplete i915 initialization
  ALSA: hda - Revert stream assignment order for Intel controllers
  ALSA: hda - Add new GPU codec ID 0x10de0070 to snd-hda
  ALSA: hda: Fix build warning

cpufreq: cpu0: OPPs can be populated at runtime

OPPs can be populated statically, via DT, or added at run time with
dev_pm_opp_add().

While this driver handles the first case correctly, it would fail to populate
OPPs added at runtime. Because call to of_init_opp_table() would fail as there
are no OPPs in DT and probe will return early.

To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table()
unconditionally.

Update bindings as well.

Suggested-by: Stephen Boyd <[email protected]>
Tested-by: Stephen Boyd <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD

Commit ff1f0018cf66080d8e6f59791e552615648a033a ("drivers: Enable
building of Kirkwood drivers for mach-mvebu") added Kirkwood into
mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files.

The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with
MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD ||
MACH_KIRKWOOD.

As a consequence of this change, the cpufreq driver is no longer enabled
for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for
ARCH_KIRKWOOD.

Signed-off-by: Quentin Armitage <[email protected]>
Acked-by: Andrew Lunn <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'liblockdep-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux into locking/urgent

Pull liblockdep fixes from Sasha Levin.

Signed-off-by: Ingo Molnar <[email protected]>

locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER

Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
encapsulate the dependencies for rwsem optimistic spinning.
No logical changes here as it continues to depend on both
SMP and the XADD algorithm variant.

Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Jason Low <[email protected]>
[ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Chris Mason <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

locking/mutex: Disable optimistic spinning on some architectures

The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.

Signed-off-by: Peter Zijlstra <[email protected]>
Reported-by: Mikulas Patocka <[email protected]>
Cc: David Miller <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Jason Low <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: John David Anglin <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/rwsem: Reduce the size of struct rw_semaphore

Recent optimistic spinning additions to rwsem provide significant performance
benefits on many workloads on large machines. The cost of it was increasing
the size of the rwsem structure by up to 128 bits.

However, now that the previous patches in this series bring the overhead of
struct optimistic_spin_queue to 32 bits, this patch reorders some fields in
struct rw_semaphore such that we can reduce the overhead of the rwsem structure
by 64 bits (on 64 bit systems).

The extra overhead required for rwsem optimistic spinning would now be up
to 8 additional bytes instead of up to 16 bytes. Additionally, the size of
rwsem would now be more in line with mutexes.

Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/rwsem: Rename 'activity' to 'count'

There are two definitions of struct rw_semaphore, one in linux/rwsem.h
and one in linux/rwsem-spinlock.h.

For some reason they have different names for the initial field. This
makes it impossible to use C99 named initialization for
__RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
along with the structure definitions.

The simpler patch is renaming the rwsem-spinlock variant to match the
regular rwsem.

This allows us to switch to C99 named initialization.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

cpufreq: imx6q: Select PM_OPP

PM_OPP is a library used by several of the existing cpufreq drivers.
ARM IMX6Q cpufreq driver uses this library for its functionality.
Thus, it should be selected in Kconfig.

Reported-by: Ezequiel Garcia <[email protected]>
Signed-off-by: Nicolas Del Piano <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: sa1110: set memory type for h3600

The Compaq iPAQ h3600 also has the K4S281632b-1H memory type.
Verified by prying apart a broken board.

Signed-off-by: Linus Walleij <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s

As reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1025690
This is yet another model which needs this quirk.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1025690
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

sched: Fix possible divide by zero in avg_atom() calculation

proc_sched_show_task() does:

if (nr_switches)
do_div(avg_atom, nr_switches);

nr_switches is unsigned long and do_div truncates it to 32 bits, which
means it can test non-zero on e.g. x86-64 and be truncated to zero for
division.

Fix the problem by using div64_ul() instead.

As a side effect calculations of avg_atom for big nr_switches are now correct.

Signed-off-by: Mateusz Guzik <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/spinlocks/mcs: Micro-optimize osq_unlock()

In the unlock function of the cancellable MCS spinlock, the first
thing we do is to retrive the current CPU's osq node. However, due to
the changes made in the previous patch, in the common case where the
lock is not contended, we wouldn't need to access the current CPU's
osq node anymore.

This patch optimizes this by only retriving this CPU's osq node
after we attempt the initial cmpxchg to unlock the osq and found
that its contended.

Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/spinlocks/mcs: Introduce and use init macro and function for osq locks

Currently, we initialize the osq lock by directly setting the lock's values. It
would be preferable if we use an init macro to do the initialization like we do
with other locks.

This patch introduces and uses a macro and function for initializing the osq lock.

Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead

The cancellable MCS spinlock is currently used to queue threads that are
doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
the lock would access and queue the local node corresponding to the CPU that
it's running on. Currently, the cancellable MCS lock is implemented by using
pointers to these nodes.

In this patch, instead of operating on pointers to the per-cpu nodes, we
store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
A similar concept is used with the qspinlock.

By operating on the CPU # of the nodes using atomic_t instead of pointers
to those nodes, this can reduce the overhead of the cancellable MCS spinlock
by 32 bits (on 64 bit systems).

Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Josef Bacik <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()

Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
named "optimistic_spin_queue". However, in a follow up patch in the series
we will be introducing a new structure that serves as the new "handle" for
the lock. It would make more sense if that structure is named
"optimistic_spin_queue". Additionally, since the current use of the
"optimistic_spin_queue" structure are "nodes", it might be better if we
rename them to "node" anyway.

This preparatory patch renames all current "optimistic_spin_queue"
to "optimistic_spin_node".

Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Josef Bacik <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/rwsem: Allow conservative optimistic spinning when readers have lock

Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning")
introduced a major performance regression for workloads such as
xfs_repair which mix read and write locking of the mmap_sem across
many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
than on 3.15 and using 20x more system CPU time.

Perf profiles indicate in some workloads that significant time can
be spent spinning on !owner. This is because we don't set the lock
owner when readers(s) obtain the rwsem.

In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
return false if there is no lock owner. The rationale is that if we
just entered the slowpath, yet there is no lock owner, then there is
a possibility that a reader has the lock. To be conservative, we'll
avoid spinning in these situations.

This patch reduced the total run time of the xfs_repair workload from
about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
back to close to the same performance as on 3.15.

Retesting of AIM7, which were some of the workloads used to test the
original optimistic spinning code, confirmed that we still get big
performance gains with optimistic spinning, even with this additional
regression fix. Davidlohr found that while the 'custom' workload took
a performance hit of ~-14% to throughput for >300 users with this
additional patch, the overall gain with optimistic spinning is
still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.

Tested-by: Dave Chinner <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
Signed-off-by: Ingo Molnar <[email protected]>

x86: Remove unused variable "polling"

Compile tested. "polling" is unused since commit f80c5b39b80a
("sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG").

Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Jiri Kosina <[email protected]>
Link: http://lkml.kernel.org/r/1404138749.2978.6.camel@x41
Signed-off-by: Ingo Molnar <[email protected]>

s390: fix restore of invalid floating-point-control

The fixup of the inline assembly to restore the floating-point-control
register needs to check for instruction address *after* the lfcp
instruction as the specification and data exceptions are suppresssing.

Signed-off-by: Martin Schwidefsky <[email protected]>

s390/zcrypt: improve device probing for zcrypt adapter cards

Improve device probing process for zcrypt adapters to
transmit service request during registration process.

Signed-off-by: Ingo Tuchscherer <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

s390/ptrace: fix PSW mask check

The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
The PSW_MASK_USER define contains the PSW_MASK_ASC bits, the ptrace
interface accepts all combinations for the address-space-control
bits. To protect the kernel space the PSW mask check in ptrace needs
to reject the address-space-control bit combination for home space.

Fixes CVE-2014-3534

Cc: [email protected]
Signed-off-by: Martin Schwidefsky <[email protected]>

s390/MSI: Use standard mask and unmask funtions

MSI irqchip in s390 has its own mask and unmask MSI irq
functions, zpci_enable_irq() and zpci_disable_irq().
They mask and unmask MSI irq in standard ways, no arch
special. MSI driver provides two global standard functions
mask_msi_irq() and unmask_msi_irq(). Local zpci_enable_irq()
and zpci_disable_irq() are almost the same as the standard
two. the difference is local mask/unmask functions
read the mask status before mask and unmask everytime.
Then change the value and rewrite to hardware. In standard
functions, save the mask status after mask and unmask msi
irq, and use the cached status to change the mask status.
When we mask or unmask a MSI irq, we always cache its
mask status except we know need not to cache it, like in
pci_msi_shutdown. So use the standard functions to replace
the local is safe.

Signed-off-by: Yijing Wang <[email protected]>
[sebott: fixed inverted function pointers]
Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

s390/3270: correct size detection with the read-partition command

The size detection for 3270 terminals with the read-partition command is
broken. The raw3270_reset_device_cb function clears the init_data array,
but if raw3270_writesf_readpart has been called the read-partition command
is queued which needs the init_data array. In this case the size detection
will fail and the invalid command does funny things to the terminal.

Signed-off-by: Martin Schwidefsky <[email protected]>

s390: require mvcos facility, not tod clock steering facility

Inlined uaccess functions require the mvcos facility (bit 27), not the tod
clock steering facility (bit 28) for z10 and newer machines.

Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

UBI: fastmap: do not miss bit-flips

The return value from 'ubi_io_read_ec_hdr()' was stored in 'err', not in 'ret'.
This fix makes sure Fastmap-enabled UBI does not miss bit-flip while reading EC
headers, events and scrubs the affected PEBs.

This issue was reported by Coverity Scan.

Artem: improved the commit message.

Signed-off-by: Brian Norris <[email protected]>
Acked-by: Richard Weinberger <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota fix from Jan Kara:
"Fix locking of dquot shrinker"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: missing lock in dqcache_shrink_scan()

Merge tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
"Fix up some merge confusion from the merge window"

* tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mcp23s08: Eliminates redundant checking.

ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack

commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack")
added second ip_vs_conn_drop_conntrack call instead of just adding
the needed check. As result, the first call still can cause
crash on netns exit. Remove it.

Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Hans Schillstrom <[email protected]>
Signed-off-by: Simon Horman <[email protected]>

ring-buffer: Fix polling on trace_pipe

ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available.  Otherwise, the following epoll and
read sequence will eventually hang forever:

1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever

~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
  ring_buffer_poll_wait() returns immediately without adding poll_table,
  which has poll_table->_qproc pointing to ep_poll_callback(), to its
  wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
  ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
  because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
  it has to call ep_poll_callback() because it is not in its wait queue.
  Hence, block forever.

Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do.  For example, tcp_poll() in tcp.c.

Link: http://lkml.kernel.org/p/[email protected]
Cc: [email protected] # 2.6.27
Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <[email protected]>
Signed-off-by: Martin Lau <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

quota: missing lock in dqcache_shrink_scan()

Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

CC: [email protected]
Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de
Signed-off-by: Niu Yawei <[email protected]>
Signed-off-by: Jan Kara <[email protected]>

dm cache metadata: do not allow the data block size to change

The block size for the dm-cache's data device must remained fixed for
the life of the cache. Disallow any attempt to change the cache's data
block size.

Signed-off-by: Mike Snitzer <[email protected]>
Acked-by: Joe Thornber <[email protected]>
Cc: [email protected]

dm thin metadata: do not allow the data block size to change

The block size for the thin-pool's data device must remained fixed for
the life of the thin-pool. Disallow any attempt to change the
thin-pool's data block size.

It should be noted that attempting to change the data block size via
thin-pool table reload will be ignored as a side-effect of the thin-pool
handover that the thin-pool target does during thin-pool table reload.

Here is an example outcome of attempting to load a thin-pool table that
reduced the thin-pool's data block size from 1024K to 512K.

Before:
kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks

After:
kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported
kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object
kernel: device-mapper: ioctl: error adding target to table

Signed-off-by: Mike Snitzer <[email protected]>
Acked-by: Joe Thornber <[email protected]>
Cc: [email protected]

tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs

The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/[email protected]
Cc: [email protected] # 3.11+
Signed-off-by: zhangwei(Jovi) <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
"This contains miscellaneous fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: replace count*size kzalloc by kcalloc
  fuse: release temporary page if fuse_writepage_locked() failed
  fuse: restructure ->rename2()
  fuse: avoid scheduling while atomic
  fuse: handle large user and group ID
  fuse: inode: drop cast
  fuse: ignore entry-timeout on LOOKUP_REVAL
  fuse: timeout comparison fix

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Bluetooth pairing fixes from Johan Hedberg.

2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
    from Max Stepanov.

3) New iwlwifi chip IDs, from Oren Givon.

4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
    Wang.

5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.

6) Fix double SKB free in openvswitch, from Andy Zhou.

7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
    crashes, from Eric Dumazet.

8) Interpret the NAPI budget correctly in the new systemport driver,
    from Florian Fainelli.

9) VLAN code frees percpu stats in the wrong place, leading to crashes
    in the get stats handler.  From Eric Dumazet.

10) TCP sockets doing a repair can crash with a divide by zero, because
    we invoke tcp_push() with an MSS value of zero.  Just skip that part
    of the sendmsg paths in repair mode.  From Christoph Paasch.

11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.

12) Don't ignore path MTU icmp messages with a zero mtu, machines out
    there still spit them out, and all of our per-protocol handlers for
    PMTU can cope with it just fine.  From Edward Allcutt.

13) Some NETDEV_CHANGE notifier invocations were not passing in the
    correct kind of cookie as the argument, from Loic Prylli.

14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
    Maloy.

15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
    from Dmitry Popov.

16) Fix skb->sk assigned without taking a reference to 'sk' in
    appletalk, from Andrey Utkin.

17) Fix some info leaks in ULP event signalling to userspace in SCTP,
    from Daniel Borkmann.

18) Fix deadlocks in HSO driver, from Olivier Sobrie.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
  hso: fix deadlock when receiving bursts of data
  hso: remove unused workqueue
  net: ppp: don't call sk_chk_filter twice
  mlx4: mark napi id for gro_skb
  bonding: fix ad_select module param check
  net: pppoe: use correct channel MTU when using Multilink PPP
  neigh: sysctl - simplify address calculation of gc_* variables
  net: sctp: fix information leaks in ulpevent layer
  MAINTAINERS: update r8169 maintainer
  net: bcmgenet: fix RGMII_MODE_EN bit
  tipc: clear 'next'-pointer of message fragments before reassembly
  r8152: fix r8152_csum_workaround function
  be2net: set EQ DB clear-intr bit in be_open()
  GRE: enable offloads for GRE
  farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
  igb: do a reset on SR-IOV re-init if device is down
  igb: Workaround for i210 Errata 25: Slow System Clock
  usbnet: smsc95xx: add reset_resume function with reset operation
  dp83640: Always decode received status frames
  r8169: disable L23
  ...

libata: EH should handle AMNF error condition as a media error

libata-eh.c should handle AMNF error condition (error byte bit 0,
usually code 0x01) in libata-eh.c along with UNC as a media error so
SCSI stack can handle it properly (translation code 0x01 is already
present in libata-scsi.c) but was never passed down due to lack of
handling in EH.

While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
AMNF 0x01 error code generates generic "device error" that is retried
several times by SCSI stack instead of "media error" that is passed up to
software.

So we may assume deprecated AMNF error code is surely not dead yet, and
it's better for it to be handled properly. As we may see it is used by
modern enough devices, and used properly: drive returned AMNF only when IDs
for track cannot be read completely due to dying head or positioning,
otherwise it returned UNC(orrectables).

Not handling it causes wrong generic error code ("device error") reporting
down the stack, can damage failing drives further because of excessive
retries, and slows salvaging down a lot. Also, there is handling code in
libata-scsi.c for 0x01 AMNF error already.

https://bugzilla.kernel.org/show_bug.cgi?id=80031

tj: Shortened $SUBJ and moved its content to the first paragraph.

Signed-off-by: Alexey Asemov <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

tracing: Fix graph tracer with stack tracer on other archs

Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Cc: [email protected] # 3.3+
Signed-off-by: Steven Rostedt <[email protected]>

tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs

Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/[email protected]
Cc: [email protected] # 3.10+
Signed-off-by: zhangwei(Jovi) <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

ALSA: hda - Fix broken PM due to incomplete i915 initialization

When the initialization of Intel HDMI controller fails due to missing
i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
the driver discontinues the probe. However, since the probe was done
asynchronously, the driver object still remains, thus the relevant PM
ops are still called at suspend/resume. This results in the bad access
to the incomplete audio card object, eventually leads to Oops or stall
at PM.

This patch adds the missing checks of chip->init_failed flag at each
PM callback in order to fix the problem above.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

PM / sleep: fix freeze_ops NULL pointer dereferences

This patch fixes a NULL pointer dereference issue introduced by
commit 1f0b63866fc1 (ACPI / PM: Hold ACPI scan lock over the "freeze"
sleep state).

Fixes: 1f0b63866fc1 (ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state)
Link: http://marc.info/?l=linux-pm&m=140541182017443&w=2
Reported-and-tested-by: Alexander Stein <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

PM / sleep: Fix request_firmware() error at resume

The commit [247bc037: PM / Sleep: Mitigate race between the freezer
and request_firmware()] introduced the finer state control, but it
also leads to a new bug; for example, a bug report regarding the
firmware loading of intel BT device at suspend/resume:
  https://bugzilla.novell.com/show_bug.cgi?id=873790

The root cause seems to be a small window between the process resume
and the clear of usermodehelper lock.  The request_firmware() function
checks the UMH lock and gives up when it's in UMH_DISABLE state.  This
is for avoiding the invalid  f/w loading during suspend/resume phase.
The problem is, however, that usermodehelper_enable() is called at the
end of thaw_processes().  Thus, a thawed process in between can kick
off the f/w loader code path (in this case, via btusb_setup_intel())
even before the call of usermodehelper_enable().  Then
usermodehelper_read_trylock() returns an error and request_firmware()
spews WARN_ON() in the end.

This oneliner patch fixes the issue just by setting to UMH_FREEZING
state again before restarting tasks, so that the call of
request_firmware() will be blocked until the end of this function
instead of returning an error.

Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware())
Link: https://bugzilla.novell.com/show_bug.cgi?id=873790
Cc: 3.4+ <[email protected]> # 3.4+
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'linux-3.16' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

deadlock fix.

* 'linux-3.16' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/therm: fix a potential deadlock in the therm monitoring code

drm/nouveau/therm: fix a potential deadlock in the therm monitoring code

Signed-off-by: Martin Peres <[email protected]>
Tested-by: Stefan Ringel <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

hso: fix deadlock when receiving bursts of data

When the module sends bursts of data, sometimes a deadlock happens in
the hso driver when the tty buffer doesn't get the chance to be flushed
quickly enough.

Remove the endless while loop in function put_rxbuf_data() which is
called by the urb completion handler.
If there isn't enough room in the tty buffer, discards all the data
received in the URB.

Cc: David Miller <[email protected]>
Cc: David Laight <[email protected]>
Cc: One Thousand Gnomes <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jan Dumon <[email protected]>
Signed-off-by: Olivier Sobrie <[email protected]>
Acked-by: Alan Cox <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

hso: remove unused workqueue

The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere
in the code. So, remove it.

Signed-off-by: Olivier Sobrie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Stefan Richter:
"The 1394 drivers cannot and are not supposed to be built on platforms
  which don't provide the DMA mapping API (regression since v3.16-rc1
  with CONFIG_COMPILE_TEST=y on some architectures)"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA

Merge git://git.kvack.org/~bcrl/aio-fixes

Pull another aio fix from Ben LaHaise:
"put_reqs_available() can now be called from within irq context, which
  means that it (and its sibling function get_reqs_available()) now need
  to be irq-safe, not just preempt-safe"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: protect reqs_available updates from changes in interrupt handlers

[media] gspca_pac7302: Add new usb-id for Genius i-Look 317

Tested-and-reported-by: yullaw <[email protected]>
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

[media] tda10071: fix returned symbol rate calculation

Detected symbol rate value was returned too small.

Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

[media] tda10071: fix spec inversion reporting

Inversion ON was reported as inversion OFF and vice versa.

Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

[media] tda10071: add missing DVB-S2/PSK-8 FEC AUTO

FEC AUTO is valid for PSK-8 modulation too. Add it.

Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

[media] tda10071: force modulation to QPSK on DVB-S

Only supported modulation for DVB-S is QPSK. Modulation parameter
contains invalid value for DVB-S on some cases, which leads driver
refusing tuning attempt. Due to that, hard code modulation to QPSK
in case of DVB-S.

Cc: [email protected]
Signed-off-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

net/l2tp: don't fall back on UDP [get|set]sockopt

The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

"If we wanted this to work, it'd have to look up the tunnel and then
use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <[email protected]>
Acked-by: James Chapman <[email protected]>
Acked-by: David Miller <[email protected]>
Cc: Phil Turnbull <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

net: ppp: don't call sk_chk_filter twice

Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use
sk_unattached_filter api") causes sk_chk_filter() to be called twice when
setting a PPP pass or active filter. This applies to both the generic PPP
subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP
subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from
within get_filter(). The second one is through the call chain

  ppp_ioctl() or isdn_ppp_ioctl()
  --> sk_unattached_filter_create()
      --> __sk_prepare_filter()
          --> sk_chk_filter()

The first call from within get_filter() should be deleted as get_filter() is
called just before calling sk_unattached_filter_create() later on, which
eventually calls sk_chk_filter() anyway.

For 3.15.x, this proposed change is a bugfix rather than a pure optimization as
in that branch, sk_chk_filter() may replace filter codes by other codes which
are not recognized when executing sk_chk_filter() a second time. So with
3.15.x, if sk_chk_filter() is called twice, the second invocation may yield
EINVAL (this depends on the filter codes found in the filter to be set, but
because the replacement is done for frequently used codes, this is almost
always the case). The net effect is that setting pass and/or active PPP filters
does not work anymore, since sk_unattached_filter_create() always returns
EINVAL due to the second call to sk_chk_filter(), regardless whether the filter
was originally sane or not.

Signed-off-by: Christoph Schulz <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4: mark napi id for gro_skb

Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: fix ad_select module param check

Obvious copy/paste error when I converted the ad_select to the new
option API. "lacp_rate" there should be "ad_select" so we can get the
proper value.

CC: Jay Vosburgh <[email protected]>
CC: Veaceslav Falico <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: David S. Miller <[email protected]>
Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option
API")
Reported-by: Karim Scheik <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pppoe: use correct channel MTU when using Multilink PPP

The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

/*
* hdrlen includes the 2-byte PPP protocol field, but the
* MTU counts only the payload excluding the protocol field.
* (RFC1661 Section 2)
*/
mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
+    8 (ICMPv4 header)
+   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
+    4 (MP header)
+    2 (PPP protocol field for the MP payload (0x3d))
+    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

neigh: sysctl - simplify address calculation of gc_* variables

The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.

Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.

Similarly seen in the grsecurity patch, written by Brad Spengler.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Brad Spengler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

xfs: null unused quota inodes when quota is on

When quota is on, it is expected that unused quota inodes have a
value of NULLFSINO. The changes to support a separate project quota
in 3.12 broken this rule for non-project quota inode enabled
filesystem, as the code now refuses to write the group quota inode
if neither group or project quotas are enabled. This regression was
introduced by commit d892d58 ("xfs: Start using pquotaino from the
superblock").

In this case, we should be writing NULLFSINO rather than nothing to
ensure that we leave the group quota inode in a valid state while
quotas are enabled.

Failure to do so doesn't cause a current kernel to break - the
separate project quota inodes introduced translation code to always
treat a zero inode as NULLFSINO. This was introduced by commit
0102629 ("xfs: Initialize all quota inodes to be NULLFSINO") with is
also in 3.12 but older kernels do not do this and hence taking a
filesystem back to an older kernel can result in quotas failing
initialisation at mount time. When that happens, we see this in
dmesg:

[ 1649.215390] XFS (sdb): Mounting Filesystem
[ 1649.316894] XFS (sdb): Failed to initialize disk quotas.
[ 1649.316902] XFS (sdb): Ending clean mount

By ensuring that we write NULLFSINO to quota inodes that aren't
active, we avoid this problem. We have to be really careful when
determining if the quota inodes are active or not, because we don't
want to write a NULLFSINO if the quota inodes are active and we
simply aren't updating them.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>

net: sctp: fix information leaks in ulpevent layer

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "drm/i915: reverse dp link param selection, prefer fast over wide again"

This reverts commit 38aecea0ccbb909d635619cba22f1891e589b434.

This breaks Haswell Thinkpad + Lenovo dock in SST mode with a HDMI monitor attached.

Before this we can 1920x1200 mode, after this we only ever get 1024x768, and
a lot of deferring.

This didn't revert clean, but this should be fine.

bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1117008
Cc: [email protected] # v3.15
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>

xfs: refine the allocation stack switch

The allocation stack switch at xfs_bmapi_allocate() has served it's
purpose, but is no longer a sufficient solution to the stack usage
problem we have in the XFS allocation path.

Whilst the kernel stack size is now 16k, that is not a valid reason
for undoing all our "keep stack usage down" modifications. What it
does allow us to do is have the freedom to refine and perfect the
modifications knowing that if we get it wrong it won't blow up in
our faces - we have a safety net now.

This is important because we still have the issue of older kernels
having smaller stacks and that they are still supported and are
demonstrating a wide range of different stack overflows. Red Hat
has several open bugs for allocation based stack overflows from
directory modifications and direct IO block allocation and these
problems still need to be solved. If we can solve them upstream,
then distro's won't need to bake their own unique solutions.

To that end, I've observed that every allocation based stack
overflow report has had a specific characteristic - it has happened
during or directly after a bmap btree block split. That event
requires a new block to be allocated to the tree, and so we
effectively stack one allocation stack on top of another, and that's
when we get into trouble.

A further observation is that bmap btree block splits are much rarer
than writeback allocation - over a range of different workloads I've
observed the ratio of bmap btree inserts to splits ranges from 100:1
(xfstests run) to 10000:1 (local VM image server with sparse files
that range in the hundreds of thousands to millions of extents).
Either way, bmap btree split events are much, much rarer than
allocation events.

Finally, we have to move the kswapd state to the allocation workqueue
work when allocation is done on behalf of kswapd. This is proving to
cause significant perturbation in performance under memory pressure
and appears to be generating allocation deadlock warnings under some
workloads, so avoiding the use of a workqueue for the majority of
kswapd writeback allocation will minimise the impact of such
behaviour.

Hence it makes sense to move the stack switch to xfs_btree_split()
and only do it for bmap btree splits. Stack switches during
allocation will be much rarer, so there won't be significant
performacne overhead caused by switching stacks. The worse case
stack from all allocation paths will be split, not just writeback.
And the majority of memory allocations will be done in the correct
context (e.g. kswapd) without causing additional latency, and so we
simplify the memory reclaim interactions between processes,
workqueues and kswapd.

The worst stack I've been able to generate with this patch in place
is 5600 bytes deep. It's very revealing because we exit XFS at:

37) 1768 64 kmem_cache_alloc+0x13b/0x170

about 1800 bytes of stack consumed, and the remaining 3800 bytes
(and 36 functions) is memory reclaim, swap and the IO stack. And
this occurs in the inode allocation from an open(O_CREAT) syscall,
not writeback.

The amount of stack being used is much less than I've previously be
able to generate - fs_mark testing has been able to generate stack
usage of around 7k without too much trouble; with this patch it's
only just getting to 5.5k. This is primarily because the metadata
allocation paths (e.g. directory blocks) are no longer causing
double splits on the same stack, and hence now stack tracing is
showing swapping being the worst stack consumer rather than XFS.

Performance of fs_mark inode create workloads is unchanged.
Performance of fs_mark async fsync workloads is consistently good
with context switches reduced by around 150,000/s (30%).
Performance of dbench, streaming IO and postmark is unchanged.
Allocation deadlock warnings have not been seen on the workloads
that generated them since adding this patch.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>

Revert "xfs: block allocation work needs to be kswapd aware"

This reverts commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679.

This commit resulted in regressions in performance in low
memory situations where kswapd was doing writeback of delayed
allocation blocks. It resulted in significant parallelism of the
kswapd work and with the special kswapd flags meant that hundreds of
active allocation could dip into kswapd specific memory reserves and
avoid being throttled. This cause a large amount of performance
variation, as well as random OOM-killer invocations that didn't
previously exist.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>

x86/espfix/xen: Fix allocation of pages for paravirt page tables

init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.

The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.

More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.

Signed-off-by: Boris Ostrovsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>