]> Git Repo - linux.git/log
linux.git
8 years agoima: support restoring multiple template formats
Mimi Zohar [Tue, 20 Dec 2016 00:22:54 +0000 (16:22 -0800)]
ima: support restoring multiple template formats

The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoima: store the builtin/custom template definitions in a list
Mimi Zohar [Tue, 20 Dec 2016 00:22:51 +0000 (16:22 -0800)]
ima: store the builtin/custom template definitions in a list

The builtin and single custom templates are currently stored in an
array.  In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list.  This will permit
defining more than one custom template per boot.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoima: on soft reboot, save the measurement list
Mimi Zohar [Tue, 20 Dec 2016 00:22:48 +0000 (16:22 -0800)]
ima: on soft reboot, save the measurement list

The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agopowerpc: ima: send the kexec buffer to the next kernel
Thiago Jung Bauermann [Tue, 20 Dec 2016 00:22:45 +0000 (16:22 -0800)]
powerpc: ima: send the kexec buffer to the next kernel

The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.

This is the architecture-specific part of setting up the IMA kexec
buffer for the next kernel.  It will be used in the next patch.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Dmitry Kasatkin <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoima: maintain memory size needed for serializing the measurement list
Mimi Zohar [Tue, 20 Dec 2016 00:22:42 +0000 (16:22 -0800)]
ima: maintain memory size needed for serializing the measurement list

In preparation for serializing the binary_runtime_measurements, this
patch maintains the amount of memory required.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoima: permit duplicate measurement list entries
Mimi Zohar [Tue, 20 Dec 2016 00:22:38 +0000 (16:22 -0800)]
ima: permit duplicate measurement list entries

Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list.  This patch adds
support for allowing duplicate measurements.

The "boot_aggregate" measurement entry is the delimiter between soft
boots.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoima: on soft reboot, restore the measurement list
Mimi Zohar [Tue, 20 Dec 2016 00:22:35 +0000 (16:22 -0800)]
ima: on soft reboot, restore the measurement list

The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.  This
patch restores the measurement list.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Dmitry Kasatkin <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agopowerpc: ima: get the kexec buffer passed by the previous kernel
Thiago Jung Bauermann [Tue, 20 Dec 2016 00:22:32 +0000 (16:22 -0800)]
powerpc: ima: get the kexec buffer passed by the previous kernel

Patch series "ima: carry the measurement list across kexec", v8.

The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and then restored on the
subsequent boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list.  This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format.  The assumption being that userspace could and would
handle any architecture conversions.  With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash.  To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg.  TPM 2.0
support for larger digests).

A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included in
this patch set.  The simplified method requires all file measurements be
taken prior to executing the kexec load, as subsequent measurements will
not be carried across the kexec and restored.

This patch (of 10):

The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
The second kernel can check whether the previous kernel sent the buffer
and retrieve it.

This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel.  It will be used in the
next patch.

The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Andreas Steffen <[email protected]>
Cc: Dmitry Kasatkin <[email protected]>
Cc: Josh Sklar <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stewart Smith <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
8 years agoipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data...
zheng li [Mon, 12 Dec 2016 01:56:05 +0000 (09:56 +0800)]
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoath10k: free host-mem with DMA_BIRECTIONAL flag
Ben Greear [Thu, 15 Dec 2016 09:23:19 +0000 (11:23 +0200)]
ath10k: free host-mem with DMA_BIRECTIONAL flag

Hopefully this fixes the problem reported by Kalle:

Noticed this in my log, but I don't have time to investigate this in
detail right now:

[  413.795346] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  414.158755] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[  477.439659] ath10k_pci 0000:02:00.0: could not get mac80211 beacon
[  481.666630] ------------[ cut here ]------------
[  481.666669] WARNING: CPU: 0 PID: 1978 at lib/dma-debug.c:1155 check_unmap+0x320/0x8e0
[  481.666688] ath10k_pci 0000:02:00.0: DMA-API: device driver frees DMA memory with different direction [device address=0x000000002d130000] [size=63800 bytes] [mapped with DMA_BIDIRECTIONAL] [unmapped with DMA_TO_DEVICE]
[  481.666703] Modules linked in: ctr ccm ath10k_pci(E-) ath10k_core(E) ath(E) mac80211(E) cfg80211(E) snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi arc4 snd_rawmidi snd_seq_midi_event snd_seq btusb btintel snd_seq_device joydev coret
[  481.671468] CPU: 0 PID: 1978 Comm: rmmod Tainted: G            E   4.9.0-rc7-wt+ #54
[  481.671478] Hardware name: Hewlett-Packard HP ProBook 6540b/1722, BIOS 68CDD Ver. F.04 01/27/2010
[  481.671489]  ef49dcec c842ee92 c8b5830e ef49dd34 ef49dd20 c80850f5 c8b5a13c ef49dd50
[  481.671560]  000007ba c8b5830e 00000483 c8461830 c8461830 00000483 ef49ddcc f34e64b8
[  481.671641]  c8b58360 ef49dd3c c80851bb 00000009 00000000 ef49dd34 c8b5a13c ef49dd50
[  481.671716] Call Trace:
[  481.671731]  [<c842ee92>] dump_stack+0x76/0xb4
[  481.671745]  [<c80850f5>] __warn+0xe5/0x100
[  481.671757]  [<c8461830>] ? check_unmap+0x320/0x8e0
[  481.671769]  [<c8461830>] ? check_unmap+0x320/0x8e0
[  481.671780]  [<c80851bb>] warn_slowpath_fmt+0x3b/0x40
[  481.671791]  [<c8461830>] check_unmap+0x320/0x8e0
[  481.671804]  [<c8462054>] debug_dma_unmap_page+0x84/0xa0
[  481.671835]  [<f937cd7a>] ath10k_wmi_free_host_mem+0x9a/0xe0 [ath10k_core]
[  481.671861]  [<f9363400>] ath10k_core_destroy+0x50/0x60 [ath10k_core]
[  481.671875]  [<f8e13969>] ath10k_pci_remove+0x79/0xa0 [ath10k_pci]
[  481.671889]  [<c848d8d8>] pci_device_remove+0x38/0xb0
[  481.671901]  [<c859fe4b>] __device_release_driver+0x7b/0x110
[  481.671913]  [<c85a00e7>] driver_detach+0x97/0xa0
[  481.671923]  [<c859ef8b>] bus_remove_driver+0x4b/0xb0
[  481.671934]  [<c85a0cda>] driver_unregister+0x2a/0x60
[  481.671949]  [<c848c888>] pci_unregister_driver+0x18/0x70
[  481.671965]  [<f8e14dae>] ath10k_pci_exit+0xd/0x25f [ath10k_pci]
[  481.671979]  [<c812bb84>] SyS_delete_module+0xf4/0x180
[  481.671995]  [<c81f801b>] ? __might_fault+0x8b/0xa0
[  481.672009]  [<c80037d0>] do_fast_syscall_32+0xa0/0x1e0
[  481.672025]  [<c88d4c88>] sysenter_past_esp+0x45/0x74
[  481.672037] ---[ end trace 3fd23759e17e1622 ]---
[  481.672049] Mapped at:
[  481.672060]  [  481.672072] [<c846062c>] debug_dma_map_page.part.25+0x1c/0xf0
[  481.672083]  [  481.672095] [<c8460799>] debug_dma_map_page+0x99/0xc0
[  481.672106]  [  481.672132] [<f93745ec>] ath10k_wmi_alloc_chunk+0x12c/0x1f0 [ath10k_core]
[  481.672142]  [  481.672168] [<f937d0c4>] ath10k_wmi_event_service_ready_work+0x304/0x540 [ath10k_core]
[  481.672178]  [  481.672190] [<c80a3643>] process_one_work+0x1c3/0x670
[  482.137134] ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[  482.313144] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:02:00.0.bin failed with error -2
[  482.313274] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/cal-pci-0000:02:00.0.bin failed with error -2
[  482.313768] ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[  482.313777] ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 1
[  482.313974] ath10k_pci 0000:02:00.0: firmware ver 10.2.4.70.59-2 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 4159f498
[  482.369858] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed with error -2
[  482.370011] ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[  483.596770] ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
[  483.701686] ath: EEPROM regdomain: 0x0
[  483.701706] ath: EEPROM indicates default country code should be used
[  483.701713] ath: doing EEPROM country->regdmn map search
[  483.701721] ath: country maps to regdmn code: 0x3a
[  483.701730] ath: Country alpha2 being used: US
[  483.701737] ath: Regpair used: 0x3a

Reported-by: Kalle Valo <[email protected]>
Signed-off-by: Ben Greear <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
8 years agosamples/bpf: Move open_raw_sock to separate header
Joe Stringer [Fri, 9 Dec 2016 02:46:20 +0000 (18:46 -0800)]
samples/bpf: Move open_raw_sock to separate header

This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.

Signed-off-by: Joe Stringer <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agosamples/bpf: Remove perf_event_open() declaration
Joe Stringer [Fri, 9 Dec 2016 02:46:19 +0000 (18:46 -0800)]
samples/bpf: Remove perf_event_open() declaration

This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.

Committer notes:

Testing it:

  $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
  make[1]: Entering directory '/home/build/v4.9.0-rc8+'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /home/acme/git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    /home/acme/git/linux/scripts/checksyscalls.sh
    HOSTCC  samples/bpf/test_verifier.o
    HOSTCC  samples/bpf/libbpf.o
    HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
    HOSTCC  samples/bpf/test_maps.o
    HOSTCC  samples/bpf/sock_example.o
    HOSTCC  samples/bpf/bpf_load.o
<SNIP>
    HOSTLD  samples/bpf/trace_event
    HOSTLD  samples/bpf/sampleip
    HOSTLD  samples/bpf/tc_l2_redirect
  make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
  $

Also tested the offwaketime resulting from the rebuild, seems to work as
before.

Signed-off-by: Joe Stringer <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agosamples/bpf: Be consistent with bpf_load_program bpf_insn parameter
Arnaldo Carvalho de Melo [Tue, 20 Dec 2016 14:03:13 +0000 (11:03 -0300)]
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter

Only one of the examples declare the bpf_insn bpf proggie as a const:

  $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
  samples/bpf/fds_example.c: static const struct bpf_insn insns[] = {
  samples/bpf/sock_example.c: struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach2.c: struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach.c: struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_sock.c: struct bpf_insn prog[] = {
  $

Which causes this warning:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  <SNIP>
     HOSTCC  samples/bpf/fds_example.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex1_user.o

So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.

Cc: Joe Stringer <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agotools lib bpf: Add bpf_prog_{attach,detach}
Joe Stringer [Wed, 14 Dec 2016 22:05:26 +0000 (14:05 -0800)]
tools lib bpf: Add bpf_prog_{attach,detach}

Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.

Committer notes:

Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.

Signed-off-by: Joe Stringer <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agosamples/bpf: Switch over to libbpf
Joe Stringer [Wed, 14 Dec 2016 22:43:39 +0000 (14:43 -0800)]
samples/bpf: Switch over to libbpf

Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  <SNIP>
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    HOSTCC  scripts/basic/fixdep
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    HOSTCC  scripts/basic/bin2c
    HOSTCC  arch/x86/tools/relocs_32.o
    HOSTCC  arch/x86/tools/relocs_64.o
    LD      samples/bpf/built-in.o
  <SNIP>
    HOSTCC  samples/bpf/fds_example.o
    HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex2_user.o
  <SNIP>
    HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
  -Wno-compare-distinct-pointer-types \
  -Wno-gnu-variable-sized-type-not-at-end \
  -Wno-address-of-packed-member -Wno-tautological-compare \
  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD  samples/bpf/tc_l2_redirect
  <SNIP>
    HOSTLD  samples/bpf/lwt_len_hist
    HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime         offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
    [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
    [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
    [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
    [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
    [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
    [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
    [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
    [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
  qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
  swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
Link: http://lkml.kernel.org/n/[email protected]
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agoperf diff: Do not overwrite valid build id
Kan Liang [Tue, 13 Dec 2016 15:29:44 +0000 (10:29 -0500)]
perf diff: Do not overwrite valid build id

Fixes a perf diff regression issue which was introduced by commit
5baecbcd9c9a ("perf symbols: we can now read separate debug-info files
based on a build ID")

The binary name could be same when perf diff different binaries. Build
id is used to distinguish between them.
However, the previous patch assumes the same binary name has same build
id. So it overwrites the build id according to the binary name,
regardless of whether the build id is set or not.

Check the has_build_id in dso__load. If the build id is already set, use
it.

Before the fix:

  $ perf diff 1.perf.data 2.perf.data
  # Event 'cycles'
  #
  # Baseline    Delta  Shared Object     Symbol
  # ........  .......  ................  .............................
  #
    99.83%  -99.80%  tchain_edit       [.] f2
     0.12%  +99.81%  tchain_edit       [.] f3
     0.02%   -0.01%  [ixgbe]           [k] ixgbe_read_reg

  After the fix:
  $ perf diff 1.perf.data 2.perf.data
  # Event 'cycles'
  #
  # Baseline    Delta  Shared Object     Symbol
  # ........  .......  ................  .............................
  #
    99.83%   +0.10%  tchain_edit       [.] f3
     0.12%   -0.08%  tchain_edit       [.] f2

Signed-off-by: Kan Liang <[email protected]>
Cc: Andi Kleen <[email protected]>
CC: Dima Kogan <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Fixes: 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agoperf annotate: Don't throw error for zero length symbols
Ravi Bangoria [Tue, 22 Nov 2016 08:40:50 +0000 (14:10 +0530)]
perf annotate: Don't throw error for zero length symbols

'perf report --tui' exits with error when it finds a sample of zero
length symbol (i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.

Reported-and-Tested-by: Anton Blanchard <[email protected]>
Link: https://lkml.org/lkml/2016/10/8/189
Signed-off-by: Ravi Bangoria <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Riyder <[email protected]>
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected] # v4.9+
Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agodrm/i915: skip the first 4k of stolen memory on everything >= gen8
Paulo Zanoni [Wed, 14 Dec 2016 14:55:37 +0000 (12:55 -0200)]
drm/i915: skip the first 4k of stolen memory on everything >= gen8

BSpec got updated and this workaround is now listed as standard
required programming for all subsequent projects. This is confirmed to
fix Skylake screen flickering issues (probably caused by the fact that
we initialized a ring in the first page of stolen, but I didn't 100%
confirm this theory).

v2: this is the patch that fixes the screen flickering, document it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94605
Cc: [email protected]
Tested-by: Dominik Klementowski <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
Acked-by: Chris Wilson <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d43537610470d8829ebd17cd7842f47176e35ebd)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
Chris Wilson [Mon, 19 Dec 2016 12:43:45 +0000 (12:43 +0000)]
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping

If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

 i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Fixes: 871dfbd67d4e ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Imre Deak <[email protected]>
Cc: <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Daniel Vetter <[email protected]>
(cherry picked from commit d766ef53006c2c38a7fe2bef0904105a793383f2)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: Fix use after free in logical_render_ring_init
Tvrtko Ursulin [Fri, 16 Dec 2016 13:18:42 +0000 (13:18 +0000)]
drm/i915: Fix use after free in logical_render_ring_init

Commit 3b3f1650b1ca ("drm/i915: Allocate intel_engine_cs
structure only for the enabled engines") introduced the
dynanically allocated engine instances and created an
potential use after free scenario in logical_render_ring_init
where lrc_destroy_wa_ctx_obj could be called after the engine
instance has been freed.

This can only happen during engine setup/init error handling
which luckily does not happen ever in practice.

Fix is to not call lrc_destroy_wa_ctx_obj since it would have
already been executed from the preceding engine cleanup.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Fixes: 3b3f1650b1ca ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines")
Cc: Chris Wilson <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jani Nikula <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d038fc7e4fff14d6b026130007faef35cbf5e956)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: disable PSR by default on HSW/BDW
Paulo Zanoni [Tue, 13 Dec 2016 20:57:44 +0000 (18:57 -0200)]
drm/i915: disable PSR by default on HSW/BDW

We've been ignoring the poor bugzilla reporters that say PSR causes
system lockups and all other sorts of problems. The earliest bug
report is from April, so I think we can use the "revert the offending
commit if no fixes are presented within 8 months" rule here.

Fixes: 9b58e352b463 ("drm/i915: Enable PSR by default on Haswell and Broadwell.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97602
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97515
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96736
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96704
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96569
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95176
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94985
Cc: <[email protected]> # v4.6+
Cc: Rodrigo Vivi <[email protected]>
Cc: Jim Bride <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2ee7dc497e348eecbb82adbb1ea9e9a7e29fe921)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: Fix setting of boost freq tunable
Mika Kuoppala [Wed, 14 Dec 2016 12:26:20 +0000 (14:26 +0200)]
drm/i915: Fix setting of boost freq tunable

For limiting the max frequency of gpu, the max freq tunable
is not enough to hard limit the max gap. We now have also per
client boost max freq. When this tunable was introduced,
it was mistakenly made read only. Allow user to gain control by
setting it writable.

Fixes: 29ecd78d3b79 ("drm/i915: Define a separate variable and control for RPS waitboost frequency")
Cc: <[email protected]> # v4.9+
Cc: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jani Nikula <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 73a798711314b54cbd4fe224e24db92c306a8d8c)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: tune down the fast link training vs boot fail
Daniel Vetter [Tue, 13 Dec 2016 19:54:14 +0000 (20:54 +0100)]
drm/i915: tune down the fast link training vs boot fail

It's been unfixed since a while and no one is immediately working on
this. And we have the FIXME already. And now also a task in the DP
team's backlog.

Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Ville Syrjälä <[email protected]>
References: https://lists.freedesktop.org/archives/intel-gfx/2016-July/101951.html
Acked-by: Ville Syrjälä <[email protected]>
[danvet: Adjust comment per Ville's feedback.]
Signed-off-by: Daniel Vetter <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2dd85aeb5bc99e3763dd192cdb95ff405a102c8a)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: Reorder phys backing storage release
Chris Wilson [Wed, 7 Dec 2016 13:34:11 +0000 (13:34 +0000)]
drm/i915: Reorder phys backing storage release

In commit a4f5ea64f0a8 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <[email protected]>
Fixes: a4f5ea64f0a8 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Reviewed-by: Joonas Lahtinen <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit dbb4351bab0a8440f6b02895c142bce6c30b7097)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915/gen9: Fix PCODE polling during SAGV disabling
Imre Deak [Mon, 5 Dec 2016 16:27:38 +0000 (18:27 +0200)]
drm/i915/gen9: Fix PCODE polling during SAGV disabling

According to the previous patch, it's possible atm that we call
intel_do_sagv_disable() only once during the 1ms period and time out if
that call fails. As opposed to this the spec says that we need to keep
retrying this request for a 1ms duration, so let's do this similarly to
the CDCLK change notification request.

v4-5:
- Rebased on the reply_mask, reply change.
v6:
- Remove w/s change. (Lyude)
- Rebased on the timeout_base argument change.

Cc: Lyude <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Chris Wilson <[email protected]>
Fixes: 656d1b89e5ff ("drm/i915/skl: Add support for the SAGV, fix underrun hangs")
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Lyude <[email protected]> (v4)
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit b3b8e99984a4eace91bc097e8f8cec71441cae16)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915/gen9: Fix PCODE polling during CDCLK change notification
Imre Deak [Mon, 5 Dec 2016 16:27:37 +0000 (18:27 +0200)]
drm/i915/gen9: Fix PCODE polling during CDCLK change notification

commit 848496e5902833600f7992f4faa82dc1546051ba
Author: Ville Syrjälä <[email protected]>
Date:   Wed Jul 13 16:32:03 2016 +0300

    drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL

increased the timeout to match the spec, but we still see a timeout on
at least one SKL. A CDCLK change request following the failed one will
succeed nevertheless.

I could reproduce this problem easily by running kms_pipe_crc_basic in a
loop. In all failure cases _wait_for() was pre-empted for >3ms and so in
the worst case - when the pre-emption happened right after calculating
timeout__ in _wait_for() - we called skl_cdclk_wait_for_pcu_ready() only
once which failed and so _wait_for() timed out. As opposed to this the
spec says to keep retrying the request for at most a 3ms period.

To fix this send the first request explicitly to guarantee that there is
3ms between the first and last request. Though this matches the spec, I
noticed that in rare cases this can still time out if we sent only a few
requests (in the worst case 2) _and_ PCODE is busy for some reason even
after a previous request and a 3ms delay. To work around this retry the
polling with pre-emption disabled to maximize the number of requests.
Also increase the timeout to 10ms to account for interrupts that could
reduce the number of requests. With this change I couldn't trigger
the problem.

v2:
- Use 1ms poll period instead of 10us. (Chris)
v3:
- Poll with pre-emption disabled to increase the number of request
  attempts. (Ville, Chris)
- Factor out a helper to poll, it's also needed by the next patch.
v4:
- Pass reply_mask, reply to skl_pcode_request(), instead of assuming the
  reply is generic. (Ville)
v5:
- List the request specific timeout values as code comment. (Ville)
v6:
- Try the poll first with preemption enabled.
- Add code comment about first request being queued by PCODE. (Art)
- Add timeout_base_ms argument. (Ville)
v7:
- Clarify code comment about first queued request. (Chris)

Cc: Ville Syrjälä <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Art Runyan <[email protected]>
Cc: <[email protected]> # v4.2- : 3b2c171 : drm/i915: Wait up to 3ms
Cc: <[email protected]> # v4.2-
Fixes: 5d96d8afcfbb ("drm/i915/skl: Deinit/init the display at suspend/resume")
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=97929
Testcase: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-B
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a0b8a1fe34430c3a82258e8cb45f5968bdf31afd)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
Hans de Goede [Thu, 1 Dec 2016 20:29:09 +0000 (21:29 +0100)]
drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting

Set the CHV_GPIO_GPIOEN bit when updating GPIOs from chv_exec_gpio.

Fixes: a0a6d4ffd2ad ("drm/i915/dsi: add support for gpio elements on CHV")
Cc: [email protected]
Cc: Jani Nikula <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Ville Syrjälä <[email protected]>
(cherry picked from commit b2b45fcd921e864a5e9bbc7aa55dee96d5e11c06)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
Hans de Goede [Fri, 2 Dec 2016 15:01:28 +0000 (16:01 +0100)]
drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET

Looking at the ADF code from the Android kernel sources for a
cherrytrail tablet I noticed that it is calling the
MIPI_SEQ_ASSERT_RESET sequence from the panel prepare hook.

Until commit b1cb1bd29189 ("drm/i915/dsi: update reset and power sequences
in panel prepare/unprepare hooks") the mainline i915 code was doing the
same. That commits effectively swaps the calling of MIPI_SEQ_ASSERT_RESET /
MIPI_SEQ_DEASSERT_RESET.

Looking at the naming of the sequences that is the right thing to do,
but the problem is, that the old mainline code and the ADF code was
actually calling the right sequence (tested on a cube iwork8 air tablet),
and the swapping of the calling breaks things.

This breakage was likely not noticed in testing because on cherrytrail,
currently chv_exec_gpio ends up disabling the gpio pins rather then
setting them (this is fixed in the next patch in this patch-set).

This commit fixes the swapping by fixing MIPI_SEQ_ASSERT/DEASSERT_RESET's
places in the enum defining them, so that their (new) names match their
actual use.

Changes in v2:
-Add a comment to the enum explaining that the assert/reassert names
 are swapped in the spec

Fixes: b1cb1bd29189 ("drm/i915/dsi: update reset and power sequences...")
Cc: Jani Nikula <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Ville Syrjälä <[email protected]>
(cherry picked from commit 2b8208ac93be2783edc627fc02d9ca50cc479923)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
Hans de Goede [Fri, 2 Dec 2016 14:29:04 +0000 (15:29 +0100)]
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating

On my Cherrytrail CUBE iwork8 Air tablet PIPE-A would get stuck on loading
i915 at boot 1 out of every 3 boots, resulting in a non functional LCD.
Once the i915 driver has successfully loaded, the panel can be disabled /
enabled without hitting this issue.

The getting stuck is caused by vlv_init_display_clock_gating() clearing
the DPOUNIT_CLOCK_GATE_DISABLE bit in DSPCLK_GATE_D when called from
chv_pipe_power_well_ops.enable() on driver load, while a pipe is enabled
driving the DSI LCD by the BIOS.

Clearing this bit while DSI is in use is a known issue and
intel_dsi_pre_enable() / intel_dsi_post_disable() already set / clear it
as appropriate.

This commit modifies vlv_init_display_clock_gating() to leave the
DPOUNIT_CLOCK_GATE_DISABLE bit alone fixing the pipe getting stuck.

Changes in v2:
-Replace PIPE-A with "a pipe" or "the pipe" in the commit msg and
comment

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97330
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Ville Syrjälä <[email protected]>
(cherry picked from commit 721d484563e1a51ada760089c490cbc47e909756)
Signed-off-by: Jani Nikula <[email protected]>
8 years agodrm/i915: drop the struct_mutex when wedged or trying to reset
Matthew Auld [Mon, 28 Nov 2016 10:36:48 +0000 (10:36 +0000)]
drm/i915: drop the struct_mutex when wedged or trying to reset

We grab the struct_mutex in intel_crtc_page_flip, but if we are wedged
or a reset is in progress we bail early but never seem to actually
release the lock.

Fixes: 7f1847ebf48b ("drm/i915: Simplify checking of GPU reset_counter in display pageflips")
Cc: Chris Wilson <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Joonas Lahtinen <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v4.7+
(cherry picked from commit ddbb271aea87fc6004d3c8bcdb0710e980c7ec85)
Signed-off-by: Jani Nikula <[email protected]>
(cherry picked from commit e411072d5740a49cdc9d0713798c30440757e451)
Signed-off-by: Jani Nikula <[email protected]>
8 years agobrcmfmac: fix uninitialized field in scheduled scan ssid configuration
Arend Van Spriel [Fri, 9 Dec 2016 11:34:14 +0000 (11:34 +0000)]
brcmfmac: fix uninitialized field in scheduled scan ssid configuration

The scheduled scan ssid configuration in firmware has a flags field that
was not initialized resulting in unexpected behaviour.

Fixes: e3bdb7cc0300 ("brcmfmac: fix handling ssids in .sched_scan_start() callback")
Reviewed-by: Hante Meuleman <[email protected]>
Reviewed-by: Pieter-Paul Giesberts <[email protected]>
Reviewed-by: Franky Lin <[email protected]>
Signed-off-by: Arend van Spriel <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
8 years agobrcmfmac: fix memory leak in brcmf_cfg80211_attach()
Arend Van Spriel [Fri, 9 Dec 2016 11:34:13 +0000 (11:34 +0000)]
brcmfmac: fix memory leak in brcmf_cfg80211_attach()

In brcmf_cfg80211_attach() there was one error path not properly
handled as it leaked memory allocated in brcmf_btcoex_attach().

Reviewed-by: Hante Meuleman <[email protected]>
Reviewed-by: Pieter-Paul Giesberts <[email protected]>
Reviewed-by: Franky Lin <[email protected]>
Signed-off-by: Arend van Spriel <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
8 years agoperf bench futex: Fix lock-pi help string
Davidlohr Bueso [Thu, 15 Dec 2016 19:36:24 +0000 (11:36 -0800)]
perf bench futex: Fix lock-pi help string

Obvious copy/paste typo from the requeue program.

Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agoperf trace: Check if MAP_32BIT is defined (again)
Jiri Olsa [Thu, 15 Dec 2016 19:56:54 +0000 (20:56 +0100)]
perf trace: Check if MAP_32BIT is defined (again)

There might be systems where MAP_32BIT is not defined, like some some
RHEL7 powerpc versions.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 256763b01741 ("perf trace beauty mmap: Add more conditional defines")
Link: http://lkml.kernel.org/r/[email protected]
[ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agosamples/bpf: Make perf_event_read() static
Arnaldo Carvalho de Melo [Thu, 15 Dec 2016 15:29:27 +0000 (12:29 -0300)]
samples/bpf: Make perf_event_read() static

While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
some warnings building samples/bpf/ on a Fedora Rawhide container, with
clang/llvm 3.9 I noticed this:

  [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
  <SNIP>
    HOSTCC  samples/bpf/trace_output_user.o
  /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
  prototype for 'perf_event_read' [-Wmissing-prototypes]
   void perf_event_read(print_fn fn)
        ^~~~~~~~~~~~~~~
    HOSTLD  samples/bpf/trace_output
  make[1]: Leaving directory '/tmp/build/linux'

Shut up the compiler by making that function static.

Acked-by: Daniel Borkmann <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Joe Stringer <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
8 years agommc: core: Further fix thread wake-up
Adrian Hunter [Mon, 19 Dec 2016 13:57:34 +0000 (15:57 +0200)]
mmc: core: Further fix thread wake-up

Commit e0097cf5f2f1 ("mmc: queue: Fix queue thread wake-up") did not go far
enough. mmc_wait_for_data_req_done() still contains some problems and can
be further simplified.  First it should not touch
context_info->is_waiting_last_req because that is a wake-up control used by
the owner of the context. Secondly, it should always return when one of its
wake-up conditions is met because, again, that is contolled by the owner of
the context.

While the current block driver does not have an issue, these problems were
exposed during testing of the Software Command Queue patches.

Fixes: e0097cf5f2f1 ("mmc: queue: Fix queue thread wake-up")
Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Harjani Ritesh <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
8 years agommc: sdhci: Fix to handle MMC_POWER_UNDEFINED
Adrian Hunter [Mon, 19 Dec 2016 13:33:11 +0000 (15:33 +0200)]
mmc: sdhci: Fix to handle MMC_POWER_UNDEFINED

Since commit c2c24819b280 ("mmc: core: Don't power off the card when
starting the host"), the power state can still be MMC_POWER_UNDEFINED after
mmc_start_host() is called. That can trigger a warning in SDHCI during
runtime resume as it tries to restore the I/O state. Handle
MMC_POWER_UNDEFINED simply by not updating the I/O state in that case.

Fixes: c2c24819b280 ("mmc: core: Don't power off the card when starting the host")
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
8 years agommc: sdhci-cadence: add Socionext UniPhier specific compatible string
Masahiro Yamada [Wed, 14 Dec 2016 02:10:46 +0000 (11:10 +0900)]
mmc: sdhci-cadence: add Socionext UniPhier specific compatible string

Add a Socionext SoC specific compatible (suggested by Rob Herring).

No SoC specific data are associated with the compatible strings for
now, but other SoC vendors may use this IP and want to differentiate
IP variants in the future.

Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
8 years agox86/platform/intel/quark: Add printf attribute to imr_self_test_result()
Nicolas Iooss [Mon, 19 Dec 2016 13:21:44 +0000 (14:21 +0100)]
x86/platform/intel/quark: Add printf attribute to imr_self_test_result()

__printf() attributes help detecting issues in printf() format strings at
compile time.

Even though imr_selftest.c is only compiled with
CONFIG_DEBUG_IMR_SELFTEST=y, GCC complains about a missing format
attribute when compiling allmodconfig with -Wmissing-format-attribute.

Silence this warning by adding the attribute.

Signed-off-by: Nicolas Iooss <[email protected]>
Acked-by: Bryan O'Donoghue <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
8 years agox86/platform/intel-mid: Switch MPU3050 driver to IIO
Linus Walleij [Wed, 14 Dec 2016 13:39:54 +0000 (14:39 +0100)]
x86/platform/intel-mid: Switch MPU3050 driver to IIO

The Intel Mid goes in and creates a I2C device for the
MPU3050 if the input driver for MPU-3050 is activated.

As of commit:

  3904b28efb2c ("iio: gyro: Add driver for the MPU-3050 gyroscope")

.. there is a proper and fully featured IIO driver for this
device, so deprecate the use of the incomplete input driver
by augmenting the device population code to react to the
presence of the IIO driver's Kconfig symbol instead.

Signed-off-by: Linus Walleij <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
8 years agox86/alternatives: Do not use sync_core() to serialize I$
Borislav Petkov [Sat, 3 Dec 2016 15:02:58 +0000 (16:02 +0100)]
x86/alternatives: Do not use sync_core() to serialize I$

We use sync_core() in the alternatives code to stop speculative
execution of prefetched instructions because we are potentially changing
them and don't want to execute stale bytes.

What it does on most machines is call CPUID which is a serializing
instruction. And that's expensive.

However, the instruction cache is serialized when we're on the local CPU
and are changing the data through the same virtual address. So then, we
don't need the serializing CPUID but a simple control flow change. Last
being accomplished with a CALL/RET which the noinline causes.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Matthew Whitehead <[email protected]>
Cc: One Thousand Gnomes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
8 years agox86/topology: Document cpu_llc_id
Borislav Petkov [Thu, 17 Nov 2016 09:45:57 +0000 (10:45 +0100)]
x86/topology: Document cpu_llc_id

It means different things on Intel and AMD so write it down so that
there's no confusion.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
8 years agox86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
Vitaly Kuznetsov [Fri, 2 Dec 2016 10:07:20 +0000 (11:07 +0100)]
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic

There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Acked-by: K. Y. Srinivasan <[email protected]>
Cc: [email protected]
Cc: Haiyang Zhang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
8 years agoNFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES
Trond Myklebust [Mon, 19 Dec 2016 15:23:10 +0000 (10:23 -0500)]
NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES

If our DELEGRETURN RPC call is rejected with an EACCES call, then we should
remove the GETATTR call from the compound RPC and retry.
This could potentially happen when there is a conflict between an
ACL denying attribute reads and our use of SP4_MACH_CRED.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES
Trond Myklebust [Mon, 19 Dec 2016 15:34:14 +0000 (10:34 -0500)]
NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES

If our CLOSE RPC call is rejected with an EACCES call, then we should
remove the GETATTR call from the compound RPC and retry.
This could potentially happen when there is a conflict between an
ACL denying attribute reads and our use of SP4_MACH_CRED.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: Place the GETATTR operation before the CLOSE
Trond Myklebust [Mon, 19 Dec 2016 17:14:44 +0000 (12:14 -0500)]
NFSv4: Place the GETATTR operation before the CLOSE

In order to benefit from the DENY share lock protection, we should
put the GETATTR operation before the CLOSE. Otherwise, we might race
with a Windows machine that thinks it is now safe to modify the file.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: Also ask for attributes when downgrading to a READ-only state
Trond Myklebust [Mon, 19 Dec 2016 16:36:41 +0000 (11:36 -0500)]
NFSv4: Also ask for attributes when downgrading to a READ-only state

If we're downgrading from a READ+WRITE mode to a READ-only mode, then
ask for cache consistency attributes so that we avoid the revalidation
in nfs_close_context()

Fixes: 3947b74d0f9d ("NFSv4: Don't request a GETATTR on open_downgrade.")
Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()
Trond Myklebust [Mon, 19 Dec 2016 14:47:32 +0000 (09:47 -0500)]
NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()

The NFS_INO_REVAL_FORCED flag now really only has meaning for the
case when we've just been handed a delegation for a file that was already
cached, and we're unsure about that cache.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agopNFS: Return RW layouts on OPEN_DOWNGRADE
Trond Myklebust [Mon, 21 Nov 2016 15:56:38 +0000 (10:56 -0500)]
pNFS: Return RW layouts on OPEN_DOWNGRADE

If the client holds no more writeable open state, and does not hold a
write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE.

We do this only for writes, since some layout drivers may require you to
also hold a read layout if you are doing a R/W workload.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE
Trond Myklebust [Sun, 20 Nov 2016 18:34:16 +0000 (13:34 -0500)]
NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE

While we do not need to return the RW layout when downgrading from a
read/write open state to read-only, we might want to do so in order
to reduce the burden on the metadataserver so that it does not need
to check for changed data when responding to GETATTR requests.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
NeilBrown [Mon, 19 Dec 2016 00:48:23 +0000 (11:48 +1100)]
NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned.  For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner.  With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: ensure __nfs4_find_lock_state returns consistent result.
NeilBrown [Mon, 19 Dec 2016 00:33:13 +0000 (11:33 +1100)]
NFSv4: ensure __nfs4_find_lock_state returns consistent result.

If a file has both flock locks and OFD locks, then it is possible that
two different nfs4 lock states could apply to file accesses from a
single process.

It is not possible to know, efficiently, which one is "correct".
Presumably the state which represents a lock that covers the region
undergoing IO would be the "correct" one to use, but finding that has
a non-trivial cost and would provide miniscule value.

Currently we just return whichever is first in the list, which could
result in inconsistent behaviour if an application ever put it self in
this position.  As consistent behaviour is preferable (when perfectly
correct behaviour is not available), change the search to return a
consistent result in this circumstance.
Specifically: if there is both a flock and OFD lock state, always return
the flock one.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
NeilBrown [Mon, 19 Dec 2016 00:19:31 +0000 (11:19 +1100)]
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <[email protected]>
Acked-by: Olga Kornievskaia <[email protected]>
Acked-by: Adamson, Andy <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
8 years agopNFS/flexfiles: delete deviceid, don't mark inactive
Weston Andros Adamson [Wed, 14 Dec 2016 21:31:55 +0000 (16:31 -0500)]
pNFS/flexfiles: delete deviceid, don't mark inactive

Instead of marking a device inactive, remove it from the cache entirely.

Flexfiles has a way to report errors back to the server, so we don't want
to stop devices from being tried again for 120 seconds.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Clean up nfs_attribute_timeout()
Trond Myklebust [Fri, 16 Dec 2016 23:51:15 +0000 (18:51 -0500)]
NFS: Clean up nfs_attribute_timeout()

It can be made static.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Remove unused function nfs_revalidate_inode_rcu()
Trond Myklebust [Fri, 16 Dec 2016 23:49:38 +0000 (18:49 -0500)]
NFS: Remove unused function nfs_revalidate_inode_rcu()

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Fix and clean up the access cache validity checking
Trond Myklebust [Fri, 16 Dec 2016 23:40:03 +0000 (18:40 -0500)]
NFS: Fix and clean up the access cache validity checking

The access cache needs to check whether or not the mode bits, ownership,
or ACL has changed or the cache has timed out.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Only look at the change attribute cache state in nfs_weak_revalidate()
Trond Myklebust [Fri, 16 Dec 2016 23:04:47 +0000 (18:04 -0500)]
NFS: Only look at the change attribute cache state in nfs_weak_revalidate()

Just like in nfs_check_verifier(), we want to use
nfs_mapping_need_revalidate_inode() to check our knowledge of the
change attribute is up to date.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Clean up cache validity checking
Trond Myklebust [Thu, 8 Dec 2016 23:18:38 +0000 (18:18 -0500)]
NFS: Clean up cache validity checking

Consolidate the open-coded checking of NFS_I(inode)->cache_validity
into a couple of helper functions.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFS: Don't revalidate the file on close if we hold a delegation
Trond Myklebust [Fri, 16 Dec 2016 22:39:58 +0000 (17:39 -0500)]
NFS: Don't revalidate the file on close if we hold a delegation

If we're holding a delegation, we can skip sending the close-to-open
GETATTR until we're returning that delegation.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN
Trond Myklebust [Sat, 17 Dec 2016 00:48:09 +0000 (19:48 -0500)]
NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN

DELEGRETURN will always carry a reference to the inode except when
the latter is being freed, so let's ensure that we always use that
inode information to ensure close-to-open cache consistency, even
when the DELEGRETURN call is asynchronous.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agoNFSv4: Update the attribute cache info in update_changeattr
Trond Myklebust [Fri, 16 Dec 2016 21:55:55 +0000 (16:55 -0500)]
NFSv4: Update the attribute cache info in update_changeattr

If we successfully updated the change attribute, we should timestamp the
cache. While we do know that the other attributes are not completely up
to date, we have the NFS_INO_INVALID_ATTR flag that let us know that,
so it is valid to say that the cache has not timed out.
We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute
is now known to be valid.

Conversely, if the change attribute did not match, we should make sure to
also revalidate the access and ACL caches.

Signed-off-by: Trond Myklebust <[email protected]>
8 years agodrm/amdgpu: fix cursor setting of dce6/dce8
Flora Cui [Wed, 14 Dec 2016 06:36:42 +0000 (14:36 +0800)]
drm/amdgpu: fix cursor setting of dce6/dce8

Fixes: 7c83d7abc999 ("drm/amdgpu: Only update the CUR_SIZE register when
necessary")
Signed-off-by: Flora Cui <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agoARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
Vineet Gupta [Mon, 19 Dec 2016 19:38:38 +0000 (11:38 -0800)]
ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache

An ARC700 customer reported linux boot crashes when upgrading to bigger
L1 dcache (64K from 32K). Turns out they had an aliasing VIPT config and
current code only assumed 2 colours, while theirs had 4. So default to 4
colours and complain if there are fewer. Ideally this needs to be a
Kconfig option, but heck that's too much of hassle for a single user.

Cc: [email protected]
Signed-off-by: Vineet Gupta <[email protected]>
8 years agoARC: mm: No need to save cache version in @cpuinfo
Vineet Gupta [Mon, 19 Dec 2016 19:24:08 +0000 (11:24 -0800)]
ARC: mm: No need to save cache version in @cpuinfo

Historical MMU revisions have been paired with Cache revision updates
which are captured in MMU and Cache Build Configuration Registers respectively.

This was used in boot code to check for configurations mismatches,
speically in simulations (such as running with non existent caches,
non pairing MMU and Cache version etc). This can instead be inferred
from other cache params such as line size. So remove @ver from post
processed @cpuinfo which could be used later to save soem other
interesting info.

Signed-off-by: Vineet Gupta <[email protected]>
8 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Linus Torvalds [Mon, 19 Dec 2016 16:23:53 +0000 (08:23 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota, fsnotify and ext2 updates from Jan Kara:
 "Changes to locking of some quota operations from dedicated quota mutex
  to s_umount semaphore, a fsnotify fix and a simple ext2 fix"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Fix bogus warning in dquot_disable()
  fsnotify: Fix possible use-after-free in inode iteration on umount
  ext2: reject inodes with negative size
  quota: Remove dqonoff_mutex
  ocfs2: Use s_umount for quota recovery protection
  quota: Remove dqonoff_mutex from dquot_scan_active()
  ocfs2: Protect periodic quota syncing with s_umount semaphore
  quota: Use s_umount protection for quota operations
  quota: Hold s_umount in exclusive mode when enabling / disabling quotas
  fs: Provide function to get superblock with exclusive s_umount

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Mon, 19 Dec 2016 16:21:29 +0000 (08:21 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Early fixes for x86.

  Instead of the (botched) revert, the lockdep/might_sleep splat has a
  real fix provided by Andrea"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
  kvm: take srcu lock around kvm_steal_time_set_preempted()
  kvm: fix schedule in atomic in kvm_steal_time_set_preempted()
  KVM: hyperv: fix locking of struct kvm_hv fields
  KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest.
  kvm: nVMX: Correct a VMX instruction error code for VMPTRLD

8 years agoblock: check partition alignment
Stefan Haberland [Mon, 19 Dec 2016 16:15:50 +0000 (17:15 +0100)]
block: check partition alignment

Partitions that are not aligned to the blocksize of a device may cause
invalid I/O requests because the blocklayer cares only about alignment
within the partition when building requests on partitions.

device
|--------4096--------|--------4096--------|--------4096--------|
partition offset 512byte
|-512-|--------4096--------|--------4096--------|--------4096--------|

When reading/writing one 4k block of the partition this maps to
reading/writing with an offset of 512 byte of the device leading to
unaligned requests for the device which in turn may cause unexpected
behavior of the device driver.

For DASD devices we have to translate the block number into a cylinder,
head, record format. The unaligned requests lead to wrong calculation
and therefore to misdirected I/O. In a "good" case this leads to I/O
errors because the underlying hardware detects the wrong addressing.
In a worst case scenario this might destroy data on the device.

To prevent partitions that are not aligned to the physical blocksize
of a device check for the alignment in the blkpg_ioctl.

Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
8 years agoMerge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvar...
Linus Torvalds [Mon, 19 Dec 2016 16:18:58 +0000 (08:18 -0800)]
Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi fix from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi_scan: Always show system identification string

8 years agoMerge tag 'mfd-for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Mon, 19 Dec 2016 16:16:26 +0000 (08:16 -0800)]
Merge tag 'mfd-for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "New Device Support
   - Add support for Ricoh RC5T619 PMIC to rn5t618
   - Add support for PM8821 PMIC to qcom-pm8xxx

  New Functionality:
   - Add support for GPIO to lpc_ich
   - Add support for GPADC to sun4i
   - Add ability for rk808 to shutdown

  Fix-ups:
   - Simplify/strip unnecessary code; tps65218, palmas, tps65217
   - Device Tree binding updates; tps65218, altera-a10sr
   - Provide/export device ID info; tps65218, axp20x-i2c, hi655x-pmic,
     fsl-imx25-tsadc, intel_soc_pmic_bxtwc
   - Use MFD API instead of of_platform_populate(); tps65218
   - Generalise name-space; pm8xxx
   - Supply/edit regmap configuration; axp20x, cs47l24-tables, axp20x
   - Enable compile testing; max77620, max77686, exynos-lpass,
     abx500-core
   - Coding style issues; wm8994-core, wm5102-tables
   - Supply endian support; syscon
   - Remove module support; ab3100-core, ab8500-debugfs, ab8500-gpadc,
     abx500-core

  Bug Fixes:
   - Fix ordering issues; wm8994
   - Fix dependencies (build-time/run-time); exynos_lpass, sun4i-gpadc
   - Fix compiler warnings; sun4i-gpadc
   - Fix leaks; mfd-core
   - Fix page fault during module unload; tps65217"

* tag 'mfd-for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (49 commits)
  mfd: tps65217: Support an interrupt pin as the system wakeup
  mfd: tps65217: Make an interrupt handler simpler
  mfd: tps65217: Update register interrupt mask bits instead of writing operation
  mfd: tps65217: Specify the IRQ name
  mfd: tps65217: Fix page fault on unloading modules
  mfd: palmas: Remove redundant check in palmas_power_off
  mfd: arizona: Disable IRQs during driver remove
  mfd: pm8xxx: add support to pm8821
  mfd: intel-lpss: Try to enable Memory-Write-Invalidate
  mfd: rn5t618: Add Ricoh RC5T619 PMIC support
  mfd: axp20x: Add address extension registers for AXP806 regmap
  mfd: intel_soc_pmic_bxtwc: Fix a typo in MODULE_DEVICE_TABLE()
  mfd: core: Fix device reference leak in mfd_clone_cell
  mfd: bcm590xx: Simplify a test
  mfd: sun4i-gpadc: Select regmap-irq
  mfd: abx500-core: drop unused MODULE_ tags from non-modular code
  mfd: ab8500: make sysctrl explicitly non-modular
  mfd: ab8500-gpadc: Make it explicitly non-modular
  mfd: ab8500-debugfs: Make it explicitly non-modular
  mfd: ab8500-core: Make it explicitly non-modular
  ...

8 years agostmmac: fix memory barriers
Pavel Machek [Sun, 18 Dec 2016 20:38:12 +0000 (21:38 +0100)]
stmmac: fix memory barriers

Fix up memory barriers in stmmac driver. They are meant to protect
against DMA engine, so smp_ variants are certainly wrong, and dma_
variants are preferable.

Signed-off-by: Pavel Machek <[email protected]>
Tested-by: Niklas Cassel <[email protected]>
Acked-by: Giuseppe Cavallaro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from devm_ioremap
Arvind Yadav [Wed, 14 Dec 2016 19:03:30 +0000 (00:33 +0530)]
net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from devm_ioremap

Here, If devm_ioremap will fail. It will return NULL.
Kernel can run into a NULL-pointer dereference.
This error check will avoid NULL pointer dereference.

Signed-off-by: Arvind Yadav <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonvme : Use correct scnprintf in cmb show
Stephen Bates [Fri, 16 Dec 2016 18:54:50 +0000 (11:54 -0700)]
nvme : Use correct scnprintf in cmb show

Make sure we are using the correct scnprintf in the sysfs show
function for the CMB.

Signed-off-by: Stephen Bates <[email protected]>
Reviewed-by Jon Derrick: <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
8 years agoblock: allow WRITE_SAME commands with the SG_IO ioctl
Mauricio Faria de Oliveira [Thu, 15 Dec 2016 17:48:18 +0000 (11:48 -0600)]
block: allow WRITE_SAME commands with the SG_IO ioctl

The WRITE_SAME commands are not present in the blk_default_cmd_filter
write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
[ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]

The problem can be reproduced with the sg_write_same command

  # sg_write_same --num 1 --xferlen 512 /dev/sda
  #

  # capsh --drop=cap_sys_rawio -- -c \
    'sg_write_same --num 1 --xferlen 512 /dev/sda'
    Write same: pass through os error: Operation not permitted
  #

For comparison, the WRITE_VERIFY command does not observe this problem,
since it is in that list:

  # capsh --drop=cap_sys_rawio -- -c \
    'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda'
  #

So, this patch adds the WRITE_SAME commands to the list, in order
for the SG_IO ioctl to finish successfully:

  # capsh --drop=cap_sys_rawio -- -c \
    'sg_write_same --num 1 --xferlen 512 /dev/sda'
  #

That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).

In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
which are translated to write-same calls in the guest kernel, and then into
SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:

  [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
  [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
  [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
  [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00
  [...] blk_update_request: I/O error, dev sda, sector 17096824

Links:
[1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')

Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Brahadambal Srinivasan <[email protected]>
Reported-by: Manjunatha H R <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
8 years agokvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)
Jim Mattson [Mon, 12 Dec 2016 19:01:37 +0000 (11:01 -0800)]
kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF)

When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.

Signed-off-by: Jim Mattson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
8 years agokvm: take srcu lock around kvm_steal_time_set_preempted()
Andrea Arcangeli [Sat, 17 Dec 2016 18:13:32 +0000 (19:13 +0100)]
kvm: take srcu lock around kvm_steal_time_set_preempted()

kvm_memslots() will be called by kvm_write_guest_offset_cached() so
take the srcu lock.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
8 years agokvm: fix schedule in atomic in kvm_steal_time_set_preempted()
Andrea Arcangeli [Sat, 17 Dec 2016 17:43:52 +0000 (18:43 +0100)]
kvm: fix schedule in atomic in kvm_steal_time_set_preempted()

kvm_steal_time_set_preempted() isn't disabling the pagefaults before
calling __copy_to_user and the kernel debug notices.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
8 years agomailbox: mailbox-test: allow reserved areas in SRAM
Sudeep Holla [Tue, 29 Nov 2016 14:37:05 +0000 (14:37 +0000)]
mailbox: mailbox-test: allow reserved areas in SRAM

When CONFIG_SRAM is enable and the SRAM region is found, the entire SRAM
region resource is requested and marked as occupied by SRAM driver even
if certain parts of regions is marked reserved.

It's quite possible that a small region of the SRAM is reserved for all
the mailbox communication and hence it may fail to request the region
as it's already marked busy region.

This patch tries to just do a ioremap of this mailbox memory region if
it finds it busy.

Cc: Lee Jones <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: mailbox-test: add support for fasync/poll
Sudeep Holla [Tue, 29 Nov 2016 14:37:04 +0000 (14:37 +0000)]
mailbox: mailbox-test: add support for fasync/poll

Currently the read operation on the message debug file returns error if
there's no data ready to be read. It expects the userspace to retry if
it fails. Since the mailbox response could be asynchronous, it would be
good to add support to block the read until the data is available.

We can also implement poll file operations so that the userspace can
wait to become ready to perform any I/O.

This patch implements the poll and fasync file operation callback for
the test mailbox device.

Cc: Lee Jones <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Remove unnecessary void* casts
Rob Rice [Mon, 14 Nov 2016 18:26:05 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Remove unnecessary void* casts

Remove unnecessary void* casts in register writes. Fix two other
minor formatting issues.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Jon Mason <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Simplify interrupt handler logic
Rob Rice [Mon, 14 Nov 2016 18:26:04 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Simplify interrupt handler logic

Earlier versions of the PDC driver registered for both
transmit and receive interrupts. The hard IRQ handler had to
communicate to the soft handler which interrupt(s) had occurred.
The PDC driver no longer registers for tx interrupts. So there is
no reason to save the intstatus. So remove the intstatus member
of the PDC state.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Performance improvements
Rob Rice [Mon, 14 Nov 2016 18:26:03 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Performance improvements

Three changes to improve performance in the PDC driver:
- disable and reenable interrupts while the interrupt handler is
running
- update rxin and txin descriptor indexes more efficiently
- group receive descriptor context into a structure and keep
context in a single array rather than five to improve locality
of reference

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Don't use iowrite32 to write DMA descriptors
Rob Rice [Mon, 14 Nov 2016 18:26:02 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Don't use iowrite32 to write DMA descriptors

In PDC driver, it is not necessary to use iowrite32()
when writing DMA descriptors to the transmit and receive rings.
The ring memory is in host memory. So convert to normal
assignment statements.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Convert from threaded IRQ to tasklet
Rob Rice [Mon, 14 Nov 2016 18:26:01 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Convert from threaded IRQ to tasklet

Previously used threaded IRQs in the PDC driver to defer
processing the rx DMA ring after getting an rx done interrupt.
Instead, use a tasklet at normal priority for deferred processing.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Try to improve branch prediction
Rob Rice [Mon, 14 Nov 2016 18:26:00 +0000 (13:26 -0500)]
mailbox: bcm-pdc: Try to improve branch prediction

Use likely/unlikely directives to improve branch prediction.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: streamline rx code
Rob Rice [Mon, 14 Nov 2016 18:25:59 +0000 (13:25 -0500)]
mailbox: bcm-pdc: streamline rx code

Remove the unnecessary rmb() from the receive path.

If the rx ring has multiple messages ready, avoid reading
last_rx_curr multiple times from the register.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Convert from interrupts to poll for tx done
Rob Rice [Mon, 14 Nov 2016 18:25:58 +0000 (13:25 -0500)]
mailbox: bcm-pdc: Convert from interrupts to poll for tx done

The PDC driver is a mailbox controller. A mailbox controller
can report that a mailbox message has been "transmitted" either when
a tx interrupt fires or by having the mailbox framework poll. This
commit converts the PDC driver to the poll method. We found that the
tx interrupt happens when the descriptors are read by the SPU hw. Thus,
the interrupt method does not allow more than one tx message in the PDC
tx DMA ring at a time. To keep the SPU hw busy, we would like to keep
the tx ring full under heavy load.

With the poll method, the PDC driver responds that the previous message
has been transmitted if the tx ring has space for another message.
SPU request messages take a variable number of descriptors. If 15
descriptors are available, there is a good chance another message will
fit. Also increased the ring size from 128 to 512 descriptors.

With this change, I found the PDC driver hangs on its spinlock under
heavy load. The PDC spinlock is not required; so I removed it. Calls
to pdc_send_data() are already synchronized because of the channel
spinlock in the mailbox framework. Other references to ring indexes
should not require locking because they only written on either the
tx or rx side.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: PDC driver leaves debugfs files after removal
Steve Lin [Mon, 14 Nov 2016 18:25:57 +0000 (13:25 -0500)]
mailbox: bcm-pdc: PDC driver leaves debugfs files after removal

Minor fix to ensure that debugfs stats pseudo-files are
removed when driver module is unloaded.  Previously, the call to
debugfs_remove_recursive() was never being called since the
directory was not empty, and a seg fault would occur if another
process tried to access these leftover files.

Signed-off-by: Steve Lin <[email protected]>
Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Changes so mbox client can be removed / re-inserted
Steve Lin [Mon, 14 Nov 2016 18:25:56 +0000 (13:25 -0500)]
mailbox: bcm-pdc: Changes so mbox client can be removed / re-inserted

Ensure that DMA is disabled, and pointers reset, when changing
DMA base addresses in pdc_ring_init().  This allows a mailbox client
to be re-inserted after being removed.  Otherwise, the DMA doesn't
restart so the client hangs while being reinserted.

Signed-off-by: Steve Lin <[email protected]>
Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: bcm-pdc: Use octal permissions rather than symbolic
Rob Rice [Mon, 14 Nov 2016 18:25:55 +0000 (13:25 -0500)]
mailbox: bcm-pdc: Use octal permissions rather than symbolic

When creating the debugfs files for the PDC driver, use
octal file permissions rather than symbolic file permissions.

Signed-off-by: Rob Rice <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: sti: Fix module autoload for OF registration
Javier Martinez Canillas [Thu, 20 Oct 2016 03:34:05 +0000 (00:34 -0300)]
mailbox: sti: Fix module autoload for OF registration

If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.

Export the module alias information using the MODULE_DEVICE_TABLE() macro.

Before this patch:

$ modinfo drivers/mailbox/mailbox-sti.ko | grep alias
alias:          platform:mailbox-sti

After this patch:

$ modinfo drivers/mailbox/mailbox-sti.ko | grep alias
alias:          platform:mailbox-sti
alias:          of:N*T*Cst,stih407-mailboxC*
alias:          of:N*T*Cst,stih407-mailbox

Signed-off-by: Javier Martinez Canillas <[email protected]>
Acked-by: Lee Jones <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agomailbox: mailbox-test: Fix module autoload
Javier Martinez Canillas [Thu, 20 Oct 2016 03:34:04 +0000 (00:34 -0300)]
mailbox: mailbox-test: Fix module autoload

If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.

Export the module alias information using the MODULE_DEVICE_TABLE() macro.

Before this patch:

$ modinfo drivers/mailbox/mailbox-test.ko | grep alias
$

After this patch:

$ modinfo drivers/mailbox/mailbox-test.ko | grep alias
alias:          of:N*T*Cmailbox-testC*
alias:          of:N*T*Cmailbox-test

Signed-off-by: Javier Martinez Canillas <[email protected]>
Acked-by: Lee Jones <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
8 years agoquota: Fix bogus warning in dquot_disable()
Jan Kara [Mon, 19 Dec 2016 13:01:39 +0000 (14:01 +0100)]
quota: Fix bogus warning in dquot_disable()

dquot_disable() was warning when sb_has_quota_loaded() was true when
invalidating page cache for quota files. The thinking behind this
warning was that we must have raced with somebody else turning quotas on
and this should not happen because all places modifying quota state must
hold s_umount exclusively now. However sb_has_quota_loaded() can be also
true at this point when we are just suspending quotas on remount
read-only. Just restore the behavior to situation before commit
c3b004460d77 ("quota: Remove dqonoff_mutex") which introduced the
warning.

The code in dquot_disable() can be further simplified with the new
locking of quota state changes however let's leave that to a separate
commit that can get more testing exposure.

Fixes: c3b004460d77bf3f980d877be539016f2df4df12
Signed-off-by: Jan Kara <[email protected]>
8 years agox86/asm: Rewrite sync_core() to use IRET-to-self
Andy Lutomirski [Fri, 9 Dec 2016 18:24:08 +0000 (10:24 -0800)]
x86/asm: Rewrite sync_core() to use IRET-to-self

Aside from being excessively slow, CPUID is problematic: Linux runs
on a handful of CPUs that don't have CPUID.  Use IRET-to-self
instead.  IRET-to-self works everywhere, so it makes testing easy.

For reference, On my laptop, IRET-to-self is ~110ns,
CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM,
and MOV-to-CR2 is ~42ns.

While we're at it: sync_core() serves a very specific purpose.
Document it.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: One Thousand Gnomes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Matthew Whitehead <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: xen-devel <[email protected]>
Link: http://lkml.kernel.org/r/5c79f0225f68bc8c40335612bf624511abb78941.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agox86/microcode/intel: Replace sync_core() with native_cpuid()
Andy Lutomirski [Fri, 9 Dec 2016 18:24:07 +0000 (10:24 -0800)]
x86/microcode/intel: Replace sync_core() with native_cpuid()

The Intel microcode driver is using sync_core() to mean "do CPUID
with EAX=1".  I want to rework sync_core(), but first the Intel
microcode driver needs to stop depending on its current behavior.

Reported-by: Henrique de Moraes Holschuh <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: One Thousand Gnomes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Matthew Whitehead <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: xen-devel <[email protected]>
Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agoRevert "x86/boot: Fail the boot if !M486 and CPUID is missing"
Andy Lutomirski [Fri, 9 Dec 2016 18:24:06 +0000 (10:24 -0800)]
Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"

This reverts commit ed68d7e9b9cfb64f3045ffbcb108df03c09a0f98.

The patch wasn't quite correct -- there are non-Intel (and hence
non-486) CPUs that we support that don't have CPUID.  Since we no
longer require CPUID for sync_core(), just revert the patch.

I think the relevant CPUs are Geode and Elan, but I'm not sure.

In principle, we should try to do better at identifying CPUID-less
CPUs in early boot, but that's more complicated.

Reported-by: One Thousand Gnomes <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Matthew Whitehead <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: xen-devel <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/82acde18a108b8e353180dd6febcc2876df33f24.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agox86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
Andy Lutomirski [Fri, 9 Dec 2016 18:24:05 +0000 (10:24 -0800)]
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels

We support various non-Intel CPUs that don't have the CPUID
instruction, so the M486 test was wrong.  For now, fix it with a big
hammer: handle missing CPUID on all 32-bit CPUs.

Reported-by: One Thousand Gnomes <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Matthew Whitehead <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Henrique de Moraes Holschuh <[email protected]>
Cc: Andrew Cooper <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: xen-devel <[email protected]>
Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agox86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
Andy Lutomirski [Thu, 15 Dec 2016 18:14:42 +0000 (10:14 -0800)]
x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6

A typo (or mis-merge?) resulted in leaf 6 only being probed if
cpuid_level >= 7.

Fixes: 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability")
Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agox86/tools: Fix gcc-7 warning in relocs.c
Markus Trippelsdorf [Thu, 15 Dec 2016 12:45:13 +0000 (13:45 +0100)]
x86/tools: Fix gcc-7 warning in relocs.c

gcc-7 warns:

In file included from arch/x86/tools/relocs_64.c:17:0:
arch/x86/tools/relocs.c: In function ‘process_64’:
arch/x86/tools/relocs.c:953:2: warning: argument 1 null where non-null expected [-Wnonnull]
  qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from arch/x86/tools/relocs.h:6:0,
                 from arch/x86/tools/relocs_64.c:1:
/usr/include/stdlib.h:741:13: note: in a call to function ‘qsort’ declared here
 extern void qsort

This happens because relocs16 is not used for ELF_BITS == 64,
so there is no point in trying to sort it.

Make the sort_relocs(&relocs16) call 32bit only.

Signed-off-by: Markus Trippelsdorf <[email protected]>
Link: http://lkml.kernel.org/r/20161215124513.GA289@x4
Signed-off-by: Thomas Gleixner <[email protected]>
8 years agox86/unwind: Dump stack data on warnings
Josh Poimboeuf [Fri, 16 Dec 2016 16:05:06 +0000 (10:05 -0600)]
x86/unwind: Dump stack data on warnings

The unwinder warnings are good at finding unexpected unwinder issues,
but they often don't give enough data to be able to fully diagnose them.
Print a one-time stack dump when a warning is detected.

Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Link: http://lkml.kernel.org/r/15607370e3ddb1732b6a73d5c65937864df16ac8.1481904011.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <[email protected]>
This page took 0.138158 seconds and 4 git commands to generate.