Git Repo - linux.git/log

Merge branch 'perf/urgent' into perf/core

Pick up the latest fixes, refresh the development tree.

Signed-off-by: Ingo Molnar <[email protected]>

perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h

On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:

Enabling non-boot CPUs ...
smpboot: Booting Node 0 Processor 1 APIC 0x1
[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
perf: IBS APIC setup failed on cpu #1
process: Switch to broadcast mode on CPU1
CPU1 is up
...
ACPI: Waking up from system sleep state S3

Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.

The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.

This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.

Marking it as stable to let distros pick up this fix.

Signed-off-by: Robert Richter <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: <[email protected]> v3.2..
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86, mm, perf: Allow recursive faults from interrupts

Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
the PMI in turn managed to trigger a fault while obtaining a stack
trace. This triggered the sig_on_uaccess_error recursive fault logic
and killed the process dead.

Fix this by explicitly excluding interrupts from the recursive fault
logic.

Reported-and-Tested-by: Waiman Long <[email protected]>
Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()")
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler and timer fixes from Ingo Molnar:
"Contains a fix for a scheduler bug that manifested itself as a 3D
  performance regression and a crash fix for the ARM Cadence TTC clock
  driver"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Calculate effective load even if local weight is 0

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: cadence_ttc: Fix mutex taken inside interrupt context

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"Two fixes from lockdep coverage of seqlocks, which fix deadlocks on
  lockdep-enabled ARM systems"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched_clock: Disable seqlock lockdep usage in sched_clock()
  seqlock: Use raw_ prefix instead of _no_lockdep

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fix attribute length problem in coretemp driver"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Fix truncated name of alarm attributes

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Another few fixes for ARM, nothing major here"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
  ARM: 7939/1: traps: fix opcode endianness when read from user memory
  ARM: 7937/1: perf_event: Silence sparse warning
  ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous match
  Revert "ARM: 7908/1: mm: Fix the arm_dma_limit calculation"

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback fix from Wu Fengguang:
"Fix data corruption on NFS writeback.

It has been in linux-next for one month"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Fix data corruption on NFS

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c bugfix from Wolfram Sang.

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Re-instate body of i2c_parent_is_i2c_adapter()

Merge branch 'akpm' (incoming from Andrew)

Merge patches from Andrew Morton:
"Six fixes"

* emailed patches from Andrew Morton <[email protected]>:
  lib/percpu_counter.c: fix __percpu_counter_add()
  crash_dump: fix compilation error (on MIPS at least)
  mm: fix crash when using XFS on loopback
  MIPS: fix blast_icache32 on loongson2
  MIPS: fix case mismatch in local_r4k_flush_icache_range()
  nilfs2: fix segctor bug that causes file system corruption

Merge tag 'md/3.13-fixes' of git://neil.brown.name/md

Pull late md fixes from Neil Brown:
"Half a dozen md bug fixes.

  All of these fix real bugs the people have hit, and are tagged for
  -stable.  Sorry they are late ....  Christmas holidays and all that.
  Hopefully they can still squeak into 3.13"

* tag 'md/3.13-fixes' of git://neil.brown.name/md:
  md: fix problem when adding device to read-only array with bitmap.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md/raid5: fix a recently broken BUG_ON().
  md/raid1: fix request counting bug in new 'barrier' code.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid5: Fix possible confusion when multiple write errors occur.

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"One nouveau regression fix on older cards, i915 black screen fixes,
  and a revert for a strange G33 intel problem"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: fix null ptr dereferences on some boards
  Revert "drm: copy mode type in drm_mode_connector_list_update()"
  drm/i915/bdw: make sure south port interrupts are enabled properly v2
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  drm/i915: fix DDI PLLs HW state readout code

lib/percpu_counter.c: fix __percpu_counter_add()

__percpu_counter_add() may be called in softirq/hardirq handler (such
as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
so we need to call this_cpu_add()(irq safe helper) to update percpu
counter, otherwise counts may be lost.

This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.

This patch is the v1 of previous one of "lib/percpu_counter.c:
disable local irq when updating percpu couter", and takes Andrew's
approach which may be more efficient for ARCHs(x86, s390) that
have optimized this_cpu_add().

Signed-off-by: Ming Lei <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Fan Du <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

crash_dump: fix compilation error (on MIPS at least)

In file included from kernel/crash_dump.c:2:0:
include/linux/crash_dump.h:22:27: error: unknown type name `pgprot_t'

when CONFIG_CRASH_DUMP=y

The error was traced back to commit 9cb218131de1 ("vmcore: introduce
remap_oldmem_pfn_range()")

include <asm/pgtable.h> to get the missing definition

Signed-off-by: Qais Yousef <[email protected]>
Reviewed-by: James Hogan <[email protected]>
Cc: Michael Holzheu <[email protected]>
Acked-by: Vivek Goyal <[email protected]>
Cc: <[email protected]> [3.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix crash when using XFS on loopback

Commit 8456a648cf44 ("slab: use struct page for slab management") causes
a crash in the LVM2 testsuite on PA-RISC (the crashing test is
fsadm.sh).  The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and
later.

Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d)
CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5
task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001101111100100001110 Not tainted
r00-03  000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0
r04-07  00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40
r08-11  0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000
r12-15  0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000
r16-19  000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001
r20-23  0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080
r24-27  00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0
r28-31  202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001
sr00-03  000000000532c000 0000000000000000 0000000000000000 000000000532c000
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430
  IIR: 539c0030    ISR: 00000000202d6000  IOR: 000006202224647d
  CPU:        3   CR30: 000000413edd8000 CR31: 0000000000000000
  ORIG_R28: 00000000405a95e0
  IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48
  IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48
  RP(r2): flush_dcache_page+0x128/0x388
Backtrace:
   flush_dcache_page+0x128/0x388
   lo_splice_actor+0x90/0x148 [loop]
   splice_from_pipe_feed+0xc0/0x1d0
   __splice_from_pipe+0xac/0xc0
   lo_direct_splice_actor+0x1c/0x70 [loop]
   splice_direct_to_actor+0xec/0x228
   lo_receive+0xe4/0x298 [loop]
   loop_thread+0x478/0x640 [loop]
   kthread+0x134/0x168
   end_fault_vector+0x20/0x28
   xfs_setsize_buftarg+0x0/0x90 [xfs]

Kernel panic - not syncing: Bad Address (null pointer deref?)

Commit 8456a648cf44 changes the page structure so that the slab
subsystem reuses the page->mapping field.

The crash happens in the following way:
* XFS allocates some memory from slab and issues a bio to read data
   into it.
* the bio is sent to the loopback device.
* lo_receive creates an actor and calls splice_direct_to_actor.
* lo_splice_actor copies data to the target page.
* lo_splice_actor calls flush_dcache_page because the page may be
   mapped by userspace.  In that case we need to flush the kernel cache.
* flush_dcache_page asks for the list of userspace mappings, however
   that page->mapping field is reused by the slab subsystem for a
   different purpose.  This causes the crash.

Note that other architectures without coherent caches (sparc, arm, mips)
also call page_mapping from flush_dcache_page, so they may crash in the
same way.

This patch fixes this bug by testing if the page is a slab page in
page_mapping and returning NULL if it is.

The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in
earlier kernels in the same scenario on architectures without cache
coherence when CONFIG_DEBUG_VM is enabled - so it should be backported
to stable kernels.

In the old kernels, the function page_mapping is placed in
include/linux/mm.h, so you should modify the patch accordingly when
backporting it.

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: John David Anglin <[email protected]>]
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Reviewed-by: Joonsoo Kim <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MIPS: fix blast_icache32 on loongson2

Commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery
all over arch/mips") failed to add Loongson2 specific blast_icache32
functions.  Fix that.

The patch fixes the following crash seen with 3.13-rc1:

  Reserved instruction in kernel code[#1]:
  [...]
  Call Trace:
    blast_icache32_page+0x8/0xb0
    r4k_flush_cache_page+0x19c/0x200
    do_wp_page.isra.97+0x47c/0xe08
    handle_mm_fault+0x938/0x1118
    __do_page_fault+0x140/0x540
    resume_userspace_check+0x0/0x10
  Code: 00200825  64834000  00200825 <bc900000> bc900020  bc900040  bc900060  bc900080  bc9000a0

Signed-off-by: Aaro Koskinen <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Acked-by: John Crispin <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MIPS: fix case mismatch in local_r4k_flush_icache_range()

Currently, Loongson-2 call protected_blast_icache_range() and others
call protected_loongson23_blast_icache_range(), but I think the correct
behavior should be the opposite. BTW, Loongson-3's cache-ops is
compatible with MIPS64, but not compatible with Loongson-2. So, rename
xxx_loongson23_yyy things to xxx_loongson2_yyy.

The patch fixes early boot hang with 3.13-rc1, introduced in commit
14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery all over
arch/mips").

Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Aaro Koskinen <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Acked-by: John Crispin <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: fix segctor bug that causes file system corruption

There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <[email protected]>
Reviewed-by: Ryusuke Konishi <[email protected]>
Tested-by: Andreas Rohner <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'clockevents/3.13-fixes' of git://git.linaro.org/people/daniel.lezcano/linux into timers/urgent

Pull clock driver fix from Daniel Lezcano:

" * Soren Brinkmann fixed the cadence_ttc driver where a call to
     clk_get_rate happens in an interrupt context. More precisely in an IPI
     when the broadcast timer is initialized for each cpu in the cpuidle
     driver. "

Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'drm-nouveau-next' of git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

Single regression fix for nouveau

* 'drm-nouveau-next' of git://git.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix null ptr dereferences on some boards

drm/nouveau: fix null ptr dereferences on some boards

Regression from "device: populate master subdev pointer only when fully
constructed"

Reported-by: Bob Gleitsmann <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

hwmon: (coretemp) Fix truncated name of alarm attributes

When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.

Reported-by: Andreas Hollmann <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>
Cc: [email protected]
Signed-off-by: Guenter Roeck <[email protected]>

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf tooling updates from Arnaldo Carvalho de Melo:

New features:

* perf record: Add --initial-delay option (Andi Kleen)

* Column colouring improvements in 'diff' (Ramkumar Ramachandra)

Fixes:

* Don't show counter information when workload fails (Arnaldo Carvalho de Melo)

* Fixup leak on error path in parse events test. (Arnaldo Carvalho de Melo)

* Fix --delay option in 'stat' man page (Andi Kleen)

* Use the DWARF unwind info only if loaded (Jean Pihet):

Developer stuff:

* Improve forked workload error reporting by sending the errno in the signal
   data queueing integer field, using sigqueue and by doing the signal setup in
   the evlist methods, removing open coded equivalents in various tools. (Arnaldo Carvalho de Melo)

* Do more auto exit cleanup shores in the 'evlist' destructor, so that the tools
   don't have to all do that sequence. (Arnaldo Carvalho de Melo)

* Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho de Melo)

* Include tools/lib/api/ in MANIFEST, fixing detached tarballs (Arnaldo Carvalho de Melo)

* Add test for building detached source tarballs (Arnaldo Carvalho de Melo)

* Shut up libtracevent plugins make message (Jiri Olsa)

* Fix installation tests path setup (Jiri Olsa)

* Fix id_hdr_size initialization (Jiri Olsa)

* Move some header files from tools/perf/ to tools/include/ to make them available to
   other tools/ dwelling codebases (Namhyung Kim)

* Fix 'probe' build when DWARF support libraries not present (Arnaldo Carvalho de Melo)

Refactorings:

* Move logic to warn about kptr_restrict'ed kernels to separate
   function in 'report' (Arnaldo Carvalho de Melo)

* Move hist browser selection code to separate function (Arnaldo Carvalho de Melo)

* Move histogram entries collapsing to separate function (Arnaldo Carvalho de Melo)

* Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo)

* Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri Olsa)

* Move arch setup into seprate Makefile (Jiri Olsa)

Trivial stuff:

* Remove misplaced __maybe_unused in 'stat' (Arnaldo Carvalho de Melo)

* Remove old evsel_list usage in 'record' (Arnaldo Carvalho de Melo)

* Comment typo fix (Cody P Schafer)

* Remove unused test-volatile-register-var.c (Yann Droneaud)

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

i2c: Re-instate body of i2c_parent_is_i2c_adapter()

The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.

Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.

One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.

Fixes: 3923172b3d70 ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case")
Cc: Phil Carmody <[email protected]>
Cc: <[email protected]>
Signed-off-by: Stephen Warren <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

md: fix problem when adding device to read-only array with bitmap.

If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update. We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag. If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <[email protected]>
Fixes: d70ed2e4fafdbef0800e73942482bb075c21578b
Cc: [email protected] (3.2+)
Signed-off-by: NeilBrown <[email protected]>

md/raid10: fix bug when raid10 recovery fails to recover a block.

commit e875ecea266a543e643b19e44cf472f1412708f9
md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Cc: [email protected] (v3.1+)
Fixes: e875ecea266a543e643b19e44cf472f1412708f9
Reported-by: Damian Nowak <[email protected]>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <[email protected]>

md/raid5: fix a recently broken BUG_ON().

commit 6d183de4077191d1201283a9035ce57a9b05254d
md/raid5: fix newly-broken locking in get_active_stripe.

simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.

When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero. As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.

So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.

The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4

Fixes: 6d183de4077191d1201283a9035ce57a9b05254d
Cc: [email protected] (any release to which above was applied)
Signed-off-by: NeilBrown <[email protected]>

md/raid1: fix request counting bug in new 'barrier' code.

The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.

allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.

However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.

So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.

But was introduced in 3.13-rc1.

Reported-by: Bruno Wolff III <[email protected]>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061
Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Cc: majianpeng <[email protected]>
Signed-off-by: NeilBrown <[email protected]>

md/raid10: fix two bugs in handling of known-bad-blocks.

If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Cc: [email protected] (v3.1+)
Fixes: 856e08e23762dfb92ffc68fd0a8d228f9e152160
Reported-by: Damian Nowak <[email protected]>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <[email protected]>

md/raid5: Fix possible confusion when multiple write errors occur.

commit 5d8c71f9e5fbdd95650be00294d238e27a363b5c
md: raid5 crash during degradation

Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.

commit 14a75d3e07c784c004b4b44b34af996b8e4ac453
md/raid5: preferentially read from replacement device if possible.

Fixed the same bug if a more effective way, so we can now revert
the original commit.

Reported-and-tested-by: Alexander Lyakas <[email protected]>
Cc: [email protected] (3.2+ - 3.2 will need a different fix though)
Fixes: 5d8c71f9e5fbdd95650be00294d238e27a363b5c
Signed-off-by: NeilBrown <[email protected]>

Revert "drm: copy mode type in drm_mode_connector_list_update()"

This reverts commit 3fbd6439e4639ecaeaae6c079e0aa497a1ac3482.

This caused some strange booting lockup issues on an Intel G33
belonging to Daniel Vetter, very unusual, I was hoping Daniel
would track this down, but it looks like instead I'll have to hack
a different fix for -next.

Signed-off-by: Dave Airlie <[email protected]>

Merge tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

Black screen fixes, one for hsw+bdw each and a regression fix for
locking+load detection.

* tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915/bdw: make sure south port interrupts are enabled properly v2
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  drm/i915: fix DDI PLLs HW state readout code

perf tools: Remove unused test-volatile-register-var.c

Since commit 01287e2cb7ad, test-volatile-register-var.c is no more built
as part of the automatic feature check.

This patch remove the unneeded file.

Signed-off-by: Yann Droneaud <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/339d86ad76741ed929defd18541f774b404003b4.1389461371.git.ydroneaud@opteya.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf probe: Fix build when DWARF support libraries not present

On a freshly installed system, after libelf-dev is installed we get:

    CC       /tmp/build/perf/util/probe-event.o
  util/probe-event.c: In function ‘try_to_find_probe_trace_events’:
  util/probe-event.c:753:46: error: unused parameter ‘target’ [-Werror=unused-parameter]
       int max_tevs __maybe_unused, const char *target)
                                                ^
    CC       /tmp/build/perf/util/cgroup.o
  util/probe-event.c: At top level:
  util/probe-event.c:193:12: error: ‘get_text_start_address’ defined but not used [-Werror=unused-function]
   static int get_text_start_address(const char *exec, unsigned long *address)
            ^
  cc1: all warnings being treated as errors
  make[1]: *** [/tmp/build/perf/util/probe-event.o] Error 1
  make[1]: *** Waiting for unfinished jobs....
  make: *** [install] Error 2

Fix it by enclosing functions only used when those libraries are installed
under the suitable preprocessor define and using __maybe_unused to a function
that is only built when DWARF support is disabled.

Problem introduced in this changeset:

  commit fb7345bbf7fad9bf72ef63a19c707970b9685812
  Author: Masami Hiramatsu <[email protected]>
  Date:   Thu Dec 26 05:41:53 2013 +0000

      perf probe: Support basic dwarf-based operations on uprobe events

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf diff: Color the Weighted Diff column

In

$ perf diff -c wdiff:M,N

color the numbers in the Weighted Diff column using color_snprintf(),
picking the colors using get_percent_color().

Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf diff: Color the Ratio column

In

$ perf diff -c ratio

color the Ratio column using value_color_snprintf(), a new function that
operates exactly like percent_color_snprintf().

At first glance, it looks like percent_color_snprintf() can be turned
into a non-variadic function simplifying things; however, 53805ec (perf
tools: Remove cast of non-variadic function to variadic, 2013-10-31)
explains why it needs to be a variadic function.

Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf diff: Color the Delta column

Color the numbers in the Delta column using percent_color_snprintf().

Generalize the coloring function so that we can accommodate all three
comparison methods in future patches: delta, ratio, and wdiff.

Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Generalize percent_color_snprintf()

Make percent_color_snprintf() handle negative values correctly.

Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools include: Include <linux/compiler.h> from asm/bug.h

Since it uses unlikely() macro inside WARN()

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf record: Add --initial-delay option

perf stat has a --delay option to delay measuring the workload.

This is useful to skip measuring the startup phase of the program, which
is often very different from the main workload.

The same is useful for perf record when sampling.

--no-delay was already taken, so add a --initial-delay
to perf record too.
-D was already taken for record, so there is only a long option.

v2: Don't disable group members (Namhyung Kim)
v3: port to latest perf/core
rename to --initial-delay to avoid conflict with --no-delay

Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Use the DWARF unwind info only if loaded

Use the info only if it has been found in the .debug_frame section of
the ELF binary.

Signed-off-by: Jean Pihet <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Add test for building detached source tarballs

Test one of the main kernel Makefile targets to generate a perf sources
tarball suitable for build outside the full kernel sources.

This is to test that the tools/perf/MANIFEST file lists all the files
needed to be in such tarball, which sometimes gets broken when we move
files around, like when we made some files that were in tools/perf/
available to other tools/ codebases by moving it to tools/include/, etc.

Now everytime we use 'make -C tools/perf -f tests/make' this test will
be performed, helping detect such problems earlier in the devel cycle.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Include tools/lib/api/ in MANIFEST

When 553873e1df63 renamed tools/lib/lk to tools/lib/api we forgot to
do the switch in tools/perf/MANIFEST, breaking tarball building:

  [acme@ssdandy linux]$ make perf-targz-src-pkg
    TAR
  [acme@ssdandy linux]$ tar xf perf-3.13.0-rc4.tar.gz -C /tmp/tmp.OgdYyvp77p/
  [acme@ssdandy linux]$ make -C /tmp/tmp.OgdYyvp77p/perf-3.13.0-rc4/tools/perf
  make: Entering directory
  `/tmp/tmp.OgdYyvp77p/perf-3.13.0-rc4/tools/perf'
    BUILD:   Doing 'make -j8' parallel build
    FLEX     util/pmu-flex.c
    CC       util/evlist.o
    CC       util/evsel.o
  util/evsel.c:12:28: fatal error: api/fs/debugfs.h: No such file or directory compilation terminated.
  In file included from util/cache.h:5:0,
  <SNIP>

Fix it.

Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools include: Move perf's bug.h to a generic place

So that it can be shared with others like libtraceevent.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Added the new header to tools/perf/MANIFEST ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools include: Define likely/unlikely in linux/compiler.h

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Added the new header to tools/perf/MANIFEST ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools include: Move perf's linux/compiler.h to a generic place

So that it can be shared with others like libtraceevent.

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Introduce evlist__for_each() & friends

For the common evsel list traversal, so that it becomes more compact.

Use the opportunity to start ditching the 'perf_' from 'perf_evlist__',
as discussed, as the whole conversion touches a lot of places, lets do
it piecemeal when we have the chance due to other work, like in this
case.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf report: Move histogram entries collapsing to separate function

Further uncluttering the main 'report' function by group related code in
separate function.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf report: Move hist browser selection code to separate function

To unclutter the main function.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf report: Move logic to warn about kptr_restrict'ed kernels to separate function

Its too big, better have a separate function for it so that the main
logic gets shorter/clearer.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools perf: Comment typo fix

s/temr/term/

Signed-off-by: Cody P Schafer <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Fix --delay option in man page

The --delay option was documented as --initial-delay in the manpage. Fix this.

Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Make perf_event__synthesize_mmap_events global

Making perf_event__synthesize_mmap_events global, it will be used in
following patch from test code.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf machine: Fix id_hdr_size initialization

The id_hdr_size field was not properly initialized, set it to zero, as
the machine struct may have come from some non zeroing allocation
routine or from the stack without any field being initialized.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables

Instead of explicitly adding same value into
FEATURE_CHECK_(C|LD)FLAGS-all variables we can do that automatically.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf trace: Pack 'struct trace'

Initial struct stats:

/* size: 368, cachelines: 6, members: 24 */
/* sum members: 353, holes: 3, sum holes: 15 */
/* last cacheline: 48 bytes */

After reorg:

[acme@ssdandy linux]$ pahole -C trace ~/bin/trace | tail -4
/* size: 360, cachelines: 6, members: 24 */
/* padding: 7 */
/* last cacheline: 40 bytes */
};
[acme@ssdandy linux]$

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf header: Pack 'struct perf_session_env'

Initial struct:

[acme@ssdandy linux]$ pahole -C perf_session_env ~/bin/perf
struct perf_session_env {
char *                     hostname;             /*     0     8 */
char *                     os_release;           /*     8     8 */
char *                     version;              /*    16     8 */
char *                     arch;                 /*    24     8 */
int                        nr_cpus_online;       /*    32     4 */
int                        nr_cpus_avail;        /*    36     4 */
char *                     cpu_desc;             /*    40     8 */
char *                     cpuid;                /*    48     8 */
long long unsigned int     total_mem;            /*    56     8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int                        nr_cmdline;           /*    64     4 */

/* XXX 4 bytes hole, try to pack */

char *                     cmdline;              /*    72     8 */
int                        nr_sibling_cores;     /*    80     4 */

/* XXX 4 bytes hole, try to pack */

char *                     sibling_cores;        /*    88     8 */
int                        nr_sibling_threads;   /*    96     4 */

/* XXX 4 bytes hole, try to pack */

char *                     sibling_threads;      /*   104     8 */
int                        nr_numa_nodes;        /*   112     4 */

/* XXX 4 bytes hole, try to pack */

char *                     numa_nodes;           /*   120     8 */
/* --- cacheline 2 boundary (128 bytes) --- */
int                        nr_pmu_mappings;      /*   128     4 */

/* XXX 4 bytes hole, try to pack */

char *                     pmu_mappings;         /*   136     8 */
int                        nr_groups;            /*   144     4 */

/* size: 152, cachelines: 3, members: 20 */
/* sum members: 128, holes: 5, sum holes: 20 */
/* padding: 4 */
/* last cacheline: 24 bytes */
};
[acme@ssdandy linux]$

[acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf | grep ^/ | grep -v Final
/* Moving 'nr_sibling_cores' from after 'cmdline' to after 'nr_cmdline' */
/* Moving 'nr_numa_nodes' from after 'sibling_threads' to after 'nr_sibling_threads' */
/* Moving 'nr_groups' from after 'pmu_mappings' to after 'nr_pmu_mappings' */
[acme@ssdandy linux]$

Final struct stats:

[acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf | tail -4
/* --- cacheline 2 boundary (128 bytes) --- */

/* size: 128, cachelines: 2, members: 20 */
};   /* saved 24 bytes and 1 cacheline! */
[acme@ssdandy linux]$

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools lib traceevent: Shut up plugins make message

Getting rid of following build output:

  $ make O=/tmp/build/perf -C tools/perf/ install-bin
  ...
  make[3]: Nothing to be done for `plugins'.
  make[2]: Nothing to be done for `plugins'.
  ...

which triggers when traceevent library needs to be rebuilt, but we have
plugins built already.

Adding extra 'plugins' target with nop which is visible and triggers in
both Makefile parts (for detached output directory (O=...) the
traceevent Makefile spawns sub make for the build itself).

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools lib traceevent: Replace tabs with spaces for all non-commands statements

The tabbed indentation in non-commands statements could be sometimes
considered as follow up for the rule command in the Makefile.

This error is hard to find, so as a precaution replacing tabs with
spaces for all non-commands statements.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://marc.info/?t=136484403900003&r=1&w=2
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tests: Fix installation tests path setup

Currently installation tests work only over x86_64, adding arch check to
make it work over i386 as well.

NOTE looks like x86 is the only arch running tests, we need some
IS_(32/64) flag to make this generic.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Move arch setup into seprate Makefile

I need to use arch related setup in the tests/make, so moving arch setup
into Makefile.arch.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Remove misplaced __maybe_unused

That 'argc' argument _is_ being used.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tests: Fixup leak on error path in parse events test

We need to call the evlist destructor when failing to parse events.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Auto unmap on destructor

Removing further boilerplate after making sure perf_evlist__munmap can
be called multiple times for the same evlist.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Close fds on destructor

Since it is safe to call perf_evlist__close() multiple times, autoclose
it and remove the calls to the close from existing tools, reducing the
tooling boilerplate.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Move destruction of maps to evlist destructor

Instead of requiring tools to do an extra destructor call just before
calling perf_evlist__delete.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf record: Remove old evsel_list usage

To be consistent with other places, use just 'evlist' for the evsel list
variable, and since we have it in 'struct record', use it directly from
there.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Move the SIGUSR1 error reporting logic to prepare_workload

So that we have the boilerplate in the preparation method, instead of
open coded in tools wanting the reporting when the exec fails.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Send the errno in the signal when workload fails

When a tool uses perf_evlist__start_workload and the supplied workload
fails (e.g.: its binary wasn't found), perror was being used to print
the error reason.

This is undesirable, as the caller may be a GUI, when it wants to have
total control of the error reporting process.

So move to using sigaction(SA_SIGINFO) + siginfo_t->sa_value->sival_int
to communicate to the caller the errno and let it print it using the UI
of its choosing.

Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Don't show counter information when workload fails

When starting a workload 'stat' wasn't using prepare_workload evlist
method's signal based exec() error reporting mechanism.

Use it so that the we don't report 'not counted' counters.

Before:

  [acme@zoo linux]$ perf stat dfadsfa
  dfadsfa: No such file or directory

   Performance counter stats for 'dfadsfa':

       <not counted>      task-clock
       <not counted>      context-switches
       <not counted>      cpu-migrations
       <not counted>      page-faults
       <not counted>      cycles
       <not counted>      stalled-cycles-frontend
     <not supported>      stalled-cycles-backend
       <not counted>      instructions
       <not counted>      branches
       <not counted>      branch-misses

         0.001831462 seconds time elapsed

  [acme@zoo linux]$

After:

  [acme@zoo linux]$ perf stat dfadsfa
  dfadsfa: No such file or directory
  [acme@zoo linux]$

Reported-by: David Ahern <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fix from Ben Herrenschmidt:
"Here's one regression fix for 3.13 that I would appreciate if you
  could still pull in.  It was an "interesting" one to debug, basically
  it's an old bug that got somewhat "exposed" by new code breaking the
  boot on PA Semi boards (yes, it does appear that some people are still
  using these!)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Check return value of instance-to-package OF call

Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Sorry, meant to push out this batch earlier this weekend"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
ftrace/x86: Load ftrace_ops in parameter not the variable holding it

powerpc: Check return value of instance-to-package OF call

On PA-Semi firmware, the instance-to-package callback doesn't seem
to be implemented. We didn't check for error, however, thus
subsequently passed the -1 value returned into stdout_node to
thins like prom_getprop etc...

Thus caused the firmware to load values around 0 (physical) internally
as node structures. It somewhat "worked" as long as we had a NULL in the
right place (address 8) at the beginning of the kernel, we didn't "see"
the bug. But commit 5c0484e25ec03243d4c2f2d4416d4a13efc77f6a
"powerpc: Endian safe trampoline" changed the kernel entry point causing
that old bug to now cause a crash early during boot.

This fixes booting on PA-Semi board by properly checking the return
value from instance-to-package.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Tested-by: Olof Johansson <[email protected]>
---

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf updates from Arnaldo Carvalho de Melo:

User visible changes:

Improvements:

* Support showing source code, asking for variables to be collected
   at probe time and other 'perf probe' operations that use DWARF information.

   This supports only binaries with debugging information at this time, detached
   debuginfo (aka debuginfo packages) support should come in later patches.
   (Masami Hiramatsu)

* Add a perf.data file header window in the 'perf report' TUI, associated
   with the 'i' hotkey, providing a counterpart to the --header option in the
   stdio UI. (Namhyung Kim)

* Guest related improvements to 'perf kvm', including allowing to
   specify a directory with guest specific /proc information. (Dongsheng Yang)

* Print session information only if --stdio is given (Namhyung Kim)

Developer stuff:

Fixes:

* Get rid of a duplicate va_end() in error reporting (Namhyung Kim)

* If a hist entry doesn't have symbol information, compare it with its
   address. Affects upcoming new feature (--cumulate) (Namhyung Kim)

Improvements:

* Make libtraceevent install target quieter (Jiri Olsa)

* Make tests/make output more compact (Jiri Olsa)

* Ignore generated files in feature-checks (Chunwei Chen)

New APIs:

* Introduce pevent_filter_strerror() in libtraceevent, similar in
   purpose to libc's strerror() function. (Namhyung Kim)

Refactorings:

* Use perf_data_file methods to write output file in 'record' and
   'inject' (Jiri Olsa)

* Use pr_*() functions where applicable in 'report' (Namhyumg Kim)

* Add 'machine' 'addr_location' struct to have full picture (machine,
   thread, map, symbol, addr) for a (partially) resolved address, reducing
   function signatures (Arnaldo Carvalho de Melo)

* Reduce code duplication in the histogram entry creation/insertion. (Arnaldo Carvalho de Melo)

* Auto allocate annotation histogram data structures, (Arnaldo Carvalho de Melo)

* No need to test against NULL before calling free, also set
   freed memory in struct pointers to NULL, to help fixing use after
   free bugs. (Arnaldo Carvalho de Melo>

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling

Kexec disables outer cache before jumping to reboot code, but it doesn't
flush it explicitly. Flush is done implicitly inside of l2x0_disable().
But some SoC's override default .disable handler and don't flush cache.
This may lead to a corrupted memory during Kexec reboot on these
platforms.

This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable()
handlers to make it consistent with default l2x0_disable().

Acked-by: Rob Herring <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Tony Lindgren <[email protected]>
Signed-off-by: Taras Kondratiuk <[email protected]>
Signed-off-by: Russell King <[email protected]>

Linux 3.13-rc8

SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()

While running stress tests on adding and deleting ftrace instances I hit
this bug:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_inode_permission+0x85/0x160
  PGD 63681067 PUD 7ddbe067 PMD 0
  Oops: 0000 [#1] PREEMPT
  CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
  Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
  task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
  RIP: 0010:[<ffffffff812d8bc5>]  [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
  RSP: 0018:ffff88007ddb1c48  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
  RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
  RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
  R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
  R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
  FS:  00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
  CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
  Call Trace:
    security_inode_permission+0x1c/0x30
    __inode_permission+0x41/0xa0
    inode_permission+0x18/0x50
    link_path_walk+0x66/0x920
    path_openat+0xa6/0x6c0
    do_filp_open+0x43/0xa0
    do_sys_open+0x146/0x240
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b
  Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
  RIP  selinux_inode_permission+0x85/0x160
  CR2: 0000000000000020

Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.

in selinux_inode_permission():

isec = inode->i_security;

rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);

Note, the crash came from stressing the deletion and reading of debugfs
files.  I was not able to recreate this via normal files.  But I'm not
sure they are safe.  It may just be that the race window is much harder
to hit.

What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().

The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct.  Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here.  (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).

Note, this is a hack, but it fixes the problem at hand.  A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback.  But that is a major job to do, and requires a
lot of work.  For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only

We see General Protection Fault on RSI in copy_page_rep: that RSI is
what you get from a NULL struct page pointer.

  RIP: 0010:[<ffffffff81154955>]  [<ffffffff81154955>] copy_page_rep+0x5/0x10
  RSP: 0000:ffff880136e15c00  EFLAGS: 00010286
  RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200
  RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000
  RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c
  R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001
  R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0
  Call Trace:
    copy_user_huge_page+0x93/0xab
    do_huge_pmd_wp_page+0x710/0x815
    handle_mm_fault+0x15d8/0x1d70
    __do_page_fault+0x14d/0x840
    do_page_fault+0x2f/0x90
    page_fault+0x22/0x30

do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
since shrink_huge_zero_page() can free the huge_zero_page, and we have
no hold of our own on it here (except where the fourth test holds
page_table_lock and has checked pmd_same), it's possible for it to
answer yes the first time, but no to the second or third test.  Change
all those last three to tests for NULL page.

(Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
in https://lkml.org/lkml/2013/3/29/103.  I believe that one is due
to the source page being split, and a tail page freed, while copy
is in progress; and not a problem without DEBUG_PAGEALLOC, since
the pmd_same check will prevent a miscopy from being made visible.)

Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: [email protected] # v3.10 v3.11 v3.12
Signed-off-by: Linus Torvalds <[email protected]>

block: null_blk: fix queue leak inside removing device

When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.

This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.

Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

perf: Introduce a flag to enable close-on-exec in perf_event_open()

Unlike recent modern userspace API such as:

  epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC),
  fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC),
  signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC),
  or the venerable general purpose open (O_CLOEXEC),

perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC
(eg. close-on-exec) flag on file descriptor it returns to userspace.

The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow
perf_event_open() syscall to atomically set close-on-exec.

Having this flag will enable userspace to remove the file descriptor
from the list of file descriptors being inherited across exec,
without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the
associated race condition between the current thread and another
thread calling fork(2) then execve(2).

Links:

- Secure File Descriptor Handling (Ulrich Drepper, 2008)
   http://udrepper.livejournal.com/20407.html

- Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012)
   http://danwalsh.livejournal.com/53603.html

- Notes in DMA buffer sharing: leak and security hole
   http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428

Signed-off-by: Yann Droneaud <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.com
Signed-off-by: Ingo Molnar <[email protected]>

perf/x86/intel: Add Intel RAPL PP1 energy counter support

This patch adds support for the Intel RAPL energy counter
PP1 (Power Plane 1).

On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.

New event:
- name: power/energy-gpu/
- code: event=0x4
- unit: 2^-32 Joules

On processors without graphics, this should count 0.
The patch only enables this event on client processors.

Reviewed-by: Maria Dimakopoulou <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

perf/x86: Fix active_entry initialization

This patch fixes a problem with the initialization of the
struct perf_event active_entry field. It is defined inside
an anonymous union and was initialized in perf_event_alloc()
using INIT_LIST_HEAD(). However at that time, we do not know
whether the event is going to use active_entry or hlist_entry (SW).
Or at last, we don't want to make that determination there.
The problem is that hlist and list_head are not initialized
the same way. One is okay with NULL (from kzmalloc), the other
needs to pointers to point to self.

This patch resolves this problem by dropping the union.
This will avoid problems later on, if someone starts using
active_entry or hlist_entry without verifying that they
actually overlap. This also solves the initialization
problem.

Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched_clock: Disable seqlock lockdep usage in sched_clock()

Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.

Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.

Acked-by: Linus Torvalds <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Reported-by: Krzysztof Hałasa <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Cc: Uwe Kleine-König <[email protected]>
Cc: Willy Tarreau <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

seqlock: Use raw_ prefix instead of _no_lockdep

Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Krzysztof Hałasa <[email protected]>
Cc: Uwe Kleine-König <[email protected]>
Cc: Willy Tarreau <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched: Calculate effective load even if local weight is 0

Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

Reported-and-tested-by: Thomas Hellstrom <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
[ Wrote changelog]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <[email protected]>

ARM: 7939/1: traps: fix opcode endianness when read from user memory

Currently code has an inverted logic: opcode from user memory
is swapped to a proper endianness only in case of read error.
While normally opcode should be swapped only if it was read
correctly from user memory.

Reviewed-by: Victor Kamensky <[email protected]>
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Taras Kondratiuk <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7937/1: perf_event: Silence sparse warning

arch/arm/kernel/perf_event_cpu.c:274:25: warning: incorrect type in assignment (different modifiers)
arch/arm/kernel/perf_event_cpu.c:274:25: expected int ( *init_fn )( ... )
arch/arm/kernel/perf_event_cpu.c:274:25: got void const *const data

Acked-by: Will Deacon <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous match

The MPIDR contains specific bitfields(MPIDR.Aff{2..0}) which uniquely
identify a CPU, in addition to some non-identifying information and
reserved bits. The ARM cpu binding defines the 'reg' property to only
contain the affinity bits, and any cpu nodes with other bits set in
their 'reg' entry are skipped.

As such it is not necessary to mask the phys_id with MPIDR_HWID_BITMASK,
and doing so could lead to matching erroneous CPU nodes in the device
tree. This patch removes the masking of the physical identifier.

Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Russell King <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Famouse last words: "final pull request" :-)

  I'm sending this because Jason Wang's fixes are pretty important

   1) Add missing per-cpu stats initialization to ip6_vti.  Otherwise
      lockdep spits out a call trace.  From Li RongQing.

   2) Fix NULL oops in wireless hwsim, from Javier Lopez

   3) TIPC deferred packet queue unlink must NULL out skb->next to avoid
      crashes.  From Erik Hugne

   4) Fix access to uninitialized buffer in nf_nat netfilter code, from
      Daniel Borkmann

   5) Fix lifetime of ipv6 loopback and SIT tunnel addresses, otherwise
      they basically timeout immediately.  From Hannes Frederic Sowa

   6) Fix DMA unmapping of TSO packets in bnx2x driver, from Michal
      Schmidt

   7) Do not allow L2 forwarding offload via macvtap device, the way
      things are now it will not end up being forwaded at all.  From
      Jason Wang

   8) Fix transmit queue selection via ndo_dfwd_start_xmit(), fixing
      things like applying NETIF_F_LLTX to the wrong device (!!) and
      eliding the proper transmit watchdog handling

   9) qlcnic driver was not updating tx statistics at all, from Manish
      Chopra"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  qlcnic: Fix ethtool statistics length calculation
  qlcnic: Fix bug in TX statistics
  net: core: explicitly select a txq before doing l2 forwarding
  macvlan: forbid L2 fowarding offload for macvtap
  bnx2x: fix DMA unmapping of TSO split BDs
  ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME
  bnx2x: prevent WARN during driver unload
  tipc: correctly unlink packets from deferred packet queue
  ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c
  netfilter: only warn once on wrong seqadj usage
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  NFC: Fix target mode p2p link establishment
  iwlwifi: add new devices for 7265 series
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  mac80211_hwsim: Fix NULL pointer dereference

Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
"Here we have a bugfix for an off-by-one in the remote attribute
  verifier that results in a forced shutdown which you can hit with v5
  superblock by creating a 64k xattr, and a fix for a missing
  destroy_work_on_stack() in the allocation worker.

  It's a bit late, but they are both fairly straightforward"

* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify

Merge branch 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds

Pull LED fix from Bryan Wu:
"Pali Rohár and Pavel Machek reported the LED of Nokia N900 doesn't
work with our latest 3.13-rc6 kernel. Milo fixed the regression here"

* 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: lp5521/5523: Remove duplicate mutex

Merge tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:

- Recent commits modifying the lists of C-states in the intel_idle
   driver introduced bugs leading to crashes on some systems.  Two fixes
   from Jiang Liu.

- The ACPI AC driver should receive all types of notifications, but
   recent change made it ignore some of them.  Fix from Alexander Mezin.

- intel_pstate's validity checks for MSRs it depends on are not
   sufficient to catch the lack of support in nested KVM setups, so they
   are extended to cover that case.  From Dirk Brandewie.

- NEC LZ750/LS has a botched up _BIX method in its ACPI tables, so our
   ACPI battery driver needs a quirk for it.  From Lan Tianyu.

- The tpm_ppi driver sometimes leaks memory allocated by
   acpi_get_name().  Fix from Jiang Liu.

* tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: close avn_cstates array with correct marker
  Revert "intel_idle: mark states tables with __initdata tag"
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  ACPI / TPM: fix memory leak when walking ACPI namespace
  ACPI / AC: change notification handler type to ACPI_ALL_NOTIFY

Merge tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes

Pull MFD fix from Samuel Ortiz:
"This is the 2nd MFD pull request for 3.13

  It only contains one fix for the rtsx_pcr driver.  Without it we see a
  kernel panic on some machines, when resuming from suspend to RAM"

* tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works

leds: lp5521/5523: Remove duplicate mutex

It can be a problem when a pattern is loaded via the firmware interface.
LP55xx common driver has already locked the mutex in 'lp55xx_firmware_loaded()'.
So it should be deleted.

On the other hand, locks are required in store_engine_load()
on updating program memory.

Reported-by: Pali Rohár <[email protected]>
Reported-by: Pavel Machek <[email protected]>
Signed-off-by: Milo Kim <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
Cc: <[email protected]>

drm/i915/bdw: make sure south port interrupts are enabled properly v2

We were apparently relying on the defaults on BDW, which resulted in no
hotplug or AUX interrupts. So be sure to call the ibx_irq_preinstall to
enable all interrupts.

v2: use preinstall instead of redundant SDIER write

References: https://bugs.freedesktop.org/show_bug.cgi?id=72834
References: https://bugs.freedesktop.org/show_bug.cgi?id=72833
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>

xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()

In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <[email protected]>
Reviewed-by: Ben Myers <[email protected]>
Signed-off-by: Ben Myers <[email protected]>
(cherry picked from commit 6f96b3063cdd473c68664a190524ed966ac0cd92)

xfs: fix off-by-one error in xfs_attr3_rmt_verify

With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <[email protected]>
Reviewed-by: Mark Tinguely <[email protected]>
Signed-off-by: Ben Myers <[email protected]>
(cherry picked from commit 85dd0707f0cad26d60f2dc574d17a5ab948d10f7)

qlcnic: Fix ethtool statistics length calculation

o Consider number of Tx queues while calculating the length of
Tx statistics as part of ethtool stats.
o Calculate statistics lenght properly for 82xx and 83xx adapter

Signed-off-by: Shahed Shaikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qlcnic: Fix bug in TX statistics

o Driver was not updating TX stats so it was not populating
statistics in `ifconfig` command output.

Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>