Robert Richter [Wed, 15 Jan 2014 14:57:29 +0000 (15:57 +0100)]
perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:
Enabling non-boot CPUs ...
smpboot: Booting Node 0 Processor 1 APIC 0x1
[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
perf: IBS APIC setup failed on cpu #1
process: Switch to broadcast mode on CPU1
CPU1 is up
...
ACPI: Waking up from system sleep state S3
Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.
The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.
This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.
Marking it as stable to let distros pick up this fix.
Peter Zijlstra [Fri, 10 Jan 2014 20:06:03 +0000 (21:06 +0100)]
x86, mm, perf: Allow recursive faults from interrupts
Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
the PMI in turn managed to trigger a fault while obtaining a stack
trace. This triggered the sig_on_uaccess_error recursive fault logic
and killed the process dead.
Fix this by explicitly excluding interrupts from the recursive fault
logic.
Linus Torvalds [Thu, 16 Jan 2014 01:33:21 +0000 (08:33 +0700)]
Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler and timer fixes from Ingo Molnar:
"Contains a fix for a scheduler bug that manifested itself as a 3D
performance regression and a crash fix for the ARM Cadence TTC clock
driver"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Calculate effective load even if local weight is 0
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: cadence_ttc: Fix mutex taken inside interrupt context
Linus Torvalds [Thu, 16 Jan 2014 01:31:55 +0000 (08:31 +0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two fixes from lockdep coverage of seqlocks, which fix deadlocks on
lockdep-enabled ARM systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched_clock: Disable seqlock lockdep usage in sched_clock()
seqlock: Use raw_ prefix instead of _no_lockdep
Linus Torvalds [Thu, 16 Jan 2014 01:26:44 +0000 (08:26 +0700)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix attribute length problem in coretemp driver"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Fix truncated name of alarm attributes
Linus Torvalds [Wed, 15 Jan 2014 08:42:11 +0000 (15:42 +0700)]
Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
"Six fixes"
* emailed patches from Andrew Morton <[email protected]>:
lib/percpu_counter.c: fix __percpu_counter_add()
crash_dump: fix compilation error (on MIPS at least)
mm: fix crash when using XFS on loopback
MIPS: fix blast_icache32 on loongson2
MIPS: fix case mismatch in local_r4k_flush_icache_range()
nilfs2: fix segctor bug that causes file system corruption
Linus Torvalds [Wed, 15 Jan 2014 08:07:36 +0000 (15:07 +0700)]
Merge tag 'md/3.13-fixes' of git://neil.brown.name/md
Pull late md fixes from Neil Brown:
"Half a dozen md bug fixes.
All of these fix real bugs the people have hit, and are tagged for
-stable. Sorry they are late .... Christmas holidays and all that.
Hopefully they can still squeak into 3.13"
* tag 'md/3.13-fixes' of git://neil.brown.name/md:
md: fix problem when adding device to read-only array with bitmap.
md/raid10: fix bug when raid10 recovery fails to recover a block.
md/raid5: fix a recently broken BUG_ON().
md/raid1: fix request counting bug in new 'barrier' code.
md/raid10: fix two bugs in handling of known-bad-blocks.
md/raid5: Fix possible confusion when multiple write errors occur.
Linus Torvalds [Wed, 15 Jan 2014 08:06:14 +0000 (15:06 +0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau regression fix on older cards, i915 black screen fixes,
and a revert for a strange G33 intel problem"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: fix null ptr dereferences on some boards
Revert "drm: copy mode type in drm_mode_connector_list_update()"
drm/i915/bdw: make sure south port interrupts are enabled properly v2
drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
drm/i915: fix DDI PLLs HW state readout code
Ming Lei [Wed, 15 Jan 2014 01:56:42 +0000 (17:56 -0800)]
lib/percpu_counter.c: fix __percpu_counter_add()
__percpu_counter_add() may be called in softirq/hardirq handler (such
as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
so we need to call this_cpu_add()(irq safe helper) to update percpu
counter, otherwise counts may be lost.
This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.
This patch is the v1 of previous one of "lib/percpu_counter.c:
disable local irq when updating percpu couter", and takes Andrew's
approach which may be more efficient for ARCHs(x86, s390) that
have optimized this_cpu_add().
Mikulas Patocka [Wed, 15 Jan 2014 01:56:40 +0000 (17:56 -0800)]
mm: fix crash when using XFS on loopback
Commit 8456a648cf44 ("slab: use struct page for slab management") causes
a crash in the LVM2 testsuite on PA-RISC (the crashing test is
fsadm.sh). The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and
later.
Kernel panic - not syncing: Bad Address (null pointer deref?)
Commit 8456a648cf44 changes the page structure so that the slab
subsystem reuses the page->mapping field.
The crash happens in the following way:
* XFS allocates some memory from slab and issues a bio to read data
into it.
* the bio is sent to the loopback device.
* lo_receive creates an actor and calls splice_direct_to_actor.
* lo_splice_actor copies data to the target page.
* lo_splice_actor calls flush_dcache_page because the page may be
mapped by userspace. In that case we need to flush the kernel cache.
* flush_dcache_page asks for the list of userspace mappings, however
that page->mapping field is reused by the slab subsystem for a
different purpose. This causes the crash.
Note that other architectures without coherent caches (sparc, arm, mips)
also call page_mapping from flush_dcache_page, so they may crash in the
same way.
This patch fixes this bug by testing if the page is a slab page in
page_mapping and returning NULL if it is.
The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in
earlier kernels in the same scenario on architectures without cache
coherence when CONFIG_DEBUG_VM is enabled - so it should be backported
to stable kernels.
In the old kernels, the function page_mapping is placed in
include/linux/mm.h, so you should modify the patch accordingly when
backporting it.
Aaro Koskinen [Wed, 15 Jan 2014 01:56:38 +0000 (17:56 -0800)]
MIPS: fix blast_icache32 on loongson2
Commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery
all over arch/mips") failed to add Loongson2 specific blast_icache32
functions. Fix that.
The patch fixes the following crash seen with 3.13-rc1:
Huacai Chen [Wed, 15 Jan 2014 01:56:37 +0000 (17:56 -0800)]
MIPS: fix case mismatch in local_r4k_flush_icache_range()
Currently, Loongson-2 call protected_blast_icache_range() and others
call protected_loongson23_blast_icache_range(), but I think the correct
behavior should be the opposite. BTW, Loongson-3's cache-ops is
compatible with MIPS64, but not compatible with Loongson-2. So, rename
xxx_loongson23_yyy things to xxx_loongson2_yyy.
The patch fixes early boot hang with 3.13-rc1, introduced in commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery all over
arch/mips").
Andreas Rohner [Wed, 15 Jan 2014 01:56:36 +0000 (17:56 -0800)]
nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.
The problem shows itself with the following kernel log message:
nilfs_sufile_do_cancel_free: segment 6533 must be clean
Usually a few hours later the file system gets corrupted:
The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.
This is what happens:
1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
including the newly allocated segment, which will contain active
data and can be allocated at a later time
10. A few hours later another segment construction allocates the
segment and causes file system corruption
This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.
Ingo Molnar [Wed, 15 Jan 2014 06:39:30 +0000 (07:39 +0100)]
Merge branch 'clockevents/3.13-fixes' of git://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
Pull clock driver fix from Daniel Lezcano:
" * Soren Brinkmann fixed the cadence_ttc driver where a call to
clk_get_rate happens in an interrupt context. More precisely in an IPI
when the broadcast timer is initialized for each cpu in the cpuidle
driver. "
Jean Delvare [Tue, 14 Jan 2014 14:59:55 +0000 (15:59 +0100)]
hwmon: (coretemp) Fix truncated name of alarm attributes
When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.
* Column colouring improvements in 'diff' (Ramkumar Ramachandra)
Fixes:
* Don't show counter information when workload fails (Arnaldo Carvalho de Melo)
* Fixup leak on error path in parse events test. (Arnaldo Carvalho de Melo)
* Fix --delay option in 'stat' man page (Andi Kleen)
* Use the DWARF unwind info only if loaded (Jean Pihet):
Developer stuff:
* Improve forked workload error reporting by sending the errno in the signal
data queueing integer field, using sigqueue and by doing the signal setup in
the evlist methods, removing open coded equivalents in various tools. (Arnaldo Carvalho de Melo)
* Do more auto exit cleanup shores in the 'evlist' destructor, so that the tools
don't have to all do that sequence. (Arnaldo Carvalho de Melo)
* Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho de Melo)
* Include tools/lib/api/ in MANIFEST, fixing detached tarballs (Arnaldo Carvalho de Melo)
* Add test for building detached source tarballs (Arnaldo Carvalho de Melo)
* Shut up libtracevent plugins make message (Jiri Olsa)
* Fix installation tests path setup (Jiri Olsa)
* Fix id_hdr_size initialization (Jiri Olsa)
* Move some header files from tools/perf/ to tools/include/ to make them available to
other tools/ dwelling codebases (Namhyung Kim)
* Fix 'probe' build when DWARF support libraries not present (Arnaldo Carvalho de Melo)
Refactorings:
* Move logic to warn about kptr_restrict'ed kernels to separate
function in 'report' (Arnaldo Carvalho de Melo)
* Move hist browser selection code to separate function (Arnaldo Carvalho de Melo)
* Move histogram entries collapsing to separate function (Arnaldo Carvalho de Melo)
* Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo)
* Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri Olsa)
* Move arch setup into seprate Makefile (Jiri Olsa)
Trivial stuff:
* Remove misplaced __maybe_unused in 'stat' (Arnaldo Carvalho de Melo)
* Remove old evsel_list usage in 'record' (Arnaldo Carvalho de Melo)
Stephen Warren [Mon, 13 Jan 2014 21:29:04 +0000 (14:29 -0700)]
i2c: Re-instate body of i2c_parent_is_i2c_adapter()
The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.
Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.
One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.
NeilBrown [Wed, 11 Dec 2013 23:13:33 +0000 (10:13 +1100)]
md: fix problem when adding device to read-only array with bitmap.
If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.
If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.
However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update. We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.
This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag. If that is set,
then the device will not be added to a read-only array.
added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.
So move the freeing of r10bio down where it is safe.
simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.
When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero. As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.
So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.
The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4
Fixes: 6d183de4077191d1201283a9035ce57a9b05254d Cc: [email protected] (any release to which above was applied) Signed-off-by: NeilBrown <[email protected]>
NeilBrown [Tue, 14 Jan 2014 00:56:14 +0000 (11:56 +1100)]
md/raid1: fix request counting bug in new 'barrier' code.
The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.
allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.
However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.
So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.
But was introduced in 3.13-rc1.
Reported-by: Bruno Wolff III <[email protected]>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061 Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 Cc: majianpeng <[email protected]> Signed-off-by: NeilBrown <[email protected]>
NeilBrown [Mon, 13 Jan 2014 23:38:09 +0000 (10:38 +1100)]
md/raid10: fix two bugs in handling of known-bad-blocks.
If we discover a bad block when reading we split the request and
potentially read some of it from a different device.
The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
seen by comparison with raid1.c
This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.
Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.
Fixed the same bug if a more effective way, so we can now revert
the original commit.
Reported-and-tested-by: Alexander Lyakas <[email protected]> Cc: [email protected] (3.2+ - 3.2 will need a different fix though) Fixes: 5d8c71f9e5fbdd95650be00294d238e27a363b5c Signed-off-by: NeilBrown <[email protected]>
This caused some strange booting lockup issues on an Intel G33
belonging to Daniel Vetter, very unusual, I was hoping Daniel
would track this down, but it looks like instead I'll have to hack
a different fix for -next.
Dave Airlie [Tue, 14 Jan 2014 02:44:48 +0000 (12:44 +1000)]
Merge tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Black screen fixes, one for hsw+bdw each and a regression fix for
locking+load detection.
* tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915/bdw: make sure south port interrupts are enabled properly v2
drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
drm/i915: fix DDI PLLs HW state readout code
perf probe: Fix build when DWARF support libraries not present
On a freshly installed system, after libelf-dev is installed we get:
CC /tmp/build/perf/util/probe-event.o
util/probe-event.c: In function ‘try_to_find_probe_trace_events’:
util/probe-event.c:753:46: error: unused parameter ‘target’ [-Werror=unused-parameter]
int max_tevs __maybe_unused, const char *target)
^
CC /tmp/build/perf/util/cgroup.o
util/probe-event.c: At top level:
util/probe-event.c:193:12: error: ‘get_text_start_address’ defined but not used [-Werror=unused-function]
static int get_text_start_address(const char *exec, unsigned long *address)
^
cc1: all warnings being treated as errors
make[1]: *** [/tmp/build/perf/util/probe-event.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [install] Error 2
Fix it by enclosing functions only used when those libraries are installed
under the suitable preprocessor define and using __maybe_unused to a function
that is only built when DWARF support is disabled.
color the Ratio column using value_color_snprintf(), a new function that
operates exactly like percent_color_snprintf().
At first glance, it looks like percent_color_snprintf() can be turned
into a non-variadic function simplifying things; however, 53805ec (perf
tools: Remove cast of non-variadic function to variadic, 2013-10-31)
explains why it needs to be a variadic function.
perf tools: Add test for building detached source tarballs
Test one of the main kernel Makefile targets to generate a perf sources
tarball suitable for build outside the full kernel sources.
This is to test that the tools/perf/MANIFEST file lists all the files
needed to be in such tarball, which sometimes gets broken when we move
files around, like when we made some files that were in tools/perf/
available to other tools/ codebases by moving it to tools/include/, etc.
Now everytime we use 'make -C tools/perf -f tests/make' this test will
be performed, helping detect such problems earlier in the devel cycle.
When 553873e1df63 renamed tools/lib/lk to tools/lib/api we forgot to
do the switch in tools/perf/MANIFEST, breaking tarball building:
[acme@ssdandy linux]$ make perf-targz-src-pkg
TAR
[acme@ssdandy linux]$ tar xf perf-3.13.0-rc4.tar.gz -C /tmp/tmp.OgdYyvp77p/
[acme@ssdandy linux]$ make -C /tmp/tmp.OgdYyvp77p/perf-3.13.0-rc4/tools/perf
make: Entering directory
`/tmp/tmp.OgdYyvp77p/perf-3.13.0-rc4/tools/perf'
BUILD: Doing 'make -j8' parallel build
FLEX util/pmu-flex.c
CC util/evlist.o
CC util/evsel.o
util/evsel.c:12:28: fatal error: api/fs/debugfs.h: No such file or directory compilation terminated.
In file included from util/cache.h:5:0,
<SNIP>
For the common evsel list traversal, so that it becomes more compact.
Use the opportunity to start ditching the 'perf_' from 'perf_evlist__',
as discussed, as the whole conversion touches a lot of places, lets do
it piecemeal when we have the chance due to other work, like in this
case.
Jiri Olsa [Tue, 7 Jan 2014 12:47:19 +0000 (13:47 +0100)]
perf machine: Fix id_hdr_size initialization
The id_hdr_size field was not properly initialized, set it to zero, as
the machine struct may have come from some non zeroing allocation
routine or from the stack without any field being initialized.
[acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf | grep ^/ | grep -v Final
/* Moving 'nr_sibling_cores' from after 'cmdline' to after 'nr_cmdline' */
/* Moving 'nr_numa_nodes' from after 'sibling_threads' to after 'nr_sibling_threads' */
/* Moving 'nr_groups' from after 'pmu_mappings' to after 'nr_pmu_mappings' */
[acme@ssdandy linux]$
Jiri Olsa [Wed, 1 Jan 2014 16:50:50 +0000 (17:50 +0100)]
tools lib traceevent: Shut up plugins make message
Getting rid of following build output:
$ make O=/tmp/build/perf -C tools/perf/ install-bin
...
make[3]: Nothing to be done for `plugins'.
make[2]: Nothing to be done for `plugins'.
...
which triggers when traceevent library needs to be rebuilt, but we have
plugins built already.
Adding extra 'plugins' target with nop which is visible and triggers in
both Makefile parts (for detached output directory (O=...) the
traceevent Makefile spawns sub make for the build itself).
Since it is safe to call perf_evlist__close() multiple times, autoclose
it and remove the calls to the close from existing tools, reducing the
tooling boilerplate.
To be consistent with other places, use just 'evlist' for the evsel list
variable, and since we have it in 'struct record', use it directly from
there.
perf evlist: Send the errno in the signal when workload fails
When a tool uses perf_evlist__start_workload and the supplied workload
fails (e.g.: its binary wasn't found), perror was being used to print
the error reason.
This is undesirable, as the caller may be a GUI, when it wants to have
total control of the error reporting process.
So move to using sigaction(SA_SIGINFO) + siginfo_t->sa_value->sival_int
to communicate to the caller the errno and let it print it using the UI
of its choosing.
Linus Torvalds [Mon, 13 Jan 2014 03:59:05 +0000 (10:59 +0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix from Ben Herrenschmidt:
"Here's one regression fix for 3.13 that I would appreciate if you
could still pull in. It was an "interesting" one to debug, basically
it's an old bug that got somewhat "exposed" by new code breaking the
boot on PA Semi boards (yes, it does appear that some people are still
using these!)"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Check return value of instance-to-package OF call
Linus Torvalds [Mon, 13 Jan 2014 00:28:49 +0000 (07:28 +0700)]
Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Sorry, meant to push out this batch earlier this weekend"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
ftrace/x86: Load ftrace_ops in parameter not the variable holding it
powerpc: Check return value of instance-to-package OF call
On PA-Semi firmware, the instance-to-package callback doesn't seem
to be implemented. We didn't check for error, however, thus
subsequently passed the -1 value returned into stdout_node to
thins like prom_getprop etc...
Thus caused the firmware to load values around 0 (physical) internally
as node structures. It somewhat "worked" as long as we had a NULL in the
right place (address 8) at the beginning of the kernel, we didn't "see"
the bug. But commit 5c0484e25ec03243d4c2f2d4416d4a13efc77f6a
"powerpc: Endian safe trampoline" changed the kernel entry point causing
that old bug to now cause a crash early during boot.
This fixes booting on PA-Semi board by properly checking the return
value from instance-to-package.
Ingo Molnar [Sun, 12 Jan 2014 16:39:47 +0000 (17:39 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf updates from Arnaldo Carvalho de Melo:
User visible changes:
Improvements:
* Support showing source code, asking for variables to be collected
at probe time and other 'perf probe' operations that use DWARF information.
This supports only binaries with debugging information at this time, detached
debuginfo (aka debuginfo packages) support should come in later patches.
(Masami Hiramatsu)
* Add a perf.data file header window in the 'perf report' TUI, associated
with the 'i' hotkey, providing a counterpart to the --header option in the
stdio UI. (Namhyung Kim)
* Guest related improvements to 'perf kvm', including allowing to
specify a directory with guest specific /proc information. (Dongsheng Yang)
* Print session information only if --stdio is given (Namhyung Kim)
Developer stuff:
Fixes:
* Get rid of a duplicate va_end() in error reporting (Namhyung Kim)
* If a hist entry doesn't have symbol information, compare it with its
address. Affects upcoming new feature (--cumulate) (Namhyung Kim)
Improvements:
* Make libtraceevent install target quieter (Jiri Olsa)
* Make tests/make output more compact (Jiri Olsa)
* Ignore generated files in feature-checks (Chunwei Chen)
New APIs:
* Introduce pevent_filter_strerror() in libtraceevent, similar in
purpose to libc's strerror() function. (Namhyung Kim)
Refactorings:
* Use perf_data_file methods to write output file in 'record' and
'inject' (Jiri Olsa)
* Use pr_*() functions where applicable in 'report' (Namhyumg Kim)
* Add 'machine' 'addr_location' struct to have full picture (machine,
thread, map, symbol, addr) for a (partially) resolved address, reducing
function signatures (Arnaldo Carvalho de Melo)
* Reduce code duplication in the histogram entry creation/insertion. (Arnaldo Carvalho de Melo)
* Auto allocate annotation histogram data structures, (Arnaldo Carvalho de Melo)
* No need to test against NULL before calling free, also set
freed memory in struct pointers to NULL, to help fixing use after
free bugs. (Arnaldo Carvalho de Melo>
Taras Kondratiuk [Fri, 10 Jan 2014 00:27:08 +0000 (01:27 +0100)]
ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
Kexec disables outer cache before jumping to reboot code, but it doesn't
flush it explicitly. Flush is done implicitly inside of l2x0_disable().
But some SoC's override default .disable handler and don't flush cache.
This may lead to a corrupted memory during Kexec reboot on these
platforms.
This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable()
handlers to make it consistent with default l2x0_disable().
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
since shrink_huge_zero_page() can free the huge_zero_page, and we have
no hold of our own on it here (except where the fourth test holds
page_table_lock and has checked pmd_same), it's possible for it to
answer yes the first time, but no to the second or third test. Change
all those last three to tests for NULL page.
(Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
in https://lkml.org/lkml/2013/3/29/103. I believe that one is due
to the source page being split, and a tail page freed, while copy
is in progress; and not a problem without DEBUG_PAGEALLOC, since
the pmd_same check will prevent a miscopy from being made visible.)
When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.
This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.
Yann Droneaud [Sun, 5 Jan 2014 20:36:33 +0000 (21:36 +0100)]
perf: Introduce a flag to enable close-on-exec in perf_event_open()
Unlike recent modern userspace API such as:
epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC),
fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC),
signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC),
or the venerable general purpose open (O_CLOEXEC),
perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC
(eg. close-on-exec) flag on file descriptor it returns to userspace.
The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow
perf_event_open() syscall to atomically set close-on-exec.
Having this flag will enable userspace to remove the file descriptor
from the list of file descriptors being inherited across exec,
without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the
associated race condition between the current thread and another
thread calling fork(2) then execve(2).
This patch fixes a problem with the initialization of the
struct perf_event active_entry field. It is defined inside
an anonymous union and was initialized in perf_event_alloc()
using INIT_LIST_HEAD(). However at that time, we do not know
whether the event is going to use active_entry or hlist_entry (SW).
Or at last, we don't want to make that determination there.
The problem is that hlist and list_head are not initialized
the same way. One is okay with NULL (from kzmalloc), the other
needs to pointers to point to self.
This patch resolves this problem by dropping the union.
This will avoid problems later on, if someone starts using
active_entry or hlist_entry without verifying that they
actually overlap. This also solves the initialization
problem.
John Stultz [Thu, 2 Jan 2014 23:11:14 +0000 (15:11 -0800)]
sched_clock: Disable seqlock lockdep usage in sched_clock()
Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.
Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.
Rik van Riel [Mon, 6 Jan 2014 11:39:12 +0000 (11:39 +0000)]
sched: Calculate effective load even if local weight is 0
Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.
Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.
wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.
Linus Torvalds [Sun, 12 Jan 2014 03:15:52 +0000 (19:15 -0800)]
x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.
Taras Kondratiuk [Fri, 10 Jan 2014 00:36:00 +0000 (01:36 +0100)]
ARM: 7939/1: traps: fix opcode endianness when read from user memory
Currently code has an inverted logic: opcode from user memory
is swapped to a proper endianness only in case of read error.
While normally opcode should be swapped only if it was read
correctly from user memory.
Sudeep Holla [Wed, 8 Jan 2014 17:24:21 +0000 (18:24 +0100)]
ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous match
The MPIDR contains specific bitfields(MPIDR.Aff{2..0}) which uniquely
identify a CPU, in addition to some non-identifying information and
reserved bits. The ARM cpu binding defines the 'reg' property to only
contain the affinity bits, and any cpu nodes with other bits set in
their 'reg' entry are skipped.
As such it is not necessary to mask the phys_id with MPIDR_HWID_BITMASK,
and doing so could lead to matching erroneous CPU nodes in the device
tree. This patch removes the masking of the physical identifier.
Pull networking fixes from David Miller:
"Famouse last words: "final pull request" :-)
I'm sending this because Jason Wang's fixes are pretty important
1) Add missing per-cpu stats initialization to ip6_vti. Otherwise
lockdep spits out a call trace. From Li RongQing.
2) Fix NULL oops in wireless hwsim, from Javier Lopez
3) TIPC deferred packet queue unlink must NULL out skb->next to avoid
crashes. From Erik Hugne
4) Fix access to uninitialized buffer in nf_nat netfilter code, from
Daniel Borkmann
5) Fix lifetime of ipv6 loopback and SIT tunnel addresses, otherwise
they basically timeout immediately. From Hannes Frederic Sowa
6) Fix DMA unmapping of TSO packets in bnx2x driver, from Michal
Schmidt
7) Do not allow L2 forwarding offload via macvtap device, the way
things are now it will not end up being forwaded at all. From
Jason Wang
8) Fix transmit queue selection via ndo_dfwd_start_xmit(), fixing
things like applying NETIF_F_LLTX to the wrong device (!!) and
eliding the proper transmit watchdog handling
9) qlcnic driver was not updating tx statistics at all, from Manish
Chopra"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
qlcnic: Fix ethtool statistics length calculation
qlcnic: Fix bug in TX statistics
net: core: explicitly select a txq before doing l2 forwarding
macvlan: forbid L2 fowarding offload for macvtap
bnx2x: fix DMA unmapping of TSO split BDs
ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME
bnx2x: prevent WARN during driver unload
tipc: correctly unlink packets from deferred packet queue
ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c
netfilter: only warn once on wrong seqadj usage
netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
NFC: Fix target mode p2p link establishment
iwlwifi: add new devices for 7265 series
mac80211: move "bufferable MMPDU" check to fix AP mode scan
mac80211_hwsim: Fix NULL pointer dereference
Linus Torvalds [Fri, 10 Jan 2014 23:33:03 +0000 (06:33 +0700)]
Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
"Here we have a bugfix for an off-by-one in the remote attribute
verifier that results in a forced shutdown which you can hit with v5
superblock by creating a 64k xattr, and a fix for a missing
destroy_work_on_stack() in the allocation worker.
It's a bit late, but they are both fairly straightforward"
* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
xfs: fix off-by-one error in xfs_attr3_rmt_verify
Linus Torvalds [Fri, 10 Jan 2014 23:26:27 +0000 (06:26 +0700)]
Merge branch 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED fix from Bryan Wu:
"Pali Rohár and Pavel Machek reported the LED of Nokia N900 doesn't
work with our latest 3.13-rc6 kernel. Milo fixed the regression here"
* 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: lp5521/5523: Remove duplicate mutex
Linus Torvalds [Fri, 10 Jan 2014 23:25:02 +0000 (06:25 +0700)]
Merge tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- Recent commits modifying the lists of C-states in the intel_idle
driver introduced bugs leading to crashes on some systems. Two fixes
from Jiang Liu.
- The ACPI AC driver should receive all types of notifications, but
recent change made it ignore some of them. Fix from Alexander Mezin.
- intel_pstate's validity checks for MSRs it depends on are not
sufficient to catch the lack of support in nested KVM setups, so they
are extended to cover that case. From Dirk Brandewie.
- NEC LZ750/LS has a botched up _BIX method in its ACPI tables, so our
ACPI battery driver needs a quirk for it. From Lan Tianyu.
- The tpm_ppi driver sometimes leaks memory allocated by
acpi_get_name(). Fix from Jiang Liu.
* tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: close avn_cstates array with correct marker
Revert "intel_idle: mark states tables with __initdata tag"
ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
ACPI / TPM: fix memory leak when walking ACPI namespace
ACPI / AC: change notification handler type to ACPI_ALL_NOTIFY
Linus Torvalds [Fri, 10 Jan 2014 23:23:57 +0000 (06:23 +0700)]
Merge tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes
Pull MFD fix from Samuel Ortiz:
"This is the 2nd MFD pull request for 3.13
It only contains one fix for the rtsx_pcr driver. Without it we see a
kernel panic on some machines, when resuming from suspend to RAM"
* tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
Milo Kim [Tue, 3 Dec 2013 01:21:44 +0000 (17:21 -0800)]
leds: lp5521/5523: Remove duplicate mutex
It can be a problem when a pattern is loaded via the firmware interface.
LP55xx common driver has already locked the mutex in 'lp55xx_firmware_loaded()'.
So it should be deleted.
On the other hand, locks are required in store_engine_load()
on updating program memory.
Jesse Barnes [Fri, 10 Jan 2014 21:13:09 +0000 (13:13 -0800)]
drm/i915/bdw: make sure south port interrupts are enabled properly v2
We were apparently relying on the defaults on BDW, which resulted in no
hotplug or AUX interrupts. So be sure to call the ibx_irq_preinstall to
enable all interrupts.
v2: use preinstall instead of redundant SDIER write
Chuansheng Liu [Tue, 7 Jan 2014 08:53:34 +0000 (16:53 +0800)]
xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().
Jie Liu [Wed, 1 Jan 2014 11:28:03 +0000 (19:28 +0800)]
xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:
This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.
Shahed Shaikh [Thu, 9 Jan 2014 17:41:05 +0000 (12:41 -0500)]
qlcnic: Fix ethtool statistics length calculation
o Consider number of Tx queues while calculating the length of
Tx statistics as part of ethtool stats.
o Calculate statistics lenght properly for 82xx and 83xx adapter