Russell King [Sun, 6 Apr 2014 15:17:39 +0000 (16:17 +0100)]
ARM: add missing system_misc.h include to process.c
arm_pm_restart(), arm_pm_idle() and soft_restart() are all declared in
system_misc.h, but this file is not included in process.c. Add this
missing include. Found via sparse:
arch/arm/kernel/process.c:98:6: warning: symbol 'soft_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:127:6: warning: symbol 'arm_pm_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:134:6: warning: symbol 'arm_pm_idle' was not declared. Should it be static?
Levente Kurusa [Fri, 7 Feb 2014 08:43:21 +0000 (09:43 +0100)]
backlight: core: Replace kfree with put_device
As per the comments on device_register, we shouldn't call kfree()
right after a device_register() failure. Instead call put_device(),
which in turn will call bl_device_release resulting in a kfree to the
full structure.
PCM and S/PDIF drivers referenced mach headers for a trivial
data structure. This caused build errors on multiplatform builds
as machine headers are not accessible from driver files. Move the data
structure definition to the driver header and remove the dependency.
While at it rename the structure to avoid multiple definition errors
as the same structure is also used by the platform code.
ASoC: fsl_sai: Fix Bit Clock Polarity configurations
The BCP bit in TCR4/RCR4 register rules as followings:
0 Bit clock is active high with drive outputs on rising edge
and sample inputs on falling edge.
1 Bit clock is active low with drive outputs on falling edge
and sample inputs on rising edge.
For all formats currently supported in the fsl_sai driver, they're exactly
sending data on the falling edge and sampling on the rising edge.
However, the driver clears this BCP bit for all of them which results click
noise when working with SGTL5000 and big noise with WM8962.
Thus this patch corrects the BCP settings for all the formats here to fix
the nosie issue.
* pm-cpufreq:
cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
cpufreq: create another field .flags in cpufreq_frequency_table
cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
cpufreq: don't print value of .driver_data from core
cpufreq: ia64: don't set .driver_data to index
cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
cpufreq: powernv: cpufreq driver for powernv platform
cpufreq: at32ap: don't declare local variable as static
cpufreq: loongson2_cpufreq: don't declare local variable as static
cpufreq: unicore32: fix typo issue for 'clk'
cpufreq: exynos: Disable on multiplatform build
Ming Lei [Sun, 6 Apr 2014 17:36:08 +0000 (01:36 +0800)]
arm, kvm: fix double lock on cpu_add_remove_lock
Commit 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
holds the lock before calling the two functions:
kvm_vgic_hyp_init()
kvm_timer_hyp_init()
and both the two functions are calling register_cpu_notifier()
to register cpu notifier, so cause double lock on cpu_add_remove_lock.
Considered that both two functions are only called inside
kvm_arch_init() with holding cpu_add_remove_lock, so simply use
__register_cpu_notifier() to fix the problem.
Fixes: 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration) Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Paul Bolle [Mon, 7 Apr 2014 14:15:45 +0000 (16:15 +0200)]
spi: qup: Depend on ARCH_QCOM
Commit 8fc1b0f87d9f ("ARM: qcom: Split Qualcomm support into legacy and
multiplatform") removed Kconfig symbol ARCH_MSM_DT. But that commit
left one (optional) dependency on ARCH_MSM_DT untouched.
Three Kconfig symbols used to depend on ARCH_MSM_DT: ARCH_MSM8X60,
ARCH_MSM8960, and ARCH_MSM8974. These three symbols now depend on
ARCH_QCOM. So it appears this driver needs to depend on ARCH_QCOM too.
arm64: Fix DMA range invalidation for cache line unaligned buffers
If the buffer needing cache invalidation for inbound DMA does start or
end on a cache line aligned address, we need to use the non-destructive
clean&invalidate operation. This issue was introduced by commit 7363590d2c46 (arm64: Implement coherent DMA API based on swiotlb).
Daniel Lezcano [Mon, 17 Mar 2014 11:17:30 +0000 (12:17 +0100)]
cpuidle: sysfs: Export target residency information
From user space, there is no way to know the target residency for each idle
state. If we want to write tools to measure the accuracy of the idle state
selection from the governor, we need this info.
As the exit latency is exported through sysfs, exporting the target residency
in the same place makes sense.
ALSA: hda - Do not assign streams in reverse order
Currently stream numbers are assigned in reverse order.
Unfortunately commit 7546abfb8e1f9933b5 ("ALSA: hda - Increment
default stream numbers for AMD HDMI controllers") assumed this was not
the case (specifically, it had the "old cards had single device only"
=> "extra unused stream numbers do not matter" assumption), causing
non-working audio regressions for AMD Radeon HDMI users.
Change the stream numbers to be assigned in forward order.
The benefit is that regular audio playback will still work even if the
assumed stream count is too high, downside is that a too high stream
count may remain hidden.
Kailang Yang [Tue, 8 Apr 2014 07:19:49 +0000 (15:19 +0800)]
ALSA: hda/realtek - Change model name alias for ChromeOS
Chrome OS was use model name of alc283-dac-wcaps for loading model as default.
Change the model name to same as model name of Chrome OS for future support.
Patrick Titiano [Fri, 28 Feb 2014 13:10:04 +0000 (14:10 +0100)]
thermal: rcar-thermal: update thermal zone only when temperature changes
Avoid updating the thermal zone in case an IRQ was triggered but the
temperature didn't effectively change.
Note this is not a driver issue.
Below is a captured debug trace illustrating the purpose of this patch:
out of 8 thermal zone updates, only 2 are actually necessary.
[ 41.120000] rcar_thermal_work(): cctemp=25000
[ 41.120000] rcar_thermal_work(): nctemp=30000
[ 41.120000] rcar_thermal_work(): temp is now 30000C, update thermal zone
[ 58.990000] rcar_thermal_work(): cctemp=30000
[ 58.990000] rcar_thermal_work(): nctemp=30000
[ 58.990000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.290000] rcar_thermal_work(): cctemp=30000
[ 59.290000] rcar_thermal_work(): nctemp=30000
[ 59.290000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.590000] rcar_thermal_work(): cctemp=30000
[ 59.590000] rcar_thermal_work(): nctemp=30000
[ 59.590000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.890000] rcar_thermal_work(): cctemp=30000
[ 59.890000] rcar_thermal_work(): nctemp=30000
[ 59.890000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.190000] rcar_thermal_work(): cctemp=30000
[ 60.190000] rcar_thermal_work(): nctemp=30000
[ 60.190000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.490000] rcar_thermal_work(): cctemp=30000
[ 60.490000] rcar_thermal_work(): nctemp=30000
[ 60.490000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.790000] rcar_thermal_work(): cctemp=30000
[ 60.790000] rcar_thermal_work(): nctemp=35000
[ 60.790000] rcar_thermal_work(): temp is now 35000C, update thermal zone
I suspect this may be due to sensor sampling accuracy / fluctuation,
but no formal proof.
Anson Huang [Wed, 12 Feb 2014 10:06:35 +0000 (18:06 +0800)]
thermal: imx: update formula for thermal sensor
Thermal sensor used to need two calibration points which are
in fuse map to get a slope for converting thermal sensor's raw
data to real temperature in degree C. Due to the chip calibration
limitation, hardware team provides an universal formula to get
real temperature from internal thermal sensor raw data:
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
"various cleanups for ext2, ext3, udf, isofs, a documentation update
for quota, and a fix of a race in reiserfs readdir implementation"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: fix race in readdir
ext2: acl: remove unneeded include of linux/capability.h
ext3: explicitly remove inode from orphan list after failed direct io
fs/isofs/inode.c add __init to init_inodecache()
ext3: Speedup WB_SYNC_ALL pass
fs/quota/Kconfig: Update filesystems
ext3: Update outdated comment before ext3_ordered_writepage()
ext3: Update PF_MEMALLOC handling in ext3_write_inode()
ext2/3: use prandom_u32() instead of get_random_bytes()
ext3: remove an unneeded check in ext3_new_blocks()
ext3: remove unneeded check in ext3_ordered_writepage()
fs: Mark function as static in ext3/xattr_security.c
fs: Mark function as static in ext3/dir.c
fs: Mark function as static in ext2/xattr_security.c
ext3: Add __init macro to init_inodecache
ext2: Add __init macro to init_inodecache
udf: Add __init macro to init_inodecache
fs: udf: parse_options: blocksize check
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild changes from Michal Marek:
- cleanups in the main Makefiles and Documentation/DocBook/Makefile
- make O=... directory is automatically created if needed
- mrproper/distclean removes the old include/linux/version.h to make
life easier when bisecting across the commit that moved the version.h
file
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: docbook: fix the include error when executing "make help"
kbuild: create a build directory automatically for out-of-tree build
kbuild: remove redundant '.*.cmd' pattern from make distclean
kbuild: move "quote" to Kbuild.include to be consistent
kbuild: docbook: use $(obj) and $(src) rather than specific path
kbuild: unconditionally clobber include/linux/version.h on distclean
kbuild: docbook: specify KERNELDOC dependency correctly
kbuild: docbook: include cmd files more simply
kbuild: specify build_docproc as a phony target
Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC changes from Vineet Gupta:
- Support for external initrd from Noam
- Fix broken serial console in nsimosci Virtual Platform
- Reuse of ENTRY/END assembler macros across hand asm code
- Other minor fixes here and there
* tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [nsimosci] Unbork console
ARC: [nsimosci] Change .dts to use generic 8250 UART
ARC: [SMP] General Fixes
ARC: Remove unused DT template file
ARC: [clockevent] simplify timer ISR
ARC: [clockevent] can't be SoC specific
ARC: Remove ARC_HAS_COH_RTSC
ARC: switch to generic ENTRY/END assembler annotations
ARC: support external initrd
ARC: add uImage to .gitignore
ARC: [arcfpga] Fix __initconst data const-correctness
Russell King [Mon, 7 Apr 2014 11:00:17 +0000 (12:00 +0100)]
DRM: armada: fix corruption while loading cursors
Loading cursors to the LCD controller's SRAM can be corrupted when the
configured pixel clock is relatively slow. This seems to be caused
when we write back-to-back to the SRAM registers.
There doesn't appear to be any status register we can read to check
when an access has completed.
Inserting a dummy read between the writes appears to fix the problem.
Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things
* emailed patches from Andrew Morton <[email protected]>: (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...
Pointer 'usb3' to struct ufs_super_block_third acquired via
ubh_get_usb_third() is never used in function
ufs_read_cylinder_structures(). Thus remove it.
Mark Salter [Mon, 7 Apr 2014 22:39:52 +0000 (15:39 -0700)]
arm64: add early_ioremap support
Add support for early IO or memory mappings which are needed before the
normal ioremap() is usable. This also adds fixmap support for permanent
fixed mappings such as that used by the earlyprintk device register
region.
Mark Salter [Mon, 7 Apr 2014 22:39:51 +0000 (15:39 -0700)]
arm64: initialize pgprot info earlier in boot
Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.
The new fixmap and early_ioremap support also needs to use these macros
before paging_init() is called. This patch moves the init_mem_pgprot()
call out of paging_init() and into setup_arch() so that pgprot_default
gets initialized in time for fixmap and early_ioremap.
Mark Salter [Mon, 7 Apr 2014 22:39:48 +0000 (15:39 -0700)]
mm: create generic early_ioremap() support
This patch creates a generic implementation of early_ioremap() support
based on the existing x86 implementation. early_ioremp() is useful for
early boot code which needs to temporarily map I/O or memory regions
before normal mapping functions such as ioremap() are available.
Some architectures have optional MMU. In the no-MMU case, the remap
functions simply return the passed in physical address and the unmap
functions do nothing.
Dave Young [Mon, 7 Apr 2014 22:39:46 +0000 (15:39 -0700)]
x86/mm: sparse warning fix for early_memremap
This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures. The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available. Typically, this
means before paging_init() has run.
This patch (of 6):
There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);
early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.
For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap
From Boris:
"Ingo told me yesterday, it makes sense too. I'd guess we can try it.
FWIW, all callers of early_memremap use the memory they get remapped
as normal memory so we should be safe"
We define a check function in order to avoid trouble with the include
files. Then the higher level __this_cpu macros are modified to invoke
the preemption check.
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
vm counters are allowed to be racy. Use raw_cpu_ops to avoid the
local_irq_disable overhead and to avoid preemption checks which will be
added to the __this_cpu operations.
Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.
When __this_cpu_inc is altered to check if preemption is disabled this
triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.
net: replace __this_cpu_inc in route.c with raw_cpu_inc
The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.
I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915ce87e ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races. In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high. These counters may be inaccurate
due to race conditions.
The use of __this_cpu op improves the situation already from what commit dbd2915ce87e did since the single instruction emitted on x86 does not
allow the race to occur anymore. However, non x86 platforms could still
experience a race here.
modules: use raw_cpu_write for initialization of per cpu refcount.
The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.
So simply disable the check through the use of raw_cpu ops.
Therefore we need to use raw_cpu_read here to avoid false postives.
Note that this issue has been discussed in prior years. If the process
changes nodes after retrieving the current numa node then that is
acceptable since most uses of numa_node etc are for optimization and not
for correctness.
There were suggestions to implement a raw_numa_node_id in order to do
preempt checks for numa_node_id as well. But I think we better defer
that to another patch since that would mean investigating how
numa_node_id() is used throughout the kernel which would increase the
scope of this patchset significantly. After all preemption was never
checked before when numa_node_id() was used.
The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel. The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).
The patch set also addresses various consistency issues in general with
the per cpu macros.
A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
because checks are skipped. This is typically shown through a raw_
prefix. So this patch set changes the places where __this_cpu_ptr()
is used to raw_cpu_ptr().
B. There has been the long term wish by some that __this_cpu operations
would check for preemption. However, there are cases where preemption
checks need to be skipped. This patch set adds raw_cpu operations that
do not check for preemption and then adds preemption checks to the
__this_cpu operations.
C. The use of __get_cpu_var is always a reference to a percpu variable
that can also be handled via a this_cpu operation. This patch set
replaces all uses of __get_cpu_var with this_cpu operations.
D. We can then use this_cpu RMW operations in various places replacing
sequences of instructions by a single one.
E. The use of this_cpu operations throughout will allow other arches than
x86 to implement optimized references and RMV operations to work with
per cpu local data.
F. The use of this_cpu operations opens up the possibility to
further optimize code that relies on synchronization through
per cpu data.
The patch set works in a couple of stages:
I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
Also converts the existing __this_cpu_xx_# primitive in the x86
code to raw_cpu_xx_#.
II. Patch 2-4 use the raw_cpu operations in places that would give
us false positives once they are enabled.
III. Patch 5 adds preemption checks to __this_cpu operations to allow
checking if preemption is properly disabled when these functions
are used.
IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
with this_cpu_ptr. They do not depend on any changes to the percpu
code. No preemption tests are skipped if they are applied.
V. Patches 21-46 are conversion patches that use this_cpu operations
in various kernel subsystems/drivers or arch code.
VI. Patches 47/48 (not included in this series) remove no longer used
functions (__this_cpu_ptr and __get_cpu_var). These should only be
applied after all the conversion patches have made it and after we
have done additional passes through the kernel to ensure that none of
the uses of these functions remain.
This patch (of 46):
The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.
raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.
Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.
Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).
Dave Jones [Mon, 7 Apr 2014 22:39:32 +0000 (15:39 -0700)]
slub: fix leak of 'name' in sysfs_slab_add
The failure paths of sysfs_slab_add don't release the allocation of
'name' made by create_unique_id() a few lines above the context of the
diff below. Create a common exit path to make it more obvious what
needs freeing.
Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches. Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache. For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:
X
A -> X
B -> X
where X is some unique id (see create_unique_id()). Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:
X
X:M
X:N
A -> X
B -> X
A:M -> X:M
A:N -> X:N
Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.
It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory. This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.
This patch does the trick. It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.
memcg, slab: do not destroy children caches if parent has aliases
Currently we destroy children caches at the very beginning of
kmem_cache_destroy(). This is wrong, because the root cache will not
necessarily be destroyed in the end - if it has aliases (refcount > 0),
kmem_cache_destroy() will simply decrement its refcount and return. In
this case, at best we will get a bunch of warnings in dmesg, like this
one:
kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
CPU: 1 PID: 7139 Comm: modprobe Tainted: G B W 3.13.0+ #117
Call Trace:
dump_stack+0x49/0x5b
kmem_cache_destroy+0xdf/0xf0
kmem_cache_destroy_memcg_children+0x97/0xc0
kmem_cache_destroy+0xf/0xf0
xfs_mru_cache_uninit+0x21/0x30 [xfs]
exit_xfs_fs+0x2e/0xc44 [xfs]
SyS_delete_module+0x198/0x1f0
system_call_fastpath+0x16/0x1b
At worst - if kmem_cache_destroy() will race with an allocation from a
memcg cache - the kernel will panic.
This patch fixes this by moving children caches destruction after the
check if the cache has aliases. Plus, it forbids destroying a root
cache if it still has children caches, because each children cache keeps
a reference to its parent.
memcg, slab: unregister cache from memcg before starting to destroy it
Currently, memcg_unregister_cache(), which deletes the cache being
destroyed from the memcg_slab_caches list, is called after
__kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
destroy the cache.
As a result, one can access a partially destroyed cache while traversing
a memcg_slab_caches list, which can have deadly consequences (for
instance, cache_show() called for each cache on a memcg_slab_caches list
from mem_cgroup_slabinfo_read() will dereference pointers to already
freed data).
To fix this, let's move memcg_unregister_cache() before the cache
destruction process beginning, issuing memcg_register_cache() on failure.
memcg, slab: separate memcg vs root cache creation paths
Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
memcg-only and except-for-memcg calls. To clean this up, let's move the
code responsible for memcg cache creation to a separate function.
This patch cleans up the memcg cache creation path as follows:
- Move memcg cache name creation to a separate function to be called
from kmem_cache_create_memcg(). This allows us to get rid of the mutex
protecting the temporary buffer used for the name formatting, because
the whole cache creation path is protected by the slab_mutex.
- Get rid of memcg_create_kmem_cache(). This function serves as a proxy
to kmem_cache_create_memcg(). After separating the cache name creation
path, it would be reduced to a function call, so let's inline it.
When a kmem cache is created (kmem_cache_create_memcg()), we first try to
find a compatible cache that already exists and can handle requests from
the new cache, i.e. has the same object size, alignment, ctor, etc. If
there is such a cache, we do not create any new caches, instead we simply
increment the refcount of the cache found and return it.
Currently we do this procedure not only when creating root caches, but
also for memcg caches. However, there is no point in that, because, as
every memcg cache has exactly the same parameters as its parent and cache
merging cannot be turned off in runtime (only on boot by passing
"slub_nomerge"), the root caches of any two potentially mergeable memcg
caches should be merged already, i.e. it must be the same root cache, and
therefore we couldn't even get to the memcg cache creation, because it
already exists.
The only exception is boot caches - they are explicitly forbidden to be
merged by setting their refcount to -1. There are currently only two of
them - kmem_cache and kmem_cache_node, which are used in slab internals (I
do not count kmalloc caches as their refcount is set to 1 immediately
after creation). Since they are prevented from merging preliminary I
guess we should avoid to merge their children too.
So let's remove the useless code responsible for merging memcg caches.
kernel: use macros from compiler.h instead of __attribute__((...))
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.
If the renamed symbol is defined lib/iomap.c implements ioport_map and
ioport_unmap and currently (nearly) all platforms define the port
accessor functions outb/inb and friend unconditionally. So
HAS_IOPORT_MAP is the better name for this.
Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.
The motivation for this change is to reintroduce a symbol HAS_IOPORT
that signals if outb/int et al are available. I will address that at
least one merge window later though to keep surprises to a minimum and
catch new introductions of (HAS|NO)_IOPORT.
Daniel M. Weeks [Mon, 7 Apr 2014 22:39:16 +0000 (15:39 -0700)]
initramfs: debug detected compression method
This can greatly aid in narrowing down the real source of initramfs
problems such as failures related to the compression of the in-kernel
initramfs when an external initramfs is in use as well. Existing errors
are ambiguous as to which initramfs is a problem and why.
x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG
This ensures that BUG() always has a definition that causes a trap (via
an undefined instruction), and that the compiler still recognizes the
code following BUG() as unreachable, avoiding warnings that would
otherwise appear (such as on non-void functions that don't return a
value after BUG()).
In addition to saving a few bytes over the generic infinite-loop
implementation, this implementation traps rather than looping, which
potentially allows for better error-recovery behavior (such as by
rebooting).
When !CONFIG_BUG and !HAVE_ARCH_BUG, define the generic BUG() as an
infinite loop rather than a no-op. This avoids undefined behavior if
execution ever actually reaches BUG(), and avoids warnings about code
after BUG() (such as on non-void functions calling BUG() and then not
returning).
bug: when !CONFIG_BUG, simplify WARN_ON_ONCE and family
When !CONFIG_BUG, WARN_ON and family become simple passthroughs of their
condition argument; however, WARN_ON_ONCE and family still have conditions
and a boolean to detect one-time invocation, even though the warning
they'd emit doesn't exist. Make the existing definitions conditional on
CONFIG_BUG, and add definitions for !CONFIG_BUG that map to the
passthrough versions of WARN and WARN_ON.
This saves 4.4k on a minimized configuration (smaller than allnoconfig),
and 20.6k with defconfig plus CONFIG_BUG=n.
kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT
"make allnoconfig" exists to ease testing of minimal configurations.
Documentation/SubmitChecklist includes a note to test with allnoconfig.
This helps catch missing dependencies on common-but-not-required
functionality, which might otherwise go unnoticed.
However, allnoconfig still leaves many symbols enabled, because they're
hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig
still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
typically get build-tested with those disabled.
To address this, introduce a new Kconfig option "allnoconfig_y", used on
symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED
(which then selects CONFIG_EXPERT). allnoconfig will then disable all the
symbols hidden behind those.
ia64: select CONFIG_TTY for use of tty_write_message in unaligned
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/ia64/kernel/unaligned.c uses tty_write_message to print an
unaligned access exception to the TTY of the current user process.
Enable TTY to prevent a build error.
Minimal fix, on the basis that few people on ia64 will care deeply about
kernel size enough to turn off TTY. Ideally, I'd instead suggest
dropping the tty_write_message entirely, and just leaving the printk.
Bonus: no need to sprintf first.
Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...
This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.
Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
affs: add mount option to avoid filename truncates
Normal behavior for filenames exceeding specific filesystem limits is to
refuse operation.
AFFS standard name length being only 30 characters against 255 for usual
Linux filesystems, original implementation does filename truncate by
default with a define value AFFS_NO_TRUNCATE which can be enabled but
needs module compilation.
This patch adds 'nofilenametruncate' mount option so that user can
easily activate that feature and avoid a lot of problems (eg overwrite
files ...)
fs/affs/dir.c: unlock/brelse dir on failure + code clean-up
Commit 0edf977d2ae3 ("[readdir] convert affs") returns directly -EIO
without unlocking dir inode and releasing dir bh when second affs_bread
sequence fails. This patch restores initial behaviour. It also fixes
pr_debug and affs_error to fit in 80 columns + removes reference to
filldir (replaced by dir_emit in the commit above).
Liu Hua [Mon, 7 Apr 2014 22:38:57 +0000 (15:38 -0700)]
hung_task: check the value of "sysctl_hung_task_timeout_sec"
As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :
rapidio: rework device hierarchy and introduce mport class of devices
This patch removes an artificial RapidIO bus root device and establishes
actual device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip
or an eternal bridge, known as "mport").
Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller. With introduction of devices using
multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
is very limiting or does not work at all. The implemented changes allow
to properly reference platform's local RapidIO mport devices and provide
device details needed for upper layers.
This change to RapidIO device hierarchy does not break any known
existing kernel or user space interfaces.
drivers/rapidio/devices/tsi721_dma.c: optimize use of BDMA descriptors
Combine SG entries describing single contiguous memory block into one
Tsi721 BDMA descriptor. This reduces number of hardware descriptors
required for large data transfers and improves performance on the PCIe
side by reducing number of descriptor fetch requests.
Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)
The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure. And in the
case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL)
WANG Chao [Mon, 7 Apr 2014 22:38:51 +0000 (15:38 -0700)]
vmcore: continue vmcore initialization if PT_NOTE is found empty
Currently when an empty PT_NOTE is detected, vmcore initialization
fails. It sounds too harsh. Because PT_NOTE could be empty, for
example, one offlined a cpu but never restarted kdump service, and after
crash, PT_NOTE program header is there but no data contains. It's
better to warn about the empty PT_NOTE and continue to initialise
vmcore.
And ultimately the multiple PT_NOTE are merged into a single one, all
empty PT_NOTE are discarded naturally during the merge. So empty
PT_NOTE is not visible to user space and vmcore is as good as expected.
wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process
Even if the main thread is dead the process still can stop/continue.
However, if the leader is ptraced wait_consider_task(ptrace => false)
always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
WCONTINUED can never work for the natural parent in this case.
Move the "A zombie ptracee is only visible to its ptracer" check into the
"if (!delay_group_leader(p))" block. ->notask_error is cleared by the
"fall through" code below.
This depends on the previous change, wait_task_stopped/continued must be
avoided if !delay_group_leader() and the tracer is ->real_parent.
Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
is already dead (single-threaded or not). If it is traced by another task
then the "stopped" state is fine until the debugger detaches and reveals a
zombie state.
Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
never called.
Note: this doesn't fix all problems with a zombie delay_group_leader(),
WCONTINUED | WEXITED check is not exactly right. debugger can't assume it
will be notified if another thread reaps the whole thread group.
it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
child. This is because wait_consider_task(ptrace => 0) sees p->ptrace and
cleares ->notask_error assuming that the debugger should detach and notify
us.
Change wait_consider_task(ptrace => 0) to pretend that ptrace == T if the
child is traced by us. This really simplifies the logic and allows us to
do more fixes, see the next changes. This also hides the unwanted group
stop state automatically, we can remove another ptrace_reparented() check.
Unfortunately, this adds the following behavioural changes:
1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
a natural child if it is traced by the caller's sub-thread.
Hopefully nobody will ever notice this change, and I think
that nobody should rely on this behaviour anyway.
2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
it is real parent.
While this change comes as a side effect, I think it is good
by itself. The group continued state can not be consumed by
another process in this case, it doesn't depend on ptrace,
it doesn't make sense to hide it from real parent.
Perhaps we should add the thread_group_leader() check before
wait_task_continued()? May be, but this shouldn't depend on
ptrace_reparented().
wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space
get_task_state() uses the most significant bit to report the state to
user-space, this means that EXIT_ZOMBIE->EXIT_TRACE->EXIT_DEAD transition
can be noticed via /proc as Z -> X -> Z change. Note that this was
possible even before EXIT_TRACE was introduced.
This is not really bad but imho it make sense to hide EXIT_TRACE from
user-space completely. So the patch simply swaps EXIT_ZOMBIE and
EXIT_DEAD, this way EXIT_TRACE will be seen as EXIT_ZOMBIE by user-space.
wait: use EXIT_TRACE only if thread_group_leader(zombie)
wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
ptrace_reparented(). This is suboptimal and a bit confusing: we do not
need do_notify_parent(p) if !thread_group_leader(p) and in this case we
also do not need ptrace_unlink(), we can rely on ptrace_release_task().
Change wait_task_zombie() to check thread_group_leader() along with
ptrace_reparented() and simplify the final p->exit_state transition.
wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition
wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else. So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable. This was fixed by
the previous commit, but it was the temporary hack.
1. Add the new exit_state, EXIT_TRACE. It means that the task is the
traced zombie, debugger is going to detach and notify its natural
parent.
This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
can avoid the changes in proc/kgdb code, get_task_state() still
reports "X (dead)" in this case.
Note: with or without this change userspace can see Z -> X -> Z
transition. Not really bad, but probably makes sense to fix.
2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
if we need to notify the ->real_parent.
3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
is always the final state we can safely ignore such a task.
4. Change wait_consider_task() to check EXIT_TRACE separately and kill
the racy and no longer needed ptrace_reparented() case.
If ptrace == T an EXIT_TRACE thread should be simply ignored, the
owner of this state is going to ptrace_unlink() this task. We can
pretend that it was already removed from ->ptraced list.
Otherwise we should skip this thread too but clear ->notask_error,
we must be the natural parent and debugger is going to untrace and
notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
even if the task was already untraced.
wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race
wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else. So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.
Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.