Davidlohr Bueso [Thu, 11 Nov 2010 22:05:24 +0000 (14:05 -0800)]
drivers/leds/leds-gpio.c: properly initialize return value
In the event that none of the configs are set (CONFIG_LEDS_GPIO_PLATFORM,
CONFIG_LEDS_GPIO_OF, CONFIG_LEDS_GPIO_PLATFORM), we will return a bogus
value when initializing the module.
Samu Onkalo [Thu, 11 Nov 2010 22:05:22 +0000 (14:05 -0800)]
leds: driver for National Semiconductors LP5523 chip
LP5523 chip is nine channel led driver with programmable engines. Driver
provides support for that chip for direct access via led class or via
programmable engines.
Samu Onkalo [Thu, 11 Nov 2010 22:05:22 +0000 (14:05 -0800)]
leds: driver for National Semiconductor LP5521 chip
This patchset provides support for LP5521 and LP5523 LED driver chips from
National Semicondutor. Both drivers supports programmable engines and
naturally LED class features.
Documentation is provided as a part of the patchset. I created "leds"
subdirectory under Documentation. Perhaps the rest of the leds*
documentation should be moved there.
Datasheets are freely available at National Semiconductor www pages.
This patch:
LP5521 chip is three channel led driver with programmable engines. Driver
provides support for that chip for direct access via led class or via
programmable engines.
Johannes Berg [Thu, 11 Nov 2010 22:05:21 +0000 (14:05 -0800)]
led-class: always implement blinking
Currently, blinking LEDs can be awkward because it is not guaranteed that
all LEDs implement blinking. The trigger that wants it to blink then
needs to implement its own timer solution.
Rather than require that, add led_blink_set() API that triggers can use.
This function will attempt to use hw blinking, but if that fails
implements a timer for it. To stop blinking again, brightness_set() also
needs to be wrapped into API that will stop the software blink.
As a result of this, the timer trigger becomes a very trivial one, and
hopefully we can finally see triggers using blinking as well because it's
always easy to use.
Nick Piggin [Thu, 11 Nov 2010 22:05:19 +0000 (14:05 -0800)]
radix-tree: fix RCU bug
Salman Qazi describes the following radix-tree bug:
In the following case, we get can get a deadlock:
0. The radix tree contains two items, one has the index 0.
1. The reader (in this case find_get_pages) takes the rcu_read_lock.
2. The reader acquires slot(s) for item(s) including the index 0 item.
3. The non-zero index item is deleted, and as a consequence the other item is
moved to the root of the tree. The place where it used to be is queued for
deletion after the readers finish.
3b. The zero item is deleted, removing it from the direct slot, it remains in
the rcu-delayed indirect node.
4. The reader looks at the index 0 slot, and finds that the page has 0 ref
count
5. The reader looks at it again, hoping that the item will either be freed or
the ref count will increase. This never happens, as the slot it is looking
at will never be updated. Also, this slot can never be reclaimed because
the reader is holding rcu_read_lock and is in an infinite loop.
The fix is to re-use the same "indirect" pointer case that requires a slot
lookup retry into a general "retry the lookup" bit.
Dan Rosenberg [Thu, 11 Nov 2010 22:05:18 +0000 (14:05 -0800)]
Restrict unprivileged access to kernel syslog
The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses. Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.
This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl. When set to "0", the default, no restrictions are
enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.
David Rientjes [Thu, 11 Nov 2010 22:05:18 +0000 (14:05 -0800)]
oom: document obsolete oom_adj tunable
/proc/pid/oom_adj was deprecated in August 2010 with the introduction of
the new oom killer heuristic.
This patch copies the Documentation/feature-removal-schedule.txt entry for
this tunable to the Documentation/ABI/obsolete directory so nobody misses
it.
Shaohua Li [Thu, 11 Nov 2010 22:05:17 +0000 (14:05 -0800)]
vmscan: avoid setting zone congested if no page dirty
nr_dirty and nr_congested are increased only when the page is dirty. So
if all pages are clean, both them will be zero. In this case, we should
not mark the zone congested.
Ken Chen [Thu, 11 Nov 2010 22:05:16 +0000 (14:05 -0800)]
latencytop: fix per task accumulator
Per task latencytop accumulator prematurely terminates due to erroneous
placement of latency_record_count. It should be incremented whenever a
new record is allocated instead of increment on every latencytop event.
Also fix search iterator to only search known record events instead of
blindly searching all pre-allocated space.
Dave Hansen [Thu, 11 Nov 2010 22:05:15 +0000 (14:05 -0800)]
mm/vfs: revalidate page->mapping in do_generic_file_read()
70 hours into some stress tests of a 2.6.32-based enterprise kernel, we
ran into a NULL dereference in here:
int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
unsigned long from)
{
----> struct inode *inode = page->mapping->host;
It looks like page->mapping was the culprit. (xmon trace is below).
After closer examination, I realized that do_generic_file_read() does a
find_get_page(), and eventually locks the page before calling
block_is_partially_uptodate(). However, it doesn't revalidate the
page->mapping after the page is locked. So, there's a small window
between the find_get_page() and ->is_partially_uptodate() where the page
could get truncated and page->mapping cleared.
We _have_ a reference, so it can't get reclaimed, but it certainly
can be truncated.
I think the correct thing is to check page->mapping after the
trylock_page(), and jump out if it got truncated. This patch has been
running in the test environment for a month or so now, and we have not
seen this bug pop up again.
kernel/range.c: fix clean_sort_range() for the case of full array
clean_sort_range() should return a number of nonempty elements of range
array, but if the array is full clean_sort_range() returns 0.
The problem is that the number of nonempty elements is evaluated by
finding the first empty element of the array. If there is no such element
it returns an initial value of local variable nr_range that is zero.
The fix is trivial: it changes initial value of nr_range to size of the
array.
The bug can lead to loss of information regarding all ranges, since
typically returned value of clean_sort_range() is considered as an actual
number of ranges in the array after a series of add/subtract operations.
Found by Analytical Verification project of Linux Verification Center
(linuxtesting.org), thanks to Alexander Kolosov.
Dan Carpenter [Thu, 11 Nov 2010 22:05:13 +0000 (14:05 -0800)]
drivers/misc/bh1770glc.c: error handling in bh1770_power_state_store()
There was a signedness bug so "ret" was never less than zero and that
breaks the error handling. Also in the original code it would overwrite
ret and the result is still negative but it's bogus number instead of the
correct error code.
Dan Carpenter [Thu, 11 Nov 2010 22:05:12 +0000 (14:05 -0800)]
memcg: null dereference on allocation failure
The original code had a null dereference if alloc_percpu() failed. This
was introduced in commit 711d3d2c9bc3 ("memcg: cpu hotplug aware percpu
count updates")
Catalin Marinas [Thu, 11 Nov 2010 22:05:10 +0000 (14:05 -0800)]
include/linux/highmem.h needs hardirq.h
Commit 3e4d3af501cc ("mm: stack based kmap_atomic()") introduced the
kmap_atomic_idx_push() function which warns on in_irq() with
CONFIG_DEBUG_HIGHMEM enabled. This patch includes linux/hardirq.h for
the in_irq definition.
Eric Dumazet [Thu, 11 Nov 2010 22:05:08 +0000 (14:05 -0800)]
atomic: add atomic_inc_not_zero_hint()
Followup of perf tools session in Netfilter WorkShop 2010
In the network stack we make high usage of atomic_inc_not_zero() in
contexts we know the probable value of atomic before increment (2 for udp
sockets for example)
Using a special version of atomic_inc_not_zero() giving this hint can help
processor to use less bus transactions.
On x86 (MESI protocol) for example, this avoids entering Shared state,
because "lock cmpxchg" issues an RFO (Read For Ownership)
akpm: Adds a new include/linux/atomic.h. This means that new code should
henceforth include linux/atomic.h and not asm/atomic.h. The presence of
include/linux/atomic.h will in fact cause checkpatch.pl to warn about use
of asm/atomic.h. The new include/linux/atomic.h becomes the place where
arch-neutral atomic_t code should be placed.
Dan Carpenter [Thu, 11 Nov 2010 22:05:07 +0000 (14:05 -0800)]
rapidio: use resource_size()
The size calculation is done incorrectly here because it should include
both the start and end (end - start + 1). It's easiest to just use
resource_size() which does the right thing.
I was worried there was something non-standard going on because the
printk() subtracts "end - 1", but the rest of the file uses the normal
resource size calculations. This function is only called from
fsl_rio_setup() in arch/powerpc/sysdev/fsl_rio.c and the calculation
there is also:
drivers/macintosh/adb-iop.c: flags should be unsigned long
Fix these warnings:
drivers/macintosh/adb-iop.c: In function `adb_iop_complete':
drivers/macintosh/adb-iop.c:85: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c:92: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c: In function ¡adb_iop_listen¢:
drivers/macintosh/adb-iop.c:111: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c:151: warning: comparison of distinct pointer types lacks a cast
Both commits 0a3d763f1a68 ("ptrace: cleanup arch_ptrace() on um") and 9b05a69e0534 ("ptrace: change signature of arch_ptrace()") broke the um
build. This patch fixes the issues.
0a3d763f1a68 introduced the undeclared variable "datavp". The patch seems
completely untested. :-(
9b05a69e0534 changed arch_ptrace()'s signature but did not update
um/include/asm/ptrace-generic.h.
Software generated interrupts (SGI) are used for IPIs by the kernel.
While previous revisions of the GIC hardware were specified not to
implement enable bits for SGIs, more recent hardware is now permitted
to implement these bits in a per-CPU banked register.
The priority registers for the PPI and SGIs are also per-CPU banked
registers, so ensure that these are also appropriately initialized.
Eric Paris [Fri, 12 Nov 2010 07:26:06 +0000 (08:26 +0100)]
netfilter: NF_HOOK_COND has wrong conditional
The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an
error in the code as the order of operations is not what was intended. C will
evalutate == before =. Which means ret is getting set to the bool result,
rather than the return value of the function call. The code says
if (ret = function() == 1)
when it meant to say:
if ((ret = function()) == 1)
Normally the compiler would warn, but it doesn't notice it because its
a actually complex conditional and so the wrong code is wrapped in an explict
set of () [exactly what the compiler wants you to do if this was intentional].
Fixing this means that errors when netfilter denies a packet get propagated
back up the stack rather than lost.
Problem introduced by commit 2249065f (netfilter: get rid of the grossness
in netfilter.h).
Sonic Zhang [Wed, 27 Oct 2010 08:16:48 +0000 (04:16 -0400)]
serial: bfin_5xx: remove redundant SSYNC to improve TX speed
We don't need to force a SSYNC here as the LSR register will already
be updated by the time we get back to reading it. This speeds up TX
throughput and lowers general system overhead (since SSYNC is system
wide, not peripheral-specific).
Sonic Zhang [Wed, 27 Oct 2010 08:16:47 +0000 (04:16 -0400)]
serial: bfin_5xx: always include DMA headers
On Blackfin systems, peripherals that have optional DMA support always
route their interrupts through the corresponding DMA channel -- even
when DMA is not being used. So in PIO mode, we still need to request
the DMA channel (so interrupts are delivered) which means we need to
always include the DMA header for the DMA defines/functions.
Nicolas Pitre [Wed, 10 Nov 2010 06:33:12 +0000 (01:33 -0500)]
vcs: make proper usage of the poll flags
Kay Sievers pointed out that usage of POLLIN is well defined by POSIX,
and the current usage here doesn't follow that definition. So let's
duplicate the same semantics as implemented by sysfs_poll() instead.
Lawrence Rust [Wed, 27 Oct 2010 12:41:02 +0000 (14:41 +0200)]
8250: Fix tcsetattr to avoid ioctl(TIOCMIWAIT) hang
Calling tcsetattr prevents any thread(s) currently suspended in ioctl
TIOCMIWAIT for the same device from ever resuming.
If a thread is suspended inside a call to ioctl TIOCMIWAIT, waiting for
a modem status change, then the 8250 driver enables modem status
interrupts (MSI). The device interrupt service routine resumes the
suspended thread(s) on the next MSI.
If while the thread(s) are suspended, another thread calls tcsetattr
then the 8250 driver disables MSI (unless CTS/RTS handshaking is
enabled) thus preventing the suspended thread(s) from ever being
resumed.
This patch only disables MSI in tcsetattr handling if there are no
suspended threads.
Axel Lin [Tue, 9 Nov 2010 08:41:48 +0000 (08:41 +0000)]
hwmon: (gpio-fan) Fix fan_ctrl_init error path
In current implementation, the sysfs entries is not removed before return -ENODEV.
Creating the sysfs attribute should be the last thing done by the function,
after all the rest has been successful.
Otherwise there is a small window during which user-space can access the attribute
but the driver isn't ready to deal with the requests.
Fix it by moving sysfs_create_group to be the last thing done by the function.
Jesse Barnes [Fri, 5 Nov 2010 19:16:36 +0000 (15:16 -0400)]
PCI: read current power state at enable time
When we enable a PCI device, we avoid doing a lot of the initial setup
work if the device's enable count is non-zero. If we don't fetch the
power state though, we may later fail to set up MSI due to the unknown
status. So pick it up before we short circuit the rest due to a
pre-existing enable or mismatched enable/disable pair (as happens with
VGA devices, which are special in a special way).
Martin Wilck [Wed, 10 Nov 2010 10:03:21 +0000 (11:03 +0100)]
PCI: fix size checks for mmap() on /proc/bus/pci files
The checks for valid mmaps of PCI resources made through /proc/bus/pci files
that were introduced in 9eff02e2042f96fb2aedd02e032eca1c5333d767 have several
problems:
1. mmap() calls on /proc/bus/pci files are made with real file offsets > 0,
whereas under /sys/bus/pci/devices, the start of the resource corresponds
to offset 0. This may lead to false negatives in pci_mmap_fits(), which
implicitly assumes the /sys/bus/pci/devices layout.
2. The loop in proc_bus_pci_mmap doesn't skip empty resouces. This leads
to false positives, because pci_mmap_fits() doesn't treat empty resources
correctly (the calculated size is 1 << (8*sizeof(resource_size_t)-PAGE_SHIFT)
in this case!).
3. If a user maps resources with BAR > 0, pci_mmap_fits will emit bogus
WARNINGS for the first resources that don't fit until the correct one is found.
On many controllers the first 2-4 BARs are used, and the others are empty.
In this case, an mmap attempt will first fail on the non-empty BARs
(including the "right" BAR because of 1.) and emit bogus WARNINGS because
of 3., and finally succeed on the first empty BAR because of 2.
This is certainly not the intended behaviour.
This patch addresses all 3 issues.
Updated with an enum type for the additional parameter for pci_mmap_fits().
If we simply insert these as children of iomem_resource, the second window
fails because it conflicts with the first, and the third is inserted as a
child of the first, i.e.,
When we claim PCI device resources, this can cause collisions like this
if we put them in the first window:
pci 0000:00:01.0: address space collision: [mem 0xff300000-0xff4fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff]
Host bridge windows are top-level resources by definition, so it doesn't
make sense to make the third window a child of the first. This patch
coalesces any host bridge windows that overlap. For the example above,
the result is this single window:
The start of the iomap was at f7e44c00 and had a size of 5120,
making the end f7e46000. We start with an offset of 0x180 or
384, giving the first read at 0xf7e44d80. Reading that location
yields 65283, which is much bigger than the 5120 that was allocated
and makes the next read at f7e54b03 which is outside the mapped area.
Perhaps this is a bug in the driver, or buggy hardware, but this patch
is more about not crashing my box on start up and just giving a warning
if it detects this error.
This patch at least lets my box boot with just a warning.
Jesper Juhl [Sun, 7 Nov 2010 21:04:43 +0000 (22:04 +0100)]
UWB: Return UWB_RSV_ALLOC_NOT_FOUND rather than crashing on NULL dereference if kzalloc fails
Crashing on a null pointer deref is never a nice thing to do. It seems
to me that it's better to simply return UWB_RSV_ALLOC_NOT_FOUND if
kzalloc() fails in uwb_rsv_find_best_allocation().
Vasiliy Kulikov [Sat, 6 Nov 2010 14:41:28 +0000 (17:41 +0300)]
usb: core: fix information leak to userland
Structure usbdevfs_connectinfo is copied to userland with padding byted
after "slow" field uninitialized. It leads to leaking of contents of
kernel stack memory.
Vasiliy Kulikov [Sat, 6 Nov 2010 14:41:31 +0000 (17:41 +0300)]
usb: misc: iowarrior: fix information leak to userland
Structure iowarrior_info is copied to userland with padding byted
between "serial" and "revision" fields uninitialized. It leads to
leaking of contents of kernel stack memory.
Jim Sung [Fri, 5 Nov 2010 01:47:51 +0000 (18:47 -0700)]
usb: subtle increased memory usage in u_serial
OK, the USB gadget serial driver actually has a couple of problems. On
gs_open(), it always allocates and queues an additional QUEUE_SIZE (16)
worth of requests, so with a loop like this:
i=1 ; while echo $i > /dev/ttyGS0 ; do let i++ ; done
eventually we run into OOM (Out of Memory).
Technically, it is not a leak as everything gets freed up when the USB
connection is broken, but not on gs_close().
With a USB device/gadget controller driver that has limited resources
(e.g., Marvell has a this MAX_XDS_FOR_TR_CALLS of 64 for transmit and
receive), so even after 4
stty -F /dev/ttyGS0
we cannot transmit anymore. We can still receive (not necessarily
reliably) as now we have 16 * 4 = 64 descriptors/buffers ready, but the
device is otherwise not usable.
ma rui [Mon, 1 Nov 2010 03:32:18 +0000 (11:32 +0800)]
USB: option: fix when the driver is loaded incorrectly for some Huawei devices.
When huawei datacard with PID 0x14AC is insterted into Linux system, the
present kernel will load the "option" driver to all the interfaces. But
actually, some interfaces run as other function and do not need "option"
driver.
In this path, we modify the id_tables, when the PID is 0x14ac ,VID is
0x12d1, Only when the interface's Class is 0xff,Subclass is 0xff, Pro is
0xff, it does need "option" driver.
Andy Whitcroft [Wed, 3 Nov 2010 18:02:38 +0000 (18:02 +0000)]
usb: gadget: goku_udc: add registered flag bit, fixing build
The commit below cleaned up error handling, in part by introducing a
registered flag bit. This however was not added to the device
structure leding to build failures:
CC drivers/usb/host/ehci-hcd.o
In file included from drivers/usb/host/ehci-hcd.c:1166:
drivers/usb/host/ehci-mxc.c: In function 'ehci_mxc_drv_probe':
drivers/usb/host/ehci-mxc.c:192: error: 'ehci' undeclared (first use in this function)
drivers/usb/host/ehci-mxc.c:192: error: (Each undeclared identifier is reported only once
drivers/usb/host/ehci-mxc.c:192: error: for each function it appears in.)
drivers/usb/host/ehci-mxc.c:117: warning: unused variable 'temp'
make[3]: *** [drivers/usb/host/ehci-hcd.o] Error 1
make[2]: *** [drivers/usb/host/ehci-hcd.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
Fix it together with the warning about the unused variable and use
msleep instead of mdelay as requested by Alan Stern.
USB: Fix FSL USB driver on non Open Firmware systems
Commit 126512e3f274802ca65ebeca8660237f0361ad48 added support for FSL's USB
controller on powerpc. In this commit the Open Firmware code was selected
and compiled unconditionally.
This breaks on ARM systems from FSL which use the same driver (.i.e. the i.MX
series), because ARM don't have OF support (yet). This patch fixes the problem
by only selecting the OF code on systems with Open Firmware support.
Staging: Merge 'tidspbridge-2.6.37-rc1' into staging-linus
This is a big revert of a lot of -rc1 tidspbridge patches in order to
get the driver back into a working state. It also includes a OMAP patch
that was approved by the OMAP maintainer.
Dmitry Torokhov [Tue, 9 Nov 2010 05:51:25 +0000 (21:51 -0800)]
Input: do not pass injected events back to the originating handler
Sometimes input handlers (as opposed to input devices) have a need to
inject (or re-inject) events back into input core. For example sysrq
filter may want to inject previously suppressed Alt-SysRq so that user
can take a screen print. In this case we do not want to pass such events
back to the same same handler that injected them to avoid loops.
Dan Carpenter [Thu, 11 Nov 2010 07:59:20 +0000 (23:59 -0800)]
Input: pcf8574_keypad - fix error handling in pcf8574_kp_probe
It is not allowed to call input_free_device() after calling
input_unregister_device() because input devices are refcounted and
unregister will free the device if we were holding he last referenc.
The preferred style in input/ is to make input_register_device() the
last function in the probe which can fail. That way we don't need to
call input_unregister_device().
Also do not need to call input_set_drvdata() as nothing in the driver
uses the data.
David S. Miller [Thu, 11 Nov 2010 05:35:37 +0000 (21:35 -0800)]
tcp: Increase TCP_MAXSEG socket option minimum.
As noted by Steve Chen, since commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.
The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.
Ian Campbell [Thu, 28 Oct 2010 18:32:29 +0000 (11:32 -0700)]
xen: do not release any memory under 1M in domain 0
We already deliberately setup a 1-1 P2M for the region up to 1M in
order to allow code which assumes this region is already mapped to
work without having to convert everything to ioremap.
Domain 0 should not return any apparently unused memory regions
(reserved or otherwise) in this region to Xen since the e820 may not
accurately reflect what the BIOS has stashed in this region.
Ian Campbell [Mon, 1 Nov 2010 16:30:09 +0000 (16:30 +0000)]
xen: events: do not unmask event channels on resume
The IRQ core code will take care of disabling and reenabling
interrupts over suspend resume automatically, therefore we do not need
to do this in the Xen event channel code.
The only exception is those event channels marked IRQF_NO_SUSPEND
which the IRQ core ignores. We must unmask these ourselves, taking
care to obey the current IRQ_DISABLED status. Failure check for
IRQ_DISABLED leads to enabling polled only event channels, such as
that associated with the pv spinlocks, which must never be enabled:
Felipe Contreras [Tue, 19 Oct 2010 07:37:24 +0000 (10:37 +0300)]
omap: dsp: remove shm from normal memory
Also, don't be picky about the location, which incidentally fixes the
build since MEMBLOCK_REAL_LIMIT is gone on 2.6.37.
arch/arm/plat-omap/devices.c: In function 'omap_dsp_reserve_sdram_memblock':
arch/arm/plat-omap/devices.c:287: error: 'MEMBLOCK_REAL_LIMIT'
undeclared (first use in this function)
Stephane Eranian [Tue, 26 Oct 2010 14:08:01 +0000 (16:08 +0200)]
perf_events: Fix time tracking in samples
This patch corrects time tracking in samples. Without this patch
both time_enabled and time_running are bogus when user asks for
PERF_SAMPLE_READ.
One uses PERF_SAMPLE_READ to sample the values of other counters
in each sample. Because of multiplexing, it is necessary to know
both time_enabled, time_running to be able to scale counts correctly.
In this second version of the patch, we maintain a shadow
copy of ctx->time which allows us to compute ctx->time without
calling update_context_time() from NMI context. We avoid the
issue that update_context_time() must always be called with
ctx->lock held.
We do not keep shadow copies of the other event timings
because if the lead event is overflowing then it is active
and thus it's been scheduled in via event_sched_in() in
which case neither tstamp_stopped, tstamp_running can be modified.
This timing logic only applies to samples when PERF_SAMPLE_READ
is used.
Note that this patch does not address timing issues related
to sampling inheritance between tasks. This will be addressed
in a future patch.
With this patch, the libpfm4 example task_smpl now reports
correct counts (shown on 2.4GHz Core 2):
In commit 20cb52ebd1b5ca6fa8a5d9b6b1392292f5ca8a45, titled
"xfs: simplify xfs_vm_writepage" I added an assert that any !mapped and
uptodate buffers are not dirty. That asserts turns out to trigger a lot
when running fsx on filesystems with small block sizes. The reason for
that is that the assert is simply incorrect. !mapped and uptodate
just mean this buffer covers a hole, and whenever we do a set_page_dirty
we mark all blocks in the page dirty, no matter if they have data or
not. So remove the assert, and update the comment above the condition
to match reality.
policy->name is a substring of policy->hname, if prefix is not NULL, it will
allocted strlen(prefix) + strlen(name) + 3 bytes to policy->hname in policy_init().
use kzfree(ns->base.name) will casue memory leak if alloc_namespace() failed.
Vasiliy Kulikov [Wed, 10 Nov 2010 20:09:10 +0000 (12:09 -0800)]
net: packet: fix information leak to userland
packet_getname_spkt() doesn't initialize all members of sa_data field of
sockaddr struct if strlen(dev->name) < 13. This structure is then copied
to userland. It leads to leaking of contents of kernel stack memory.
We have to fully fill sa_data with strncpy() instead of strlcpy().
The same with packet_getname(): it doesn't initialize sll_pkttype field of
sockaddr_ll. Set it to zero.
J. Bruce Fields [Wed, 3 Nov 2010 22:09:18 +0000 (18:09 -0400)]
locks: remove dead lease error-handling code
A minor oversight from f7347ce4ee7c65415f84be915c018473e7076f31,
"fasync: re-organize fasync entry insertion to allow it under a
spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now
that we're preallocating it's unneeded.
David S. Miller [Wed, 10 Nov 2010 18:38:24 +0000 (10:38 -0800)]
filter: make sure filters dont read uninitialized memory
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.
Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.
Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.
In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.
For other filters, additional cost is a single instruction.
[ Since we access fentry->k a lot now, cache it in a local variable
and mark filter entry pointer as const. -DaveM ]
Vasiliy Kulikov [Wed, 10 Nov 2010 18:14:33 +0000 (10:14 -0800)]
net: ax25: fix information leak to userland
Sometimes ax25_getname() doesn't initialize all members of fsa_digipeater
field of fsa struct, also the struct has padding bytes between
sax25_call and sax25_ndigis fields. This structure is then copied to
userland. It leads to leaking of contents of kernel stack memory.
XFS does not need it's inodes to actuall be hashed in the VFS inode
cache, but we require the inode to be marked hashed for the
writeback code to work.
Insted of using insert_inode_hash, which requires a second
inode_lock roundtrip after the partial merge of the inode
scalability patches in 2.6.37-rc simply use the new hlist_add_fake
helper to mark it hashed without requiring a lock or touching a
global cache line.
xfs: fix a few compiler warnings with CONFIG_XFS_QUOTA=n
Andi Kleen reported that gcc-4.5 gives lots of warnings for him
inside the XFS code. It turned out most of them are due to the
quota stubs beeing macros, and gcc now complaining about macros
evaluating to 0 that are not assigned to variables.
xfs: tell lockdep about parent iolock usage in filestreams
The filestreams code may take the iolock on the parent inode while
holding it on a child. This is the only place in XFS where we take
both the child and parent iolock, so just telling lockdep about it
is enough. The lock flag required for that was already added as
part of the ilock lockdep annotations and unused so far.
Dave Chinner [Mon, 8 Nov 2010 08:55:05 +0000 (08:55 +0000)]
xfs: move delayed write buffer trace
The delayed write buffer split trace currently issues a trace for
every buffer it scans. These buffers are not necessarily queued for
delayed write. Indeed, when buffers are pinned, there can be
thousands of traces of buffers that aren't actually queued for
delayed write and the ones that are are lost in the noise. Move the
trace point to record only buffers that are split out for IO to be
issued on.
Dave Chinner [Mon, 8 Nov 2010 08:55:04 +0000 (08:55 +0000)]
xfs: fix per-ag reference counting in inode reclaim tree walking
The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.
Kulikov Vasiliy [Sat, 30 Oct 2010 14:26:17 +0000 (14:26 +0000)]
xfs: xfs_ioctl: fix information leak to userland
al_hreq is copied from userland. If al_hreq.buflen is not properly aligned
then xfs_attr_list will ignore the last bytes of kbuf. These bytes are
unitialized. It leads to leaking of contents of kernel stack memory.