Ed Cashin [Fri, 5 Oct 2012 00:16:20 +0000 (17:16 -0700)]
aoe: for performance support larger packet payloads
tAdd adds the ability to work with large packets composed of a number of
segments, using the scatter gather feature of the block layer (biovecs)
and the network layer (skb frag array). The motivation is the performance
gained by using a packet data payload greater than a page size and by
using the network card's scatter gather feature.
Users of the out-of-tree aoe driver already had these changes, but since
early 2011, they have complained of increased memory utilization and
higher CPU utilization during heavy writes.[1] The commit below appears
related, as it disables scatter gather on non-IP protocols inside the
harmonize_features function, even when the NIC supports sg.
net offloading: Generalize netif_get_vlan_features().
With that regression in place, transmits always linearize sg AoE packets,
but in-kernel users did not have this patch. Before 2.6.38, though, these
changes were working to allow sg to increase performance.
Paul Clements [Fri, 5 Oct 2012 00:16:18 +0000 (17:16 -0700)]
nbd: handle discard requests
Add discard support to nbd. If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).
If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:
fallocate(.. FALLOC_FL_PUNCH_HOLE ..)
To punch a hole in the backend storage, which is no longer needed.
Paul Clements [Fri, 5 Oct 2012 00:16:15 +0000 (17:16 -0700)]
nbd: add set flags ioctl
Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).
Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.
Also included is a one-line fix to debug output for the set-timeout ioctl.
Replace the single global destination ID counter with per-net allocation
mechanism to allow independent destID management for each available
RapidIO network. Using bitmap based mechanism instead of counters allows
destination ID release and reuse in systems that support hot-swap.
rapidio/rionet: rework to support multiple RIO master ports
Make RIONET driver multi-net safe/capable by introducing per-net lists of
RapidIO network peers. Rework registration of network adapters to support
all available RIO master port devices.
Modify mport initialization routine to run the RapidIO discovery process
asynchronously. This allows to have an arbitrary order of enumerating and
discovering ports in systems with multiple RapidIO controllers without
creating a deadlock situation if enumerator port is registered after a
discovering one.
Making netID matching to mportID ensures consistent net ID assignment in
multiport RapidIO systems with asynchronous discovery process (global
counter implementation is affected by race between threads).
rapidio: use device lists handling on per-net basis
Modify handling of device lists to resolve issues caused by using single
global list of RIO devices during enumeration/discovery. The most common
sign of existing issue is incorrect contents of switch routing tables in
systems with multiple mport controllers while single-port configuration
performs as expected.
The following set of patches provides modifications targeting support of
multiple RapidIO master port (mport) devices on a CPU-side of
RapidIO-capable board. While the RapidIO subsystem code has definitions
suitable for multi-controller/multi-net support, the existing
implementation cannot be considered ready for multiple mport
configurations.
=========== NOTES: =============
a) The patches below do not address RapidIO side view of multiport
processing elements defined in Part 6 of RapidIO spec Rev.2.1 (section
6.4.1). These devices have Base Device ID CSR (0x60) and Component Tag
CSR (0x6C) shared by all SRIO ports. For example, Freescale's P4080,
P3041 and P5020 have a dual-port SRIO controller implemented according
the specification. Enumeration/discovery of such devices from RapidIO
side may require device-specific fixups.
b) Devices referenced above may also require implementation specific
code to setup a host device ID for mport device. These operations are
not addressed by patches in this package.
=================================
Details about provided patches:
1. Fix blocking wait for discovery ready
While it does not happen on PowerPC based platforms, there is
possibility of stalled CPU warning dump on x86 based platforms that run
RapidIO discovery process if they wait too long for being enumerated.
Currently users can avoid it by disabling the soft-lockup detector
using "nosoftlockup" kernel parameter OR by ensuring that enumeration
is completed before soft-lockup is detected.
This patch eliminates blocking wait and keeps a scheduler running.
It also is required for patch 3 below which introduces asynchronous
discovery process.
2. Use device lists handling on per-net basis
This patch allows to correctly support multiple RapidIO nets and
resolves possible issues caused by using single global list of devices
during RapidIO system enumeration/discovery. The most common sign of
existing issue is incorrect contents of switch routing tables in
systems with multiple mport controllers while single-port configuration
performs as expected.
The patch does not eliminate the global RapidIO device list but
changes some routines in enumeration/discovery to use per-net device
lists instead. This way compatibility with upper layer RIO routines is
preserved.
3. Run discovery as an asynchronous process
This patch modifies RapidIO initialization routine to asynchronously
run the discovery process for each corresponding mport. This allows
having an arbitrary order of enumerating and discovering mports without
creating a deadlock situation if an enumerator port was registered
after a discovering one.
On boards with multiple discovering mports it also eliminates order
dependency between mports and may reduce total time of RapidIO
subsystem initialization.
Making netID matching to mportID ensures consistent netID assignment
in multiport RapidIO systems with asynchronous discovery process
(global counter implementation is affected by race between threads).
4. Rework RIONET to support multiple RIO master ports
In the current version of the driver rionet_probe() has comment "XXX
Make multi-net safe". Now it is a good time to address this comment.
This patch makes RIONET driver multi-net safe/capable by introducing
per-net lists of RapidIO network peers. It also enables to register
network adapters for all available mport devices.
5. Add destination ID allocation mechanism
The patch replaces a single global destination ID counter with
per-net allocation mechanism to allow independent destID management for
each available RapidIO network. Using bitmap based mechanism instead
of counters allows destination ID release and reuse in systems that
support hot-swap.
This patch:
Fix blocking wait loop in the RapidIO discovery routine to avoid warning
dumps about stalled CPU on x86 platforms.
rapidio: apply RX/TX enable to active switch ports only
Apply port RX/TX enable operations only to active switch ports.
RapidIO specification (Part 6: LP-Serial Physical Layer) recommends to
keep Output Port Enable (TX) and Input Port Enable (RX) control bits in
disabled state (0b0) after device reset. It also allows to have
implementation specific reset state for these bits.
This patch ensures that TX/RX enable action is applied only to active
switch's ports while preserving an initial state of inactive ones.
This patch is intended to keep inactive switch ports with inbound and
outbound packet transfers disabled to block unexpected packets during hot
insertion event. While it does not fix any visible malfunction it is
intended to prevent such events in future.
Add common inbound memory mapping/unmapping interface. This allows to make
local memory space accessible from the RapidIO side using hardware mapping
capabilities of RapidIO bridging devices. The new interface is intended to
enable data transfers between RapidIO devices in combination with DMA engine
support.
This patch is based on patch submitted by Li Yang <[email protected]>
(https://lists.ozlabs.org/pipermail/linuxppc-dev/2009-April/071210.html)
Modify RapidIO mport device name assignment to include device name of PCIe
side of Tsi721 bridge. The new name format is intended to provide
definitive reference between RapidIO and PCIe sides of the bridge in
systems with multiple Tsi721 bridges.
Fix multicast packet transmit logic to account for repetitive transmission
of single skb:
- correct check for available buffers (this bug may produce NULL pointer
crash dump in case of heavy traffic);
- update skb user count (incorrect user counter causes a warning dump from
net_tx_action routine during multicast transfers in systems with three or
more rionet participants).
yan [Fri, 5 Oct 2012 00:15:41 +0000 (17:15 -0700)]
proc: no need to initialize proc_inode->fd in proc_get_inode()
proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().
If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.
Denys Vlasenko [Fri, 5 Oct 2012 00:15:36 +0000 (17:15 -0700)]
coredump: extend core dump note section to contain file names of mapped files
This note has the following format:
long count -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
long start
long end
long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
Denys Vlasenko [Fri, 5 Oct 2012 00:15:35 +0000 (17:15 -0700)]
coredump: add a new elf note with siginfo of the signal
Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.
There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.
This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.
Denys Vlasenko [Fri, 5 Oct 2012 00:15:31 +0000 (17:15 -0700)]
compat: move compat_siginfo_t definition to asm/compat.h
This is a preparatory patch for the introduction of NT_SIGINFO elf note.
Make the location of compat_siginfo_t uniform across eight architectures
which have it. Now it can be pulled in by including asm/compat.h or
linux/compat.h.
Most of the copies are verbatim. compat_uid[32]_t had to be replaced by
__compat_uid[32]_t. compat_uptr_t had to be moved up before
compat_siginfo_t in asm/compat.h on a several architectures (tile already
had it moved up). compat_sigval_t had to be relocated from linux/compat.h
to asm/compat.h.
Denys Vlasenko [Fri, 5 Oct 2012 00:15:29 +0000 (17:15 -0700)]
coredump: pass siginfo_t* to do_coredump() and below, not merely signr
This is a preparatory patch for the introduction of NT_SIGINFO elf note.
With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params. It will be used by the
next patch. Most changes are simple s/signr/siginfo->si_signo/.
Oleg Nesterov [Fri, 5 Oct 2012 00:15:25 +0000 (17:15 -0700)]
coredump: add support for %d=__get_dumpable() in core name
Some coredump handlers want to create a core file in a way compatible with
standard behavior. Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0. However, there was no way for coredump
handler to know that the process being dumped was suid'ed.
This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.
Alex Kelly [Fri, 5 Oct 2012 00:15:24 +0000 (17:15 -0700)]
coredump: update coredump-related headers
Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c. It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.
Alex Kelly [Fri, 5 Oct 2012 00:15:23 +0000 (17:15 -0700)]
coredump: make core dump functionality optional
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump. This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.
CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.
device_cgroup: convert device_cgroup internally to policy + exceptions
The original model of device_cgroup is having a whitelist where all the
allowed devices are listed. The problem with this approach is that is
impossible to have the case of allowing everything but few devices.
The reason for that lies in the way the whitelist is handled internally:
since there's only a whitelist, the "all devices" entry would have to be
removed and replaced by the entire list of possible devices but the ones
that are being denied. Since dev_t is 32 bits long, representing the allowed
devices as a bitfield is not memory efficient.
This patch replaces the "whitelist" by a "exceptions" list and the default
policy is kept as "deny_all" variable in dev_cgroup structure.
The current interface determines that whenever "a" is written to devices.allow
or devices.deny, the entry masking all devices will be added or removed,
respectively. This behavior is kept and it's what will determine the default
policy:
# cat devices.list
a *:* rwm
# echo a >devices.deny
# cat devices.list
# echo a >devices.allow
# cat devices.list
a *:* rwm
The interface is also preserved. For example, if one wants to block only access
to /dev/null:
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null
# echo a >devices.allow
# echo "c 1:3 rwm" >devices.deny
# cat /dev/null
cat: /dev/null: Operation not permitted
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 r" >devices.allow
# cat /dev/null
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rw" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rwm" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
#
Note that I didn't rename the functions/variables in this patch, but in the
next one to make reviewing easier.
Daniel Santos [Fri, 5 Oct 2012 00:15:10 +0000 (17:15 -0700)]
kernel-doc: don't mangle whitespace in Example section
A section with the name "Example" (case-insensitive) has a special meaning
to kernel-doc. These sections are output using mono-type fonts. However,
leading whitespace is stripped, thus robbing a lot of meaning from this,
as indented code examples will be mangled.
This patch preserves the leading whitespace for "Example" sections. More
accurately, it preserves it for all sections, but removes it later if the
section isn't an "Example" section.
Daniel Santos [Fri, 5 Oct 2012 00:15:08 +0000 (17:15 -0700)]
kernel-doc: bugfix - empty line in Example section
If you have a section named "Example" that contains an empty line,
attempting to generate htmldocs give you the error:
/path/Documentation/DocBook/kernel-api.xml:3455: parser error : Opening and ending tag mismatch: programlisting line 3449 and para
</para><para>
^
/path/Documentation/DocBook/kernel-api.xml:3473: parser error : Opening and ending tag mismatch: para line 3467 and programlisting
</programlisting></informalexample>
^
/path/Documentation/DocBook/kernel-api.xml:3678: parser error : Opening and ending tag mismatch: programlisting line 3672 and para
</para><para>
^
/path/Documentation/DocBook/kernel-api.xml:3701: parser error : Opening and ending tag mismatch: para line 3690 and programlisting
</programlisting></informalexample>
^
unable to parse
/path/Documentation/DocBook/kernel-api.xml
Essentially, the script attempts to close a <programlisting> with a
closing tag for a <para> block. This patch corrects the problem by
simply not outputting anything extra when we're dumping pre-formatted
text, since the empty line will be rendered correctly anyway.
This patch fixes the issue by appending all lines ending in a blackslash
(optionally followed by whitespace), removing the backslash and any
whitespace after it prior to appending (just like the C pre-processor
would).
This fixes a break in kerel-doc introduced by the additions to rbtree.h.
This does the following:
1: Splits the arguments of a function call to stop it
from exceeding 80 characters
2: Re-indents the arguments of another function call
to prevent the splitting of a quoted string.
Maintain an index of directory inodes by starting cluster, so that
fat_get_parent() can return the proper cached inode rather than inventing
one that cannot be traced back to the filesystem root.
Add a new msdos/vfat binary mount option "nfs" so that FAT filesystems
that are _not_ exported via NFS are not saddled with maintenance of an
index they will never use.
Finally, simplify NFS file handle generation and lookups. An
ext2-congruent implementation is adequate for FAT needs.
Under memory pressure, the system may evict dentries from cache. When the
FAT driver receives a NFS request involving an evicted dentry, it is
unable to reconnect it to the filesystem root. This causes the request to
fail, often with ENOENT.
This is partially due to ineffectiveness of the current FAT NFS
implementation, and partially due to an unimplemented fh_to_parent method.
The latter can cause file accesses to fail on shares exported with
subtree_check.
This patch set provides the FAT driver with the ability to
reconnect dentries. NFS file handle generation and lookups are simplified
and made congruent with ext2.
Testing has involved a memory-starved virtual machine running 3.5-rc5 that
exports a ~2 GB vfat filesystem containing a kernel tree (~770 MB, ~40000
files, 9 levels). Both 'cp -r' and 'ls -lR' operations were performed
from a client, some overlapping, some consecutive. Exports with
'subtree_check' and 'no_subtree_check' have been tested.
Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.
The following should be considered for NFS clients who are sensitive to ESTALE:
* Mounting with lookupcache=none
Unfortunately this can degrade performance severely, particularly for deep
filesystems.
* Incorporating VFS patches to retry ESTALE failures on the client-side,
such as https://lkml.org/lkml/2012/6/29/381
* Handling ESTALE errors in client application code
This patch:
Move NFS-related code into its own C file. No functional changes.
Commit c3b79770e51a ("rtc: m41t80: Workaround broken alarm
functionality") disabled m41t80's alarm functions. But since those
functions were not touched, building this driver triggers these GCC
warnings:
drivers/rtc/rtc-m41t80.c:216:12: warning: 'm41t80_rtc_alarm_irq_enable' defined but not used [-Wunused-function]
drivers/rtc/rtc-m41t80.c:238:12: warning: 'm41t80_rtc_set_alarm' defined but not used [-Wunused-function]
drivers/rtc/rtc-m41t80.c:308:12: warning: 'm41t80_rtc_read_alarm' defined but not used [-Wunused-function]
Remove these functions (and the commented out references to them) to
silence these warnings. Anyone wanting to fix the alarm irq functionality
can easily find the removed code in the git log of this file or through
some web searches.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:29 +0000 (17:14 -0700)]
drivers/rtc/rtc-em3027.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:27 +0000 (17:14 -0700)]
drivers/rtc/rtc-isl1208.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:25 +0000 (17:14 -0700)]
drivers/rtc/rtc-pcf8563.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:22 +0000 (17:14 -0700)]
drivers/rtc/rtc-rs5c372.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future. while at it also fix a checkpatch
warn WARNING: sizeof rs5c->buf should be sizeof(rs5c->buf)
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:21 +0000 (17:14 -0700)]
drivers/rtc/rtc-s35390a.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:18 +0000 (17:14 -0700)]
drivers/rtc/rtc-x1205.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:17 +0000 (17:14 -0700)]
drivers/rtc/rtc-ds1672.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
David Fries [Fri, 5 Oct 2012 00:14:12 +0000 (17:14 -0700)]
rtc_sysfs_show_hctosys(): display 0 if resume failed
Without this patch /sys/class/rtc/$CONFIG_RTC_HCTOSYS_DEVICE/hctosys
contains a 1 (meaning "This rtc was used to initialize the system
clock") even if setting the time by do_settimeofday() at bootup failed.
The RTC can also be used to set the clock on resume, if it did 1,
otherwise 0. Previously there was no indication if the RTC was used
to set the clock in resume.
This uses only CONFIG_RTC_HCTOSYS_DEVICE for conditional compilation
instead of it and CONFIG_RTC_HCTOSYS to be more consistent.
rtc_hctosys_ret was moved to class.c so class.c no longer depends on
hctosys.c.
Julia Lawall [Fri, 5 Oct 2012 00:14:07 +0000 (17:14 -0700)]
drivers/rtc/rtc-coh901331.c: use clk_prepare_enable() and clk_disable_unprepare()
clk_prepare_enable and clk_disable_unprepare combine clk_prepare and
clk_enable, and clk_disable and clk_unprepare. They make the code more
concise, and ensure that clk_unprepare is called when clk_enable fails.
A simplified version of the semantic patch that introduces calls to these
functions is as follows: (http://coccinelle.lip6.fr/)
Venu Byravarasu [Fri, 5 Oct 2012 00:14:04 +0000 (17:14 -0700)]
rtc: rc5t583: add ricoh rc5t583 RTC driver
Add an RTC driver for the RTC device on Ricoh MFD Rc5t583. Ricoh RTC has
3 types of alarms. The current patch adds support for the Y-Alarm of
RC5t583 RTC.
There are several comparisons of a unsigned int to less than zero int
spear RTC driver. Such a check will always be true. In all these cases a
signed int is assigned to the unsigned variable, which is checked, before.
So the right fix is to make the checked variable signed as well. In one
case the check can be dropped completely, because all it does it returns
'err' if 'err' is less than zero, otherwise it returns 0. Since in this
particular case 'err' is always either 0 or less this is the same as just
returning 'err'.
The issue has been found using the following coccinelle semantic patch:
//<smpl>
@@
type T;
unsigned T i;
@@
(
*i < 0
|
*i >= 0
)
//</smpl>
The irq field of the jz4740_irc struct is unsigned. Yet we assign the
result of platform_get_irq() to it. platform_get_irq() may return a
negative error code and the code checks for this condition by checking if
'irq' is less than zero. But since 'irq' is unsigned this test will
always be false. Fix it by making 'irq' signed.
The issue was found using the following coccinelle semantic patch:
//<smpl>
@@
type T;
unsigned T i;
@@
(
*i < 0
|
*i >= 0
)
//</smpl>
Stephen Warren [Fri, 5 Oct 2012 00:13:56 +0000 (17:13 -0700)]
rtc: add MAX8907 RTC driver
The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.
The driver is based on an original by or fixed by:
* Tom Cherry
* Prashant Gaikwad
* Joseph Yoon
During upstreaming, I (swarren):
* Converted to regmap.
* Fixed handling of RTC_HOUR register containing 12.
* Fixed handling of RTC_WEEKDAY register.
* General cleanup.
Vincent Palatin [Fri, 5 Oct 2012 00:13:52 +0000 (17:13 -0700)]
rtc: recycle id when unloading a rtc driver
When calling rtc_device_unregister, we are not freeing the id used by the
driver. So when doing a unload/load cycle for a RTC driver (e.g. rmmod
rtc_cmos && modprobe rtc_cmos), its id is incremented by one. As a
consequence, we no longer have neither an rtc0 driver nor a
/proc/driver/rtc (as it only exists for the first driver).
Kim, Milo [Fri, 5 Oct 2012 00:13:45 +0000 (17:13 -0700)]
rtc-proc: permit the /proc/driver/rtc device to use other devices
To get time information via /proc/driver/rtc, only the first device (rtc0)
is used. If the rtcN (eg. rtc1 or rtc2) is used for the system clock,
there is no way to get information of rtcN via /proc/driver/rtc. With
this patch, the time data can be retrieved from the system clock RTC.
If the RTC_HCTOSYS_DEVICE is not defined, then rtc0 is used by default.
Ben Gardner [Fri, 5 Oct 2012 00:13:44 +0000 (17:13 -0700)]
drivers/rtc/rtc-isl1208.c: add support for the ISL1218
The ISL1218 chip is identical to the ISL1208, except that it has 6
additional user-storage registers. This patch does not enable access to
those additional registers, but only adds the chip name to the list.
Alan Cox [Fri, 5 Oct 2012 00:13:42 +0000 (17:13 -0700)]
binfmt_elf: Uninitialized variable
load_elf_interp() has interp_map_addr carefully described as
"uninitialized_var" and marked so as to avoid a warning. However if you
trace the code it is passed into load_elf_interp and then this value is
checked against NULL.
As this return value isn't used this is actually safe but it freaks
various analysis tools that see un-initialized memory addresses being read
before their value is ever defined.
Set it to NULL as a matter of programming good taste if nothing else
Paton J. Lewis [Fri, 5 Oct 2012 00:13:39 +0000 (17:13 -0700)]
epoll: support for disabling items, and a self-test app
Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item. If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment. Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.
Tejun Heo [Fri, 5 Oct 2012 00:13:28 +0000 (17:13 -0700)]
scatterlist: atomic sg_mapping_iter() no longer needs disabled IRQs
SG mapping iterator w/ SG_MITER_ATOMIC set required IRQ disabled because
it originally used KM_BIO_SRC_IRQ to allow use from IRQ handlers.
kmap_atomic() has long been updated to handle stacking atomic mapping
requests on per-cpu basis and only requires not sleeping while mapped.
Update sg_mapping_iter such that atomic iterators only require disabling
preemption instead of disabling IRQ.
While at it, convert wte weird @ARG@ notations to @ARG in the comment of
sg_miter_start().
Jan Beulich [Fri, 5 Oct 2012 00:13:24 +0000 (17:13 -0700)]
lib/vsprintf.c: improve standard conformance of sscanf()
Xen's pciback points out a couple of deficiencies with vsscanf()'s
standard conformance:
- Trailing character matching cannot be checked by the caller: With a
format string of "(%x:%x.%x) %n" absence of the closing parenthesis
cannot be checked, as input of "(00:00.0)" doesn't cause the %n to be
evaluated (because of the code not skipping white space before the
trailing %n).
- The parameter corresponding to a trailing %n could get filled even if
there was a matching error: With a format string of "(%x:%x.%x)%n",
input of "(00:00.0]" would still fill the respective variable pointed to
(and hence again make the mismatch non-detectable by the caller).
This patch aims at fixing those, but leaves other non-conforming aspects
of it untouched, among them these possibly relevant ones:
- improper handling of the assignment suppression character '*' (blindly
discarding all succeeding non-white space from the format and input
strings),
- not honoring conversion specifiers for %n, - not recognizing the C99
conversion specifier 't' (recognized by vsprintf()).
lib/spinlock_debug: avoid livelock in do_raw_spin_lock()
The logic in do_raw_spin_lock() attempts to acquire a spinlock by invoking
arch_spin_trylock() in a loop with a delay between each attempt. Now
consider the following situation in a 2 CPU system:
1. CPU-0 continually acquires and releases a spinlock in a
tight loop; it stays in this loop until some condition X
is satisfied. X can only be satisfied by another CPU.
2. CPU-1 tries to acquire the same spinlock, in an attempt
to satisfy the aforementioned condition X. However, it
never sees the unlocked value of the lock because the
debug spinlock code uses trylock instead of just lock;
it checks at all the wrong moments - whenever CPU-0 has
locked the lock.
Now in the absence of debug spinlocks, the architecture specific spinlock
code can correctly allow CPU-1 to wait in a "queue" (e.g., ticket
spinlocks), ensuring that it acquires the lock at some point. However,
with the debug spinlock code, livelock can easily occur due to the use of
try_lock, which obviously cannot put the CPU in that "queue". This
queueing mechanism is implemented in both x86 and ARM spinlock code.
Note that the situation mentioned above is not hypothetical. A real
problem was encountered where CPU-0 was running hrtimer_cancel with
interrupts disabled, and CPU-1 was attempting to run the hrtimer that
CPU-0 was trying to cancel.
Address this by actually attempting arch_spin_lock once it is suspected
that there is a spinlock lockup. If we're in a situation that is
described above, the arch_spin_lock should succeed; otherwise other
timeout mechanisms (e.g., watchdog) should alert the system of a lockup.
Therefore, if there is a genuine system problem and the spinlock can't be
acquired, the end result (irrespective of this change being present) is
the same. If there is a livelock caused by the debug code, this change
will allow the lock to be acquired, depending on the implementation of the
lower level arch specific spinlock code.
genalloc: make it possible to use a custom allocation algorithm
Premit use of another algorithm than the default first-fit one. For
example a custom algorithm could be used to manage alignment requirements.
As I can't predict all the possible requirements/needs for all allocation
uses cases, I add a "free" field 'void *data' to pass any needed
information to the allocation function. For example 'data' could be used
to handle a structure where you store the alignment, the expected memory
bank, the requester device, or any information that could influence the
allocation algorithm.
An usage example may look like this:
struct my_pool_constraints {
int align;
int bank;
...
};
unsigned long my_custom_algo(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data)
{
struct my_pool_constraints *constraints = data;
...
deal with allocation contraints
...
return the index in bitmap where perform the allocation
}
Add of best-fit algorithm function:
most of the time best-fit is slower then first-fit but memory fragmentation
is lower. The random buffer allocation/free tests don't show any arithmetic
relation between the allocation time and fragmentation but the
best-fit algorithm
is sometime able to perform the allocation when the first-fit can't.
This new algorithm help to remove static allocations on ESRAM, a small but
fast on-chip RAM of few KB, used for high-performance uses cases like DMA
linked lists, graphic accelerators, encoders/decoders. On the Ux500
(in the ARM tree) we have define 5 ESRAM banks of 128 KB each and use of
static allocations becomes unmaintainable:
cd arch/arm/mach-ux500 && grep -r ESRAM .
./include/mach/db8500-regs.h:/* Base address and bank offsets for ESRAM */
./include/mach/db8500-regs.h:#define U8500_ESRAM_BASE 0x40000000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK_SIZE 0x00020000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK0 U8500_ESRAM_BASE
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK1 (U8500_ESRAM_BASE + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK2 (U8500_ESRAM_BANK1 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK3 (U8500_ESRAM_BANK2 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK4 (U8500_ESRAM_BANK3 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_DMA_LCPA_OFFSET 0x10000
./include/mach/db8500-regs.h:#define U8500_DMA_LCPA_BASE
(U8500_ESRAM_BANK0 + U8500_ESRAM_DMA_LCPA_OFFSET)
./include/mach/db8500-regs.h:#define U8500_DMA_LCLA_BASE U8500_ESRAM_BANK4
I want to use genalloc to do dynamic allocations but I need to be able to
fine tune the allocation algorithm. I my case best-fit algorithm give
better results than first-fit, but it will not be true for every use case.
Jan Beulich [Fri, 5 Oct 2012 00:13:17 +0000 (17:13 -0700)]
lib/Kconfig.debug: adjust hard-lockup related Kconfig options
The main option should not appear in the resulting .config when the
dependencies aren't met (i.e. use "depends on" rather than directly
setting the default from the combined dependency values).
The sub-options should depend on the main option rather than a more
generic higher level one.
Alex Elder [Fri, 5 Oct 2012 00:13:16 +0000 (17:13 -0700)]
lib/parser.c: avoid overflow in match_number()
The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)
In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.
Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.
No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.