Paolo Bonzini [Fri, 3 Feb 2012 10:57:23 +0000 (11:57 +0100)]
qom: clean up/optimize object_dynamic_cast
The interface loop can be performed only on the parent object. It
does not need to be done on each interface. Similarly, we can
simplify the code by switching early from the implementation
object to the parent object.
Blue Swirl [Sat, 4 Feb 2012 12:18:36 +0000 (12:18 +0000)]
Merge branch 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu
* 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu:
linux-user: Fix sa_flags byte swaps for mips
linux-user: Define TARGET_QEMU_ESIGRETURN for mips64
linux-user: Define TARGET_QEMU_ESIGRETURN for mipsn32
linux-user: Add default configs for mips64[el]
linux-user: Add default-configs for mipsn32[el]
linux-user: Implement *listxattr syscalls
linux-user/syscall.c: Implement f and l versions of set/get/removexattr
linux-user: Allow NULL value pointer in setxattr and getxattr
linux-user: fix wait* syscall status returns
linux-user/strace.c: Correct errno printing for mmap etc
linux-user: fix QEMU_STRACE=1 segfault
linux-user: add SO_PEERCRED support for getsockopt
linux-user/main.c: Add option to user-mode emulation so that user can specify log file name
linux-user: fake /proc/self/auxv
linux-user: fake /proc/self/stat
linux-user: fake /proc/self/maps
linux-user: add open() hijack infrastructure
linux-user: save auxv length
linux-user: stack_base is now mandatory on all targets
Indeed, compilation failed for w32, so the bridge code is now
conditional. Hosts which don't support it can simply remove the
definition of CONFIG_NET_BRIDGE.
Anthony Liguori [Fri, 23 Dec 2011 14:47:39 +0000 (08:47 -0600)]
object: sure up reference counting
Now we have the following behavior:
1) object_new() returns an object with ref = 1
2) object_initialize() does not increase the reference count (ref may be 0).
3) object_deref() will finalize the object when ref = 0. it does not free the
memory associated with the object.
4) both link and child properties correctly set the reference count.
The expected usage is the following:
1) child devices should generally be created via object_initialize() using
memory from the parent device. Adding the object as a child property will
take ownership of the object and tie the child's life cycle to the parent.
2) If a child device is created via qdev_create() or some other form of
object_new(), there must be an object_delete() call in the parent device's
finalize function.
Anthony Liguori [Fri, 23 Dec 2011 14:35:43 +0000 (08:35 -0600)]
qom: accept any compatible type when setting a link property
Links had limited utility before as they only allowed a concrete type to be
specified. Now we can support abstract types and interfaces which means it's
now possible to have a link<PCIDevice>.
Anthony Liguori [Mon, 30 Jan 2012 14:55:55 +0000 (08:55 -0600)]
qom: move properties from qdev to object
This is mostly code movement although not entirely. This makes properties part
of the Object base class which means that we can now start using Object in a
meaningful way outside of qdev.
Anthony Liguori [Thu, 22 Dec 2011 20:40:54 +0000 (14:40 -0600)]
qom: add new command to search for types
This adds a command that allows searching for types that implement a property.
This allows you to do things like search for all available PCIDevices. In the
future, we'll also have a standard interface for things with a BlockDriverState
property that a PCIDevice could implement.
This will enable search queries like, "any type that implements the BlockDevice
interface" which would allow management tools to present available block devices
without having to hard code device names. Since an object can implement
multiple interfaces, one device could act both as a BlockDevice and a
NetworkDevice.
Anthony Liguori [Thu, 22 Dec 2011 17:05:00 +0000 (11:05 -0600)]
qdev: remove baked in notion of aliases (v2)
Limit them to the device_add functionality. Device aliases were a hack based
on the fact that virtio was modeled the wrong way. The mechanism for aliasing
is very limited in that only one alias can exist for any device.
We have to support it for the purposes of compatibility but we only need to
support it in device_add so restrict it to that piece of code.
Signed-off-by: Anthony Liguori <[email protected]>
---
v1 -> v2
- Use a table for aliases (Paolo)
Anthony Liguori [Thu, 8 Dec 2011 03:34:16 +0000 (21:34 -0600)]
qdev: register all types natively through QEMU Object Model
This was done in a mostly automated fashion. I did it in three steps and then
rebased it into a single step which avoids repeatedly touching every file in
the tree.
The first step was a sed-based addition of the parent type to the subclass
registration functions.
The second step was another sed-based removal of subclass registration functions
while also adding virtual functions from the base class into a class_init
function as appropriate.
Finally, a python script was used to convert the DeviceInfo structures and
qdev_register_subclass functions to TypeInfo structures, class_init functions,
and type_register_static calls.
We are almost fully converted to QOM after this commit.
Anthony Liguori [Sun, 4 Dec 2011 22:13:14 +0000 (16:13 -0600)]
usb-hid: simplify class initialization a bit
We can probably model USBHidDevice as a base class to get even better code
sharing but for now, just use a common function to initialize the common class
members.
Alexander Graf [Wed, 23 Nov 2011 23:44:43 +0000 (00:44 +0100)]
linux-user: fix wait* syscall status returns
When calling wait4 or waitpid with a status pointer and WNOHANG, the
syscall can potentially not modify the status pointer input. Now if we
have guest code like:
int status = 0;
waitpid(pid, &status, WNOHANG);
if (status)
<breakage>
then we have to make sure that in case status did not change we actually
return the guest's initialized status variable instead of our own uninitialized.
We fail to do so today, as we proxy everything through an uninitialized status
variable which for me ended up always containing the last error code.
This patch fixes some test cases when building yast2-core in OBS for ARM.
Peter Maydell [Mon, 21 Nov 2011 12:21:19 +0000 (12:21 +0000)]
linux-user/strace.c: Correct errno printing for mmap etc
Correct the printing of errnos for syscalls which are handled
via print_syscall_ret_addr (mmap, mmap2, brk, shmat): errnos
are returned as negative returned values at this level, not
via the host 'errno' variable.
Alexander Graf [Mon, 21 Nov 2011 11:04:07 +0000 (12:04 +0100)]
linux-user: fix QEMU_STRACE=1 segfault
While debugging some issues with QEMU_STRACE I stumbled over segmentation
faults that were pretty reproducible. Turns out we tried to treat a
normal return value as errno, resulting in an access over array boundaries
for the resolution.
Fix this by allowing failure to resolve invalid errnos into strings.
陳韋任 [Tue, 8 Nov 2011 09:46:44 +0000 (17:46 +0800)]
linux-user/main.c: Add option to user-mode emulation so that user can specify log file name
QEMU linux user-mode's default log file name is "/tmp/qemu.log". In order to
change the log file name, user need to modify the source code then recompile
QEMU. This patch allow user use "-D logfile" option to specify the log file
name.
Alexander Graf [Wed, 2 Nov 2011 19:23:26 +0000 (20:23 +0100)]
linux-user: fake /proc/self/auxv
Gtk tries to read /proc/self/auxv to find its auxv table instead of
taking it from its own program memory space.
However, when running with linux-user, we see the host's auxv which
clearly exposes wrong information. so let's instead expose the guest
memory backed auxv tables via /proc/self/auxv as well.
Alexander Graf [Wed, 2 Nov 2011 19:23:25 +0000 (20:23 +0100)]
linux-user: fake /proc/self/stat
The boehm gc finds the program's stack starting pointer by
checking /proc/self/stat. Unfortunately, so far it reads
qemu's stack pointer which clearly is wrong.
So let's instead fake the file so the guest program sees the
right address.
Alexander Graf [Wed, 2 Nov 2011 19:23:24 +0000 (20:23 +0100)]
linux-user: fake /proc/self/maps
glibc's pthread_attr_getstack tries to find the stack range from
/proc/self/maps. Unfortunately, /proc is usually the host's /proc
which means linux-user guests see qemu's stack there.
Fake the file with a constructed maps entry that exposes the guest's
stack range.
Alexander Graf [Wed, 2 Nov 2011 19:23:23 +0000 (20:23 +0100)]
linux-user: add open() hijack infrastructure
There are a number of files in /proc that expose host information
to the guest program. This patch adds infrastructure to override
the open() syscall for guest programs to enable us to on the fly
generate guest sensible files.
Alexander Graf [Sat, 28 Jan 2012 19:12:14 +0000 (21:12 +0200)]
linux-user: save auxv length
We create our own AUXV segment on stack and save a pointer to it.
However we don't save the length of it, so any code that wants to
do anything useful with it later on has to walk it again.
Instead, let's remember the length of our AUXV segment. This
simplifies later uses by a lot.
Alexander Graf [Tue, 31 Jan 2012 02:46:55 +0000 (03:46 +0100)]
PPC: E500: Populate L1CFG0 SPR
When running Linux on e500 with powersave-nap enabled, Linux tries to
read out the L1CFG0 register and calculates some things from it. Passing
0 there ends up in a division by 0, resulting in -1, resulting in badness.
So let's populate the L1CFG0 register with reasonable defaults. That way
guests aren't completely confused.
Alexander Graf [Tue, 31 Jan 2012 02:19:23 +0000 (03:19 +0100)]
PPC: E500: Implement msgsnd
This patch implements the msgsnd instruction. It is part of the
Embedded.Processor Control specification and allows one CPU to
IPI another CPU without going through an interrupt controller.
Alexander Graf [Tue, 31 Jan 2012 02:18:35 +0000 (03:18 +0100)]
PPC: E500: Implement msgclr
This patch implements the msgclr instruction. It is part of the
Embedded.Processor Control specification and clears pending doorbell
interrupts on the current CPU.
Alexander Graf [Wed, 25 Jan 2012 16:06:30 +0000 (17:06 +0100)]
PPC: booke206: move avail check to tlbwe
We can have TLBs that only support a single page size. This is defined
by the absence of the AVAIL flag in TLBnCFG. If this is the case, we
currently write invalid size info into the TLB, but override it on
internal fault.
Let's move the check over to tlbwe, so we don't have the AVAIL check in
the hotter fault path.
Alexander Graf [Wed, 25 Jan 2012 15:27:26 +0000 (16:27 +0100)]
PPC: booke206: Check for TLB overrun
Our internal helpers to fetch TLB entries were not able to tell us
that an entry doesn't even exist. Pass an error out if we hit such
a case to not accidently pass beyond the TLB array.
Alexander Graf [Fri, 20 Jan 2012 03:09:15 +0000 (04:09 +0100)]
PPC: booke206: Implement tlbilx
The PowerPC 2.06 BookE ISA defines an opcode called "tlbilx" which is used
to flush TLB entries. It's the recommended way of flushing in virtualized
environments.
So far we got away without implementing it, but Linux for e500mc uses this
instruction, so we better add it :).
Alexander Graf [Fri, 20 Jan 2012 03:07:51 +0000 (04:07 +0100)]
PPC: booke206: Check for min/max TLB entry size
When setting a TLB entry, we need to check if the TLB we're putting it in
actually supports the given size. According to the 2.06 PowerPC ISA, a
value that's out of range can either be redefined to something implementation
dependent or we can raise an illegal opcode exception. We do the latter.
Alexander Graf [Fri, 20 Jan 2012 03:06:18 +0000 (04:06 +0100)]
PPC: booke206: allow NULL raddr in ppcmas_tlb_check
We might want to call the tlb check function without actually caring about
the real address resolution. Check if we really should write the value
back.
Alexander Graf [Thu, 19 Jan 2012 18:51:50 +0000 (19:51 +0100)]
PPC: e500: msync is 440 only, e500 has real sync
The e500 CPUs don't use 440's msync which falls on the same opcode IDs,
but instead use the real powerpc sync instruction. This is important,
since the invalid mask differs between the two.
Alexander Graf [Fri, 6 Jan 2012 03:02:24 +0000 (04:02 +0100)]
PPC: KVM: Update HIOR code to new interface
Unfortunately the HIOR setting code slipped into upstream QEMU
before it was pulled into upstream KVM. And since Murphy is always
right, comments on the patches only emerged on the pull request
leading to changes in the interface.
So here's an update to the HIOR setting. While at it, I also relaxed
it a bit since for HV KVM we can already run fine without and 3.2
works just fine with HV KVM but when not setting HIOR. We will only
need this when running PAPR in PR KVM.
Since we accidently changed the ABI and API along the way, we have
to update the underlying kernel headers together with the code that
uses it to not break bisectability.
Alexander Graf [Fri, 20 Jan 2012 13:41:12 +0000 (14:41 +0100)]
KVM: Update headers (except HIOR mess)
This patch is basically what ./scripts/update-linux-headers.sh against
upstream KVM's next branch outputs except that all the HIOR bits are
removed. These we have to update with the code that uses them.
Corey Bryant [Thu, 26 Jan 2012 14:42:27 +0000 (09:42 -0500)]
Add support for net bridge
The most common use of -net tap is to connect a tap device to a bridge. This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.
This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu. The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.
By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users. We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.
Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.
A typical invocation would be similar to one of the following:
The default bridge that we attach to is br0. The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.
Alternatively, if a user wants to use a different bridge, a typical invocation
would be simliar to one of the following:
Corey Bryant [Thu, 26 Jan 2012 14:42:26 +0000 (09:42 -0500)]
Add cap reduction support to enable use as SUID
The ideal way to use qemu-bridge-helper is to give it an fscap of using:
setcap cap_net_admin=ep qemu-bridge-helper
Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied. This means they'll have to SUID the qemu-bridge-helper
binary.
To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user. This is
hopefully close to equivalent to fscap support from a security perspective.
Corey Bryant [Thu, 26 Jan 2012 14:42:25 +0000 (09:42 -0500)]
Add access control support to qemu bridge helper
We go to great lengths to restrict ourselves to just cap_net_admin as an OS
enforced security mechanism. However, we further restrict what we allow users
to do to simply adding a tap device to a bridge interface by virtue of the fact
that this is the only functionality we expose.
This is not good enough though. An administrator is likely to want to restrict
the bridges that an unprivileged user can access, in particular, to restrict
an unprivileged user from putting a guest on what should be isolated networks.
This patch implements an ACL mechanism that is enforced by qemu-bridge-helper.
The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of
'all'. All users are blacklisted by default, and deny takes precedence over
allow.
An interesting feature of this ACL mechanism is that you can include external
ACL files. The main reason to support this is so that you can set different
file system permissions on those external ACL files. This allows an
administrator to implement rather sophisticated ACL policies based on
user/group policies via the file system.
As an example:
/etc/qemu/bridge.conf root:qemu 0640
allow br0
include /etc/qemu/alice.conf
include /etc/qemu/bob.conf
include /etc/qemu/charlie.conf
/etc/qemu/alice.conf root:alice 0640
allow br1
/etc/qemu/bob.conf root:bob 0640
allow br2
/etc/qemu/charlie.conf root:charlie 0640
deny all
This ACL pattern allows any user in the qemu group to get a tap device
connected to br0 (which is bridged to the physical network).
Users in the alice group can additionally get a tap device connected to br1.
This allows br1 to act as a private bridge for the alice group.
Users in the bob group can additionally get a tap device connected to br2.
This allows br2 to act as a private bridge for the bob group.
Users in the charlie group cannot get a tap device connected to any bridge.
Under no circumstance can the bob group get access to br1 or can the alice
group get access to br2. And under no cicumstance can the charlie group
get access to any bridge.
Corey Bryant [Thu, 26 Jan 2012 14:42:24 +0000 (09:42 -0500)]
Add basic version of bridge helper
This patch adds a helper that can be used to create a tap device attached to
a bridge device. Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.
The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor. The descriptor is one
end of a socketpair() of domain sockets. This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.
The helper can then exit and let qemu use the tap device.
Bugfix after reboot when vmmouse was enabled and another OS which uses e.g. PS/2
mouse.
Details:
When a guest activated the vmmouse followed by a reboot the vmmouse was still
enabled and the PS/2 mouse was therefore unsusable. When another guest is then
booted without vmmouse support (e.g. PS/2 mouse) the mouse is not working.
Reason is that VMMouse has priority and disables all other mouse entities
and therefore must be disabled on reset.
Testscenario:
1.) Boot e.g. OS with VMMouse support (e.g. Windows with VMMouse tools)
2.) reboot
3.) Boot e.g. OS without VMMouse support (e.g. DOS) => PS/2 mouse doesn't work
any more. Fixes that issue.
Testscenario 2 by Jan Kiszka <[email protected]>:
Confirm that this patch fixes a real issue. Setup: qemu.git,
opensuse 11.4 guest, SDL graphic, system_reset while guest is using the
vmmouse. Without the patch, the vmmouse become unusable after the
reboot. Also, the mouse stays in absolute mode even before X starts again.
Fixed by:
Disabling the vmmouse in its reset handler.
Laszlo Ersek [Fri, 27 Jan 2012 13:34:05 +0000 (14:34 +0100)]
keep the PID file locked for the lifetime of the process
The lockf() call in qemu_create_pidfile() aims at ensuring mutual
exclusion. We shouldn't close the pidfile on success (as introduced by
commit 1bbd1592), because that drops the lock as well [1]:
"File locks shall be released on first close by the locking process
of any file descriptor for the file."
Coverity may complain again about the leaked file descriptor; let's
worry about that later.
v1->v2:
- add reference to 1bbd1592
- explain the intentional fd leak in the source
Michael Roth [Sat, 21 Jan 2012 17:13:53 +0000 (11:13 -0600)]
main-loop: For tools, initialize timers as part of qemu_init_main_loop()
In some cases initializing the alarm timers can lead to non-negligable
overhead from programs that link against qemu-tool.o. At least,
setting a max-resolution WinMM alarm timer via mm_start_timer() (the
current default for Windows) can increase the "tick rate" on Windows
OSs and affect frequency scaling, and in the case of tools that run
in guest OSs such has qemu-ga, the impact can be fairly dramatic
(+20%/20% user/sys time on a core 2 processor was observed from an idle
Windows XP guest).
This patch doesn't address the issue directly (not sure what a good
solution would be for Windows, or what other situations it might be
noticeable), but it at least limits the scope of the issue to programs
that "opt-in" to using the main-loop.c functions by only enabling alarm
timers when qemu_init_main_loop() is called, which is already required
to make use of those facilities, so existing users shouldn't be
affected.
Michael Roth [Sat, 21 Jan 2012 01:08:27 +0000 (19:08 -0600)]
main-loop: Fix SetEvent() on uninitialized handle on win32
The __attribute__((constructor)) init_main_loop() automatically get
called if qemu-tool.o is linked in. On win32, this leads to
a qemu_notify_event() call which attempts to SetEvent() on a HANDLE that
won't be initialized until qemu_init_main_loop() is manually called,
breaking qemu-tools.o programs on Windows at runtime.
This patch checks for an initialized event handle before attempting to
set it, which is analoguous to how we deal with an unitialized
io_thread_fd in the posix implementation.
Jan Kiszka [Mon, 30 Jan 2012 10:27:33 +0000 (11:27 +0100)]
optionroms: Silence intermediate file removal
The build process of optionroms spits out an "rm ..." line. Moreover, it
removes all .o files that can be handy for debugging purposes. So
disable automatic intermediate removal.
Jan Kiszka [Tue, 31 Jan 2012 12:45:31 +0000 (13:45 +0100)]
sdl: Limit sdl_grab_end in handle_activation to Windows hosts
There are scenarios on Linux with some SDL versions where
handle_activation is continuous invoked with state = SDL_APPINPUTFOCUS
and gain = 0 while we grabbed the input. This causes a ping-pong when we
grab the input after an absolute mouse entered the window.
As this sdl_grab_end was once introduced to work around a Windows-only
issue (0294ffb9c8), limit it to that platform.
Jan Kiszka [Tue, 31 Jan 2012 12:45:30 +0000 (13:45 +0100)]
sdl: Grab input on end of non-absolute mouse click
By grabbing the input already on button down, we leave the button in
that state for the host GUI. Thus it takes another click after releasing
the input again to synchronize the mouse button state.
SDL_WM_GrabInput does not reliably bail out if grabbing is impossible.
So if we get here, we already lost and will block. But this can no
longer happen due to the check in sdl_grab_start. So this patch became
obsolete.
Jan Kiszka [Tue, 31 Jan 2012 12:45:28 +0000 (13:45 +0100)]
sdl: Fix block prevention of SDL_WM_GrabInput
Consistently check for SDL_APPINPUTFOCUS before trying to grab the input
focus. Just checking for SDL_APPACTIVE doesn't work. Moving the check to
sdl_grab_start allows for some consolidation.