Alexander Graf [Wed, 25 Jan 2012 16:06:30 +0000 (17:06 +0100)]
PPC: booke206: move avail check to tlbwe
We can have TLBs that only support a single page size. This is defined
by the absence of the AVAIL flag in TLBnCFG. If this is the case, we
currently write invalid size info into the TLB, but override it on
internal fault.
Let's move the check over to tlbwe, so we don't have the AVAIL check in
the hotter fault path.
Alexander Graf [Wed, 25 Jan 2012 15:27:26 +0000 (16:27 +0100)]
PPC: booke206: Check for TLB overrun
Our internal helpers to fetch TLB entries were not able to tell us
that an entry doesn't even exist. Pass an error out if we hit such
a case to not accidently pass beyond the TLB array.
Alexander Graf [Fri, 20 Jan 2012 03:09:15 +0000 (04:09 +0100)]
PPC: booke206: Implement tlbilx
The PowerPC 2.06 BookE ISA defines an opcode called "tlbilx" which is used
to flush TLB entries. It's the recommended way of flushing in virtualized
environments.
So far we got away without implementing it, but Linux for e500mc uses this
instruction, so we better add it :).
Alexander Graf [Fri, 20 Jan 2012 03:07:51 +0000 (04:07 +0100)]
PPC: booke206: Check for min/max TLB entry size
When setting a TLB entry, we need to check if the TLB we're putting it in
actually supports the given size. According to the 2.06 PowerPC ISA, a
value that's out of range can either be redefined to something implementation
dependent or we can raise an illegal opcode exception. We do the latter.
Alexander Graf [Fri, 20 Jan 2012 03:06:18 +0000 (04:06 +0100)]
PPC: booke206: allow NULL raddr in ppcmas_tlb_check
We might want to call the tlb check function without actually caring about
the real address resolution. Check if we really should write the value
back.
Alexander Graf [Thu, 19 Jan 2012 18:51:50 +0000 (19:51 +0100)]
PPC: e500: msync is 440 only, e500 has real sync
The e500 CPUs don't use 440's msync which falls on the same opcode IDs,
but instead use the real powerpc sync instruction. This is important,
since the invalid mask differs between the two.
Alexander Graf [Fri, 6 Jan 2012 03:02:24 +0000 (04:02 +0100)]
PPC: KVM: Update HIOR code to new interface
Unfortunately the HIOR setting code slipped into upstream QEMU
before it was pulled into upstream KVM. And since Murphy is always
right, comments on the patches only emerged on the pull request
leading to changes in the interface.
So here's an update to the HIOR setting. While at it, I also relaxed
it a bit since for HV KVM we can already run fine without and 3.2
works just fine with HV KVM but when not setting HIOR. We will only
need this when running PAPR in PR KVM.
Since we accidently changed the ABI and API along the way, we have
to update the underlying kernel headers together with the code that
uses it to not break bisectability.
Alexander Graf [Fri, 20 Jan 2012 13:41:12 +0000 (14:41 +0100)]
KVM: Update headers (except HIOR mess)
This patch is basically what ./scripts/update-linux-headers.sh against
upstream KVM's next branch outputs except that all the HIOR bits are
removed. These we have to update with the code that uses them.
Corey Bryant [Thu, 26 Jan 2012 14:42:27 +0000 (09:42 -0500)]
Add support for net bridge
The most common use of -net tap is to connect a tap device to a bridge. This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.
This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu. The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.
By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users. We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.
Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.
A typical invocation would be similar to one of the following:
The default bridge that we attach to is br0. The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.
Alternatively, if a user wants to use a different bridge, a typical invocation
would be simliar to one of the following:
Corey Bryant [Thu, 26 Jan 2012 14:42:26 +0000 (09:42 -0500)]
Add cap reduction support to enable use as SUID
The ideal way to use qemu-bridge-helper is to give it an fscap of using:
setcap cap_net_admin=ep qemu-bridge-helper
Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied. This means they'll have to SUID the qemu-bridge-helper
binary.
To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user. This is
hopefully close to equivalent to fscap support from a security perspective.
Corey Bryant [Thu, 26 Jan 2012 14:42:25 +0000 (09:42 -0500)]
Add access control support to qemu bridge helper
We go to great lengths to restrict ourselves to just cap_net_admin as an OS
enforced security mechanism. However, we further restrict what we allow users
to do to simply adding a tap device to a bridge interface by virtue of the fact
that this is the only functionality we expose.
This is not good enough though. An administrator is likely to want to restrict
the bridges that an unprivileged user can access, in particular, to restrict
an unprivileged user from putting a guest on what should be isolated networks.
This patch implements an ACL mechanism that is enforced by qemu-bridge-helper.
The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of
'all'. All users are blacklisted by default, and deny takes precedence over
allow.
An interesting feature of this ACL mechanism is that you can include external
ACL files. The main reason to support this is so that you can set different
file system permissions on those external ACL files. This allows an
administrator to implement rather sophisticated ACL policies based on
user/group policies via the file system.
As an example:
/etc/qemu/bridge.conf root:qemu 0640
allow br0
include /etc/qemu/alice.conf
include /etc/qemu/bob.conf
include /etc/qemu/charlie.conf
/etc/qemu/alice.conf root:alice 0640
allow br1
/etc/qemu/bob.conf root:bob 0640
allow br2
/etc/qemu/charlie.conf root:charlie 0640
deny all
This ACL pattern allows any user in the qemu group to get a tap device
connected to br0 (which is bridged to the physical network).
Users in the alice group can additionally get a tap device connected to br1.
This allows br1 to act as a private bridge for the alice group.
Users in the bob group can additionally get a tap device connected to br2.
This allows br2 to act as a private bridge for the bob group.
Users in the charlie group cannot get a tap device connected to any bridge.
Under no circumstance can the bob group get access to br1 or can the alice
group get access to br2. And under no cicumstance can the charlie group
get access to any bridge.
Corey Bryant [Thu, 26 Jan 2012 14:42:24 +0000 (09:42 -0500)]
Add basic version of bridge helper
This patch adds a helper that can be used to create a tap device attached to
a bridge device. Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.
The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor. The descriptor is one
end of a socketpair() of domain sockets. This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.
The helper can then exit and let qemu use the tap device.
Bugfix after reboot when vmmouse was enabled and another OS which uses e.g. PS/2
mouse.
Details:
When a guest activated the vmmouse followed by a reboot the vmmouse was still
enabled and the PS/2 mouse was therefore unsusable. When another guest is then
booted without vmmouse support (e.g. PS/2 mouse) the mouse is not working.
Reason is that VMMouse has priority and disables all other mouse entities
and therefore must be disabled on reset.
Testscenario:
1.) Boot e.g. OS with VMMouse support (e.g. Windows with VMMouse tools)
2.) reboot
3.) Boot e.g. OS without VMMouse support (e.g. DOS) => PS/2 mouse doesn't work
any more. Fixes that issue.
Testscenario 2 by Jan Kiszka <[email protected]>:
Confirm that this patch fixes a real issue. Setup: qemu.git,
opensuse 11.4 guest, SDL graphic, system_reset while guest is using the
vmmouse. Without the patch, the vmmouse become unusable after the
reboot. Also, the mouse stays in absolute mode even before X starts again.
Fixed by:
Disabling the vmmouse in its reset handler.
Laszlo Ersek [Fri, 27 Jan 2012 13:34:05 +0000 (14:34 +0100)]
keep the PID file locked for the lifetime of the process
The lockf() call in qemu_create_pidfile() aims at ensuring mutual
exclusion. We shouldn't close the pidfile on success (as introduced by
commit 1bbd1592), because that drops the lock as well [1]:
"File locks shall be released on first close by the locking process
of any file descriptor for the file."
Coverity may complain again about the leaked file descriptor; let's
worry about that later.
v1->v2:
- add reference to 1bbd1592
- explain the intentional fd leak in the source
Michael Roth [Sat, 21 Jan 2012 17:13:53 +0000 (11:13 -0600)]
main-loop: For tools, initialize timers as part of qemu_init_main_loop()
In some cases initializing the alarm timers can lead to non-negligable
overhead from programs that link against qemu-tool.o. At least,
setting a max-resolution WinMM alarm timer via mm_start_timer() (the
current default for Windows) can increase the "tick rate" on Windows
OSs and affect frequency scaling, and in the case of tools that run
in guest OSs such has qemu-ga, the impact can be fairly dramatic
(+20%/20% user/sys time on a core 2 processor was observed from an idle
Windows XP guest).
This patch doesn't address the issue directly (not sure what a good
solution would be for Windows, or what other situations it might be
noticeable), but it at least limits the scope of the issue to programs
that "opt-in" to using the main-loop.c functions by only enabling alarm
timers when qemu_init_main_loop() is called, which is already required
to make use of those facilities, so existing users shouldn't be
affected.
Michael Roth [Sat, 21 Jan 2012 01:08:27 +0000 (19:08 -0600)]
main-loop: Fix SetEvent() on uninitialized handle on win32
The __attribute__((constructor)) init_main_loop() automatically get
called if qemu-tool.o is linked in. On win32, this leads to
a qemu_notify_event() call which attempts to SetEvent() on a HANDLE that
won't be initialized until qemu_init_main_loop() is manually called,
breaking qemu-tools.o programs on Windows at runtime.
This patch checks for an initialized event handle before attempting to
set it, which is analoguous to how we deal with an unitialized
io_thread_fd in the posix implementation.
Jan Kiszka [Mon, 30 Jan 2012 10:27:33 +0000 (11:27 +0100)]
optionroms: Silence intermediate file removal
The build process of optionroms spits out an "rm ..." line. Moreover, it
removes all .o files that can be handy for debugging purposes. So
disable automatic intermediate removal.
Jan Kiszka [Tue, 31 Jan 2012 12:45:31 +0000 (13:45 +0100)]
sdl: Limit sdl_grab_end in handle_activation to Windows hosts
There are scenarios on Linux with some SDL versions where
handle_activation is continuous invoked with state = SDL_APPINPUTFOCUS
and gain = 0 while we grabbed the input. This causes a ping-pong when we
grab the input after an absolute mouse entered the window.
As this sdl_grab_end was once introduced to work around a Windows-only
issue (0294ffb9c8), limit it to that platform.
Jan Kiszka [Tue, 31 Jan 2012 12:45:30 +0000 (13:45 +0100)]
sdl: Grab input on end of non-absolute mouse click
By grabbing the input already on button down, we leave the button in
that state for the host GUI. Thus it takes another click after releasing
the input again to synchronize the mouse button state.
SDL_WM_GrabInput does not reliably bail out if grabbing is impossible.
So if we get here, we already lost and will block. But this can no
longer happen due to the check in sdl_grab_start. So this patch became
obsolete.
Jan Kiszka [Tue, 31 Jan 2012 12:45:28 +0000 (13:45 +0100)]
sdl: Fix block prevention of SDL_WM_GrabInput
Consistently check for SDL_APPINPUTFOCUS before trying to grab the input
focus. Just checking for SDL_APPACTIVE doesn't work. Moving the check to
sdl_grab_start allows for some consolidation.
Jan Kiszka [Fri, 27 Jan 2012 18:55:43 +0000 (19:55 +0100)]
Improve default machine options usability
So far we overwrite the machine options completely with defaults if no
accel=value is provided. More user friendly is to fill in only
unspecified options. The new qemu_opts_set_defaults enables this.
Jan Kiszka [Fri, 27 Jan 2012 18:54:54 +0000 (19:54 +0100)]
qemu-option: Introduce default mechanism
This adds qemu_opts_set_defaults, an interface provide default values
for a QemuOpts set. Default options are parsed from a string and then
prepended to the list of existing options, or they serve as the sole
QemuOpts set.
Jan Kiszka [Mon, 23 Jan 2012 19:15:11 +0000 (20:15 +0100)]
qdev: Introduce lost tick policy property
Potentially tick-generating timer devices will gain a common property:
lock_tick_policy. It allows to encode 4 different ways how to deal with
tick events the guest did not process in time:
discard - ignore lost ticks (e.g. if the guest compensates for them
already)
delay - replay all lost ticks in a row once the guest accepts them
again
merge - if multiple ticks are lost, all of them are merged into one
which is replayed once the guest accepts it again
slew - lost ticks are gradually replayed at a higher frequency than
the original tick
Not all timer device will need to support all modes. However, all need
to accept the configuration via this common property.
Peter Maydell [Tue, 17 Jan 2012 13:23:13 +0000 (13:23 +0000)]
exec.c: Clarify comment about tlb_flush() flush_global parameter
Clarify the comment about tlb_flush()'s flush_global parameter,
so it is clearer what it does and why it is OK that the implementation
currently ignores it.
Paolo Bonzini [Fri, 20 Jan 2012 12:05:00 +0000 (13:05 +0100)]
m48t59: use rtc_clock for alarm timer
This lets the RTC get adjustments from the host NTP client.
The watchdog still uses the vm_clock. The previous behavior is
available with "-rtc clock=vm".
hw/9pfs: Remove O_NOATIME flag from 9pfs open() calls in readonly mode
When 2c74c2cb4bedddbfa67628fbd5f9273b4e0e9903 added support for
the 'readonly' flag against 9p filesystems, it also made QEMU
add the O_NOATIME flag as a side-effect.
The O_NOATIME flag, however, may only be set by the file owner,
or a user with CAP_FOWNER capability. QEMU cannot assume that
this is the case for filesytems exported to QEMU.
eg, run QEMU as non-root, and attempt to pass the host OS
filesystem through to the guest OS with readonly enable.
The result is that the guest OS cannot open any files at
all.
If O_NOATIME is really required, it should be optionally
enabled via a separate QEMU command line flag.
M. Mohan Kumar [Thu, 19 Jan 2012 06:51:12 +0000 (12:21 +0530)]
hw/9pfs: Preserve S_ISGID
In passthrough security model in local fs driver, after a file creation
chown and chmod are done to set the file credentials and mode as requested
by 9p client. But if there was a request to create a file with S_ISGID
bit, doing chown on that file resets the S_ISGID bit. So first call
chown and then invoking chmod with proper mode bit retains the S_ISGID
(if present/requested)
This resulted in LTP mknod02, mknod03, mknod05, open10 test case
failures. This patch fixes this issue.
man 2 chown
When the owner or group of an executable file are changed by an unprivileged
user the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify
whether this also should happen when root does the chown(); the Linux behavior
depends on the kernel version.
Blue Swirl [Sat, 28 Jan 2012 13:11:20 +0000 (13:11 +0000)]
Merge branch 'target-arm.for-upstream' of git://git.linaro.org/people/pmaydell/qemu-arm
* 'target-arm.for-upstream' of git://git.linaro.org/people/pmaydell/qemu-arm:
Add Cortex-A15 CPU definition
Add dummy implementation of generic timer cp15 registers
arm: store the config_base_register during cpu_reset
target-arm/helper.c: Don't assume softfloat int32 is 32 bits only
target-arm: Fix implementation of TLB invalidate operations
Andreas Färber [Fri, 27 Jan 2012 19:08:52 +0000 (20:08 +0100)]
unin_pci: Fix typos in device names
Commit 999e12bbe85c5dcf49bef13bce4f97399c7105f4 (sysbus: apic: ioapic:
convert to QEMU Object Model) introduced two typos, one of which broke
the mac99 machine.
Anthony Liguori [Tue, 24 Jan 2012 19:12:29 +0000 (13:12 -0600)]
sysbus: apic: ioapic: convert to QEMU Object Model
This converts three devices because apic and ioapic are subclasses of sysbus.
Converting subclasses independently of their base class is prohibitively hard.
Anthony Liguori [Sun, 4 Dec 2011 20:37:06 +0000 (14:37 -0600)]
qdev: add class_init to DeviceInfo
Since we are still dynamically creating TypeInfo, we need to chain the
class_init function in order to be able to make use of it within subclasses of
TYPE_DEVICE.
This will disappear once we register TypeInfos directly.
Anthony Liguori [Thu, 15 Dec 2011 20:40:29 +0000 (14:40 -0600)]
qdev: add a interface to register subclasses
In order to introduce inheritance while still using the qdev registration
interfaces, we need to be able to use a parent other than TYPE_DEVICE. Add a
new interface that allows this.
Anthony Liguori [Sun, 4 Dec 2011 17:08:36 +0000 (11:08 -0600)]
qdev: move qdev->info to class
Right now, DeviceInfo acts as the class for qdev. In order to switch to a
proper ObjectClass derivative, we need to ween all of the callers off of
interacting directly with the info pointer.
Anthony Liguori [Fri, 16 Dec 2011 20:34:46 +0000 (14:34 -0600)]
qdev: integrate with QEMU Object Model (v2)
This is a very shallow integration. We register a TYPE_DEVICE but only use
QOM as basically a memory allocator. This will make all devices show up as
QOM objects but they will all carry the TYPE_DEVICE.
Signed-off-by: Anthony Liguori <[email protected]>
---
v1 -> v2
- update for new location of object.h
Anthony Liguori [Sat, 3 Dec 2011 23:10:08 +0000 (17:10 -0600)]
qom: add the base Object class (v2)
This class provides the main building block for QEMU Object Model and is
extensively documented in the header file. It is largely inspired by GObject.
Signed-off-by: Anthony Liguori <[email protected]>
---
v1 -> v2
- remove printf() in type registration
- fix typo in comment (Paolo)
- make Interface private
- move object into a new directory and move header into include/qemu/
- don't make object.h depend on qemu-common.h
- remove Type and replace it with TypeImpl * (Paolo)
- use hash table to store types (Paolo)
- aggressively cache parent type (Paolo)
- make a type_register and use it with interfaces (Paolo)
- fix interface cast comment (Paolo)
- add a few more functions required in later series
Anthony Liguori [Fri, 27 Jan 2012 15:00:03 +0000 (09:00 -0600)]
Merge remote-tracking branch 'pmaydell/arm-devs.for-upstream' into staging
* pmaydell/arm-devs.for-upstream:
arm: SoC model for Calxeda Highbank
arm_boot: support board IDs more than 16 bits wide
arm: add secondary cpu boot callbacks to arm_boot.c
ahci: add support for non-PCI based controllers
Add xgmac ethernet model
Thomas Higdon [Tue, 24 Jan 2012 17:19:44 +0000 (12:19 -0500)]
scsi: Guard against buflen exceeding req->cmd.xfer in scsi_disk_emulate_command
Limit the return value (corresponding to the length of the buffer to be
DMAed back to the intiator) to the value in req->cmd.xfer, which is the
amount of data that the initiator expects. Eliminate now-duplicate code
that does this guarding in the functions for individual commands.
Without this, the SCRIPTS code in the emulated LSI device eventually
raises a DMA interrupt for a data overrun when an INQUIRY command whose
buflen exceeds req->cmd.xfer is processed. It's the responsibility of
the client to provide a request buffer and allocation length that are
large enough for the result of the command.
Li Zhi Hui [Mon, 21 Nov 2011 07:40:39 +0000 (15:40 +0800)]
qcow: Use bdrv functions to replace file operation
Since common file operation functions lack of error detection and use
much more I/O syscalls, so change them to bdrv series functions and
reduce I/O request.
Stefan Weil [Sat, 21 Jan 2012 12:54:24 +0000 (13:54 +0100)]
block/vdi: Zero unused parts when allocating a new block (fix #919242)
The new block was filled with zero when it was allocated by g_malloc0,
but when it was reused later and only partially used, data from the
previously allocated block were still present and written to the new
block.
This caused the problems reported by bug #919242
(https://bugs.launchpad.net/qemu/+bug/919242).
Now the unused parts of the new block which are before and after the data
are always filled with zero, so it is no longer necessary to zero the whole
block with g_malloc0.
There already exists a virtio_blk_handle_write trace event as well as
completion events. Add the virtio_blk_handle_read event so it's easy to
trace virtio-blk requests for both read and write operations.
Stefan Hajnoczi [Wed, 18 Jan 2012 14:40:50 +0000 (14:40 +0000)]
blockdev: make image streaming safe across hotplug
Unplugging a storage interface like virtio-blk causes the host block
device to be deleted too. Long-running operations like block migration
must take a DriveInfo reference to prevent the BlockDriverState from
being freed. For image streaming we can do the same thing.
Note that it is not possible to acquire/release the drive reference in
block.c where the block job functions live because
drive_get_ref()/drive_put_ref() are blockdev.c functions. Calling them
from block.c would be a layering violation - tools like qemu-img don't
even link against blockdev.c.