FAILURE: in first configuration

mc5686 · 21 June 2023 15:02

I think I found the problem (see below).
I will test in a non-IPFire but equivalent VM to ascertain correctness of my analysis, but I have no clue about how to compile for IPFire.
I’ll report ASAP.

bonnietwin · 21 June 2023 15:09

There was nothing “below” indicating what the cause is thought to be. Maybe you missed to attach or copy and paste something?

In the c code the following article is listed

http://lwn.net/Articles/301888/

and this covers the principle of what the code is doing by looking at some standard memory locations defined by Intel & AMD for hypervisor info.

mc5686 · 21 June 2023 15:29

My bad.
I was answering your question so I assumed my previous comment would have been “below”, while it is actually “above”.
Sorry for the confusion.
I copy it here so to avoid further mishap:

Analyzing code it seems there’s a problem:

in my case /sys/hypervisor/type is not present, so the __asm__ check is performed.

I am under an hypervisor (QEmu) and thus presumably has_hypervisor is TRUE

if sig.text is not empty the check loop is entered.

if string is not found before loop will try strcmp(hypervisor_ids[HYPER_OTHER], sig.text)

but hypervisor_ids[HYPER_OTHER] = NULL

BOOOOMMM!!!

Can someone cross-check my analysis?

Thanks for your patience
Mauro

bonnietwin · 21 June 2023 15:45

Ah okay. Yes, you have to be careful with below and above in the forum. Better to specifically mention a post number or make a link to it.

I am not familiar at all with c code but I would struggle to understand why a c strcmp between a string and a null (which is still a string) would cause the kernel to throw a segmentation fault.

It would make more sense if one of the assembler related commands accessing specific memory locations is accessing something that is protected in some way or is the wrong memory location for that info. That would make more sense to me for causing a segmentation fault.
What kernel version are you using on the system running the QEMU instance.

I think this needs one of the more experienced developers to look at it.

Hopefully @xperimental is able to reproduce the fault because then one of the experienced developers can do the same and that enables them to do special builds where they can access tools like gdb etc to track the root cause of the problem.

mc5686 · 21 June 2023 16:07

My analysis seems confirmed.

Note: in C a NULL is not a string and passing a NULL pointer to a function expecting a valid pointer is very likely to bomb with Segmentation fault.

The minimally modified program:

#include <errno.h>
#include <fcntl.h>
#include <linux/hdreg.h>
#include <stdbool.h>
#include <string.h>
#include <sys/ioctl.h>
#include <stdio.h>
#include <stdint.h>
#include <assert.h>

/* hypervisor vendors */
enum hypervisors {
	HYPER_NONE       = 0,
	HYPER_XEN,
	HYPER_KVM,
	HYPER_MSHV,
	HYPER_VMWARE,
	HYPER_OTHER,
	HYPER_LAST /* for loop - must be last*/
};

const char *hypervisor_ids[] = {
	[HYPER_NONE]    = NULL,
	[HYPER_XEN]     = "XenVMMXenVMM",
	[HYPER_KVM]     = "KVMKVMKVM",
	/* http://msdn.microsoft.com/en-us/library/ff542428.aspx */
	[HYPER_MSHV]    = "Microsoft Hv",
	/* http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458 */
	[HYPER_VMWARE]  = "VMwareVMware",
	[HYPER_OTHER]   = NULL
};

const char *hypervisor_vendors[] = {
	[HYPER_NONE]    = NULL,
	[HYPER_XEN]     = "Xen",
	[HYPER_KVM]     = "KVM",
	[HYPER_MSHV]    = "Microsoft",
	[HYPER_VMWARE]  = "VMWare",
	[HYPER_OTHER]   = "other"
};

#define NEWLINE "\n\r"

static void truncate_nl(char *s) {
	assert(s);

	s[strcspn(s, NEWLINE)] = '\0';
}

static int read_one_line_file(const char *filename, char **line) {
	char t[2048];

	if (!filename || !line)
		return -EINVAL;

	FILE* f = fopen(filename, "re");
	if (!f)
		return -errno;

	if (!fgets(t, sizeof(t), f)) {
		if (ferror(f))
			return errno ? -errno : -EIO;

		t[0] = 0;
	}

	char *c = strdup(t);
	if (!c)
		return -ENOMEM;

	truncate_nl(c);

	*line = c;
	return 0;
}

/*
 * This CPUID leaf returns the information about the hypervisor.
 * EAX : maximum input value for CPUID supported by the hypervisor.
 * EBX, ECX, EDX : Hypervisor vendor ID signature. E.g. VMwareVMware.
 */
#define HYPERVISOR_INFO_LEAF   0x40000000

int detect_hypervisor(int *hypervisor) {
#if defined(__x86_64__) || defined(__i386__)
	/* Try high-level hypervisor sysfs file first: */
	char *hvtype = NULL;
	int r = read_one_line_file("/sys/hypervisor/type", &hvtype);
	if (r >= 0) {
		if (strcmp(hvtype, "xen") == 0) {
			*hypervisor = HYPER_XEN;
			return 1;
		}
	} else if (r != -ENOENT)
		return r;

	/* http://lwn.net/Articles/301888/ */

#if defined(__amd64__)
#define REG_a "rax"
#define REG_b "rbx"
#elif defined(__i386__)
#define REG_a "eax"
#define REG_b "ebx"
#endif

	uint32_t eax = 1;
	uint32_t ecx;
	union {
		uint32_t sig32[3];
		char text[13];
	} sig = {};

	__asm__ __volatile__ (
		/* ebx/rbx is being used for PIC! */
		"  push %%"REG_b"       \n\t"
		"  cpuid                \n\t"
		"  pop %%"REG_b"        \n\t"

		: "=a" (eax), "=c" (ecx)
		: "0" (eax)
	);

	bool has_hypervisor = !!(ecx & 0x80000000U);

	if (has_hypervisor) {
		/* There is a hypervisor, see what it is... */
		eax = 0x40000000U;
		__asm__ __volatile__ (
			"  push %%"REG_b"       \n\t"
			"  cpuid                \n\t"
			"  mov %%ebx, %1        \n\t"
			"  pop %%"REG_b"        \n\t"

			: "=a" (eax), "=r" (sig.sig32[0]), "=c" (sig.sig32[1]), "=d" (sig.sig32[2])
			: "0" (eax)
		);
		sig.text[12] = '\0';

		*hypervisor = HYPER_OTHER;

		if (*sig.text) {
			printf("sig.text: '%s'\n", sig.text);
			for (int id = HYPER_NONE + 1; id < HYPER_LAST; id++) {
				printf("checking: '%s'\n", hypervisor_ids[id]);
				if (strcmp(hypervisor_ids[id], sig.text) == 0) {
					*hypervisor = id;
					break;
				}
			}
		}

		return 1;
	}
#endif
	return 0;
}

int
main() {
	/*
		Get hypervisor from the cpuid command.
	*/
	int hypervisor = HYPER_NONE;

	int r = detect_hypervisor(&hypervisor);
	if (r >= 1) {
		const char* hypervisor_vendor = hypervisor_vendors[hypervisor];
		if (!hypervisor_vendor)
			printf("No hypervisor found\n");
		else
			printf("Hypervisor '%s' found!", hypervisor_vendor);
	}

	return 0;
}

outputs (as expected):

root@t2:~# gcc -Wall -o test test.c 
root@t2:~# ./test 
sig.text: 'Linux KVM Hv'
checking: 'XenVMMXenVMM'
checking: 'KVMKVMKVM'
checking: 'Microsoft Hv'
checking: 'VMwareVMware'
checking: '(null)'
Segmentation fault
root@t2:~#

Fix is trivial:

--- test.c.orig	2023-06-21 15:55:38.616000000 +0000
+++ test.c	2023-06-21 16:13:35.240000000 +0000
@@ -15,8 +15,8 @@
 	HYPER_KVM,
 	HYPER_MSHV,
 	HYPER_VMWARE,
-	HYPER_OTHER,
-	HYPER_LAST /* for loop - must be last*/
+	HYPER_QEMU,
+	HYPER_OTHER     /* for loop - must be last*/
 };
 
 const char *hypervisor_ids[] = {
@@ -27,6 +27,7 @@
 	[HYPER_MSHV]    = "Microsoft Hv",
 	/* http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458 */
 	[HYPER_VMWARE]  = "VMwareVMware",
+	[HYPER_QEMU]    = "Linux KVM Hv",
 	[HYPER_OTHER]   = NULL
 };
 
@@ -36,6 +37,7 @@
 	[HYPER_KVM]     = "KVM",
 	[HYPER_MSHV]    = "Microsoft",
 	[HYPER_VMWARE]  = "VMWare",
+	[HYPER_QEMU]    = "QEmu",
 	[HYPER_OTHER]   = "other"
 };
 
@@ -141,7 +143,7 @@
 
 		if (*sig.text) {
 			printf("sig.text: '%s'\n", sig.text);
-			for (int id = HYPER_NONE + 1; id < HYPER_LAST; id++) {
+			for (int id = HYPER_NONE + 1; id < HYPER_OTHER; id++) {
 				printf("checking: '%s'\n", hypervisor_ids[id]);
 				if (strcmp(hypervisor_ids[id], sig.text) == 0) {
 					*hypervisor = id;

Let me know if I should open a bug report or if this is enough, fr the time being.

Regards
Mauro

cfusco · 21 June 2023 16:11

@mc5686 wonderful work. Congratulations! Definitely I would open a bug report if I were you. However, let’s see what is @bonnietwin opinion.

cfusco · 21 June 2023 16:15

@mc5686 Very naive question from one that has never understood C. Do you think your patch could introduce a bug in some other context?

bonnietwin · 21 June 2023 16:23

I think you need to open a bug report. Just having it reported here won’t guarantee anything getting done with it. The core developers don’t have the time to look through the forum on a regular basis and the wiki page on bug reporting explicitly says that the forum is not the place to report a bug. It can be useful to confirm that the problem found is a bug and I think we are definitely there now with your information.

This version of fireinfo has been like this for around two years. Maybe nobody (or very few people) have installed IPFire into QEMU and hence why this is the first report of this problem. The recommendation fro IPFire anyway is to only run IPFire in a vm for testing purposes and not for an actual production system so this could be good news that IPFire in QEMU is not very common.

When you report the bug add the results from your various investigations into the bug itself, rather than just making a link from the bug to this forum post. It makes it much easier for the person who will take ownership of this bug to fix it when all the required information is in the bug report itself.

cfusco · 21 June 2023 16:38

@mc5686 In case this can help you out writing the bug report, this is chatGPT-4 explanation of your analysis and patch.

ChatGPT Text

The developer’s analysis explains the cause of a segmentation fault in the software, and it is consistent with the patch you’ve previously shown. Here’s a step-by-step explanation:

The code first tries to identify the hypervisor by reading the file /sys/hypervisor/type. However, this file does not exist in the developer’s QEMU/LXD environment.
As the file does not exist, the code falls back to using the CPUID assembly instruction to check if the system is running under any hypervisor (using the __asm__ block). In this case, the system is running under QEMU, so the has_hypervisor flag is set to true.
If has_hypervisor is true, the code checks the sig.text to see if it matches with any known hypervisor signatures.
If sig.text does not match any known hypervisor signatures, the loop continues until it reaches the HYPER_OTHER case.
Here lies the problem: the hypervisor_ids[HYPER_OTHER] is NULL. When the strcmp function tries to compare sig.text with this NULL pointer, it results in a segmentation fault, because strcmp expects both its arguments to be valid null-terminated strings.

The patch fixes this by adding an explicit case for QEMU, so the hypervisor is recognized correctly and the code doesn’t end up trying to do a string comparison (strcmp) with a NULL pointer, avoiding the segmentation fault. The HYPER_OTHER case is now the last in the enumeration, but its corresponding string in the hypervisor_ids array remains NULL, which should be fine because the loop in the detect_hypervisor function now stops before reaching this case.

mc5686 · 21 June 2023 16:40

@cfusco No, i don’t think it will open new bugs (but I could be proved wrong, of course).

Patch actually does two things:

closed the Segmentation Fault problem avoiding the “one off” problem.
adds support for QEmu.

Both are very localized and straightforward.

cfusco · 21 June 2023 16:43

@mc5686 Chat-GPT agrees, but it has some suggestion. Can you tell me your opinion of its criticism? I am trying to learn how these machines perform in code analysis.

EDIT, here the link to the complete chat-log. I will keep it up for few days, in case you are curious.

ChatGPT suggestions

The patch, as presented, should not introduce new bugs in the context of the original code’s purpose and environment. It is well contained, adding support for QEMU hypervisor detection and preventing a null pointer dereference, which was the cause of the segmentation fault.

However, there could be some potential issues in different contexts:

Hardcoding Hypervisor Signatures: The hypervisor signatures are hardcoded in this code, so if the signatures change in future versions of the hypervisors, this function may not correctly identify them.
Hypervisor Identification Order: The order of hypervisor identification matters in this code. If a system could feasibly return the signatures for multiple hypervisors (though unlikely), the first one in the list would always be chosen.
Unsupported Hypervisors: For any other hypervisors that are not in the list, the code will not identify them. This was the original problem with QEMU, and the patch fixes it by adding QEMU to the list. But if there are other hypervisors that return a non-empty signature and are not in the list, the code will hit the HYPER_OTHER case and, as it currently stands, this would again lead to a segmentation fault because hypervisor_ids[HYPER_OTHER] is NULL.

To further solidify the patch, you might consider adding a safety check before the strcmp operation to ensure that none of the strings being compared are NULL. This would provide a fail-safe for scenarios where a hypervisor might not be accounted for in the list.

Do remember, changes should always be thoroughly tested in all expected operational contexts to ensure they work as intended and don’t introduce new issues.

mc5686 · 21 June 2023 17:48

Suggestions (1) and (2) are valid, but would require a much larger patch and I doubt it would be worthwhile.
I tried to keep changes minimal.

Suggestion (3) is not valid because of the change in loop test: hypervisor_ids[HYPER_OTHER] will not be tested because id < HYPER_OTHER is FALSE and loop is terminated.

Patch “solidification” is generally valid, but I would advise to go that route if/when these functions are turned into lib functions called with unchecked arguments; as-is there’s no real need.

bonnietwin · 21 June 2023 17:57

Thinking about this a bit more and then looking through the virtual environments page of the IPFire Wiki

https://wiki.ipfire.org/hardware/virtual

I have found that it doesn’t mention IPFire as having been successfully tested on QEMU. It mentions VMware, KVM, Xen, Microsoft Hyper-V and Virtualbox.

This might be because QEMU can be used as an emulator or as an Hypervisor but I have read in a few places that QEMU is better to be used together with KVM as a hypervisor as the emulated QEMU section will be able to take advantage of the hardware virtualisation from the Kernel and operate at a reasonable speed whereas a fully emulated option will be slow.

I wondered about VirtualBox, which I use, not being listed in the fireinfo code. I just checked my fireinfo details and it shows that the virtualisation vendor is listed as KVM.
Reading up about VirtualBox it needs to have the hardware virtualisation turned on in the host system kernel and this is then detected as KVM. No KVM then no virtualisation with VirtualBox.

I am not so familiar with QEMU but it is also a Type 2 hypervisor, similar to VirtualBox so I wonder if you also need the hardware virtualisation turned on if you want to use it as a hypervisor as opposed to an emulator.

cfusco · 21 June 2023 18:05

I use QEMU with KVM in a Debian host and I can confirm it runs quite nicely.

bonnietwin · 21 June 2023 18:19

The question I have here is if @mc5686 is using KVM with the QEMU or just the QEMU on its own.

I would suspect only the QEMU with the KVM turned off. That would explain why the fireinfo code is not finding the KVM label. If KVM is being used then the question would be why is it not being found
If just the QEMU then I am wondering how well IPFire will run in a QEMU only environment without any hardware virtualisation, only software.

In fact looking at the code patch it is not detecting QEMU as a hypervisor it is just saying that if nothing else is found then the QEMU will be the [HYPER_OTHER] default value and I am not sure that is the right thing.

The first part of the patch even has the name for [HYPER_QEMU] set as “Linux KVM Hv” so that would suggest it should be using KVM and the KVM should be detected with the existing code.

cfusco · 21 June 2023 18:21

he is using QEMU in a container, namely LXD.

bonnietwin · 21 June 2023 18:23

Ah, Okay. I misread that originally for LXDE and thought it was a desktop environment in Linux.

I haven’t touched containers so I have no knowledge about them.

cfusco · 21 June 2023 18:29

Same for me. In case, the notes provided by @mc5686 are quite nice. Easy to follow, well documented tutorial.

bonnietwin · 21 June 2023 18:30

I suspect that the fireinfo code was not created to work with containers and some information is getting lost between the host kernel and the contents of the container. I presume that a container limits or stops two way traffic in and out of the container except for what is wanted.

mc5686 · 21 June 2023 18:31

@bonnietwin I think this has nothing to do with the bug I found.

detect_hypervisor() should not segfault, even in unexpected environment.

I can concur running a Firewall in a VM might thwart the very concept of separation, but, again, this is beyond the point.

Note that IPFire, in spite of this “malfunction”, runs perfectly fine in my environment and I’m currently sending this through it. I did not (yet) find any limitation, besides the “FAILURE:” printed, so I should say IPFire runs fine in my environment (LXD --vm which uses QEmu with accel = "kvm" under the hood).
In this condition the reported Hypervisor string is “Linux KVM Hv” and not plain “KVM” as expected.

I strongly suspect someone changed that string upstream and thus it might be the “Right Thing” to do is to change the HYPER_KVM string instead of adding a new one.
I would leave final decision to developers, but I will try to inquire with KVM people.