Pakfire upgrade never closes (but seems to finish)

When upgrading from Core 185 → 186 the Pakfire Configuration screen gets to some point… today was [Jun 13 13:13:13 ipfire pakfire: DECRYPT STARTED: core-upgrade-186] and seems to hang there. How long should I wait before taking some action. Have seen this in the past a couple of times and waited as much as 4-6 hours but upgrade “stuck” in same place. In the past I have usually just rebooted router from console, then checked Pakfire log in ‘System Logs’. There it usually shows installation completed and home screen shows correct new version.

A couple of minutes and it will finish and the wui will continue.

I have tested it out during the testing phase and did not have any problems.

I have updated three systems today and all completed with no problems.

The issue that was there in the past few updates has been fixed in cu186.

Hi all, I would just like to confirm that I have just upgraded to CU186 and there were no issues like the previous update, no WUI freezing, nothing. Update to CU186 went perfectly. Thank you to all who worked on this, much appreciated.

3 Likes

Core 185->186

I waited more than 10 min - In my case I had to start the httpd manually to get the wui back.

I got also a message on the console: chmod: cannot access ‘/var/run/dhcpd.pid’ : No such file or directory

Can you show the contents of your core update log to see what messages there were relating to the apache server.

The file is /var/log/pakfire/update-core-upgrade-186.log

The following lines should be starting from line number 10480

removed '/var/ipfire/ovpn/openssl/ovpn.cnf'
removed directory '/var/ipfire/ovpn/openssl'
Searching in /usr/lib...
Removing /usr/lib/libkmod.so.2.4.1...
Searching in /lib...
Starting Apache daemon...
e[1Ae[0Ge[-8Ge[1;34m[e[1;32m  OK  e[1;34m]e[0;39m
dracut: Executing: /usr/bin/dracut --kver=6.6.32-ipfire --force
dracut: *** Including module: modsign ***
dracut: *** Including module: i18n ***
dracut: *** Including module: btrfs ***
dracut: *** Including module: dm ***

The above is from my CU186 updated system and it shows Starting Apache Daemon… and the following line has OK in it which means that Apache successfully started after the upgrade.

What do you have in your CU186 logfile for the Apache start. If it did not start then in place of OK there would be FAIL plus some message about why it failed.

That indicates that when you rebooted that the dhcp server failed to staqrt and therefore did not create the dhcpd.pid file and therefore the startup command that changes the permission of the pid file from 640 to 644 would fail because the dhcpd.pid file was not present.

I have no idea why the dhcp server would fail to start when the reboot was carried out.
If you go to the WUI menu page Logs - System Logs and then select DHCP Server in the drop down box labelled Section: and select the date when you did the upgrade and then press the Update button.

You will then get the logs related to the restart of the dhcp server from the reboot. In those logs there should be some messages relating to why the dhcp server failed to start.

I am having the same issue on nearly all of our IPFire installtions.
Adoph, here is the copy of the update log you requested…

Hi @fkienker

Thanks for the log message.

That log shows that apache was still running when it was told to start.

That suggests that when apache was told to stop at the start of the update script it failed to do so for some reason.

That would give a problem because then apache was effectively not restarted as it was still running while the update ocurred. The reboot would have restarted apache.

However something prevented apache from stopping andcwe need to understand what. I didn’t see that in any of the evaluations of cu186 testing that i evaluated. There must be some corner case setting preventing apache from stopping.

Can you show the lines at the start of that log where apache is stopped. Hopefully there will be some message for what prevented it from stopping.

1 Like

Rather than sending the log piecemeal, is there a way to attach the log to a forum message? I can also send it via the development list if that is easier.

You can copy the content in a block section but the size is a bit large.

The simplest is probably to send it via the devekopment list giving the forum post number in the subject line. Hopefully there will be some indication why it happened so that i can replicate it and therefore also evaluate any fix that is created.
The previous fix worked in my setups but obviously not yours and we need to understand what the difference is. There is obviously some marginality in the apache initscript that means that it fails to stop the daemon under certain conditions or within a certain timeframe.

Where would the forum post number be found?
I will send the file right away.

Edit - never mind, I believe I found it.
Check your email.

When I tried to email the log, I recieved the following message:

Your message to the Development mailing-list was rejected for the following reasons:

The message is not from a list member

Do you have a Plan B?

Ah, you have to be a member of the list. No anonymous postings are allowed, so you would need to join the list and your email address would be included in the allowed list.

Only alternative would be if you have a dropbox account or a pastebin or something like that where you can store the file and let me have a link to it.

You can do this to compress it and then post the .tar.gz. It should be found in the root directory

tar -cvzf update-core-upgrade-186.log.tar.gz /var/log/pakfire/update-core-upgrade-186.log
1 Like

I believe I am a member of the dev distribution list. I receive postings to the dev mail list, and have sent emails in the past as well. Being rejected is new to me, but it is what it is.

I already know how to work around this during upgrades, so this isn’t a big deal to me.

Here is the beginning of the file per your request.

Stopping Apache daemon…
e[1Ae[0Ge[-8Ge[1;34m[e[1;32m OK e[1;34m]e[0;39m
removed ‘/boot/System.map-6.6.15-ipfire’
removed ‘/boot/config-6.6.15-ipfire’
removed ‘/boot/initramfs-6.6.15-ipfire.img’
removed ‘/boot/vmlinuz-6.6.15-ipfire’

Since it shows Apache stopping correctly at the beginning, the issue, hopefully, is somewhere between these two points. But my suspicion is there is an issue in the stop, and it’s not really stopping. A quick and dirty fix would be to use a restart rather than start where it is failing.

Best regards, and best of luck!
Fred

Same experience here as the OP – this also was the case for the last CU. Apache didn’t restart.

Hi @fkienker

The apache initscript in Core Update 185 used

        stop)
                boot_mesg "Stopping Apache daemon..."
                /usr/sbin/apachectl -k stop
                evaluate_retval
                ;;

This command in apache told it to stop but did not wait until all the several child programs with their pids had all finished.

The fix for this was put into Core Update 186

        stop)
                boot_mesg "Stopping Apache daemon..."
                killproc /usr/sbin/httpd
                ;;

This command runs the kill command for each pid file that apache has open. It also does not move on until it gets back the status output for the whole command.

To ensure that anyone installing Core Update 185 from fresh did not have the previous problem the above fix was also placed into the Core Update 185 install iso from May 10th.

Therefore the existing apache initscript in all Core Update 185 systems would still have the original stop command which did not consistently stop all apache running instances for all setups so some users experience the problem and others didn’t.

Therefore your update to Core Update 186, when it ran the stop apache command was running the old version.

After your reboot you will now have the new apache initscript. You can check this by looking at the stop command section of the initscript (/etc/rc.d/init.d/apache) and seeing that it matches with the second one I showed above.

After upgrade to Core Update 186 all users will have the updated apache initscript that uses the killproc command to stop all apache instances and will not move on until it has a return status of 0, ie success.

Users on Core Update 185 still have the old apache initscript version where the stop command does not operate consistently and where apache does not wait for a return status for all instances before moving on. Whether this causes a problem or not depends on several factors for how IPFIre is loaded, in terms of active functions and so does not show up for all users or in all cases for the same user.

There were some fixes put in in previous Core Updates but those turned out not to be affecting the real root cause of the issue.

For most of the situations the devs systems were not showing the problem and hence the feedback that a previous fix had not worked was only obtained after a full Core Update was released and the users having the problem reported on it.

My expectation now is that for Core Update 186 onwards, apache will properly stop as the killproc command will be being used and this ensures that all instances of apache are shutdown before the command moves on.

4 Likes

After checking one of the systems upgraded to 186, the change is there.

But the code which did not work in the stop option appears to be retained in the restart option: Was there a reason for this?

That is the reload command which stays running but re-reads the configuration files. My understanding is that the reload command stops the child apache process but leaves the main process running which then loads the new configuration files and then re-starts the child processes again.

The restart command is the one above which is stop followed by start and so will use the updated stop command of killproc.

EDIT:
I would also say that generally the reload command is not used in the upgrades because we usually want to not just reload the config files but also start with the new binary due to changed library linkages.
Therefore we use either restart at the end of the upgrade script or a stop at the beginning followed by a start at the end and that ensures that the new binary with its updated libraries is installed

2 Likes

2 posts were split to a new topic: Chmod: cannot access ‘/var/run/dhcpd.pid’ : No such file or directory