Speedtest-cli ERROR: Unable to connect to servers to test latency when run from fcrontab

So the script ran at 11:30 but it came back with an error.

Jan  3 10:37:52 ipfire qos-automatic.sh[9288]: DEBUG - RAW_SPEED = 234402086.8770652
Jan  3 11:30:03 ipfire qos-automatic.sh[13412]: Speedtest failed. Retrying. Result: ERROR: Unable to connect to servers to test latency.
Jan  3 11:30:05 ipfire qos-automatic.sh[13412]: Speedtest failed. Result: ERROR: Unable to connect to servers to test latency.

I would expect the 12:30 and 01:30 to do the same things if this happened at 11:30.

Will have a closer look now at the script details to try and figure out why it can’t connect to any servers.

The error message
ERROR: Unable to connect to servers to test latency.
rang a bell in my head.

I had referenced it in another post.
I found that the speedtest-cli program gives that message when it thinks too many attempts have been made to often to access the servers and it blocks the IP.

At that time I found a pull request in the speedtest-cli github that refers to this error and that many users believe that it is not due to too many attempts by themselves but that there is a bug in the code. It seems to occur for people more often when the access is being done at the half hour. It seemed to be getting triggered by home automation software running the speedtest for some reason and failing for the unable to connect to servers error.

https://github.com/sivel/speedtest-cli/pull/796

A user had supplied a pull request patch fix back in Feb 2023 but there has been no response from the github owner since then and there have been more response to the issue thread in github during 2024. It looks like the software is no longer being supported as the last commit was 4 years ago. The last response by the github owner was in Apr 2022 and that was just to close the pull request as not being done.

Some of the users reporting on the issue in github have tried the patch, which changes the software to get the server list via json and some of the home automation users have tested it and reported that it worked for them.

I will look at testing this out on my vm dev testbed but first need to confirm that those systems also have the same issue.
The tests so far have been run on my production system but I won’t modify code on that system.

After remembering that the issue seemed to be that the speedtest software when looking for servers has a problem when that is done at the half hour I modified my fcrontab from 12:30 to 12:15.

The speedtest check worked!!

So maybe a simple workaround is that you use 01:15 or 01:45 instead of 01:30 until I can test out the change to the speedtest python code and confirm if it works properly or not.

EDIT:
Added the 12:30 back in to the fcrontab and it failed again. So I think this confirms that the speedtest code for finding the servers to use has a bug in it that issues an error message for too many attempts if the attempt is done at the half hour. This looks like it is not in the speedtest.py code itself but in the php code that speedtest.net runs to get the server list.
This should probably happen also if manually done but that is more difficult to test as it depends on the time window that the speedtest software has a problem with.

EDIT2:
I manually ran the following command
speedtest-cli --no-upload --secure --csv
at precisely 14:30
and got the ERROR: Unable to connect to servers to test latency. message.
Running the same command at 14:32 worked without issues.
I think this definitely confirms that speedtest has a problem with any requests at the half hour.

EDIT3:
Confirmed that the problem occurred at 15:00 so it occurs at every half hour.
I then retested running the command at 15:00:10 and still the error message. Same at 15:00:15 but at 15:00:30 the command worked. So the boundary of the problem is somewhere between 15 secs and 30 secs after the half hour period.

3 Likes

Good news.
First I confirmed that I had the same problem on my vm testbed system. It had the error message if run at the hour or at the half hour but worked otherwise.
Then I applied the patch to the speedtest.py file and it just ran at 18:00 and there was no Error message.

Here is the log from the qos-automatic.sh cut-down script.

Jan  3 16:00:01 ipfire qos-automatic.sh[15042]: Speedtest failed. Retrying. Result: ERROR: Unable to connect to servers to test latency.
Jan  3 16:00:02 ipfire qos-automatic.sh[15042]: Speedtest failed. Result: ERROR: Unable to connect to servers to test latency.
Jan  3 17:30:01 ipfire qos-automatic.sh[24801]: Speedtest failed. Retrying. Result: ERROR: Unable to connect to servers to test latency.
Jan  3 17:30:02 ipfire qos-automatic.sh[24801]: Speedtest failed. Result: ERROR: Unable to connect to servers to test latency.
Jan  3 17:50:06 ipfire qos-automatic.sh[27028]: DEBUG - RAW_SPEED = 420379379.6093119
Jan  3 18:00:14 ipfire qos-automatic.sh[28122]: DEBUG - RAW_SPEED = 327218854.6605606

The first four entries are when I confirmed it didn’t work on the hour and the half hour running the script via fcron.
Then the entry at 17:50 is when I ran the script manually after patching to confirm that speedtest still worked as it should.
The 18:00 entry is the one run via fcron.

I will now do a patch submission into IPFire to add the patch into our build of speedtest. Hopefully it should get into CU192.
Until then you can run your script at 01:31 and it should run okay, or at least you can then see if the rest of the script is working or not.

5 Likes

Thank you very much for spending the time on this @bonnietwin !

I’ll manually copy that Speedtest pull request and try it now.

Is the theory that there are so many automated speed tests running on the half hour that the servers get overwhelmed?

Your theory explains why it always worked when I manually tested with Cron - it wasn’t due to my session being inactive, but instead due to the random times I was using, which weren’t on the hour or half hour.

I might modify my script to check if it is running within 30 seconds of an hour or half hour and if so sleep for 30 seconds.

# Never run on half hour or 30 minutes past the hour, when Speedtest servers can get overwhelmed 
MINUTE=$( date +%M )
if [ "$MINUTE" -eq 0 ] || [ "$MINUTE" -eq 30 ]
then
  # sleep half a minute to avoid busy time
  sleep 30
fi
1 Like

That is what some users on the speedtest github site have suggested.

I am not convinced of that. I would not expect that the servers would be overwhelmed every half hour throughout the whole day. I would expect some half hours would be okay. None of the ones I tested worked.

Also if it is overwhelmed by too many requests in total I would not expect to get an error message related to too many requests from my specific IP.

The old system in the speedtest code gets the server list info by calling and running a php script on the speedtest.net web server.

The fix no longer uses the php script but gets the json list from the speedtest.net web server. If there was a problem with the loading on the server at the half hours I would expect that would also be a problem for getting the json list from the servers. It is the same server, just accessed in a different way.

I suspect that speedtest.net have a bug in their php code that creates the list when it is called.

As speedtest.net do all their web based speedtests with sockets now and not via the http approach (from around 6 years ago), I am sure they are not doing any maintenance on the php code they have as it is not part of their core approach anymore.

So using json to get the server lists works at the half hour but using the old php code doesn’t.

It wouldn’t surprise me that eventually the speedtest-cli code will stop working as it is not being maintained.

Their is already a pull request in the speedtest github for a used deprecated date method that no longer works with python 3.12 which has not been responded to by the speedtest github owner.

I will have to look at adding that patch in when I start working on the next python update which will be shortly, otherwise speedtest will just not work.

The problem will be when something changes in python that stops speedtest working but no user of speedtest is able to fix it with a patch.

1 Like

I have created a patch set for speedtest-cli that applies the fix 429 errors patch, the python 3.10 support patch and the python 3.11 updates and fixes patch.

I have then manually installed that modified addon into my vm testbed system and have been running the qos-automatic script via fcron for the last hour os so.

It has worked correctly at 13:30 and 14:00

I will leave it running a bit longer just to confirm.

EDIT:
It has now been running for nine half hour periods and fcron has successfully got the data back from speedtest at every one of them. So I think this confirms it. Using the server list accessed via the json service resolves the issue if you end up running an fcron entry at the hour or half hour slots.

I will redo the speedtest build to also include a python 3.12 removal of deprecated date method patch that is a pull request in the speedtest github.
This uses the correct date method depending on whether you are running prior to 3.11, or 3.11 and later as different methods are required for each and the method for 3.11 and later only started from 3.11 and the method from before 3.11 is no longer valid from 3.12 onwards.
I think it is worth adding this patch in now as I will be shortly working on updating IPfire to the latest python, together with updates on 43 out of 61 python3 packages used in IPFire.

I would expect that the patch set for this fix to the speedtest package should be able to get into CU192.

1 Like

It turns out that as the four patches being used have not all been merged into the same github system (because there is no active support from the speedtest github owner) then it turns out that the python-3.11 patch changes the line that will be patched by the python-3.12 patch and it therefore fails as it can’t find the line to be modified as it now looks different.

So I am going to have to create my own version of the python-3.12 patch that takes account of that change.

Oh well. Will not get submitted today then. More likely tomorrow or Monday as I not only want to successfully do a complete build but also test out installing the speedtest-cli addon to make sure that everything still works as it should.

4 Likes

Thank you so much for all of this!

Thank you especially for clarifying the issue with the php version of speedtest.net.

I tested the copy I’d linked above, from the specific speedtest-cli/pull/796 on Github you’d mentioned and it worked at exactly the half hour, so the dodgy 30 second sleep logic I’d added wasn’t needed.

It sounds like the version I’m currently running wont work with Python 3.12 though.

Thank you very much! I’m ashamed to say I’ve never had reason to learn python and aside from modifying existing scripts, my ability with it is limited.

It will still work. I have tested on my desktop which is using Arch Linux for the OS and that has python 3.13 and it still works on there as well.

The datetime.datetime.utcnow() function has been deprecated by python but is still available but at some future version python will remove it.

The deprecation notice gives us some early warning that it will disappear in the future.

So nothing to panic about yet and I will be able to apply the python 2.12 patch fix to the speedtest-cli that we have in IPFire.

3 Likes

The updated speedtest-cli with the additional patches to fix the problem highlighted in this post thread has been merged into next and will end up in Core Update 192.

4 Likes