Getting martian source errors in kernel and dropping network

I have a couple of ASIX 88179 ethernet adapters and periodically they seem to be dropping network connection in IPFire starting with CU 190. I tried using a different USB port, swapped out the cable, and used a different port into the switch even. Nothing worked until I swaped the dongle out with a spare using the same chipset. That seemed to be stable for a while, but it just started dropping again.

The only thing I can see that coincides with the network dropping in the logs is a cascade of IPV4: martian source errors in the kernel logs. The 88179 is serving as my green interface and when it dies, I can’t connect to the firewall through anything locally. I can still connect to my OpenVPN server on the IP Fire box from outside the network, but the IP Fire’s WUI is sluggish and non-responsive when I do.

Just swapped out with another dongle that uses a Realtek 8152 chipset to see if that resolves the issue.

Edit: Update, even the Realtek 8152 adapter is getting this issue. I’m guessing it might be a problem with my cheap Linksys LN1300 mesh system. I’m going to try to swap out to the old mesh I had and see if that fixes the issue.

Edit2: Still getting martian sources like this even with new wireless router, not sure the source:

21:00:10 	kernel: 	IPv4: martian source 169.254.38.63 from 169.254.38.63, on dev green0

MAC address listed in the entry below seems to match the Mesh wifi and they are using the ethernet as a backhaul. Is there an issue with IP Fire not managing to ignore those correctly? Why is my switch broadcasting the backhaul to the firewall to begin with?

It’s intended that the kernel warns if it recieve a packet from an IP outsdide the network range on network. Usually this is an attack or a serios config issue. For me this looks like there is a device with a static configured mac address (from the IPIPA/Windows autoconfig range) in your green network.

2 Likes

I’m still not sure what is going on here, though. The system will go for days without an issue then start getting an avalanche of these martian source corresponding to the green interface failing entirely. Is this an attack going on that IP Fire is somehow shutting down or is there something else at work like the wired backhaul of my mesh system causing a crash?

|17:14:02 |kernel: |IPv4: martian source 162.159.36.5 from 100.74.63.71, on dev green0|
|---|---|---|
|17:14:02 |kernel: |ll header: 00000000: 00 0e c6 c5 c9 c0 ae 7f 53 e3 85 72 08 00 |

It started yesterday with that. The MAC ae:7f:53:e3:85:72 might be a guest device from one of my kid’s friends since it’s not on my list of known devices. That continues for a while then stops, then more show up with two devices, a laptop and an audio receiver.

Then a entry:

19:46:25 	kernel: 	net_ratelimit: 15 callbacks suppressed 

The laptop and receiver stop getting martian addresses but then an Apple Watch and Sprinkler controller getting thrown in the mix instead, then the green0 interface fails completely with no log entry showing it failing. It just doesn’t respond anymore.

Not sure what is happening here.

I had weird issues like this when I was upgrading across different versions in stable. So I imagine you need to print the settings you put in IPFire through the web gui, and download and install the Jan21 download and reload the settings. Because I don’t trust it backing up the settings across different versions. Maybe this is unnecessary, but at least you have a paper copy of your settings.

The I would go a step further and change the distro type to testing and use the new Linux kernel because they did fix a few usb drivers and my usb ethernet (ASIX) works correctly. I did update from 189 to 190, but some of the default firewall rules were different than it should be and the previous 190 iso I downloaded in December had the same issue when I loaded it on a replacement machine.

I may have to try a full reinstall once I have time. For now, I swapped to an older mesh wifi system that seemed to have martian source issues when it first came up but has stopped for now. I’ll see if it gets them again.

The other possibility is that the LN1300’s I have aren’t technically supposed to support mesh but you can enable it through a hidden menu on their WUI. It may be that the unsupported mesh has issues with the backhaul that is causing cascading failures that just started popping up with the last two CU’s.

All mesh systems doesn’t support dchp reliably from themselves. So all the mesh units need to be static outside green’s dhcp pool, and the gateway and dns ip be the IPFire server green interface. Also, WAN is to be disabled, as well as the dhcp server on each node. And for the primary, connect from a LAN port to green network input of the IPFire server.

But as far as the usb3 AXIs ethernet, it was running in usb 2.0 fast mode (480mbps) instead of usb3.0 (1Gbps) and dropping out randomly until I manually updated the kernel that now is in the “testing” repository of pakfire in the web gui. Which you would switch repository, and the update to 192 testing will appear on the top right section.

This driver broke at a kernel level that all Linux distributions incorporated in their stable versions that you had to either switch to their testing or manually installing the driver. In this distribution, its just easier to switch to testing than to rebuild the OS from source, install all the build tools to compile to sign a driver and install it.

If I’m reading this correctly, it’s not enough to have the mesh system be in access point (or bridging) mode with WAN turned off, I can’t even have them using DHCP as a client to get their own static IP’s from IPFire? I need to manually configure the static IP on each node, even if IP Fire has them defined as static IP’s?

All 3 nodes are wired to a managed 16 port gigabit switch which also connects to the green interface on my IPFire box. They were using wired backhaul but these are finicky and seemed to keep dropping back to wireless backhaul on their own no matter what I did.

Mine is connected at 5000M according to lsusb --tree, so I’m guessing this might not be my problem. Maybe a different ASIX model?

The dhcp server on green would serve it all if wan and dchp is turned off
and plugged into the LAN port of each .

But what you have to do to set a wired back haul on the linksys/wrt style mesh is create the chain by wireless, then one at at time plug them in by thier lan port. But WAN and DHCP has to be off on them. That would switch the wireless backhaul to the lan port.

But you don’t even need to use 802.11s (mesh) anyways if you are using wired backhaul. Its only when you have node(s) that are all wireless.

Because if you set the ssid, password and auth security the same, disable dhcp, set the radio channel to different channels, and put them into ap mode and plug them in (any port) its works the same. You don’t even have to buy special equipment nor have them be the same manufacturer if you have a wired backhaul.

He could have patched the driver, but I run 6.12.3 kernel or higher since there was so many fixes that transpired including some security fixes. So that is why I upgraded to it. Besides my equipment runs faster it seems. Even the web gui.

There is only two asix chips that span the two decades. The only thing that changes is the usb phy chip. But this was one of a group of them that the drivers malfunctioned when Linux Main broke it.

Correct me if I’m wrong, but what you’re describing is a DIY mesh. Real mesh uses the backhaul to allow the nodes to communicate and hand traffic off between each other balancing signal strength and load rather than relying on the client device itself deciding to drop one node and move to the other.

I’ve tried DIY mesh and had issues where devices cling to the further node for dear life even when it has a much stronger signal from a closer node. This is really more of a problem with cell phones and tablets that get moved throughout the house while streaming things, then you end up with 5mbps because it’s still got usable signal from the node in the basement when you’re upstairs and could get 200+ from the closer node.

its a wifi mesh without a wifi extend that is called ‘fixed mesh’.

Since you moving around with devices that their OS have not solved the issue of grouping preferred networks and picking the strongest signal (which most systems do this by default and even some Android OS versions do this or have this capability). You would need to do the first example of setting the mesh up wireless then plug the lan in to switch the backhaul of the nodes as the controls to switch back haul to wired is automatic and most don’t have access to the extend wifi functions with local back haul in their setup screens. This is baked in wrt/linksys style operating systems to switch the back haul this way.

Otherwise, you create network loops that do unpredictable things you are experiencing.

wifi mesh is just a system that has more than one access point with the same SID and security. The wifi extend part is another layer of it that most of the time the cross traffic between access points are back hauls and things that move position. This was later added to the technology and is called “easy mesh” by the wifi alliance. So, you need to use your wifi mesh system as a “easy mesh” instead of the legacy ‘fixed mesh’ system that does not use wifi extend.

I’m using 3 LN1301 (aka MX4300) and configuring them in Linksys’ own WUI to be mesh. The option is hidden and unsupported on the LN1301 branded model, but still there. They show up in the Linksys app as mesh and will show either wireless backhaul or wired. I set them up to be wired and did not configure them to be wireless, but they seem to occasionally (and randomly) drop to a wireless backhaul anyway. I can’t find any information as to whether they use easy mesh or fixed mesh. It would seem (from practical usage) that they are easy mesh since my devices switch over between routers fine.

My older system is a 2 node Netgear MR60 which are configured out of the box to be a 2 node mesh. They claim to be easy mesh compatible. They automatically go to a wired backhaul when they see one available.

Before, i was using a DIY ‘fixed mesh’ system with two completely different brands of AC routers setup as access points with the same SSID and key but otherwise not talking to each other and had the ‘clinging’ issue.

I’m getting martian sources with both the linksys and netgear, but it seems to be happening far more often with the LN1301. This might be because of the LN1301’s crummy firmware or it might be because it’s 3 nodes instead of the netgear’s two.

The other really odd thing here is that at least two of the MAC addresses I’ve seen show up with martian sources are wired over ethernet and don’t even use the wifi at all. One of them was an Xbox One that wouldn’t even be connecting to anything else on the network using Wifi unless someone was streaming a game (which no one has been).

This suggests that these are losses of connection to the Green network
I have used ASIX or even Realtek USB Ethernet dongles for a long time with connection loss problems and this forum also lists some (do an ASIX or RTL search to see)
The Martian addresses 169.254.xx.xx are automatically assigned by Microsoft devices when they do not obtain a DHCP address
The network abandonment is done before.
You may need to look for other errors upstream of these errors in the /var/log/messages file

Check this side and the connections upstream of your switch, maybe just a faulty cable. Or the switch itself

1 Like

So to summarize what you need to setup is an easy mesh system.

But it is set up a certain way when its used as an access point. Because the device’s NAT and dhcp modules have bugs in their os that malfunction when it is placed on a network.

I would find the instructions to set the mesh system up manually instead of using an app so you understand how it all links together. But they are going to be similar just that one unit is the master and the rest are slave units regardless of brand. All have the same domain name and if not auto generated, different host names. WAN disabled, dhcp disable. All nodes are static assigned a green address with gateway and dns as the green ipfire interface. Back haul plugged in one by one on a LAN port on all nodes AFTER setting up the mesh by wireless.

169.254 is a self assigned when a dhcp server is not found (BTW, its not Os specific and Microsoft has nothing to do with anyone’s networking). What it is, they are trying to use the mesh system dhcp server that crashes when the unit is placed on a dhcp network which is a known issue with wireless OS’ mesh dhcp program and I explained the only solution to it.

The WUI is the only way to set up mesh and it just has two boxes that say “add wired node” and “add wireless node”, no other options. If I flash it to dd-wrt or openwrt and use that instead, would that possibly fix this issue or is it an issue pervasive in the underlying hardware itself?

The issue exists in the all other OS as well. The wired setup (if its there) doesn’t work correctly. So you should log into the router, turn off wan, DHCP and set up wireless connect to the node, then connect the node to the wire backhaul. It will switch automatically to wired backhaul after a few seconds. The unit must be on and connected to the wireless mesh before connecting it to the lan.

I can’t get the LN1301 to set up the way you suggested. In initial set-up, I have to set it to ‘Bridging Mode’ to disable the firewall and DHCP, etc. This greys out all other options for DHCP, DNS, gateway, etc.

Then, I can’t pair the nodes at all without wiring them directly to the main. I tried adding them as wireless nodes without connecting them physically and it didn’t do anything at all.

In the process of doing this, though, I had factory reset them all and paired them again. Since then, they are behaving differently, oddly enough. The linksys app had a lot of issues before thinking it wasn’t connected to the same network, but now it seems to be working fine and populating very quickly compared to before. Additionally, pointing my browser to the IP of a sub-node doesn’t open a WUI for that node anymore, it redirects me to the main unit instead.

I’ll leave it up and see if my problem comes back. I may need to actually drop money on a proper mesh system instead of trying to hodgepodge something together if this doesn’t work.

If that falls apart, it should be easy to do it in the web gui.
I just have not found a screen shot of the network tab,

That might be what I’d see without bridge mode active, but with bridging mode enabled, it goes to this:

And I want Bridge Mode active since I’m using IP Fire for the firewall, DHCP, DNS, and routing.

For now, I don’t see cascading martian source errors, just one set of them from my Xbox One since I did the factory reset. I’m hoping that the buggy firmware may have just had an issue before and doing the full reset may have resolved it.

It should be fine then because once the static entry is put in the router its going to keep it. Because there is a bug in their firmware because the 1st one in the chain has to get a static so the others link properly and when it does you will be able to use the dns name that goes back to the 1st one which it sounds like what you ended up with. On their OS, they suppose to show the static address of the first node here on the network screen but it doesn’t. I don’t think it has an ssh server to log into its version of Linux/BSD to actually look at how the net is set up. But working backwards seemed to solve the issue since the static address doesn’t go away when resetting it. It supposed to automatically reserve the ip address that is on it but can run into problems if the ip is already used or occupied in the arp table. Which can happen if you use the default ip scheme and reset it. Because of their programming flaws in the linksys unit.