Jump to content

TOT CONSTANTLY terminating my ssh TCP sessions: anyone else experiencing this?


lsemprini

Recommended Posts

I have a TOT Fiber connection in Pai that is otherwise very reliable and fast (600Mbps down and up speedtest in Thailand, consistently, HTTPS traffic never has problems)

 

But whenever I try to ssh to my server in the US, it appears that TOT is CONSTANTLY killing my ssh TCP connection at random intervals, regardless of whether the ssh TCP session is idle or busy transferring data.

 

Sometimes ssh windows/connections can survive for 48 hours unmolested, sometimes minutes, and sometimes they are killed seconds after connection.   

 

There is ZERO correlation between the killing and whether or not I am (or my household is) currently using a lot of bandwidth.  And also ZERO correlation between the killing and whether or not my computer (or household) currently has a lot of TCP connections open. 

 

This appears to be specific to TOT: I used to have a second CAT connection (as recently as late 2022) and CAT didn't cut my ssh sessions (sadly CAT is no longer available at my new house).  And I am using the same local computer to test, same ssh software, same remote server.

 

And it appears to be specific to ssh protocol: only my ssh, scp, and rsyncs get killed.  When I download large files with HTTP/HTTPS I do not see random kills.

 

I have tried VPN just to see if that might jar the problem away, no dice.

 

Perhaps it is some kind of ill-advised effort by TOT to kill a certain kind of traffic that is spilling over to ssh traffic?

 

Anyone else seeing this?  Did you find a solution?

 

Link to comment
Share on other sites

56 minutes ago, lsemprini said:

I have tried VPN just to see if that might jar the problem away, no dice.

 

Well they wouldn't see any ssh connections if it's going over a VPN as they would be hidden in whatever tunnel the VPN is using.

 

So if this happens while using the VPN it suggests to me that the cause is something else not directly related to ssh - at least not on your end.

 

I had a TOT connection many years ago and it used to disconnect/reconnect as well but I think it was more to do with the DCHP lease time between the router and the ISP itself but I didn't bother to dig deeper into it.

 

Edited by ukrules
Link to comment
Share on other sites

20 minutes ago, ukrules said:

 

Well they wouldn't see any ssh connections if it's going over a VPN as they would be hidden in whatever tunnel the VPN is using.

 

So if this happens while using the VPN it suggests to me that the cause is something else not directly related to ssh - at least not on your end.

 

I had a TOT connection many years ago and it used to disconnect/reconnect as well but I think it was more to do with the DCHP lease time between the router and the ISP itself but I didn't bother to dig deeper into it.

 

 

Yes good point -- whatever TOT is doing is not specific to the ssh port or ssh traffic bytes, but apparently still specific to the general pattern of when and how much data is sent/received.  I remember the Awful Firewall of China long ago used heuristics based on packet size/timing to try to thwart VPN.  Wonder if TOT is trying to do the same for some reason.

 

I don't think it's related to DHCP lease time (either between me and router or router and ISP) because when the ssh gets killed (which it sometimes does 10 times in 1 minute), no other long-lived TCP connection is affected in any way.  And also because the frequency of drops is so random (sometimes hours go by between drops, sometimes seconds) and one would expect a DHCP lease problem to happen at regular intervals.

 

 

 

Link to comment
Share on other sites

40 minutes ago, ozimoron said:

I have a 3BB fiber connection and I can leave my session open for days.

 

If you have a mac or linux, put this in your .ssh/config file

 

Host *
 ServerAliveInterval 10


Great idea, but sadly no dice.  I added the ServerAliveInterval 10 ~/.ssh/config and still seeing connections cut.  I also added ServerAliveCountMax 9999999 just to make sure lack of ServerAlive responses isn't causing the client to disconnect, and also "TCPKeepAlive no" for same thing at TCP level (see man ssh_config), but same behavior.  That was on cygwin ssh command-line.

 

And on my other client (Windows putty) I did the same thing: set 10 second keepalive (there is no ServerAliveCountMax setting sadly) and no TCP keepalives.  And shell was still killed just as randomly.

 

Link to comment
Share on other sites

I think the difference with the other kind of traffic is the single socket vs. multi-stream.

 

TOT might have a flaky router somewhere.

 

You could try running pingplotter, maybe you get lucky and will see some router losing more packets than it should

Link to comment
Share on other sites

20 minutes ago, ozimoron said:

Can you try with a different ISP, like by using a hotspot? Can you use ethernet to connect to the router?

 

I already established that using the CAT ISP does not show this problem.  I suspect it would be the same with my AIS 4G connection via hotspot, but sadly I can't use that all the time due to limited speed and data cap.  I could test temporarily but not sure if that adds much new data.

 

I already normally connect my computer to the TOT router using gigabit ethernet (I get full 600Mbps up/down to Thai speedtest server), so WiFi is not the source of the problem (no WiFi in use anywhere).

 

Any other ideas?

 

 

Link to comment
Share on other sites

21 minutes ago, tgw said:

I think the difference with the other kind of traffic is the single socket vs. multi-stream.

 

TOT might have a flaky router somewhere.

 

You could try running pingplotter, maybe you get lucky and will see some router losing more packets than it should

 

Interesting idea, but in the case of a flaky router we would expect to see all TCP connections affected the same, right?

 

I am able to keep big HTTP downloads (even those running in a command-line wget process with a single HTTPS connection, as opposed to a browser that might have a pool of open TCP connections to a given server) flowing with no problem even when ssh shells are killed left and right during the same download.

 

 

 

 

Link to comment
Share on other sites

7 minutes ago, lsemprini said:

 

Interesting idea, but in the case of a flaky router we would expect to see all TCP connections affected the same, right?

 

I am able to keep big HTTP downloads (even those running in a command-line wget process with a single HTTPS connection, as opposed to a browser that might have a pool of open TCP connections to a given server) flowing with no problem even when ssh shells are killed left and right during the same download.

 

It's not a fix but try using screen. You could then reconnect to the same session if the pipe breaks.

 

Also have you looked at the server logs for clues? Check free and top to make sure you're not running out of resources on the server.

Edited by ozimoron
Link to comment
Share on other sites

18 minutes ago, ozimoron said:

 

It's not a fix but try using screen. You could then reconnect to the same session if the pipe breaks.

 

Also have you looked at the server logs for clues?

 

I did try screen for some operations, but it doesn't work for all the things I need to do (keep a long rsync from my house computer for off-site backups running, for example).  And of course screen messes up anything that uses fancy terminal/curses stuff.

 

I don't think it's a server issue, because I never had the problem from my old CAT connection, and the admins of my shared hosting server did look at logs at one point and simply saw the sshd disconnect without any particular error (i.e. it looked like a normal client disconnect to the server).  They verified that there is no process killing going on on the server side (which they would only do for out-of-memory, but not even that is happening).

 

 

Edited by lsemprini
add out of memory note
Link to comment
Share on other sites

14 minutes ago, lsemprini said:

 

Interesting idea, but in the case of a flaky router we would expect to see all TCP connections affected the same, right?

 

I am able to keep big HTTP downloads (even those running in a command-line wget process with a single HTTPS connection, as opposed to a browser that might have a pool of open TCP connections to a given server) flowing with no problem even when ssh shells are killed left and right during the same download.

 

and how do you know the downloads use a single HTTPS stream ?

HTTPS is not at the same level as SSH, HTTPS is application-managed, while SSH is transport layer.

Even wget doesn't support multiple sockets itself, the routers between your machine and the file source would.

Edited by tgw
Link to comment
Share on other sites

2 minutes ago, DudleySquat said:

Check for updates on your computer, especially the network card.  

 

While you are doing that, switch out the ethernet cable. Always check Layer 1 first.

 

Well I can do that, but as I mentioned, all other network access continues smoothly (even long HTTP downloads continue) while the ssh is killed left and right, often many times per second.  So it seems unlikely to be at layer 1 since it uniquely affects the ssh traffic.

 

Link to comment
Share on other sites

43 minutes ago, lsemprini said:

 

Well I can do that, but as I mentioned, all other network access continues smoothly (even long HTTP downloads continue) while the ssh is killed left and right, often many times per second.  So it seems unlikely to be at layer 1 since it uniquely affects the ssh traffic.

 

 

Also, if this is Windows, check the network card's settings. Turn off power saving and anything else that would put the card to sleep.  

 

Link to comment
Share on other sites

Another datapoint: I had a wired TP-Link Archer AC1200 router between my computer and the ISP's ZTE F612 modem/ONU (but everything gigabit ethernet, no WiFi).  I temporarily eliminated the AC1200 and connected direct to the ZTE, same problem. 

 

 

Link to comment
Share on other sites

Another datapoint: I tried turning off my Avast antivirus (because I noticed the horrible thing was intercepting all https traffic via a hack driver and substituting its own certificate so it could spy on my traffic, so who knows what else it was doing to other outgoing TCP connections).  Initially that seemed to make a difference, but psyche! the drops were soon back.  So no go there either.

 

 

Link to comment
Share on other sites

2 minutes ago, lsemprini said:

Another datapoint: I tried turning off my Avast antivirus (because I noticed the horrible thing was intercepting all https traffic via a hack driver and substituting its own certificate so it could spy on my traffic, so who knows what else it was doing to other outgoing TCP connections).  Initially that seemed to make a difference, but psyche! the drops were soon back.  So no go there either.

 

Avast was good until it wasn't (many years ago already).

VPN was mentioned in this thread, the only information is that you tried it. Was it a proper VPN ?

 

One thing you might try to eliminate another potential source is to activate VPN at router level.

 

Link to comment
Share on other sites

1 minute ago, tgw said:

 

Avast was good until it wasn't (many years ago already).

VPN was mentioned in this thread, the only information is that you tried it. Was it a proper VPN ?

 

One thing you might try to eliminate another potential source is to activate VPN at router level.

 

 

Yes it was not a company VPN, but rather a VPN server running on a relative's external router at a home address in the US.  Normally when I am using that VPN server, I don't experience any new drops of long-running TCP connections (in other words, that VPN doesn't add any new drops that wouldn't also be there without the VPN).  But when I used that VPN recently, I was still seeing ssh drop just as often as explained in the OP.  So either the drops are from something as yet unidentified on my PC or in my TOT router, or the drops are from something in TOT's network that targets traffic based on packet size/timing (since the VPN traffic no longer looks like ssh traffic, but still has the same general pattern of packets over time).

 

Another thing I noticed recently was that the drops happen a heck of a lot more during the daytime and especially early evening, which tends to point more to TOT's network since that is the busy time, and not so much late at night.  But will collect more data there.

 

Apropos tgw's earlier theory, I suppose it is possible that TOT's network redirects all my HTTPS requests to some edge caching server which has good and proper connectivity with the rest of the internet, whereas TOT's network directs other traffic (like ssh) to some faulty network that drops TCP connections left and right.  I could try to set up some random server on the internet at a random TCP port to see if TOT drops those connections too.

 

 

Link to comment
Share on other sites

1 hour ago, timendres said:

SSH has a configuration for timing out connections that are idle for a period of time. This was constantly frustrating for me until I found the configuration and set it to never timeout. You can Google this, but here is a place to start:

 

https://stackoverflow.com/questions/4936807/how-to-set-ssh-timeout

 

Yup thanks, ozimoron suggested that above and I tried both the ssh-level and TCP-level options to disable timeout, but sadly that didn't fix it. 

In my case, the connection drop happens regardless of whether the connection is idle or actively sending data back and forth.

 

Link to comment
Share on other sites

20 minutes ago, Satcommlee said:

Are you sharing your public IP address with others? Common these days...

 

Yes good to check if they have put you behind a carrier grade nat (CGNAT)

it's easy to check if they have,  look at the WAN address reported at your router 

then check that against a "what is my ip" website

eg https://whatsmyip.com/ 

if the ip addresses are not the same then it's CGNAT and things can get weird.

Edited by johng
Link to comment
Share on other sites

3 hours ago, johng said:

 

Yes good to check if they have put you behind a carrier grade nat (CGNAT)

it's easy to check if they have,  look at the WAN address reported at your router 

then check that against a "what is my ip" website

eg https://whatsmyip.com/ 

if the ip addresses are not the same then it's CGNAT and things can get weird.

 

It would appear that I am not behind a CGNAT because my fiber ONU's external IP is the same as my whatismyip.com ip

 

Just curious, what weirdness does CGNAT cause that regular on-premisis NAT doesn't also cause?

 

On a related topic, my external TOT IP address changes, but only about once per week (I have been tracking it).  So that is DEFINITELY not the source of the hundreds of ssh drops per day that I often see.

 

Link to comment
Share on other sites

On 12/27/2023 at 7:06 PM, lsemprini said:

But whenever I try to ssh to my server in the US, it appears that TOT is CONSTANTLY killing my ssh TCP connection at random intervals, regardless of whether the ssh TCP session is idle or busy transferring data.

With ssh it takes two to tango, plus the internet connection. 
The ISP is just one place the problem could be occurring.  It could also originate with your ssh server (software or hardware) or your client (software or hardware).
Do you have a secondary, public ssh server that you can attach to and test your client connection?
And then you have to question the health of the server-side ssh connection as well.  Is all well on your server?
 

Link to comment
Share on other sites

21 hours ago, connda said:

With ssh it takes two to tango, plus the internet connection. 
The ISP is just one place the problem could be occurring.  It could also originate with your ssh server (software or hardware) or your client (software or hardware).
Do you have a secondary, public ssh server that you can attach to and test your client connection?
And then you have to question the health of the server-side ssh connection as well.  Is all well on your server?
 

 

All good questions.  I believe my server and client are both OK because as I mentioned in the OP, I never had problems when connecting to the same server from the same client (same laptop computer, software, setup) over a CAT connection.  Only when switching to a TOT connection do I get the endless drops.  My old house used to have 2 fiber connections so I was able to switch back and forth instantaneously and the pattern was clear.  And now in my new house where I have only a TOT connection, I am seeing the same drops that I saw with TOT in the old house.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.




×
×
  • Create New...