About 3 months ago we got stuck with “sticker shock” from Openvpn.net - with a huge price increase over what we paid last year.
So we did the research, and Outline really looked great! We did 3 weeks of internal testing with a server on AWS (m5.large) on about 10 machines, and felt confident enough to roll it out to our user base (about 60 people). Totally happy with the Outline API, btw, which let us confidently synchronize accounts against 2 servers in a round-robin DNS config for high availability (since retired, in desperation)
When we took it to production, we doubled our compute, going to a m5.xlarge instance size.
But what we’re experiencing is a dramatic drop in bandwidth when using Outline. like … dramatic.
Using speed test sites, I’m getting 1.5 Mbps when Outline is activated, compared to 300Mbps normally, and this is past peak hours on a Friday afternoon. The Amazon metrics show that the machine doesn’t even break 5% cpu – and the network isn’t saturated.
The other thing I’m noticing is a LARGE number of connections that seem to be left stagnating on the Outline servers:
$ netstat -tan |grep : |awk '{print $NF}' |sort |uniq -c |sort -rn
530 ESTABLISHED
164 TIME_WAIT
73 FIN_WAIT2
18 LISTEN
16 FIN_WAIT1
8 CLOSE_WAIT
Is this normal at all? Can anybody point me at a place to start digging to try and diagnose this?
Thanks for listening… yes, I feel better already…
I just ran two back to back tests. The Digital Ocean one is with Outline iOS Client talking to a stock Outline Server Was about the same as the direct connection. A bit faster actually. I reached 237 mbps download, 108 Mbps download.
What client did you use? What test did you run? Do you have a modified server?
OS settings may affect it too. For example, did you enable TCP BBR? We have that in the install script
Also, trying running on different ports. Perhaps your network is throttling you.
Thank you for your reply, and sorry for not giving more detail!
We are using a mix of Mac and Windows clients. On Mac we are running with 1.13.0 client. On Windows we are running 1.12.1 client. (Allegedly the 1.13.0 build has issues installing the PCAP driver, or so my IT folks tell me. I’m a Mac guy)
The test consists of going to internet speed tests, such as https://fast.com or https://speedtest.net
We are running a totally ‘stock’ server, as a docker containers -from these images:
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/outline/shadowbox stable 4d7319b73269 3 months ago 316MB
containrrr/watchtower latest e7dd50d07b86 6 months ago 14.7MB
The only OS tuning that I did was to try and cull the excessive timeout lingering connections, via:
$ echo 10 > /proc/sys/net/ipv4/tcp_fin_timeout
We are running on Amazon Linux 2:
Linux ip-10-0-0-106 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
TCP BBR is not enabled:
sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic
sysctl net.core.default_qdisc
net.core.default_qdisc = fq_codel
Since at this point I have two identical brethren boxes, I can enable BBR on one, switch the DNS to it, and switch over to it after hours to test it.
We are running on Amazon Linux, and have never had port issues before, but thinking may be switching to port TCP:1194 to look more like OpenVPN since that’s a standard.