gallery.badmanners.xyz/src/data/blog/ssh-all-the-way-down.mdx
2024-12-05 21:43:04 -03:00

284 lines
24 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: SSH all the way down!
pubDate: 2024-09-22
isAgeRestricted: false
authors: bad-manners
thumbnail: /src/assets/thumbnails/blog/ssh_all_the_way_down.png
description: |
A long investigation on how reverse port forwarding works in SSH; for fun at first, and then, fully embracing it.
next: supercharged-ssh-apps-on-sish
posts:
mastodon: https://meow.social/@BadManners/113182634117109306
tags:
- technical post
- programming
---
import { Image } from "astro:assets";
import TocMdx from "@components/TocMdx.astro";
import imageServeoWebpage from "@assets/images/ssh/serveo_webpage.png";
import imageServeoPlainData from "@assets/images/ssh/serveo_plain_data.png";
import imageSishPublic from "@assets/images/ssh/sish_public.png";
import imageVpsArchitectureTraefik from "@assets/images/ssh/vps_architecture_traefik.png";
import imageVpsArchitectureSish from "@assets/images/ssh/vps_architecture_sish.png";
This is my first technical post! It's a long one, for sure, but if you find the subject interesting, I hope you enjoy the read. Comments are more than welcome!
<TocMdx headings={getHeadings()} />
## A journey of self-host discovery
So I've been getting into self-hosting some of my content, as evidenced by this gallery itself. I've never been much of a DevOps/infrastructure guy, but a full-stack developer. Still, I figured: why not try my hand at it? I have a bunch of stuff that I wanted to make available on the web:
- [A Git service](https://forgejo.org/), to self-host my adult-oriented source code without worries about other sites' ToS.
- [An image board](https://github.com/shish/shimmie2/), to have a single place from where I can share the artwork that I got from others.
- [A privacy-first search engine](https://github.com/searxng/searxng), since the instance that I normally use often crashes or gets rate-limited.
And more. Plus, I had a [Raspberry Pi](https://www.raspberrypi.com/products/raspberry-pi-3-model-b/) lying around, and it was already online all of the time in order to run a [Discord bot](https://discordpy.readthedocs.io/en/stable/). So how hard could it be?
Well... almost impossible, actually. Never mind that the Raspberry Pi struggled running all of these services at once, that's the least of my issues. No, the **REAL** problem is that, even if I have a better computer plugged into my router 24/7, I can't self-host my websites.
And I don't mean buying a [VPS](https://en.wikipedia.org/wiki/Virtual_private_server). I mean **actually** self-hosting Internet services from my home.
If you've had any experience trying to do the same, you are probably wondering if the issue is [Network Address Translation](https://en.wikipedia.org/wiki/Network_address_translation) in which case, you'd be correct!
If you're unfamiliar with NAT, in short, it means that the address that I access the Internet with doesn't match my router's address. So if I try and expose a server through [port forwarding](https://en.wikipedia.org/wiki/Port_forwarding) on my router, there's no way for any services to listen to it.
Well, bummer. What can we do about it?
One solution is to only use [IPv6](https://en.wikipedia.org/wiki/IPv6). Unfortunately, I have this pesky requirement where I want _anyone_ to access my site. I like IPv6, but it's not [as widespread as it should be](https://www.google.com/intl/en/ipv6/statistics.html#tab=ipv6-adoption).
Another solution provided that you're still committed to running your services in your home server is to use a proxy server. Instead of our IP, we'll use someone else's IP, which is already exposed to the Internet without any address translation. Then any traffic to our websites gets redirected through a tunnel, and our server transparently responds to them as they come through the network.
That's where something like a VPN really excels at. In fact, I've used Mullvad VPN in the past partially for that reason, while they still had port forwarding from the local computer to the Internet.
And I say _had_, because Mullvad [removed support for port forwarding](https://mullvad.net/en/blog/removing-the-support-for-forwarded-ports), alleging legal reasons for the decision.
Crap. And I guess Mullvad lost a customer. Back to square one...
To their credit, VPNs are excellent for this kind of setup. It's common practice to share data channels between servers through a secure tunnel, over a virtual network. But I personally have three issues with public VPNs (I'll leave the third one as a secret for later; see if you can guess it as a challenge):
1. You generally have to pay for them, or set them up properly, where any mistake can break your security.
2. It requires installing software I'm unfamiliar with, so there's a good chance I'll break stuff irrecoverably.
3. ???
That's not what I want. Half of the reason I took on this challenge of self-hosting stuff was for fun, to see what I could pull off with minimal effort. Plus, I want results in an afternoon, not over a week. What else can I do...?
## How may I Serveo you?
Some options that I've immediately found were [ngrok](https://ngrok.com/) and [Cloudflare Tunnel](https://www.cloudflare.com/products/tunnel/), both of which are API gateways. In short, they provide you with a tool to expose local servers to the Internet, by connecting to their service through a private tunnel. It's not so different from a VPN, except that it requires using their specific application.
I was a bit on the fence, since ngrok seemed to be a paid service, and Cloudflare requires full control over your [domain](https://en.wikipedia.org/wiki/Domain_Name_System). But it was interesting to know that things like this already existed. And in my research on Ngrok specifically, that's how I ended up learning about [Serveo](https://serveo.net/).
Chances are that when you click the link to Serveo in the last paragraph, it won't work. Serveo seems to get DDoS'd all the time. So here's how the page normally looks like:
<figure>
<Image
src={imageServeoWebpage}
alt="A screenshot of Serveo's homepage, with the tagline 'Expose local servers to the internet; No installation, no signup' and a simple SSH reverse port forwarding command that it tells you to copy and paste into your terminal."
/>
<figcaption>How the Serveo webpage looks like when it's up. © Trevor Dixon</figcaption>
</figure>
To explain the gist of it, unlike ngrok, you don't pay any fees or install any special tools to use their proxy service. Instead, all you need is an SSH client.
Wait, really?!
Yes really. As it turns out, the secure shell protocol supports what is called a [reverse SSH tunnel](https://goteleport.com/blog/ssh-tunneling-explained/) (definitely read this article if you wanna learn more). In fact, SSH can do much more than simply create a secure session to display a shell from a remote machine it's loaded with functionalities that you might not expect it to have!
I was familiar with SSH already, or at least its common use cases. Creating keys, adding them to a server for authentication, [pushing Git commits](https://docs.github.com/en/authentication/connecting-to-github-with-ssh), and so on. It's always been a part of my dev career, and arguably, it's one of the fundamental skills you invariably pick up on.
And I'd heard of port forwarding through SSH, but never gave it much thought. But is it really that easy to set up?
Well, I gave it a go, trying out the simple single-line command that the Serveo website told me to follow:
<figure>
{/* prettier-ignore-start */}
```sh
ssh -R gitbadmanners:80:localhost:3000 serveo.net
```
{/* prettier-ignore-end */}
<figcaption>A very basic reverse SSH tunnel command for Serveo.</figcaption>
</figure>
And, to my delight, it worked! No extra configuration, no tool installation, no questions asked my RPi server was live on gitbadmanners.serveo.net, and I could access it from any device connected to the Internet!
With a bit more fiddling, I even got my custom domain to work. Pointing the CNAME record to their service, adding a special TXT with my key's fingerprint, and adjusting the command slightly, I got both HTTP and SSH (with a caveat\*) to work, too.
<figure>
{/* prettier-ignore-start */}
```sh
ssh -R git.badmanners.xyz:80:localhost:3000 -R git.badmanners.xyz:22:localhost:22 serveo.net
```
{/* prettier-ignore-end */}
<figcaption>Using multiple port forwardings with a custom domain in Serveo.</figcaption>
</figure>
So that's pretty cool, huh? It even supports HTTPS! This feature is specific to Serveo (and most reverse proxy solutions). But to any developer, having HTTPS is not merely a huge plus; [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) support is kind of expected out of the box for anything Internet-facing nowadays.
In fact, there's nothing unique about Serveo doing what it does. [localhost.run](https://localhost.run/) works very similarly, by also leveraging an SSH connection as a way to both authenticate the server and pass incoming data through a secure tunnel.
But how is SSH able to do all of this? It might be worth it to go over the fundamentals.
## A SSHort-ish primer
I'll be using the terms OpenSSH and SSH interchangeably, although the former is an implementation of the latter the most common implementation, in fact. I'll also make other assumptions and simplifications throughout, so feel free to check the links if you wanna learn more.
When two computers need to communicate over a network, they do so over a connection. That connection happens through sockets, an abstraction at the level of the operating system over finer networking details (such as [addressing and sending electrical bits](https://en.wikipedia.org/wiki/OSI_model)).
For any application, sending and reading data through a socket is [not so different from interacting with files](https://en.wikipedia.org/wiki/Berkeley_sockets). Whether you connect to the Internet to play a game, [access webmail](https://en.wikipedia.org/wiki/Webmail), [access non-web mail](https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol), and so on, your computer is connecting on a network socket (usually [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol)) and sending data to your router, before it traverses the electricity-powered spaghetti that we call the Internet.
The main aspects distinguishing each application on the Internet is what kind of protocol is used for communication, as well as the code running on the server and your machine (usually called "client"). With a previously-agreed [API](https://en.wikipedia.org/wiki/API), both parties can communicate, usually with the server being the authority over the client and that model is how [most of the modern web has been built](https://en.wikipedia.org/wiki/Client%E2%80%93server_model).
One such application that runs over the web is SSH, short for [Secure Shell Protocol](https://en.wikipedia.org/wiki/Secure_Shell). It's a cryptographic-based protocol, designed to replace less secure shell protocols from its time, by making sure that all traffic between a server and a client is end-to-end encrypted.
That's a lot of terms in a single sentence, so I'll explain them one by one.
Cryptography is a way to secure communication between two parties, or in our case, two computers over a network. There are mainly two reasons to encrypt (i.e. secure) this communication:
1. To prevent others from spying on the transmitted data. Without a secure channel, things like your password or credit card data would be passed in plaintext, for anybody on the way of your data to see and steal!
2. To prevent others from tampering with the transmitted data. For example, your [ISP](https://en.wikipedia.org/wiki/Internet_service_provider) might decide to inject ads on the pages you access ([that's a real thing!](https://superuser.com/questions/902635/isp-is-inserting-ads-into-web-pages)), or modify/censor the pages you access as it sees fit, without you even realizing that it's happening.
Whenever you access a link that starts with [`https://`](https://en.wikipedia.org/wiki/HTTPS), you're using the secure version of HTTP. While it doesn't mean that you're protected from anything a malicious actor might do (or even that the other side necessarily is who they claim to be), it ensures that your traffic will be protected in the two ways mentioned above. When only the two ends (i.e. the client and the server) can understand the data, we say that the channel is [end-to-end encrypted](https://en.wikipedia.org/wiki/End-to-end_encryption).
Now, what's a [shell](https://en.wikipedia.org/wiki/Unix_shell)? It's a kind of application that comes with every operating system, allowing you to interact with the machine on a high level through text commands, rather than a [graphical interface](https://en.wikipedia.org/wiki/Graphical_user_interface). It requires a [terminal](https://en.wikipedia.org/wiki/Terminal_emulator), like those you see in hacker stock footage, and isn't much different from running any other program, except that all communication is done with individual characters and [escape codes](https://en.wikipedia.org/wiki/Escape_sequence).
Typically, a shell runs on the same machine that it's on, but with SSH or [telnet](https://en.wikipedia.org/wiki/Telnet), you can access a remote shell that is, a shell on a remote server.
In order to keep its security guarantees, SSH handles authentication of users (which usually map to users in the operating system) with methods such as a password, keyboard prompts, or most importantly for us, [public keys](https://en.wikipedia.org/wiki/Public-key_cryptography). I won't go over public key cryptography in this post, but suffice it to say that it's a special file that proves that you're really the one accessing the system. For accessing a remote shell, it is pretty useful, and more secure than a simple password.
And that's the way I learned how to use SSH, just a way to use my private key to access a remote server or [transfer files](https://en.wikipedia.org/wiki/Secure_copy_protocol) without giving it much thought. But as we've seen with Serveo, SSH can do more. Much more. For instance, it is a viable way of exposing services to the wider Internet, as long as somebody else is willing to proxy the connection for us.
## But is it really secure?
So we've seen that reverse SSH forwarding solves two of the problems I had before. I don't need any fancy setup to expose a service, and with Serveo, it's all painless and free. But it still comes at a cost, mainly (and here's the reveal of the third issue that I also had with other solutions):
3. All traffic is passed unencrypted through the proxy.
And, as it turns out, a service like Serveo runs into that same issue.
But wait you say , I thought all traffic in both HTTPS and SSH was end-to-end encrypted!
And it is, hypothetical reader. But we have to consider which _ends_ are encrypted.
<figure>
<Image
src={imageServeoPlainData}
alt="A diagram that shows the Raspberry Pi connecting to Serveo via SSH, and another computer connecting to Serveo via HTTPS. Inside of Serveo, these two parts connect together as plain data."
/>
<figcaption>
From our service (RPi) to the proxy, an SSH tunnel encrypts the session. From the other client to the proxy, HTTPS
secures the connection. But what about the middle bit?
</figcaption>
</figure>
It makes sense, if you think about it. Our [hole punching](<https://en.wikipedia.org/wiki/Hole_punching_(networking)>) solution doesn't do anything special, and all handling of HTTPS traffic is delegated to the proxy server. That means that the proxy server also handles encrypting and decrypting any messages from HTTP or any other TCP connection, thus making us lose the guarantees that the data won't be inspected or modified in some way. This means that any data like passwords **shouldn't** be sent over our tunnel, as they could be stolen either by Serveo or a hacker who has taken over the service. I knew this going in, so I didn't make the mistake of ever authenticating over the exposed service and you shouldn't either!
The fix for this is obvious, as much as it pains me to admit it: I'd have to buy a VPS and host my own solution.
At least this way, I can guarantee that I'm the only one able to interact with the transmitted data, be it encrypted or plain. And before you say "that's still insecure", consider that it's literally what happens with any web server. Plus, even with encryption, [HTTPS doesn't mitigate malice](https://security.stackexchange.com/questions/66355/can-an-https-site-be-malicious-or-unsafe). But in this case, I'm hosting stuff for myself, and I can trust that guy for sure.
But then, I wonder if there's even an open-source version of Serveo that I can use...
## sish happens
It was easy enough to stumble upon [sish](https://github.com/antoniomika/sish) through a search engine. It is essentially an open-source version of Serveo (or ngrok, or localhost.run), complete with [a range of configurations](https://docs.ssi.sh/). And as you can see from the image that they host on their website, they offer exactly the same interface:
<figure>
<Image
src={imageSishPublic}
alt="Diagram entitled 'sish public', showing that Eric's machine with a service exposed on localhost port 3000 connects to sish via the command (ssh -R eric:80:localhost:3000 tuns.sh). This creates a bi-directional tunnel and exposes https://eric.tuns.sh to the Internet, which Tony accesses from a separate device."
/>
<figcaption>Even the command is the same: a simple SSH reverse port forwarding! © Antonio Mika</figcaption>
</figure>
After buying a new domain and VPS (I like Namecheap and Hetzner for those, respectively), and setting up an instance of sish through [Docker Compose](<https://en.wikipedia.org/wiki/Containerization_(computing)>) (which simplifies the deployment process a lot), I migrated all my Raspberry Pi services to that. Of course, since "migrating" is just changing the SSH command to point to a different URL, as well as updating some DNS entries, it was a pretty simple process!
Now I have all my services proxied through the sish instance, which can handle HTTPS for them same as Serveo before. And I can guarantee that the server won't go down sporadically, unless I turn off the virtual machine myself (or it crashes), so that's another bonus.
But my RPi still isn't handling the load of so many services at once. Hm. Maybe now that I have a VPS, it's worth offloading some of the services to run on that, instead...
## Less self-hosting is more hosting
So I have a bit of a weird setup. I use [Fastmail](https://www.fastmail.com/) to manage my domain. It lets me manage my e-mail addresses, but it means that I have to set up DNS records manually, for things hosted outside of Fastmail like [this gallery](https://gallery.badmanners.xyz) and [my personal site](https://badmanners.xyz). That's fine. I just use [CNAME and ANAME](https://en.wikipedia.org/wiki/CNAME_record#ANAME_record) to point those to my [NearlyFreeSpeech.NET](https://www.nearlyfreespeech.net/) websites.
Then there are also services like [my Forgejo instance](https://git.badmanners.xyz) that I mentioned earlier, which was running on my RPi and being exposed through sish. But it often causes the mini computer to struggle, whenever it processes too much data or too many requests at once. In this case, it made sense to also move Forgejo to the VPS as well.
And so, I started looking at how to set up some kind of proxy server in front of sish. I'd heard of Traefik and Caddy, two [reverse proxies](https://en.wikipedia.org/wiki/Reverse_proxy) that do some of the annoying stuff like managing TLS for you. Still, I couldn't find a trivial way to make them work well with sish. Aside from that, despite what I'd been led to believe, they still require a non-minimal amount of effort to set up and maintain and I'm still intent on being maximally and efficiently lazy for this whole project.
I even started making a diagram, while trying to explain the architecture that I had in mind to a friend:
<figure>
<Image
src={imageVpsArchitectureTraefik}
alt="Diagram showing a VPS host with SSH, HTTP(S), and TCP exposed to the outside world by Traefik, as it internally connects through reverse proxy to both git.badmanners.xyz and the sish instance. A Raspberry Pi serving booru.badmanners.xyz connects via SSH, while another computer sends an HTTP request for any service. There is a tangled mess of wires as Traefik is supposed to handle all of these different parts."
/>
<figcaption>
Recreation of the diagram. My main issue was that I couldn't figure out how to expose sish's functionality through
Traefik...
</figcaption>
</figure>
I wasn't really seeing a way to make this work... But hmm. What Traefik is doing isn't so different from what sish is doing. They are both reverse proxies, in a way, although Traefik is a traditional one, while sish does it in a roundabout way with SSH.
So then I tweaked the diagram a bit...
<figure>
<Image
src={imageVpsArchitectureSish}
alt="Diagram showing a VPS host with SSH, HTTP(S), and TCP exposed to the outside world by sish, as it is internally connected through by git.badmanners.xyz via SSH. A Raspberry Pi serving booru.badmanners.xyz connects via SSH as well, while another computer sends an HTTP request for any service. There is no internal wiring, since all services connect through SSH and sish handles any reverse proxying within itself."
/>
<figcaption>
Updated version of the diagram. When I realized that everything even internal services could connect through
SSH, it finally clicked for me!
</figcaption>
</figure>
Aha! It turns out that exposing services through sish is always the same, whether it's running on the same machine or halfway across the globe. We just need to set our credentials and start a [permanent shell session](https://github.com/Autossh/autossh) that does reverse port forwarding for us. With Docker Compose, that's both trivial and safe: no ports get exposed outside of the container network. And sish even supports advanced features, such as multiple domains and [load balancing](https://docs.ssi.sh/advanced#load-balancing). It's an SSH-powered reverse proxy!
With everything configured just as before, and the appropriate private keys created to authenticate each service, we are able to finally expose multiple services from different sources with a single reverse proxy server. It's all transparent, too, despite how we're using several distinct domains and services, and despite how some of these things are running at my home instead of the VPS. Never mind that I ended pretty much back where I started, it all just works!
It's SSH all the way down, baby!
All in all, this project made me realize just how powerful SSH can be. Reverse port forwarding in particular is a great way to expose local servers remotely, quickly and easily, as long as you bind a socket for HTTP.
...Although, you don't even _need_ to expose a socket on your machine to do this!
In the next post, I'll go over the actual nitty gritty details of how SSH does this and some code I wrote. I'll also show how I've deployed an SSH-proxied, HTTP-based game without any need for binding a TCP socket.
## \* A caveat about recurSSHive forwarding
I mentioned way earlier that Serveo (and sish) are able to forward SSH traffic, even through the same port where you bind services. This is what SSH internally refers to as a proxy, i.e. the sish instance serves as a proxy to our Git service.
In order to be able to direct traffic to our underlying service, we must tell our client to expect the sish instance to reply to us first. With the `ProxyJump` option, standard input and output on the client will be forwarded over the secure channel to establish TCP forwarding.
The details aren't completely clear even for me, but we don't need to care about the details. If we just want to connect to our Git server for pushing, we'd simply set the following contents in our SSH config:
<figure>
{/* prettier-ignore-start */}
```ssh-config
Host git.badmanners.xyz
ProxyJump my.sish.instance
```
{/* prettier-ignore-end */}
<figcaption>Two lines will make the necessary proxy jump transparent.</figcaption>
</figure>
Alternatively, if we were exposing a real SSH server on `ssh.badmanners.xyz`, we might use the following command:
<figure>
{/* prettier-ignore-start */}
```sh
ssh -J my.sish.instance ssh.badmanners.xyz
```
{/* prettier-ignore-end */}
<figcaption>We can be explicit with the `-J` flag.</figcaption>
</figure>