• 2 Posts
  • 26 Comments
Joined 2 years ago
cake
Cake day: November 15th, 2023

help-circle

  • But the estimation is with each NC instance with half a CPU and 1GB of memory. This is a super conservative estimation, that doesn’t include anything besides a tiny Fargate deployment and Aurora instances.

    Edit: fargate ($40/month), the tiniest Aurora instances at 20% utilization and with merely 50GB storage ($120/month). Missing s3, which will easily cost $50 in storage and transfer (for only a few TB), ALBs and network traffic, especially outbound (easily $50-100 depending on volumes).

    This basic solution’s real cost is already between $150 and $300/month. I don’t know NC enough to understand volumes on DBs and all usage, but I assume that it’s going to be lots of data in and out (backups, media, etc.). —edit—

    For a heavily used NC instance (assuming a company offering it as a service), the cost is going to become massive pretty fast.

    Also, as I side note, if a company is offering NC as a service, but doesn’t manage a single piece of NC deployment… What is the company product? And most importantly, how are they going to make money when AWS is going to eat a linearly scalable chunk of their revenue forever?


  • Well yeah, wouldn’t break the bank, but a conservative cost estimate (without considering network costs, for example, quite relevant for a data intensive app) would bring this setup to about $40/month. That is about 5 times more expensive than a VPC with 4x the resources.

    OP said this is some sort of “enterprise self-hosting” solution, which I guess then kind of makes sense. For a company providing nextcloud as a service I would never vendor lock myself and let AWS take a huge chunk of my revenue forever, but I can imagine folks have different opinions.


  • In that case, Pulumi permissions are too broad IMHO for what it has to do, an enterprise should adhere to least privilege. Likewise, as I wrote in another comment, the egress security groups are unclear to me (why any traffic at all is needed?) and the image consumed should be pinned to a digest. Or better yet, should be coming from a private enterprise registry, ideally with an attestation that can be verified at runtime.

    I am not sure ECS Fargate makes sense vs an ec2 instance to run the workload. This setup alone will cost about $30/month assuming half a vCPU per replica with Fargate, plus about $12 for the memory (1GB/task). 2xt2.micro could be run for ~$20 without even considering reservation discounts etc. Obviously the gap will become even larger at scale, which I suppose might be very interesting for an enterprise.



  • Oh yeah, I am aware. Mostly here I would question the idea to have multi-AZ redundancy and using a manage service for DB (which indeed is expensive). All of this when a 5$ VPS could host the same (maybe still using s3 for storage) and accept the few hours downtime in the rare event your VPS explodes and you need to restore it from a backup.

    So from my PoV this is absolutely overkill but I concede that it depends a lot on the requirements. I can’t ever imagine having requirements so tight that need such infra to run (in fact, I think not even most businesses have these requirements, I have written on the topic at https://loudwhisper.me/blog/hating-clouds/) for my personal stuff…


  • Everyone is free to pick their poison, but I have to ask…why? What is the target audience here? This is a massively overkill architecture IMHO. Not to talk about the fact you now need 3 managed services (fargate, s3 and aurora at least) for a single self hosted tool, and that is being generous (not counting cloudwatch, ALBs, etc.).

    • Why do you need security groups to allow egress anywhere (or, at all)?
    • I would pin the image to a digest, rather than using latest.
    • what is the average monthly cost for this infra for you?


  • Comfort is the main reason, I suppose. If I mess up Wireguard config, even to debug the tunnel I need to go to the KVM console. It also means that if I go to a different place and I have to SSH into the box I can’t plug my Yubikey and SSH from there. It’s a rare occurrence, but still…

    Ultimately I do understand both point of view. The thing is, SSH bots pose no threats after the bare minimum hardening for SSH has been done. The resource consumption is negligible, so it has no real impact.

    To me the tradeoff is slight inconvenience vs slightly bigger attack surface (in case of CVEs). Ultimately everyone can decide which compromise is acceptable for them, but I would say that the choice is not really a big one.


  • Hey, the short answer is yes, you can.

    I would elaborate a little more:

    • First, you have the problem of sourcing the data. In essence, Crowdsec won’t be able to go and fetch those logs for you dynamically, but can go and take those logs from a file (you can do a dirty solution like a sidecar deployment) or from a stream. You can deploy crowdsec in multiple modes, and you can have many instances that talk to each other. You can also simply have some process tailing the pod logs and sending them to a file crowdsec has access to or serving them as a stream (see https://doc.crowdsec.net/docs/data_sources/intro).
    • The above means that it doesn’t really matter whether you run Crowdsec inside your cluster (it does have a Helm chart) or on the host. Ultimately all it matters is that crowdsec has access to your pods logs (for example, the logs of your ingress controller).
    • The next piece is the remediation component. What do you want crowdsec to do, once it is able to detect bad IPs? If you want to just add IPs to the firewall, then it might make more sense running it on the host(s) you use in ingress, if you want to add the IPs to network policies you can do it, but you need to develop your own remediation components. I am planning to write a remediation component that will add the IPs to Hetzner firewall, some other systems are already supported, but this would be a way to basically block the IPs outside your cluster. For nginx ingress controller there is already a pre-made remediation component .

    In practice I personally would choose a simple setup where the interesting logs are just forwarded (in Syslog format for example) to a single crowdsec instance. If you have ingress from a single node, I’d go for running it on the host and banning via firewall, if you have multiple ingress nodes, then I would run it inside the cluster and ban via a loadBalancer/cloud firewall/whatever you have in front.

    In essence, I would spend some time to think about your preferences, and it might take a little bit to make the setup clean, but I think you have plenty of flexibility to do what you prefer. Let me know if you want to bounce some more ideas!


  • Yeah I know (I mentioned it myself in the post), but realistically there is no much you can do besides upgrading. Unattended upgrades kick in once a day and you will install the security patches ASAP. There are also virtual patches (crowdsec has a virtual patch for that CVE), but they might not be very effective.

    I argue that VPN software is a smaller attack surface, but the problem still exists (CVEs) for everything you expose.






  • Also hypervisors get escape vulnerabilities every now and then. I would say that in a realistic scale of difficulty of escape, a good container (doesn’t matter if using Docker or something else) is a good security boundary.

    If this is not the case, I wonder what your scale extremes are.

    A good container has very little attack surface, since it can have almost no code or tools available, a read-only fs, no user privileges or capabilities whatsoever and possibly even a syscall filter. Sure, the kernel is the same but then the only alternative is to split that per application VMs-like) and you move the problem to hypervisors.

    In the context of this asked question, I think the gains from reducing the attack surface are completely outweighed from the loss in functionality and waste of resources.




  • Fair question. What I meant is that suggesting that would have made the whole post 10 lines long and not worth doing. So I avoided such suggestions that completely change the threat model.

    It’s not useless to avoid a good security posture (although you might have concerns of a monopoly gatekeeping the internet, TLS traffic inspection privacy concerns etc.), on the contrary makes everything I have written about here redundant (+ provide more, like DDoS protection) as you are outsourcing the security controls.