Recently I have stopped writing tech related posts, and the main reason is I could not invest enough time into tech to make my post not a redundant of other peoples solution / article / medium post / stackvoerflow answers.
However, due to the painful set up progress of gRPC service on AWS EKS. I think finally I get something to talk about.
The problem here we try to solve is we want to put a gRPC server in AWS Elastic Kubernetes Service.
This post assume the readers have enough knowledge about AWS, Load Balancers of AWS, EKS basic, Kubernetes Basic.
If you learn a little about gRPC, you will know it uses HTTP2 connection and require level 7 load balancing. In AWS, until this post is written, there is no support from provided load balancers. (Though AWS plans to launch Application Load Balancer with this feature next month, so basically this is a dead post :P)
So we need something in between to perform the load balancing. You have lots of choices, i.g. Nginx-Ingress-Controller, Envoy, etc.
We use Nginx-Ingress-Controller (NIC) just out of no reason.
We happily set up the NIC, our server is running in EKS and everything is good. Then we come to the TLS termination problem.
You see, NIC has Helm chart, so there is no way for us to set up NIC manually, which results in I have few understanding of how exactly NIC works.
In the chart README, it mentions sometimes user need to terminate TLS at Network Load Balancer (NLB) of AWS, so I think,
we need to terminate TLS, and the README mentioned something about terminate TLS, then they should be the same thing.
It turns out not.
TLS can be terminated at NLB, but it can also terminated at the Nginx Pod created by NIC as well.
What makes thing become worse is I also ignored the code in the ingress file of the gRPC example from NIC repo.
So I set TLS termination at NLB, and then drop the proper TLS termination code in ingress, and start testing. Certainly, it does not work. I immediately reach out for help from my infra team.
My infra team successfully figured out I forget the TLS in ingress problem, though they has few knowledge about NIC. But the problem is they think manually manage certificate is so complicated, so we should Amazon Certificate Manager (ACM).
Starting from this point, my exploration is officially yawed. Because at this point, NIC does not support ACM.
It tooks me 1 week and even reach for AWS support, then I realized we could not use ACM at all. The gRPC service is immediately working after I realized the problem and apply for a single certificate to encrypt my TLS connection from IT and set it up…
Basically, at least, I should always finish those ugly unreadable document and do not easily trust other’s proposal if they are not really experienced in that part….
Hope it will help somebody.