台北市中山區2年以上大學以上
17 LIVE 歡迎對以下工作內容有興趣的 網站可靠性工程師 加入我們的大家庭!
您將負責:
- 負責 17LIVE 基礎架構和產品的整體性能和可靠性。
- 自動化:SRE 看不下去沒有自動化和工具的事情。
- 系統架構:知道系統運作生命週期(ex: 啟動到可以對外提供服務到停止)。
- 部署和變更管理: 知道服務發布流程(例如: GitFlow、GitHubFlow、GitLabFlow)以及如何進行版本控制,理解 GitOps。
- 監控服務:了解如何收集log、metrics以及建立dashboard監控服務。
- 提高可用性:知道如何部署HA架構以及DR架構。
- 系統意外事件處理(改善 On-Call 的體驗,工具和程序),能夠初步判斷意外事件可能因素,協助進行事後分析。
- 了解 IaC 並至少會使用一種 IaC Tool,例如 Terraform。
如果您具備以下工作技能及工作經驗,請不要猶豫立即手刀提出申請:
- 了解 Linux 基本運作原理 和樂意更深入了解 Linux 內部結構。
- 良好的程式語言技能,至少在下列之一:Go、C、C ++、Python、Java 和學習其他語言的能力。
- 基礎的 Shell Script 技能。
- 具有 Kubernetes、 CI/CD 、Monitoring 維運經驗。
- 具有 IDC、AWS、GCP 或 Azure 的經驗。
加分條件:
- 具備 Kubernetes 或是 cloud 相關證照
- 具備 container 相關知識,例如 docker、containerd、podman。
- 具備其中一種的知識:MySQL、MongoDB、ELK、Datadog、Prometheus 或類似技術。
- 了解 Caching 和 Queue Redis,memcache,RabbitMQ,Apache Kafka…)。
- 對開源軟體有貢獻。
We are currently hiring for Site Reliability Engineer professionals that will take part in:
Responsible for the overall performance and reliability of 17LIVE’s infrastructure and products. SREs design and implement the tools that automate building reliable and performant systems.
- Ensuring the overall performance and reliability of 17LIVE's infrastructure and products.
- Automation: SREs can’t stand tasks that aren’t automated or tools that aren't in place.
- System Architecture: Understanding the lifecycle of a system (e.g., from startup to service provision to shutdown).
- Deployment and Change Management: Knowing the service release process (e.g., GitFlow, GitHubFlow, GitLabFlow) and how to manage version control, understanding GitOps.
- Monitoring Services: Understanding how to collect logs, metrics, and create dashboards for monitoring services.
- Enhancing Availability: Knowing how to deploy High Availability (HA) and Disaster Recovery (DR) architectures.
- Incident Management: Handling system incidents (improving the On-Call experience with tools and procedures), being able to preliminarily identify possible causes of incidents, and assisting with post-incident analysis.
- Understanding Infrastructure as Code (IaC) and being proficient with at least one IaC tool, such as Terraform.
Good to Have:
- Understanding the basic principles of Linux and a willingness to delve deeper into Linux's internal structure.
- Strong programming skills in at least one of the following languages: Go, C, C++, Python, Java, and the ability to learn other languages.
- Basic shell scripting skills.
- Experience in maintaining Kubernetes, CI/CD, and Monitoring systems.
- Experience with IDC, AWS, GCP, or Azure.
You will be highly considered if you have the following experience:
- Possessing Kubernetes or cloud-related certifications.
- Knowledge of container technologies such as Docker, containerd, or podman.
- Knowledge of one of the following: MySQL, MongoDB, ELK, Datadog, Prometheus, or similar technologies.
- Understanding of caching and queue systems like Redis, Memcached, RabbitMQ, Apache Kafka, etc.
- Contributions to open-source software.