Senior Site Reliability Engineer Full-time Job

1 week ago   Engineering   Dubai   96 views Reference: 33985
Job Details

Tech stack

OS: Linux Ubuntu;

Web server: Nginx;

Monitoring: Grafana, Prometheus, Graylog, Jaeger;

CI/CD: Jenkins, Git, Gitlab, Docker;

Automation: Python, Bash;

SCM: Ansible, Chef;

IaC: Terraform. Pulumi;

DB: PostgreSQL, Redis, Keydb, MySQL;

Cloud: Openstack, AWS, GCP, DO.

Examples of first tasks in the role:

Review processes, platform and infrastructure;

Implementation of Grafana OnCall;

Review and rework ITSM processes if needed.

Responsibilities in the role:

Identification of bottlenecks and preparation of recommendations to improve the reliability of services;

Responding to platform emergencies, localizing and resolving the causes of failures, compiling postmortem reports;

Development of monitoring and alerting tools ensuring high availability and quick detection of potential issues: (Grafana, Grafana OnCall, Prometheus Alert manager, etc.);

Active participation in change management processes, including assessment and coordination of changes to the infrastructure within Change Advisory Board (CAB) sessions;

Implementation and support of ITSM processes to optimize team workflow and enhance service quality.

Development and maintenance of documentation in an up-to-date state.

Requirements:

3+ years of experience in SRE/DevOps;

Understanding of SRE principles, practical experience in implementing SRE practices;

Understanding of principles and practical experience in building resilient systems;

Experience with monitoring and logging systems (Prometheus, Graylog, Grafana).

Experience with automation tools for software build and deployment (CI/CD): GitLab, Jenkins;

Understanding of virtualization and containerization principles;

Understanding of Infrastructure as Code (IaC) approaches and experience;

Proficiency in a programming language for automation script development (Python, Nodejs, Golang, etc.), ability to understand service code;

Understanding of network protocols, topologies, and network models;

Experience with configuration management tools: Ansible, Chef;

Basic experience with relational databases, such as PostgreSQL;

Experience in administering Linux operating systems;

Fluency in English and Russian (B2 minimum).

Company Description
Quadcode is a fintech company specializing in financial brokerage activities and offering advanced financial products to clients globally. Our flagship product is our internal trading platform offered as a Software-as-a-Service (SaaS) solution to other brokers.
By being a financial broker ourselves (B2C) and offering our technological solutions as a SaaS solution (B2B) to other brokers, we are able to identify opportunities and improve our offerings for both worlds.
As of now, there are over 700 employees and service providers working at Quadcode in 7 offices spread around the world the UK, Gibraltar, the UAE, the Bahamas, Australia, and the headquarters in Cyprus.
By expanding its presence on an international level, Quadcode offers a remote or hybrid work model, a wide range of interesting tasks and challenges for developers, market research analysts, PR marketing specialists, and many more.