Marcelo Candio, portrait
Intro

I run AI platforms in production, under regulated and high-traffic load.

Chief Systems Engineer · Platform & SRE

Ten+ years operating mission-critical infrastructure on AWS and GCP. The last four have been deliberately AI-oriented: an LLM API gateway and the agents around it at EPAM, computer-vision model infrastructure at Agot.ai, Algorand blockchain SRE at C3.ai, and platform reliability for regulated Brazilian betting at Blaze.

Based
Remote
Tenure
+10 years SRE / Platform Eng.
Focus
AI Infrastructure / High-Load Traffic / Regulated
Status
Open to opportunities
Experience
Certifications & ToolsTools I actually combine, plus what's on paper
Tools & StackDaily driver toolchain

Languages

Python
TypeScript
Go
Bash
JavaScript
SQL

Cloud & Compute

AWS
GCP
Azure
Anthos
S3
CloudFront

Containers & Orchestration

Kubernetes
K3s
K3d
Istio
Traefik
Telepresence

IaC & Config

Terraform
Terragrunt
Atlantis
Ansible
Packer
HashiCorp Vault

Data & Cache

PostgreSQL
MongoDB
DynamoDB
Redis
SSDB
Sphinx

Workflows & ML

Airflow
Prefect
Kubeflow
Step Functions
AWS Bedrock
OpenAI

Networking & Edge

Cloudflare
ALB
API Gateway
Kong
NGINX
Teleport

Observability

Datadog
New Relic
CloudWatch
Prometheus
Grafana
OpenTelemetry

CI/CD & Platform

GitHub
GitLab
CircleCI
ArgoCD
Port
SSM
Writing
All writing →
Contact

Let's talk infrastructure.

Open to staff+ platform and SRE roles. I reply within two working days.

Base
Portugal · Remote

No tracking. No newsletter. Just a reply.

By sending this message, you accept the Terms of Use.