Microservices Best Practices I Learned at Tokopedia
Microservices Best Practices I Learned at Tokopedia
During my time at Tokopedia, I had the privilege of working on systems that handle millions of requests daily. Here are the key lessons I learned about building and maintaining microservices at scale.
1. Design for Failure
In a distributed system, failures are inevitable. Design your services to handle them gracefully.
Circuit Breaker Pattern
Implement circuit breakers to prevent cascading failures:
type CircuitBreaker struct {
failures int
threshold int
state State
lastFailure time.Time
timeout time.Duration
}
func (cb *CircuitBreaker) Call(fn func() error) error {
if cb.state == Open {
if time.Since(cb.lastFailure) > cb.timeout {
cb.state = HalfOpen
} else {
return ErrCircuitOpen
}
}
err := fn()
if err != nil {
cb.recordFailure()
return err
}
cb.reset()
return nil
} Retry with Exponential Backoff
func RetryWithBackoff(fn func() error, maxRetries int) error {
var err error
for i := 0; i < maxRetries; i++ {
err = fn()
if err == nil {
return nil
}
backoff := time.Duration(math.Pow(2, float64(i))) * time.Second
time.Sleep(backoff)
}
return err
} 2. Observability is Critical
You can’t fix what you can’t see. Implement the three pillars of observability:
Logging
Use structured logging with consistent fields:
log.WithFields(log.Fields{
"request_id": ctx.Value("request_id"),
"user_id": userID,
"action": "create_order",
"latency_ms": latency.Milliseconds(),
}).Info("Order created successfully") Metrics
Track key metrics for each service:
- Request rate - Requests per second
- Error rate - Percentage of failed requests
- Latency - P50, P95, P99 response times
- Saturation - CPU, memory, connection pool usage
Distributed Tracing
Implement tracing to follow requests across services:
span, ctx := opentracing.StartSpanFromContext(ctx, "processOrder")
defer span.Finish()
span.SetTag("order_id", orderID)
span.LogFields(log.String("event", "processing")) 3. API Design Matters
Use gRPC for Internal Communication
gRPC provides:
- Strong typing with Protocol Buffers
- Efficient binary serialization
- Built-in streaming support
- Automatic code generation
service OrderService {
rpc CreateOrder(CreateOrderRequest) returns (Order);
rpc GetOrder(GetOrderRequest) returns (Order);
rpc ListOrders(ListOrdersRequest) returns (stream Order);
} REST for External APIs
For public-facing APIs, REST with clear versioning:
GET /api/v1/orders/{id}
POST /api/v1/orders 4. Database Per Service
Each microservice should own its data:
- Autonomy - Services can choose the best database for their needs
- Isolation - Schema changes don’t affect other services
- Scalability - Each database can scale independently
Handling Cross-Service Data
Use the Saga pattern for distributed transactions:
type OrderSaga struct {
steps []SagaStep
}
func (s *OrderSaga) Execute(ctx context.Context) error {
var completedSteps []SagaStep
for _, step := range s.steps {
if err := step.Execute(ctx); err != nil {
// Compensate in reverse order
for i := len(completedSteps) - 1; i >= 0; i-- {
completedSteps[i].Compensate(ctx)
}
return err
}
completedSteps = append(completedSteps, step)
}
return nil
} 5. Message Queues for Async Communication
Use message queues (NSQ, Kafka, RabbitMQ) for:
- Decoupling - Services don’t need to know about each other
- Resilience - Messages persist if consumers are down
- Scalability - Add consumers as needed
func (c *Consumer) HandleMessage(msg *nsq.Message) error {
var event OrderCreatedEvent
if err := json.Unmarshal(msg.Body, &event); err != nil {
return err
}
// Process the event
return c.processOrderCreated(event)
} Key Takeaways
- Design for failure - Implement circuit breakers, retries, and timeouts
- Invest in observability - Logging, metrics, and tracing from day one
- Choose the right communication pattern - gRPC internally, REST externally
- Own your data - Database per service with saga patterns for transactions
- Embrace async - Use message queues for non-blocking operations
Building microservices at scale is challenging, but these patterns have served us well at Tokopedia. Start small, iterate quickly, and always keep reliability in mind.
Have questions about microservices? Feel free to reach out on X or LinkedIn.

Ali Irsyaad
Software Engineer