How to gather OpenTelemtry Metrics in Instana with ‘no’ Instana agent on your ‘production’ infrastructure (part 2)

Alain Airom (Ayrom)
11 min readJul 12


This is the 2nd part of OpenTelemetry trace export to an Instana back-end series of articles.

The previous article discussed a simple use-case of Instana / OpenTelemetry Integration for Go Applications.

The current article is a demonstration of more complex, enterpise/production oriented usage.

How to gather OpenTelemtry Metrics in Instana with ‘no’ Instana agent on your ‘production’ infrastructure

IBM® Instana® Observability is the gold standard of incident prevention with automated full-stack visibility, 1-second granularity, and 3 seconds to notify. With today’s highly dynamic and complex cloud environments, the average cost of an hour of downtime can reach six figures and beyond. Traditional application performance monitoring (APM) tools simply aren’t fast enough to keep up or thorough enough to contextualize the issues identified. Also, they are typically limited to super users who must complete months of training to learn.

IBM Instana Observability goes beyond traditional APM solutions by democratizing observability so anyone across DevOps, SRE, platform engineering, ITOps, and development can get the data they want with the context they need. Instana automatically delivers continuous high-fidelity data at 1-second granularity and end-to-end traces with the context of logical and physical dependencies across mobile, web, applications, and infrastructure.

Instana is capable of monitoring all sorts of platforms/application types once the agent is deployed on a target infrastructure;

  • web sites
  • mobile apps
  • containers
  • K8s clusters
  • databases
  • serverless apps
  • APIs
  • Microservices

All documentation (current and previous versions) could be found on the official page:

But there could be use cases where an organization is already using an observability tool and would like to connect to the Instana back-end BUT with no Instana agent deployments directly on their existing infrastructure. In this article, we will walk through OpenTelemtry metrics to Instana export.

This use case was brought up to us by one of our customers recently.

In order to test the feasibility a minimum of requirements are needed;

So basically what you need to do this is;

  • your code (duh)
  • a bastion to install the Instana agent.

What is the Instana agent?

The demonstration/proof of concept

The demo/code I used to put in place this use case could be found at: and the contributor is Petr Styblo.

The GitHub repo mocks a microservices-based application with services written in different languages/technologies such as Node.js, Golang, and Python programming languages and using technologies such as Redis, Kafka…

The demo app could all be implemented locally using docker compose or it could be deployed to a Kubernetes/OpenShift cluster (local, distant, cloud-based). All the instructions to run the app in either of the environments are fully documented (commands to be executed, Yaml files, or Helm chart for K8s deployments).

The use case mocks an Instana agent deployed on one bastion server, and all the OpenTelemtry traces/metrics go through this server to be visible on the Instana backend.

Running everything locally to monitor a microservices application on Instana back-end

When running the demo app locally, the agent directory should contain a “.env” file as in the example below;

# AGENT .env
# Instana agent configuration
INSTANA_AGENT_KEY="your backend instana agent key"
INSTANA_DOWNLOAD_KEY="your backend instana agent key"
INSTANA_AGENT_ENDPOINT="the instana backend you want to use"
INSTANA_AGENT_ENDPOINT_PORT=443 # this is the port for SaaS backend only
# EUM settings
INSTANA_AGENT_ZONE=otel-demo #the name to track on the backend

For the microservices part, there is a need of an “.env” file too;

# App .env
# Images

# Instana

# Collector
# Demo Platform

# OpenTelemetry Collector

# OpenTelemetry Resource Definitions

# Metrics Temporality

# ******************
# Core Demo Services
# ******************
# Ad Service

# Cart Service

# Checkout Service

# Currency Service

# Email Service

# Feature Flag Service

# Frontend

# Frontend Proxy (Envoy)

# Load Generator

# Payment Service

# Product Catalog Service

# Quote Service

# Recommendation Service

# Shipping Service

# ******************
# Dependent Services
# ******************
# Kafka

# Redis

# ********************
# Telemetry Components
# ********************
# Grafana

# Jaeger

# Prometheus

We can also dig into the code provided to discover how each of the microservices sends their traces to OpenTelemetry. The example below is taken from the Python microservice which is used as a recommender system on the mock-up eCommerce site that the application provides.

# Copyright The OpenTelemetry Authors
# SPDX-License-Identifier: Apache-2.0

import logging
import sys
from pythonjsonlogger import jsonlogger
from opentelemetry import trace

class CustomJsonFormatter(jsonlogger.JsonFormatter):
def add_fields(self, log_record, record, message_dict):
super(CustomJsonFormatter, self).add_fields(log_record, record, message_dict)
if not log_record.get('otelTraceID'):
log_record['otelTraceID'] = trace.format_trace_id(trace.get_current_span().get_span_context().trace_id)
if not log_record.get('otelSpanID'):
log_record['otelSpanID'] = trace.format_span_id(trace.get_current_span().get_span_context().span_id)

def getJSONLogger(name):
logger = logging.getLogger(name)
handler = logging.StreamHandler(sys.stdout)
formatter = CustomJsonFormatter('%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] [trace_id=%(otelTraceID)s span_id=%(otelSpanID)s] - %(message)s')
logger.propagate = False
return logger
# Copyright The OpenTelemetry Authors
# SPDX-License-Identifier: Apache-2.0

def init_metrics(meter):

# Recommendations counter
app_recommendations_counter = meter.create_counter(
'app_recommendations_counter', unit='recommendations', description="Counts the total number of given recommendations"

rec_svc_metrics = {
"app_recommendations_counter": app_recommendations_counter,

return rec_svc_metrics

# Copyright The OpenTelemetry Authors
# SPDX-License-Identifier: Apache-2.0

# Python
import os
import random
from concurrent import futures

# Pip
import grpc
from opentelemetry import trace, metrics
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource

# Local
import logging
import demo_pb2
import demo_pb2_grpc
from grpc_health.v1 import health_pb2
from grpc_health.v1 import health_pb2_grpc

from metrics import (

cached_ids = []
first_run = True

class RecommendationService(demo_pb2_grpc.RecommendationServiceServicer):
def ListRecommendations(self, request, context):
prod_list = get_product_list(request.product_ids)
span = trace.get_current_span()
span.set_attribute("app.products_recommended.count", len(prod_list))"Receive ListRecommendations for product ids:{prod_list}")

# build and return response
response = demo_pb2.ListRecommendationsResponse()

# Collect metrics for this service
rec_svc_metrics["app_recommendations_counter"].add(len(prod_list), {'recommendation.type': 'catalog'})

return response

def Check(self, request, context):
return health_pb2.HealthCheckResponse(

def Watch(self, request, context):
return health_pb2.HealthCheckResponse(

def get_product_list(request_product_ids):
global first_run
global cached_ids
with tracer.start_as_current_span("get_product_list") as span:
max_responses = 5

# Formulate the list of characters to list of strings
request_product_ids_str = ''.join(request_product_ids)
request_product_ids = request_product_ids_str.split(',')

# Feature flag scenario - Cache Leak
if check_feature_flag("recommendationCache"):
span.set_attribute("app.recommendation.cache_enabled", True)
if random.random() < 0.5 or first_run:
first_run = False
span.set_attribute("app.cache_hit", False)"get_product_list: cache miss")
cat_response = product_catalog_stub.ListProducts(demo_pb2.Empty())
response_ids = [ for x in cat_response.products]
cached_ids = cached_ids + response_ids
cached_ids = cached_ids + cached_ids[:len(cached_ids) // 4]
product_ids = cached_ids
span.set_attribute("app.cache_hit", True)"get_product_list: cache hit")
product_ids = cached_ids
span.set_attribute("app.recommendation.cache_enabled", False)
cat_response = product_catalog_stub.ListProducts(demo_pb2.Empty())
product_ids = [ for x in cat_response.products]

span.set_attribute("app.products.count", len(product_ids))

# Create a filtered list of products excluding the products received as input
filtered_products = list(set(product_ids) - set(request_product_ids))
num_products = len(filtered_products)
span.set_attribute("app.filtered_products.count", num_products)
num_return = min(max_responses, num_products)

# Sample list of indicies to return
indices = random.sample(range(num_products), num_return)
# Fetch product ids from indices
prod_list = [filtered_products[i] for i in indices]

span.set_attribute("app.filtered_products.list", prod_list)

return prod_list

def must_map_env(key: str):
value = os.environ.get(key)
if value is None:
raise Exception(f'{key} environment variable must be set')
return value

def check_feature_flag(flag_name: str):
flag = feature_flag_stub.GetFlag(demo_pb2.GetFlagRequest(name=flag_name)).flag
return flag.enabled

if __name__ == "__main__":
service_name = must_map_env('OTEL_SERVICE_NAME')

# Initialize Traces and Metrics
tracer = trace.get_tracer_provider().get_tracer(service_name)
meter = metrics.get_meter_provider().get_meter(service_name)
rec_svc_metrics = init_metrics(meter)

# Initialize Logs
logger_provider = LoggerProvider(
'': service_name,
log_exporter = OTLPLogExporter(insecure=True)
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

# Attach OTLP handler to logger
logger = logging.getLogger('main')

catalog_addr = must_map_env('PRODUCT_CATALOG_SERVICE_ADDR')
ff_addr = must_map_env('FEATURE_FLAG_GRPC_SERVICE_ADDR')
pc_channel = grpc.insecure_channel(catalog_addr)
ff_channel = grpc.insecure_channel(ff_addr)
product_catalog_stub = demo_pb2_grpc.ProductCatalogServiceStub(pc_channel)
feature_flag_stub = demo_pb2_grpc.FeatureFlagServiceStub(ff_channel)

# Create gRPC server
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))

# Add class to gRPC server
service = RecommendationService()
demo_pb2_grpc.add_RecommendationServiceServicer_to_server(service, server)
health_pb2_grpc.add_HealthServicer_to_server(service, server)

# Start server
port = must_map_env('RECOMMENDATION_SERVICE_PORT')
server.start()'Recommendation service started, listening on port {port}')

Once the repo is cloned and the values set, you can use the docker compose commands provided to test/run the app.

The application is accessible at http://localhost:8080/

You can monitor the Instana back-end for traces (this screen capture comes from a test/local back-end, let’s move on).

Running a distant agent bastion

This is great, but now let’s try it with a distant machine used as the bastion for the Instana agent.

The high-level idea of the architecture is the following;

Test the app with a distant agent

  • I provisioned a minimal Ubuntu VSI on IBM Cloud (you can use either in classic or VPC modes)
  • Then we need an Instana backend to generate the agent for our server. For that purpose, I used the IBM Instana dev sandbox to generate an agent for my VSI.
  • Copy the generated agent execution script provided by the back-end, connect to the distant server by “ssh root@xx.xx.x.xx” and paste and execute the agent:

curl -o && chmod 700 ./ && sudo ./ -a xxxxxxxxxxxx -d xxxxxxxxxxxx -t dynamic -e
  • You need to uncomment/modify the following sections;
nano /opt/instana/agent/etc/instana/configuration.yaml
# Hardware &   Zone
enabled: true # disabled by default
availability-zone: 'the-name-you-choose-to-see-your-bastion-server-on-instana-backend'
  • This is my example (when taken into account by Instana back-end after a few seconds)
  • and
# OpenTelemetry Collector
enabled: true
enabled: true
enabled: true
  • Then you’ll need to restart the Instana agent on your bastion server.
cd /opt/instana/agent/bin
./status (if you want to check)

Now again you can run the microservice part of the cloned repo locally (or for example from your K8s deployment).

Network adjustments required in case of a distant bastion agent (present case on IBM Cloud)

As the IBM Cloud platform was used to provision a VSI and deploy the agent, some network configuration is required.

In order to enable the Instana back-end to gather services from the bastion agent, you should put in place a reverse-proxy such as Nginx.

Install Nginx as reverse proxy

Log into your bastion and install Nginx;

sudo apt update
sudo apt install nginx

If you have a firewall you should also adjust it;

sudo ufw app list

In my case, I didn’t put in place any firewall configuration.

Configuration of the reverse proxy

As discussed earlier, there is a specific configuration of the microservices app, which now must be changed in order to reflect the usage of a reverse proxy.

First, we give a different port number for the Instana agent in the .env file.

# Instana

Then we configure our reverse proxy as shown below in /etc/nginx/sites-enabled/default (the file name is ‘default’);

# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# In most cases, administrators will remove this file from sites-enabled/ and
# leave it as reference inside of sites-available where it will continue to be
# updated by the nginx packaging team.
# This file will automatically load configuration files provided by oth
# applications, such as Drupal or Wordpress. These applications will be made
# available underneath a path with that package name, such as /drupal8.
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.

# Default server configuration
server {
listen 80 default_server;
listen [::]:80 default_server;

# SSL configuration
# listen 443 ssl default_server;
# listen [::]:443 ssl default_server;
# Note: You should disable gzip for SSL traffic.
# See:
# Read up on ssl_ciphers to ensure a secure configuration.
# See:
# Self signed certs generated by the ssl-cert package
# Don't use them in a production server!
# include snippets/snakeoil.conf;

root /var/www/html;

# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;

server_name _;

location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;

# pass PHP scripts to FastCGI server
#location ~ \.php$ {
# include snippets/fastcgi-php.conf;
# # With php-fpm (or other unix sockets):
# fastcgi_pass unix:/run/php/php7.4-fpm.sock;
# # With php-cgi (or other tcp sockets):
# fastcgi_pass;

# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#location ~ /\.ht {
# deny all;

server {
access_log /var/log/nginx/access_42700.log;
error_log /var/log/nginx/error.log error;
listen 42700;
listen [::]:42700;
location / {
proxy_pass http://localhost:42699;
server {
access_log /var/log/nginx/access_42717.log;
error_log /var/log/nginx/error.log error;
listen 42717 http2 ;
location / {
server {
access_log /var/log/nginx/access_42718.log;
error_log /var/log/nginx/error.log error;
listen 42718;
listen [::]:42718;
location / {
proxy_pass http://localhost:4318;

# Virtual Host configuration for
# You can move that to a different file under sites-available/ and symlink that
# to sites-enabled/ to enable it.
#server {
# listen 80;
# listen [::]:80;
# server_name;
# root /var/www/;
# index index.html;
# location / {
# try_files $uri $uri/ =404;
# }

In this configuration, OpenTelemetry specific 4317 (gRPC) and 4318 (http) which are only listening on the localhost will be visible and open through a remote connection too!

Now we restart the Nginx server to apply the changed configuration.

systemctl restart nginx
systemctl status nginx

OpenTelemetry traces visible on Instana Backend

Example of OpenTelemetry traces on the Instana backend.

And there we go!!!! We have the result we wanted.

The next step is to give a more user-friendly name to services! Stay tunded and thanks for reading.

The dream-team who contributed to this project

First, Petr Styblo (Cloud-Native Solutions Architect @IBM) for pointing me to his repo and adjusting his code on the fly for my needs.

Many thanks to Mathieu Figiel (Technical Sales Specialist Observability @IBM) for his valuable help on Instana and network configuration.

And also many thanks to Badreddine Boutanzit SRE @IBM Client Engineering for his precious adjustments in reverse proxy implementation.

Last but not least, my buddy Keyvan Tofighi (APM/ARM Specialist @IBM) who implicated us in his project and for his help and knowledge on the Instana back-end.



Alain Airom (Ayrom)

IT guy for a long time, looking for technical challenges everyday!