Algolia Telemetry

Why is Algolia collecting telemetry data, what do we collect, and do with the data

Last update: June 1st, 2020

Why is Algolia collecting telemetry data

With infrastructure distributed around the world we select the best Algolia endpoint passively using GeoIP routing as part of DNS resolution. If you manage to connect quickly to our API, great! If you fail to connect, we’ll never know about it. You’ll not receive a search result and we’ll not know you had a problem. That’s why in 2020 we started this telemetry search project with a goal of providing visibility into performance and availability experience of people using our API, learning from it and making the experience better.

In order to do that, we’ve developed a system for performance telemetry collection running on this domain, fully independent from our production APIs providing search around the world and privacy focused.

What do we collect

We only collect the minimum dataset we need to understand how our APIs perform and how they are available.

Coming from France and based in California, we love GDPR, CCPA and all the privacy initiatives! Algolia business is not based on selling data, tracking, fingerprinting or anyhow violating the privacy of people. We charge fair prices for our services and we provide fair treatment. You’re not the product!

Here is an example of payload we store:

Stored HTTP Headers:

Stored Request Meta-Data:

Stored JSON

  // Domain of the website collecting telemetry
  "o": "",

  // Search successes
  "d": [{
    // The request to Search API, with appid and index name,
    // and without query parameters
    "r": "",

    // DNS resolution time in milliseconds
    "d": 0,

    // HTTP Roundtrip time in milliseconds
    "t": 113,

    // Size of the search response in KB
    "sz": 19.79,

    // When measure has been taken
    "ts": "2020-06-17T09:26:10.778Z"

  // Search errors
  "e": [{
    // The request to Search API, with appid and index name,
    // and without query parameters
    "r": "",

    // The HTTP status code as received by the Algolia Search API client
    "sc": 408,

    // The error message associated to the HTTP status code
    "m": "Request Timeout Error",

    // Is it a timeout error ? 1=yes, 0=no
    "to": 0,

    // The time at which this data has been measured
    "ts": "2020-06-17T09:26:10.778Z"

From this payload, we’re able to get an idea how to optimise our network to better serve people using the same Internet Service Provider as you but we’re unable to say who you are.

Opting out of Algolia Telemetry

You can opt-out of Algolia Telemetry clicking the button below. This will set an opt-out cookie in your browser for the domain

Questions & Answers

Do you comply with GDPR and CCPA?

Absolutely yes! We don’t track anyone, we don’t identify anyone, we don’t profile anyone, we don't sell our data to anyone, we only aggregate anonymous metrics to improve our service.

What do you use this data for?

We aggregate the data and look at regions and networks that don’t have a great experience with Algolia APIs. Then we work with our server providers and customers on improving the situation. Maybe a new peering will improve the situation, maybe we can change the DNS routing and serve the network from a different location, maybe our customer can lower the size of the responses or maybe our customer can add additional regions into their Algolia setup and serve their regional user base better.

Who are you sharing the data with?

No one. We run our own system on GCP with Cloudflare and we use it for our internal purposes of infrastructure optimization. We don’t use a third party for its collection and we don’t share the data with anyone.

Who has access to this data?

Algolia monitoring and SRE teams.

Do you track me? Can you say if I visited the same website before? Can you monitor my move across websites?

No, we can’t. We don’t identify you anyhow because we don’t want to and we don’t have to.

Why don’t you store the full IP address?

Because we don’t need it. The minimal subnet allowed in BGP today is /24 (256 addresses) and we have no need to optimise any further.

Why do you store the browser and its version?

We extract this information to identify problems introduced or solved by new versions of the browser. Some of our customers demand using TLSv1.0 or TLSv1.1 but some new browsers don’t work with it. Some browser versions have bugs that negatively impact the availability of our service. Some browsers have outdated certificate stores and fail to connect to our API.

Why don’t you set a cookie, fingerprint each browser and correlate the results?

Because we don’t need to. We’re focusing on large scale optimizations for a significant group of users, not individuals.

Do you store my search queries and the results?

No, we don’t because we’re not trying to optimize the network for specific queries and specific results. We care how big the result is. 1kB result behaves differently than 50kB result and that is an interesting performance metric, but what is in the result is not important.

I have an idea how you can do better! How can I contact you?

That’s great! We always want to do better! Send us an email to