As part of our mission to ensure industry-wide transparency, the Sellers.guide team has compiled lists that you can download to help you solve some of the issues below.
Ads.txt is all about transparency. The IAB states, "ads.txt is a simple, flexible, and secure method that publishers and distributors can use to publicly declare the companies they authorize to sell their digital inventory." When we started working on Sellers.guide, we realized that the line "publicly declared," which is supposed to be the easiest part, needs more attention.
6% to 12% of publishers’ ads.txt files are unreachable to the crawlers and bots built to read them. If an ads.txt file is unreachable, the buyer (or DSP) that couldn’t crawl it WILL NOT buy its inventory since it will think that it’s not authorized to be sold non-directly – leaving the publisher at risk of losing revenue.
Here are the most common errors for publishers that prevent their files from being crawled:
Some sites (still) do not support HTTPS. All content should be encrypted as a best practice of today’s internet. Some crawlers will not accept non/broken HTTPS connections – and you guessed it, the ads.txt file won’t be crawled.
This is a widespread mistake. Publishers store their ads.txt file in the "www" subdomain - i.e., https://www.publisher.com/ads.txt, instead of putting it in the top domain as it should be – https://publisher.com/ads.txt. Crawlers, by default, use top domain, and won’t fetch sub-domain ads.txt unless redirected.
Some ads.txt, instead of being a plain text file like they should be, are placed inside an HTML document. Crawlers won’t strip the HTML tags and won’t analyze such files.
Hosting services tend to block non-familiar UAs (User Agents). Crawlers use their own “signature” UA, so servers will know it is a machine behind the call. Ads.txt files should be available to any UA, so any crawler will be able to fetch them.
Some hosting services block requests coming from data centers. Essentially, this is supposed to protect your servers from nonhuman traffic. However, since a different machine crawls ads.txt, a data center’s file should always be excluded from this block to be available to all IP addresses.
Due to content distribution networks (CDNs), we’ve seen that ads.txt files of certain domains were different when retrieved from different geo-locations. We also saw cases where the file did not even exist (404) from a certain geo-location.
This can result in revenue loss in two instances:
If the wrong ads.txt file is used.
If the file cannot be reached.
A publisher should make sure that its CDN distributes its latest ads.txt file for all geos.
Typically, a media or ad ops company with hundreds of websites manages a single master ads.txt file on its backend server for all of them. When the ads.txt files for those sites are crawled, the crawler will try to fetch the (same) ads.txt for each domain. Sometimes, the crawler will get blocked by the server that hosts the ads.txt since it gets too many calls from the same origin.
There are many mistakes that can jeopardize a publisher’s ads.txt file. The plus side is, most of the errors are simple to correct and easy to avoid. Analyze your domain for free with Sellers.guide and check to see if your file needs some help from our clean-up crew. See the publishers making these errors here by downloading the lists.