Bot traffic: What it is and why you should care about it | DevsDay.ru

IT-блоги Bot traffic: What it is and why you should care about it

Yoast.com 29 июня 2022 г. Edwin Toonen


Bots have become an integral part of the digital space of today. They help us order groceries, play music on our Slack channel, and pay our colleagues back for the delicious smoothies they bought us. Bots also populate the internet to carry out the functions they’re designed for. But what does this all mean for website owners? And perhaps more importantly, what does this mean for the environment? Read on to find out what you need to know about bot traffic and why you should care about it!

What is bot traffic?

To begin, a bot is a software application created to perform automated tasks over the internet. Bots can imitate or replace the behavior of a real user. They’re very good at executing repetitive and mundane tasks. They’re also swift and efficient, which makes them a perfect choice if you need to do something on an enormous scale.

Bot traffic refers to non-human traffic to a website or app. If you own a website, you’ve likely been visited by a bot. Bot traffic accounts for more than 40% of the total internet traffic in 2022. We’ve seen this number rising in recent years, and we will continue to see this trend in the foreseeable future.

Bot traffic gets a bad name sometimes, and in many cases, they are indeed bad. But there are good and legitimate bots too. It depends on the purpose of those bots. Some bots are essential for operating digital services like search engines or personal assistants. Some bots want to brute-force their way into your website and steal sensitive information. So which are the ‘good’ bot activities and which ones are ‘bad?’ Let’s go a bit deeper into these two kinds of bots.

The ‘good’ bots

The ‘good’ bots carry out specific functions that do not cause harm to your website or server. They announce themselves and let you know what they do on your website.

The most popular bot of this type is probably search engine crawlers. Without crawlers visiting your website to discover content, search engines would have no way to serve you information when you search for something. When we talk about ‘good’ bot traffic, we’re talking about these bots. It’s perfectly normal for a site to have a small percentage of traffic coming from ‘good’ bots. Other than search engine crawlers, some other good internet bots include:

  • SEO crawlers: If you’re in the SEO space, you’ve probably used tools like Semrush or Ahrefs to do keyword research or gain insight into competitors. For those tools to serve you information, they also need to send out bots to crawl the web to gather data.
  • Commercial bots: Commercial companies send these bots to crawl the web to gather information. For instance, research companies use them to monitor news on the market; ad networks need them to monitor and optimize display ads; ‘coupon’ websites gather discount codes and sales programs to serve users on their websites.
  • Site-monitoring bots: They help you monitor your website uptime and other website metrics. They periodically check and report data such as your server status and uptime duration so you can take action when something’s wrong with your site.
  • Feed/aggregator bots: They collect and combine newsworthy content to deliver to your website visitors or email subscribers.

The ‘bad’ bots

The ‘bad’ bots are created with malicious intentions in mind. You are probably familiar with spam bots that spam your website with non-sense comments, irrelevant backlinks, and atrocious advertisements. You’ve probably also heard of bots that take people’s spots in online raffles or those that buy out the good seats in concerts.

Because of these malicious bots, bot traffic gets a bad name. Unfortunately, a significant amount of bot traffic comes from such ‘bad’ bots. It is estimated that bad bot traffic will account for 27.7% of internet traffic in 2022. Here are some of the bots that you don’t want on your site:

  • Email scrapers: They harvest email addresses and send malicious emails to those contact.
  • Comment spam bots: Spams your website with comments and links that redirect people to a malicious website. Or in many cases, they spam your website to advertise or to try to get backlinks to their sites.
  • Scrapers bots: These bots come to your website and download everything they can find. That can include your text, images, HTML files, and even videos as well. Bot operators will then re-use your content without permission.
  • Bots for credential stuffing or brute force attacks: These bots will try to gain access to your website to steal sensitive information. They do that by trying to log in like a real user.
  • Botnet, zombie computers: They are networks of infected devices used to perform DDoS attacks. DDoS stands for distributed denial-of-service. During a DDoS attack, the attacker uses such a network of devices to flood a website with bot traffic. This overwhelms your web server with requests, resulting in a slow or unusable website.
  • Inventory and ticket bots: They go to websites to buy up tickets for entertainment events or to bulk purchase newly-released products. Brokers use them to resell tickets or products at a higher price to make profits.

Why you should care about bot traffic

Now that you’ve got some knowledge about bot traffic let’s talk about why you should care about it.

For your website security and performance

We’ve discussed several types of bad bots and their functions. You do not want malicious bots lurking around your website. They will undoubtedly wreak havoc on your website performance and security.

Malicious bots disguise themselves as regular human traffic, so they might not be visible when you check your website traffic statistics. That can hurt your business decisions because you don’t have the correct data. You might see random spikes in traffic but don’t understand why. Or you might be confused as to why you receive traffic but no conversion.

Next to this, malicious bot traffic strains your web server and might sometimes overload it. These bots take up your server bandwidth with their requests, making your website slow or utterly inaccessible in case of a DDoS attack. In the meantime, you might have lost traffic and sales to other competitors.

And malicious bots are bad for your site’s security. They will try to brute force their way into your website using various username/password combinations or seek out weak entry points and report to their operators. If you have security vulnerabilities, these malicious players might even attempt to install viruses on your website and spread those to your users. And if you own an online store, you will have to manage sensitive information like credit card details that hackers would love to steal.

For the environment

Let’s come back to the question at the beginning of the post. You need to care about bot traffic because it affects the environment more than you might think.

When a bot visits your site, it makes an HTTP request to your server asking for information. Your server needs to respond to this request and returns the necessary information. Whenever this happens, your server must spend a small amount of energy to complete the request. But if you consider all the bots on the internet, then the amount of energy spent on bot traffic is enormous.

In this sense, it doesn’t matter if a good or bad bot visits your site because the process is still the same. They both use energy to perform their tasks, and they both have consequences on the environment. Even though search engines are an essential part of the internet, they are guilty of being wasteful too.

You know the basics by now, search engines send crawlers to your site to discover new content and refresh old ones. But they can visit your site too many times and not even pick up the right changes. We recommend checking your server log to see how many times crawlers and bots visit your site. A crawl stats report in Google Search Console also tells you how many times Google crawls your site. You might be surprised by some numbers there.

A small case study from Yoast

Let’s take Yoast, for instance. On a given day, Google crawlers can visit our website 10,000 times. It might seem reasonable to visit us a lot, but they only crawl 4,500 unique URLs. That means energy was used on crawling the duplicate URLs over and over. Even though we regularly publish and update our website content, we probably don’t need all those crawls. These crawls aren’t just for pages; crawlers also go through our images, CSS, JavaScript, etc.

But that’s not all. Google bots are not the only ones visiting us. There are bots from other search engines, digital services, and even bad bots. Such unnecessary bot traffic strains our website server and wastes energy that could otherwise be used for other valuable activities.

Statistics about crawl behaviors on Yoast.com. In this example, Google bot crawled Yoast 9.537 times and 4,458 links were crawled.
Statistic on the crawl behaviors of Google crawlers on Yoast.com in a day

What to do against ‘bad’ bots

You can try to detect bad bots and block them from entering your site. That will save you a lot of bandwidth and reduce strain on your server, which in turn helps save energy.

The most basic way to do this is to block an individual or an entire range of IP addresses. You should block that IP address if you identify irregular traffic from a source. This approach works, but it’s labor-intensive and time-consuming. Alternatively, you can use a bot management solution from providers like Cloudflare. These companies have an extensive database of good and bad bots. They also use AI and machine learning to detect malicious bots and block them before they can cause harm to your site.

You should install a security plugin if you’re running a WordPress website. Some of the more popular security plugins (like Sucuri Security or Wordfence) are maintained by companies that employ security researchers who monitor and patch issues. Some security plugins automatically block specific ‘bad’ bots for you. Others let you see where unusual traffic comes from and decide how to deal with that traffic.

What about the ‘good’ bots

As we mentioned earlier, the ‘good’ bots are good because they are essential and transparent in what they do. But they can consume a lot of energy while performing their tasks, which impacts the environment. Not to mention, these good bots might not even be helpful for you. Even though what they do can be considered ‘good,’ they might even bring disadvantages to your website and, ultimately, to the environment. So what can you do for the good bots?

1. Block them if they are not useful

You need to think and decide whether or not you want these ‘good’ bots to crawl your site. Do them crawling your website benefit you? And, significantly, do them crawling you benefit more than the cost to your servers, their servers, and the environment?

Let’s take search engine bots, for instance. You know that Google is not the only search engine out there. It’s most likely that crawlers from other search engines have visited you. Let’s say you check your server log and see that a search engine has crawled your site 500 times today, but it only brings you ten visitors. If that’s the case, would it be useful to let bots from that search engine crawl your site? Or should you block them because you don’t get much value from this search engine?

2. Limit the bot’s crawl rate

If they support the crawl-delay in robots.txt, you should try to limit their crawl rate, so they don’t come back once every 20 seconds and crawl the same links over and over. This is very useful for medium to large websites that crawlers often visit. But small websites also benefit from using crawl delays. Most likely, you don’t update your website content 100 times on a given day, even for larger websites. And if you have copyright bots visiting your site to check for copyright infringement, do they need to come every few hours?

You could play with the crawl rate and monitor its effect on your website. And you can assign a specific crawl delay rate for crawlers from different sources. Start with a slight delay and increase the number when you’re sure it doesn’t have negative consequences. Unfortunately, Google doesn’t support craw delay, so you don’t need to set this for Google bots.

3. Help them crawl more efficiently

You can decide which parts of your site you don’t want bots to crawl and block their access via robots.txt. This not only saves energy but also helps to optimize your crawl budget.

There are a lot of places on your website where crawlers have no business coming. That can be your internal search results, for instance. Nobody wants to see those on public search engines. Or, if you have a staging website, you probably don’t want people to find it.

Next, you can help bots crawl your site better by removing unnecessary links that your CMS and plugins automatically create. For instance, WordPress automatically creates an RSS feed for your website comments. Of course, this RSS feed has a link. But hardly anybody looks at it anyway, especially if you don’t have a lot of comments. Hence, the existence of this RSS feed might not bring you any value. It just creates another link for crawlers to crawl repeatedly, wasting energy in the process.

Optimize your website crawl with Yoast SEO

We’ve recently launched a feature in Yoast SEO Premium that lets you optimize your website to make it easier for crawlers to crawl your site. Within the crawl settings in Yoast SEO Premium, you’ll find many toggles that let you turn off various things WordPress automatically adds to your site that most sites won’t miss.

At the moment, there are 20 toggles available in the crawl setting. We’ve added a lot more options since the feature was first released in Yoast SEO Premium 18.6. It’s good to know this is currently in beta. We will be working hard to improve this feature and add more settings to help you optimize your site’s crawlability. Check out this page to learn more about our crawl feature!

The post Bot traffic: What it is and why you should care about it appeared first on Yoast.

Источник: Yoast.com

Наш сайт является информационным посредником. Сообщить о нарушении авторских прав.

Crawl directives Technical SEO