Log File Analysis Introduction
Not many in the SEO industry are aware of the enormous benefits that are offered by Log File Analysis, I spoke to a couple of friends in my agency, and quite a few were taken back by the term itself. For some, it may sound a bit difficult, but in reality, it’s easy to learn.
Many SEO specialists working in an agency feel that something like Log File Analysis isn’t required for small clients and would most likely be beneficial for large scale enterprise SEO clients.
Now, this is not true,mitroon!
So What are Log Files?
Here’s a 2 sentence explainer from Wikipedia “A server log is a log file (or several files) automatically created and maintained by a server consisting of a list of activities it performed. A typical example is a web server log which maintains a history of page requests.”
To understand what a log file is, we should first know what “Requests” are to get a good understanding. HTTP was developed in the year 1991 to fetch documents and send them to a client; in other words, it allows web-based applications to communicate and exchange data. To put it briefly, this is how the parties involved in the worldwide web communicate.
To simply put it, Log files are those files containing details of who and what are making requests to our website. Details such as Time, Date, IP Address, etc. are recorded in a file, providing us with an exact overview of how Googlebot and other crawlers are crawling our site.
Here’s how a format of a standard log file would look like, every site would store the information differently, so the below is just an example of how it would usually look.
“22.214.171.124 [01/Jan/2017:09:00:00 +0000] “GET /contact.html HTTP/1.1” 200 250 500000 “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
So let’s break down the above line in parts to help us understand the information that is provided on each line in the log file. I have colour coded the elements so as to differentiate the components that make up the logfile quickly.
- Server IP
- The date & time the request was received by the server
- The method of the Server Request
- Requested URL
- Response code of the page
- Size of the response in bytes – ie 250 bytes.
- Time taken to serve the request, in microseconds: 500000 (Half a second)
- User-Agent: Googlebot
- Client IP (Not Included above – as they are sometimes included in files)
- Referrer – (Google does not provide this)
The above format is a standard one that I’ve come across while doing Log File Analysis for a few of my clients. A few popular formats are WC3, Apache, and Elastic Load Balancing. We, as SEO’s, do not need to learn the forms in-depth but feel free to do so if need be. Knowing too much never hurts.
These files are usually available in the Hosting Cpanel; if you have any difficulty retrieving them, you can always reach out to your developer to get them for you.Once you have the files, the next step would be to analyze them for insights. What kind of insights you’d ask. Benefits of Log File Analysis are plentiful, and I feel SEO’s should do an LFA while doing a Technical SEO Audit for their clients.
Tools required for Log File Analysis
There are several tools available in the market, from what I have researched, most of them are a bit expensive. The one that provides “Paisa Vasool” happiness is Screaming Frog’s Log File Analyzer. I’ve been using the tool for the last year since I got introduced to a concept like Log File Analysis.
I’m planning to write a 3-Article Series on Log File Analysis, by next week, I should have published an article on setting up the tool and the insights any SEO can get by performing an LFA. So am I not going to let you know what are the insights we can get from Log File Analysis. Are you kidding me! That’s the best part of the article, and not mentioning the benefits would keep this article incomplete. Mostly I’d fail at helping my fellow SEO brethren the endless opportunities of an LFA.
As mentioned before, the log files that contain all the details about the requests are imported in a tool like Log File Analyzer, which kind of spits out information for us to analyze and make well-informed decisions. In the field of SEO, we all are aware of the fact that the rectifications that we want to take need to be backed heavily with the required data. Critical issues related to technical SEO can easily be spotted, and to an extent, I feel it is the most accurate form of Raw Data available to all.
Insights from Log File Analysis
The most important benefit would be to see where Googlebot ends up crawling the most, in simplified terms – Crawl Budget. For some who don’t know what a crawl budget is, it’s in simple terms the crawl limit allotted by the GoogleBot for sites. The more prominent and popular the website, the higher the Budget allocated.
Now, this is something which has been coined by the industry experts. The reason why something like Crawl Budget exists is that there are around 5 billion pages indexed on Google as of mid-2019, according to WorldWideWebSize.com. GoogleBot may have the resources to crawl a site frequently, but why would it crawl a site which has an adverse health score.
You can identify the crawl issues over a while, this is something which is not offered via a site crawl.
Crawled Pages & Files
As mentioned above, we can get to know the pages which are being crawled by the search engine bots frequently, pages that are deemed low quality can be deindexed or can be modified to navigate users to the essential pages via internal links.
Rectifying 404 pages
While analyzing, you can check out the list of 404 pages on your website that are being pinged from GoogleBot. The higher the number, the higher it should be on the priority list to get it rectified.
Page Load Times
We can find out the list of pages that are high in load times, images that are consuming a lot of resources, etc.
Pages receiving the highest traffic
One can quickly get to know the pages receiving the most amount of traffic, are they the right ones? Are a few pages that you feel essential not receiving the fair share of traffic.
∑- Log File Analysis will provide you with tons of insights on how a Googlebot is interacting with our site in other terms how sites are from Google’s POV. LFA may sound a bit difficult, in reality, it’s not. With a tool like Screaming Frog’s Log File Analyzer and a few hours that need to be spent on analyzing the data, I can guarantee you that you shall discover quite a few crucial issues that you wouldn’t have spotted otherwise.