Server Logs Explained, Part 1: Introduction and Googlebot Traffic
(Editor’s note: This post has been updated since publication.)
In this post, I’m going to go over some background information about the web server logs I’ll be posting, and then I’ll go over a basic example of what traffic from Googlebot should look like.
The logs I’ll be posting will be in the default combined format for Nginx, which looks like this:
$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent"
Let’s break that down. Each section that’s prefaced by a dollar sign indicates one piece of information that will be logged. Nginx will collect this information and then concatenate it so that it is easy to scan. If any of the requested pieces of information are not provided, Nginx will replace it with a hyphen. Here is what each bit of information we’ll be collecting means:
$remote_addr: The IP address of the client.
- (The hyphen here is just a placeholder for readability)
$remote_user: The authenticated user, if one exists.
$time_local: The time the request was processed, based on the server’s time zone settings.
$request: The requested resource. This will include the HTTP method used.
$status: The status code that the web server returned.
$bytes_sent: The number of bytes the server sent to the client.
$http_referer: The referrer URL, or the webpage that sent the visitor to your server using a link.
$http_user_agent: Information regarding what browser or operating system was used to make the request.
Fun fact: You may have noticed that “referer” is spelled incorrectly. This is a misspelling in the actual HTTP specification and you can learn more about that here.
Now let’s break down a real log, taken from one of my servers. I’m going show how each piece of information in the log matches up to the reference above.
220.127.116.11 - - [18/Jun/2016:08:36:33 -0400] "GET /about-us HTTP/1.1" 200 2233 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
$remote_addr is 18.104.22.168. This is a real Googlebot IP address, so we can be fairly certain this is a legitimate request.
$remote_user: Googlebot was not logged into the server, so this field is blank.
$time_local: The request came in at 8:36 AM on June 18th, 2016.
$request: Googlebot requested the /about-us page using a GET request.
$status: The server responded with an HTTP 200 OK status code, indicating that it was able to successfully complete the request.
$bytes_sent: The server sent back 2,233 bytes.
$http_referer: Googlebot did not list a referrer URL.
$http_user_agent: This custom user agent listing tells us that the request was made by Googlebot and not by a typical web browser.
Join us next week for an explanation of some less benign requests.