Month: June 2016

Updated Posted by Arnon Erba in Server Logs Explained on .

(Editor’s note: The IP addresses in this post have been replaced with reserved IP addresses for documentation purposes. This post has been updated since publication to include more information about the w00tw00t scan.)

What is

If you read last week’s post, you’ll remember that I promised to post a more interesting log excerpt this week. This one is from a pretty common bot scan that you’ll see if you’re running a web server for any length of time, and while it looks scary at first, you likely don’t need to worry if your server is configured properly. - - [21/Jun/2016:06:35:55 -0400] "GET / HTTP/1.1" 400 0 "-" "ZmEu"

In this log excerpt, we see that an IP address that maps to the Netherlands made a GET request for /, a nonexistent resource. However, the server returned a 400 Bad Request error rather than a 404 Not Found.

What This Means for You

Because I didn’t grab the accompanying error log entry that explains why Nginx returned a 400 error, I’m going to skip right to the explanation (spoiler alert). The w00tw00t entries are created by the ZmEu or DFind vulnerability scanners as part of an attempt at banner grabbing. Banner grabbing is an enumeration technique, and in this case the scanner was searching for information about my server that could reveal possible exploits. The process goes something like this: a bot, possibly an infected computer or a proxy server, sends an HTTP GET request with a bogus URI in the hope that the targeted server will respond with some information about its configuration. In my case, Nginx determined that the HTTP request was malformed in some way, so it rejected it with a 400 Bad Request status code. Most likely, the request was missing the Host header, in the hope that my server would fill it in or provide some other information.

The bottom line is that if you’re running a web server, you’re going to come across these requests in your server logs at some point. The Internet is frequently scanned by script kiddies looking for various vulnerabilities, but as long as your server returns a 400 error for any w00tw00t requests, you shouldn’t have to worry. There are a few other variants of this scan as well, including one that makes a request for /

Further Reading

If you want to read more about the w00tw00t scan, here’s some extra resources for more information:

Updated Posted by Arnon Erba in Server Logs Explained on .

In this post, I’m going to go over some background information about the web server logs I’ll be posting, and then I’ll go over a basic example of what traffic from Googlebot should look like.

Log Format

The logs I’ll be posting will be in the default combined format for Nginx, which looks like this:

$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent"

Let’s break that down. Each section that’s prefaced by a dollar sign indicates one piece of information that will be logged. Nginx will collect this information and then concatenate it so that it is easy to scan. If any of the requested pieces of information are not provided, Nginx will replace it with a hyphen. Here is what each bit of information we’ll be collecting means:

$remote_addr: The IP address of the client.
- (The hyphen here is just a placeholder for readability)
$remote_user: The authenticated user, if one exists.
$time_local: The time the request was processed, based on the server’s time zone settings.
$request: The requested resource. This will include the HTTP method used.
$status: The status code that the web server returned.
$bytes_sent: The number of bytes the server sent to the client.
$http_referer: The referrer URL, or the webpage that sent the visitor to your server using a link.
$http_user_agent: Information regarding what browser or operating system was used to make the request.

Fun fact: You may have noticed that “referer” is spelled incorrectly. This is a misspelling in the actual HTTP specification and you can learn more about that here.

Googlebot Traffic

Now let’s break down a real log, taken from one of my servers. I’m going show how each piece of information in the log matches up to the reference above. - - [18/Jun/2016:08:36:33 -0400] "GET /about-us HTTP/1.1" 200 2233 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

$remote_addr is This is a real Googlebot IP address, so we can be fairly certain this is a legitimate request.
$remote_user: Googlebot was not logged into the server, so this field is blank.
$time_local: The request came in at 8:36 AM on June 18th, 2016.
$request: Googlebot requested the /about-us page using a GET request.
$status: The server responded with an HTTP 200 OK status code, indicating that it was able to successfully complete the request.
$bytes_sent: The server sent back 2,233 bytes.
$http_referer: Googlebot did not list a referrer URL.
$http_user_agent: This custom user agent listing tells us that the request was made by Googlebot and not by a typical web browser.

Join us next week for an explanation of some less benign requests.

Updated Posted by Arnon Erba in Server Logs Explained on .

To go along with the launch of my new blog, I’m going to be starting a post series with explanations of some of the bizarre server logs that turn up from my websites every few days. I’ll be covering what normal, healthy traffic looks like, but will mainly be focusing on some of the juicier would-be WordPress or PHP exploits that come rolling through now and then. Stay tuned, and check the “Server Logs Explained” category for all the posts in this series.

Updated Posted by Arnon Erba in General on .

Pop quiz: do you use the cloud? Even if you don’t know it, it’s highly likely that your answer is “yes”. Cloud computing has become a ubiquitous part of modern day computer usage. However, many people don’t know that much about it.

Google defines “cloud computing” as the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer. That definition is still fairly technical, so let’s break it down.

When you edit a file locally, the file is stored and processed on your computer. This works fairly well, assuming you only have one computer and don’t need to access your file from anywhere else or share it with collaborators. However, if your computer is turned off, you can’t use the file without making a copy of it and placing it on another computer. In today’s world of smartphones and mobile devices, it’s crucial to have access to the same data from multiple locations without having to create redundant copies of files and deal with the hassle of moving them back and forth. The solution is to store the files in a separate, universal location and access those files across the Internet. This separate location takes the form of large, powerful computers run by companies such as Google, Microsoft, Apple, and Amazon, and is commonly referred to as the cloud.

A good example of the cloud in everyday life is modern email. If you use email on both your phone and your computer, and your inbox contains the same emails no matter what device you’re on, you’re most likely using the cloud. The standard configuration for Gmail, Yahoo! Mail, or other email accounts is to store all your emails on your email provider’s servers and to have your devices download temporary copies of them to view. In this example, all your email is stored in the cloud.

Another commonly used cloud service is Google Drive. Google Drive is a service that allows users to upload, edit, and share documents, pictures, and videos. When you use Google Drive, all your files are stored on Google’s cloud servers and are accessible when you sign in to Google Drive with your password.

iCloud on your iPhone or iPad is also a cloud service. iCloud allows you to store photos, backups, and other settings in the cloud so that they are accessible on all your Apple devices. If you use iCloud, you’re using Apple’s cloud servers to store your data.

Other examples include Pandora, Google Play Music, Dropbox, Microsoft Office 365, YouTube, and almost any other service that involves streaming, downloading, or storing content on the Internet.

The name “cloud computing” has nothing to do with the weather, as the term stems from the abstract depiction of remote servers or the Internet in general as a large, ambiguous cloud. However, that doesn’t mean that weather has no effect on the cloud. Since the cloud relies on massive physical computers to store data, a large storm or natural disaster could physically affect these servers. In 2012, Hurricane Sandy partially flooded the server farm of a company called Datagram, Inc. Datagram’s servers ran a number of popular websites, such as Lifehacker, Gizmodo, and Huffington Post, and these websites temporarily went offline as a result of the storm.