Server Logs Explained, Part 6: The Berkeley Research Scan
(Editor’s note: This post has been updated since publication.)
Some log entries are particularly bizarre, like the one we’ll be looking at today:
22.214.171.124 - - [26/Jun/2016:00:35:26 -0700] "\xD5H\xC5p*\xB7:\x8F\x91\x8A\xE1\xAA\xE0p\xD9\xF2[;\xAE\xE7c\xF7\x9C\xAB~\x98\xCB\xAD\xCB\xBE\xCE\xED\xAF\xEC\x8B\x19\xC6\x08D\xEB\xA8\x91\x1De\x10\x18 u\x01zHj\x00\x8D|\x15\x8B;\x98\x08RaSH" 400 166 "-" "-"
My server responded with
400 Bad Request, but the most interesting part is the giant
$request portion, which doesn’t include any of the normal components you would expect in an HTTP request:
Note: See my first Server Logs Explained post for an example of how to interpret the entire log entry.
If your first thought was that this looks like 64 bytes of garbage, then you’d be exactly right. As it turns out, I wasn’t the first person to see one of these bizarre log entries. According to this Information Security StackExchange question and answer, this server request is from an Internet-wide research scan led by the Electrical Engineering and Computer Sciences (EECS) department at the University of California at Berkeley.
A reverse DNS lookup of the IP address led to an illuminating hostname, researchscan1.EECS.Berkeley.EDU. It turns out there’s actually several machines related to the project:
If you access any of those IP addresses or hostnames in a web browser, you’ll see a brief description of the project. However, according to this answer on StackExchange, the text on those webpages has changed over time, so the most concise explanation comes in the form of a quote from the project leaders at Berkeley that’s reproduced in that answer:
We are performing a measurement study of a particular phenomenon on the Internet. To accurately asses the behavior we’re performing a daily scan of the IPv4 space by sending a single benign packet to every IP on port 80 consisting of 64 random bytes of data. […] No, we are not attempting to gain unauthorized access. […] It’s simply randomly generated data that conforms to a certain set of criteria.
I contacted the project team a few months ago as well, but have not heard anything back. Given that the StackExchange answer and my log entry both date back to 2016, it’s possible that the research project is already over and is now just Internet history. Either way, it’s interesting to finally know what it is.