Apache 2.4 Failing to Recognize Virtual Hosts

After upgrading from Apache 2.2 to 2.4 in a Windows development environment, all my virtualhosts stopped working. I could add syntax errors to the files which would make Apache refuse to start up, or get notices about invalid document roots, but the virtual host server names just wouldn’t catch on.

After removing the reference to Include conf/extra/httpd-vhosts.conf things suddenly started working! Weird. The reason seems to be that the default vhost referenced in httpd-vhosts.conf uses _default_ instead of * to reference the virtualhost name. I’ve used * in all my configuration files, and apparently Apache refuses to reference any * references if it hits a _default_ name in the VirtualHost configuration first. That seems weird, so if someone has any more information about what’s causing this, I’m very interested.

My setup now works again, so I’m not going to start digging into the source to find the reason for this just yet. :-)

AH01753: access check of ‘127.0.0.1’ to /xxx/ failed, reason: unable to get the remote host name

This error message can be caused by placing an IP instead of a hostname in a Require host statement in Apache 2.4+. After porting some old access rules to Apache 2.4 I had used Require host 127.0.0.1 instead of the correct Require ip 127.0.0.1. Switched it, and ahoy! It now works.

svn: OPTIONS of ‘‘: SSL handshake failed: SSL error: A TLS warning alert has been received. ()

If your svn client suddenly starts complaining about something similar to

svn: OPTIONS of '...': SSL handshake failed: SSL error: A TLS warning alert has been received. (...)

The reason might be that the host in the URL (https://example.com/ => example.com) doesn’t match the ServerName setting in the SSL host for your web server. You might not have configured this, so for Apache add:

ServerName example.com

.. and restart the server. It might just work again!

nginx and rewriting based on GET-parameter (URL-parameters/arguments)

Update: see the comment below from Alan Orth about how to implement this in a much cleaner way now!

When rewriting URLs in Apache through mod_rewrite, you have the possibility of using RewriteCond to only apply rewrites if the original resource has been called with a particular argument in the URL (such as “/file?oid=..”).

The solution in nginx was however a bit different, but thanks to Rewriting URL-params in nginx I got on the right track from the start.

In nginx this information is available through the $args variable, which will contain the complete query string. In Will’s example above he’ll replace the query string, but I were interested in inserting a specific parameter instead (and include the previous query string, so I couldn’t just do the “set $args ..” that he does in the example).

My first try was to simply use $1 in the rewrite destination, but this didn’t work – as rewrite will reset the captured patterns from the previous regular expression (since the rewrite source also is a regular expression). But by introducing my own, temporary variable I were able to save the value from the matching regular expression (for the GET parameter) and use it in my rewrite destination.

The following example shows how I ended up solving the issue. This will rewrite the URL only if the “oid” parameter is found at the beginning of the query string when the URL is requested, and the location = /oldURL limits the rewrite to requests for the old resource.

location = /oldURL {
    if ($args ~ "^oid=(\d+)") {
        set $key1 $1;
        rewrite ^.*$  /newURL?param1=foo¶m2=bar&key1=$key1 last;
    }
}

This will rewrite a request for /oldURL?oid=123&what=cheese to /newURL?param1=foo&param2=bar&key1=123&oid=123&what=cheese — if you want to exclude the previous arguments, you can either just set $args directly to key1=$1 and just use param1=foo and param2=bar in the rewrite destination:

        set $args key1=$1;
        rewrite ^.*$  /newURL?param1=foo¶m2=bar last;

This might be cleaner, depending on what you’re trying to do.

Solr, Memory Usage and Dynamic Fields

One of the many great things about Solr is that it allows you to add dynamic fields – you can define a certain pattern that a field will have to follow, but it can then use any field name that matches the pattern.

We’ve been using one such dynamic field to add a sort field for our documents:

xxx_Category_Subcategory: 300

This would allow us to sort by this field to get the priority of our documents in this particular category and subcategory. A document would contain somewhere between 1 and 15 such fields. The total selection of unique field names is somewhere around 1200 across all documents.

Be small, be happy

As long as our collection were quite small (<10k documents) this scheme worked great. When our collection grew to around 500k documents, we started seeing out of memory errors quite often. At the worst rate we got an out of memory exception every 30 minutes, and had to restart the Solr server. Performance didn't suffer, but obviously we couldn't continue restarting servers until we got bored. After removing a few other possible issues (such as our stable random sort) I were rather stumped that things didn't improve. The total amount of data in our dynamic fields were rather low, somewhere around 2.5 - 3.5m integers, or possibly somewhere around 50-70MB in total. The JVM should be able to fit everything about these fields in memory and query them for the fields we're trying to find, but a heap dump of the jvm just before it hit the out of memory exception revealed that we were getting quite a few GBs of Lucene's FieldCache objects. These objects cache the value of a field for the total set of documents available in the index, and you're sadly not able to tune this cache through the Solr configuration (at least not for 1.4 as far as I could find).

Less Dynamic Fields, More Manual Labor

After pondering this issue a bit I came to the conclusion that our problem was related to the dynamic fields we had, and the fact that we used them for sorting. Lucene / Solr keeps one set of field caches for each field when it’s used for sorting, to avoid having to do duplicate work later. For us, this meant that each time we sorted a new field, an array had to be created with the size of the total document set. As long as we just had 10k documents, these arrays were small enough that we had enough memory available – when the document set grew to almost 500k documents, not so much.

This means that the total memory required for field caches will be limited by DocumentsInIndex * FieldsSortedBy. As long as our DocumentsInIndex were just 10k, the available memory to the jvm was enough to keep sorting by the number of fields we did. When the number of documents grew, the memory usage grew by the same factor and we got our OutOfMemoryException.

The Solution

Our solution could probably be more elegant, but currently we’ve moved the sorting to our application layer instead of the data provider layer. We’re requesting the complete set of hits from the Solr-server in the category anyway, so we’re able to sort it in the application – and by using a response format other than XML we’re also doing it rather quickly. This means that we’re not using sorting at all, and are only querying against one multivalued field to see if the category key is present there at all.

Note: Other solutions we considered were to divide our index into several Solr cores. This would allow us to keep the number of documents in each core low, and therefor also keep the fieldcache size in check. We know that each category could very well live on just on core as we won’t be mixing it with data from the other cores (and for that we could keep a separate core with all the documents, just not use it for searching across dynamic fields). We dropped this plan because of the rather worrying increase in complexity in our Solr installation. This could however help in your own case. :-)

mod_jk and Internal Server Error (HTTP 500)

We’ve extended our previously single Solr-node to a few nodes in a cluster. This allows us to run queries against one node while updating or configuring another, distributing the load across several servers (although we’re not there yet load wise) and being able to handle any out of memory or other critical errors.

While Solr supports querying several cores or distributing the queries internally, we decided to move the load balancing and handling of failed nodes higher up in the hierarchy. We’re now doing simple load balancing and handling of failed nodes by using mod_jk in our existing Apache-based environment. mod_jk also handles failed servers without any administrator interaction. We were already using mod_jk for our main web frontend, and since we use Tomcat as our application container for Solr, things should be a breeze!

Well, no. After copying our existing mod_jk setup, configuring our new workers and restarting Apache, all I got was the well known 500 INTERNAL SERVER ERROR. Here’s the worker configuration file:

worker.list=loadbalancer,status

worker.solr1.port=8009
worker.solr1.host=10.0.0.4
worker.solr1.type=ajp13
worker.solr1.lbfactor=1
worker.solr1.cachesize=10

worker.solr2.port=8009
worker.solr2.host=10.0.0.5
worker.solr2.type=ajp13
worker.solr2.lbfactor=4
worker.solr2.cachesize=10

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=solr1,solr2
worker.loadbalancer.sticky_session=0

worker.status.type=status

This provides us with two solr servers and one status worker (the status worker is responsible for providing a simple web interface for enabling/disabling/seeing the status of the other workers), configured with a 1:4 load balancing (the second server has quite a bit more memory available for Solr).

I provided the configuration of the workers through the JkWorkersFile configuration setting (in a VirtualHost block… don’t do that):

JkWorkersFile conf/workers.properties

I’d also enable debug logging to attempt to find the problem (still in a VirtualHost block):

JkLogFile logs/mod_jk.log
JkLogLevel debug
JkLogStampFormat "[%a %b %d %H:%M:%S %Y]"

Other mod_jk settings (in the VirtualHost block) were:

JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
JkRequestLogFormat "%w %V %T"
JkShmFile logs/jk.shm
JkMount /* loadbalancer

<Location /jkstatus>
	JkMount status
	Order deny,allow
        Deny from all
        Allow from 127.0.0.1
</Location>

Still no solution. Peeking at the log files mod_jk provided, I were able to deduce the following:

[debug] map_uri_to_worker::jk_uri_worker_map.c (525): Attempting to map context URI '/jkstatus'
[debug] map_uri_to_worker::jk_uri_worker_map.c (550): Found an exact match status -> /jkstatus
[debug] jk_handler::mod_jk.c (1920): Into handler jakarta-servlet worker=status r->proxyreq=0
[debug] wc_get_worker_for_name::jk_worker.c (111): did not find a worker status
[info]  jk_handler::mod_jk.c (2071): Could not find a worker for worker name=status

This indicates that mod_jk was unable to find a worker matching the name I provided in the JkMount statement above; status. Weird. I added some garbage characters to the “JkWorkersFile” setting, and Apache complained that it were unable to find the workers file. Changed it back, reloaded, and still nothing. It was apparently unable to find the worker. The map did however work, as it tried to launch a worker.

Looking back at the start up sequence of mod_jk, the following were found in the log:

[debug] build_worker_map::jk_worker.c (236): creating worker ajp13
[debug] wc_create_worker::jk_worker.c (141): about to create instance ajp13 of ajp13
[debug] wc_create_worker::jk_worker.c (154): about to validate and init ajp13
[debug] ajp_validate::jk_ajp_common.c (1922): worker ajp13 contact is 'localhost:8009'
[debug] ajp_init::jk_ajp_common.c (2047): setting endpoint options:
[debug] ajp_init::jk_ajp_common.c (2050): keepalive:        0
[debug] ajp_init::jk_ajp_common.c (2054): timeout:          -1
[debug] ajp_init::jk_ajp_common.c (2058): buffer size:      0
ajp_init::jk_ajp_common.c (2062): pool timeout:     0
[debug] ajp_init::jk_ajp_common.c (2066): connect timeout:  0
[debug] ajp_init::jk_ajp_common.c (2070): reply timeout:    0
[debug] ajp_init::jk_ajp_common.c (2074): prepost timeout:  0
[debug] ajp_init::jk_ajp_common.c (2078): recovery options: 0
[debug] ajp_init::jk_ajp_common.c (2082): retries:          2
[debug] ajp_init::jk_ajp_common.c (2086): max packet size:  8192
[debug] ajp_create_endpoint_cache::jk_ajp_common.c (1959): setting connection pool size to 1 with min 0

It took a bit of time, but I realized that this tells me that mod_jk created _a default_ worker named ajp13. Apparently it was not reading my worker file at all, but it still complained if I changed the file name. You’d think that the setting which loads the configuration file would work when it complains when it doesn’t. But .. well. After an hour of attempting to find out why the workers didn’t load, revising the workers file to a minimal example, trying with just a single status worker, I concluded that my workers file was correct, and obviously mod_jk found it when it attempted to load it.

Then I suddenly discovered the small notice in the mod_jk configuration manual:

JkWorkersFile: This directive is only allowed once. It must be put into the global part of the configuration.

JkWorkersFile can not be defined in a <VirtualHost> section. It will NOT complain if you do it, it’ll just never define any workers. It will complain if the file doesn’t exist, even if it never tries to actually load it.

Confusing.

Moving the JkWorkersFile statement out from the <VirtualHost> block and to the LoadModule statement instead solved the issue. This is also the case for JkWorkerProperty.

Solr, Tomcat and HTTP/1.1 505 HTTP Version Not Supported

During today’s hacking aboot I came across the above error from our Solr query library. The error indicates that some part of Tomcat was unable to parse the “GET / HTTP/1.1” string – where it is unable to determine the “HTTP/1.1” part. A problem like this could be introduced by having a space in the query string (and it not being escaped properly), so that the request would have been for “GET /?a=b c HTTP/1.1”. After running through both the working and non-working query through ngrep and wireshark, this did however not seem to be the problem. My spaces were properly escaped using plus signs (GET /?a=b+c HTTP/1.1).

There does however seem to be a problem (at least with our version of Tomcat – 6.0.20) which results in the +-s being resolved before the request is handed off to the code that attempts to parse the header, so even though it is properly escaped using “+”, it still barfs.

The solution:

Use %20 to escape spaces instead of + signs; simply adding str_replace(” “, “%20”, ..); in our query layer solved the problem.

Escaping Characters in a Solr Query / Solr URL

We’re using our own Solr library at Derdubor at the moment, but we’ve only been using it for indexing content. The query part was never standardized in our common library as we usually used an alternative output format, but during the last days that has changed. We now have a parser for the default XML outputter and we’re also supporting facets and field queries (or constraints as they’re abstracted as in our library).

This means that we’re feeding content into the query that may contain foreign characters, in particular those who have special meaning in a Solr query. You can find the complete list of characters that need to be escaped in a SOLR or Lucene query in the Lucene manual.

To escape the characters we use this very simple and stupid PHP method:

    static public function escapeSolrValue($string)
    {
        $match = array('\\', '+', '-', '&', '|', '!', '(', ')', '{', '}', '[', ']', '^', '~', '*', '?', ':', '"', ';', ' ');
        $replace = array('\\\\', '\\+', '\\-', '\\&', '\\|', '\\!', '\\(', '\\)', '\\{', '\\}', '\\[', '\\]', '\\^', '\\~', '\\*', '\\?', '\\:', '\\"', '\\;', '\\ ');
        $string = str_replace($match, $replace, $string);

        return $string;
    }

We used a regular expression first, but the sheer amount of backslashes made it a regular .. hell … to read. So to make it easier for the persons maintaining this in the future, we went the easy to read / easy to maintain road for this one.

TCP: drop open request from ..

At some time during friday one of my web servers started to behave rather strangely. When attempting to connect to the web site, the requests would time out almost randomly. About half of them got through, while the other half seemed to time out or being left for dead. Restarting the web server helped, but the problem crept back in within 10 – 15 seconds. This seemed very strange, but digging through the logs of the server and checking the load of the database server didn’t show any apparent problems.

After heading over to check the syslog (/var/log/syslog) I found that the TCP stack was trying to tell me something:

TCP: drop open request from u.x.y.z/vvvv
printk: 228 messages suppressed.

Apparently this is one of the signs of an attempted (D)DoS-attack, when a computer on the other end sends as many TCP open requests as possible to a port on the computer, making the daemon busy with just handling idling connections that never go anywhere.

I realized that this fit the pattern I was seeing quite good: the web server accepted requests as normal after restarting it, before being hit with loads of bogus open requests right after. The requests were never proper HTTP requests, resulting in them not being logged to the normal error or access logs.

There are at least two ways of handling this on the server itself (there’s probably a couple of hundreds more, but the first one worked for me). Simply drop the traffic – or turn on TCP SYN Cookies.

If the attack is from a particular host or subnet, dropping the traffic works fine:

iptables -I INPUT -s u.x.y.z -j DROP

If the attack originates at several different locations, turning on TCP SYN Cookies while the attack is taking place is probably the best idea (as enabling TCP SYN Cookies will disable most high performance TCP options, you’ll want to disable it after the attack has subsided again).

You enable TCP SYN Cookies with:

echo 1 >> /proc/sys/net/ipv4/tcp_syncookies

You can read a bit more about how the tcp_syncookies setting works at securityfocus.

If you’re seeing these problems often I strongly recommend you talk with your hosting provider and ISP to get the problem fixed by Someone Who Knows What They’re Doing. Getting rid of the troublesome requests before they even arrive at your server is also a good idea.

Undefined symbol: php_pdo_declare_long_constant

After installing a new PDO module (PDO_PgSQL) into our compiled-from-the-ground-up version of PHP 5.2.8 (.. since RHEL4 doesn’t really stay updated, but we do), i ran head first into the following issue:


/usr/sbin/httpd: symbol lookup error: /usr/local/lib/php/extensions/no-debug-non-zts-20060613/pdo_pgsql.so: undefined symbol: php_pdo_declare_long_constant

Panic. Then tried updating pdo_mysql which actually still worked, which just led it to have the exact same problem. Luckily a bit of searching at Google pointed me to PDO_MYSQL causing Apache segfault over at the PECL bug tracker. The last comment provided the solution to the problem: a quick rebuild of PHP with –disable-pdo and then enabling pdo from PECL instead (so that PDO and the PDO plugins API actually match, instead of trying to load the wrong version into the process) solved the issue.

Be sure to build PDO from the SAME VERSION as your client libraries. Disable it in the PHP build itself if you need to build it from PECL.