JGrups on Linux With Multiple Network Interfaces

Few hints for those, who want to run JBoss Cache in clustered mode ([a]synchronous replication/invalidation) on Linux machine with multiple network interface cards (NICs) and using UDP/multicasting in OSI layers 4/3 . Here is situation overview:

We need multicast communication through eth1 to other JBoss Cache nodes.

JBoss Cache uses JGroups for network communication. And I’m going to focus on configuration of this service (Cache configuration is very easy). First of all, for device with multiple NICs, you need to specify a bind address in configuration file (of course if use xml file to configure services) in section ClusterConfig/UDP. AFAIK this can’t be 0.0.0.0. Probably you can use any address from your NICs, but I would suggest using the one from eth1 (this will be logical but it won’t change process of packet generation and sending). Without this parameter service won’t start throwing an exception.

Second thing is to specify outgoing NIC. On Linux platform process run by non-root user can’t specify source address and outgoing interface for it’s packets. The only way to control this is through routing table. Without special entry in routing table for your cache’s multicast address, all packets will go outside of machine using DEFAULT route, which in case from above picture will mean eth0. This is not what we want to archive. You can correct it like this:

$ ip r a 228.1.2.3/24 dev eth0

The last thing is packet fragmentation issue. Maybe only I had this problem, but I spend a lot of time trying to solve it. Every replication of bigger amount of data resulted in timeout exception. To get things working I had to set timeout to 120 seconds! Of course that wasn’t right solution. Finally I’ve changed max_xmit_size and frag_size parameters to MTU minus approximate size of all headers. This is about 1400 B.

If you still have problems then check your Ethernet switch connected to eth1. For example if it has IGMP SPAN mode enabled then check if it works correctly. I had problems with it on 3Com SuperStack 3 Switch 3870 with software version 1.01.