Oracle Coherence – description of federation member states

States of Oracle Coherence federation are not very well described in the documentation. So I thought it would be good to ask at the source. Thanks for Patrick F for all these explanations!

Stopped states – Federation is not federating data to the destination, nor is it keeping a backlog of changes to send to the destination.  A “start” operation must be performed to start federating data again:
  • STOPPED – A stop operation was issued, or federation was set to start in the stopped state. Stopped states – Federation is not federating data to the destination, nor is it keeping a backlog of changes to send to the destination.  A “start” operation must be performed to start federating data again:
  • ERROR – An error occurred from which federation was unable to continue federating data
Paused state – Federation is not federating data, but is keeping a (growing) backlog of changes to be sent once a “start” operation is issued:
  • PAUSED – A pause operation was issued, or federation was set to start in the paused state.
Normal states – Federation is federating data:
  • INITIAL – default startup state.  A Coherence node will stay in this state until there is data to be federated
  • IDLE – federation is active and connected to the destination.  There is no data currently to send
  • READY – federation is transitioning out of CONNECTING, or YIELDING, or BACKLOG_NORMAL and will go to SENDING
  • SENDING – federation has data to send
  • CONNECTING – federation is connecting to the destination cluster
  • CONNECT_WAIT – federation is disconnected and will make a new connect attempt.  There may be a delay before making the next attempt depending on the circumstances under which federation was disconnected
  • YIELDING – federation has data to send, but is pausing briefly, likely due to a BACKLOG_EXCESSIVE event
  • DISCONNECTED – federation was disconnected from the remote destination.  If there is no data to send, federation may stay in this state. NB: DISCONNECTED is a “normal” state.  It means that a member of the cluster lost its federation connection to the remote cluster (probably because the remote member it was connected to was shut down).  If the member in the DISCONNECTED state has no data to send, it will stay in the DISCONNECTED state – basically the same as being IDLE.  Once there is some data for this member to send it will issue a new connection.
The following states do not exist – although they may have in 12.2.1.0.0 (federation should not be used with this version):
  • BACKLOG_EXCESSIVE 
  • BACKLOG_NORMAL

Shrinking VirtualBox VMs

This article is note to self, as I just gather information from other sites:

Proxmox: enp0s31f6: Detected Hardware Unit Hang

Since a few weeks, my Proxmox lab has started having issues with on-board network. The adapter enters in “hang” state, the logs are full of recurring errors like below:

[89276.274556] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
[89276.306147] vmbr0: port 1(enp0s31f6) entered disabled state
[89280.269563] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[89280.269626] vmbr0: port 1(enp0s31f6) entered blocking state
[89280.269631] vmbr0: port 1(enp0s31f6) entered forwarding state
[89282.226613] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                 TDH                  <0>
                 TDT                  <1>
                 next_to_use          <1>
                 next_to_clean        <0>
               buffer_info[next_to_clean]:
                 time_stamp           <10153702e>
                 next_to_watch        <0>
                 jiffies              <101537150>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <c8db>
               PHY 1000BASE-T Status  <a39b>
               PHY Extended Status    <ffff>
               PCI Status             <10>

Initially I was suspecting a hardware issue but after replacing the motherboard, the problem still persisted.

Next, I’ve found this thread in Proxmox forum. Workaround which is suggested there seems to be working, below how I’ve implemented it:

One time fix

root@wieloryb-pve:/etc/rc.d/init.d# /sbin/ethtool -K enp0s31f6 tx off rx off
Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported
Actual changes:
rx-checksumming: off
tx-checksumming: off
    tx-checksum-ip-generic: off
tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp6-segmentation: off [requested on]

Preserve the change across reboots

root@wieloryb-pve:~# cat /etc/network/if-up.d/ethtool2
#!/bin/sh

/sbin/ethtool -K enp0s31f6 tx off rx off

root@wieloryb-pve:~# chmod 755 /etc/network/if-up.d/ethtool2

Reboot and verify

root@wieloryb-pve:/etc#  shutdown -r now

root@wieloryb-pve:/etc/rc.d/init.d# /sbin/ethtool -k enp0s31f6
Features for enp0s31f6:
Cannot get device udp-fragmentation-offload settings: Operation not supported
rx-checksumming: off                   <--------- SHOULD BE OFF, HERE AND A FEW OTHER PLACES
tx-checksumming: off

 

PlantronicsHub – major memory leak?

I’ve noticed that process SpokesUpdateService  is promoted to the top of my Memory list. Here an example which I captured today (just 24hrs after my last reboot) but on the other day I’ve seen 2.4 GB wasted:

It turned out to be part of PlantronicsHub software. I will disable to for a while an see how it will behave.

bash-4.4$ sudo launchctl remove com.PlantronicsUpdateService
bash-4.4$

Note: need to find out how to prevent starting this after every reboot.

References to other guys reporting very same:

  • http://forums.macresource.com/read.php?1,2266926
  • https://arstechnica.com/civis/viewtopic.php?f=19&t=1430655
  • I’ve opened Plantronics Case 06390520: ref:_00D507IyW._50038xj5MQ:ref

 

Update 2018-08-21

This message I got from Plantronics Support:

Dear Jaroslaw,

 

Please update the hub to 3.11.2 this should solve the issue you were experiencing.

Please let me know how it goes.

The case number for your reference is 06390520. If you need any further assistance, please let us know.

Kind regards

Angela

Technical Support

I’ve upgraded to 3.11.2, will see 😀