Uploaded image for project: 'EJBCA'
  1. EJBCA
  2. ECA-5977

Continue to check connectivity to peers after MariaDB Galera Cluster error

    Details

    • Issue discovered during:
      Customer
    • Sprint:
      EJBCA Sprint 4

      Description

      Summary

      This ticket aims to solve the linked support issue where the CA stops connecting to RA peers after a RuntimeException is thrown in JBoss.

      Ticket description

      The linked support issue is caused by a combination of two things:

      1. 1. An unstable network connection
      2. 2. A bug in the peer connections module

      There is a message like this from the ICA2 JBoss log file:
      18:02:16,306 ERROR [org.ejbca.core.ejb.config.HealthCheckSessionBean] (ajp--0.0.0.0-8009-1) Error creating connection to database.: javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: WSREP has not yet prepared node for application use

      This exception is later handled by the PeerRaMasterServiceBean:
      18:02:19,424 INFO [org.ejbca.peerconnector.ra.PeerRaMasterServiceBean] (EJB default - 1) Failure during Peer RA connectivity check: org.hibernate.exception.JDBCConnectionException: WSREP has not yet prepared node for application use

      However, as a side-effect EJBCA stops checking connectivity to peers, and at this point the connection between the RA and the CA is permanently broken, causing "blue screen with no buttons" as described.

      org.hibernate.exception.JDBCConnectionException is thrown when MariaDB Galera cluster is suspected to be split, and the node is in the smaller part, perhaps due to a previous network glitch where nodes temporarily lose each other.
      See https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/

      Solution

      1. Continue to look for peers even after a RuntimeException has been thrown
      2. Display a notification message in the RA web if the RA master API reports returns "no access to anything" (which is caused either by a certificate with no access or a broken connection between the CA and the RA).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bastianf Bastian Fredriksson
              Reporter:
              samuel Samuel Lidén Borell
              Verified by:
              Johan Eklund
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 week, 3 days, 2 hours, 30 minutes
                  1w 3d 2h 30m