Uploaded image for project: 'EJBCA'
  1. EJBCA
  2. ECA-8170

Improve reliability of service workers in a cluster

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: EJBCA 7.0.1.4, EJBCA 7.2.0
    • Component/s: None
    • Labels:
      None
    • Provenance:
      Ordered by Customer
    • Sprint:
      EJBCA Team Alice - 2019 w18

      Description

      We've encountered the following situation in the wild:

      • A cluster of three nodes is up and running
      • One of the nodes loses connection with the HSM
      • The admin team does not remove that node from the cluster
      • When the services are triggered, the faulty node grabs the job and fails.
      • Starting the job resets the timer, meaning that the other nodes will skip running the service
      • Due to all three nodes running on the same interval, the faulty node will keep grabbing the service, blocking the others

      To fix this, I propose we add a sanity check to each service worker type - we can't stop the faulty node from resetting the timer, but we can set the timer on that node to skip an interval. This should cause one of the other nodes to grab the service the next time the interval hits. 

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mikek Mike Agrenius Kushner
              Reporter:
              mikek Mike Agrenius Kushner
              Verified by:
              Henrik Sunmark
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2 days
                  2d
                  Remaining:
                  Remaining Estimate - 2 days
                  2d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified