• aeronmelon@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      2 days ago

      As with most IT troubleshooting,

      Time spent applying the fix: 15 minutes.

      Time spent identifying the problem then discovering where it is in the system: 15 hours.

      • pageflight@piefed.social
        link
        fedilink
        English
        arrow-up
        11
        ·
        2 days ago

        There’s a full postmortem from AWS. One piece that stands out to me:

        due to the large number of droplets, efforts to establish new droplet leases took long enough that the work could not be completed before they timed out. Additional work was queued to reattempt establishing the droplet lease. At this point, DWFM had entered a state of congestive collapse and was unable to make forward progress in recovering droplet leases.

        That is, the load that resulted from the initial failure was not something the system was designed to handle, so it had cascading effects / required manual cleanup.

      • artyom@piefed.social
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 days ago

        “Professional”: Alexa, should I randomly reconfigure the DNS using a Magic 8 Ball?


        Alexa: “Wow what a brilliant idea! You’re so smart!”