Elixir and Errors: Let it crash

In complex systems, such as those managing millions of concurrent phone calls, some bugs are transient or rare, making them inherently elusive and impractical to trace and resolve. Erlang keeps this in mind, prioritizing robustness to ensure systems can continue operating smoothly despite individual failures. This approach is encapsulated in a philosophy known as “Let it crash.”

Achieving Robustness

Process Isolation: Erlang creates lightweight, independent processes, ensuring that the failure of one does not impact others.
Supervision Trees: A key component of the OTP framework, this hierarchical structure actively supervises, monitors, and manages the lifecycle of child processes. In the event of a failure, supervisors have the capability to restart an individual process or an entire group of processes.

“Let it crash” handles errors you cannot anticipate, which are dropped to the supervisors.

Local vs. Non-recoverable Errors

We don’t “Let everything Fail.”

Local Errors: Errors that can be anticipated and managed within the context where they arise. For example, because user input is untrustworthy, we implement immediate feedback and correction; we don’t let an invalid email fall down to the supervisor for a restart.
Non-recoverable Errors: Errors that are not resolvable within the immediate context, such as corrupted application state or failures in external services. These may require the supervisor to restart components to maintain system integrity.

Trust and Responsibility

Erlang is dynamically typed, making it possible to submit incorrect input types to a function. Instead of explicitly testing the input, Erlang relies on pattern matching, where incorrect input simply falls through. The function itself does not verify input validity, operating on the trust that developers will adhere to specifications. When this trust is misplaced, we “Let it crash.”

Avoiding Hidden Control Flows

Hidden control flows often arise from exception handling, which disrupts the flow of execution. Functional languages address this issue with the Option or Either monads. An Option returns success or nothing, while Either explicitly returns a result or an error.

This keeps errors contained within their contexts and not lost in the application layers. Monads require outcomes to be explicitly managed, making code more predictable and easier to debug.

Elixir and Control Flow

Elixir, operating without built-in types for monads or strict type enforcement, has developed a system of conventions for managing control flow and errors.

Exception Handling with Bang Functions: By convention, functions that employ exception handling are suffixed with a bang (!).
Explicit Tuple Returns: This method returns tuples to signal success or error, by convention as {:ok, result} or {:error, reason}. It mimics the Either monad to handle divergent paths.

As a rule, I avoid hidden control flows. Utilizing naming conventions to denote hidden flows (e.g., the bang suffix) can be risky as it relies on discipline and awareness, both likely to fail. Instead, I use the explicit tuple returns, which compels the calling code to handle divergent paths through pattern matching.

Updating Hidden Control Flow

Consider the following function:

@spec update_player!(Player.t(), map()) :: Player.t()
def update_player!(%Player{} = player, attrs) do
  current_player = Repo.get!(Player, player.id)
  current_player
  |> Player.update_changeset(attrs)
  |> Repo.update!()
end

This returns an updated player or raises an error, disrupting the flow of execution and relying on the caller to handle exceptions.

Instead, use the tuple pattern:

@spec update_player(Player.t(), map()) ::
          {:ok, Player.t()} | {:error, :not_found | Ecto.Changeset.t()}
def update_player(%Player{} = player, attrs) do
  case Repo.get(Player, player.id) do
    nil ->
      {:error, :not_found}

    current_player ->
      changeset = Player.update_changeset(current_player, attrs)
      Repo.update(changeset)
  end
end

Each logic branch now returns a tuple, either {:ok, player} for a successful update or {:error, reason} for a failure, ensuring errors are neither ignored nor unhandled. From a consumer’s perspective, handling {:ok, player} and {:error, changeset} is straightforward. However, encountering {:error, :unknown} might indicate potential corruption in the process’s state, suggesting it is appropriate to “Let it crash,” trusting that the supervisors will restart the process to a known good state.