作者: Björn Gustavsson <bjorn(at)erlang(dot)org>
状态: 最终/24.0 已在 OTP 24 版本中实现
类型: 标准跟踪
创建: 2020-09-14
Erlang 版本: OTP-24.0
发布历史: 2020-10-14, 2020-11-27

EEP 54：提供更多关于错误的信息 #

摘要 #

本 EEP 提出了一种机制，用于报告关于 BIF 引发异常时出错的更人性化的信息。库或应用程序可以使用相同的机制来提供更详细的错误消息。

规范 #

在 OTP 23 和更早的版本中，当对内置函数 (BIF) 的调用失败时，shell 会打印一条简洁的消息

1> element(a,b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)

bad argument 消息告诉我们，对调用的一个或多个参数在某种程度上是不正确的（在本例中，两个参数都具有错误的类型）。

我们提出一种机制，使 shell 能够打印更有帮助的错误消息。以下是使用本 EEP 的参考实现打印消息的方式

1> element(a, b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)
        *** argument 1: not an integer
        *** argument 2: not a tuple

请注意，消息的确切格式和措辞是本 EEP 范围之外的实现细节。此处将指定的是使这些消息成为可能的 API 和约定。

本 EEP 中的提案 #

扩展了调用堆栈回溯 (stacktrace) 格式，以指示该调用存在扩展的错误信息，以及如何提供扩展错误信息的约定。
一个新的 erlang:error/3 BIF，允许库和应用程序在堆栈跟踪中引发带有扩展错误信息的异常。
新的函数 erl_error:format_exception/3 和 erl_error:format_exception/4，允许库和应用程序以与 shell 相同的样式格式化堆栈跟踪。

扩展堆栈跟踪 #

堆栈回溯 (stacktrace) 目前是元组列表。出于本 EEP 的目的，我们只对堆栈跟踪中的第一个条目感兴趣。它的格式为 {Module,Function,Arguments,ExtraInfo}，其中 ExtraInfo 是双元组列表。为了指示存在扩展的错误信息，我们建议在堆栈跟踪中第一个元素的 ExtraInfo 中添加一个 {error_info,ErrorInfoMap} 元组。

映射 ErrorInfoMap 可能包含有关错误的更多信息或有关如何处理错误的提示。

目前，三个可选键具有定义的含义

键 module 的值是一个模块名称，可以调用该模块以提供有关错误的附加信息。默认值是堆栈跟踪条目中的 Module。
键 function 的值是要在提供错误信息的模块中调用的函数名称。默认名称为 format_error。
键 cause 的值（如果存在）提供有关错误的附加信息。

要获取有关错误的更多信息，可以调用由 module 和 function 键的值命名的函数。此后在本文档中，为简洁起见，我们将该函数称为 format_error/2。

format_error/2 的参数是异常原因（对于 BIF 调用，通常是 badarg）和堆栈跟踪。

因此，如果对 element/2 的调用因 badarg 异常而失败，并且堆栈跟踪中的第一个条目是

{erlang,element,[1,no_tuple],[{error_info,ErrorInfoMap}]}

并且假设堆栈跟踪绑定到变量 StackTrace，则以下调用将提供有关错误的附加信息

FormatModule = maps:get(module, ErrorInfoMap, erlang),
FormatFunction = maps:get(function, ErrorInfoMap, format_error),
FormatModule:FormatError(badarg, StackTrace)

format_error/2 函数应返回一个映射。对于每个出错的参数，都应该有一个以参数号作为键的映射元素（即，第一个参数为 1，第二个参数为 2，依此类推），并以 unicode:chardata() 项作为值。

原子 general 和 reason 也可以在映射中返回。general 表示未归因于特定参数的通用错误（例如，当默认设备已死时 io:format(“Hello”) 的 badarg）。reason 将告诉错误漂亮打印机打印返回的字符串而不是错误原因。general 和 reason 指向的值应为 unicode:chardata() 项。

在调用 format_error/2 函数时，您可以修改第一个堆栈跟踪条目的 error_info 映射，以添加一个可选的 pretty_printer 键，其值是 arity 为 1 的匿名函数。format_error/2 实现可以使用任何 Erlang 项调用匿名函数，它必须返回一个 unicode:chardata() 项，其中包含给定项的格式化表示。

例如

Args = [1,no_tuple],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, StackTrace)

可能会返回

#{2 => <<"not a tuple">>}

和

Args = [0, b],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

可能会返回

#{1 => <<"out of range">>, 2 => <<"not a tuple">>}

和

Args = ["Hello"],
StackTrace = [{io, format, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

可能会返回

#{general => "the device has terminated"}

请注意，ErrorInfoMap 项中键 cause 的值（如果存在）仅供 format_error/2 使用。特定错误的实际值可能随时更改。

仅当错误发生在依赖于运行时系统内部状态的 BIF 中（例如 register/2 或 ETS BIF），或者对于具有复杂参数的 BIF（例如 system_flag/2）时，才会通常存在 cause 键，这会导致难以且容易出错地找出哪个参数出错。

以下是 erlang 模块的 format_error/2 一种实现方式

format_error(ExceptionReason, [{erlang, F, As, Info} | _]) ->
    ErrorInfoMap = proplists:get_value(error_info, Info, #{}),
    Cause = maps:get(cause, ErrorInfoMap, none),
    do_format_error(F, As, ExceptionReason, Cause).

do_format_error(_, _, system_limit, _) ->
    %% The explanation for system_limit is clear enough, so we don't
    %% need any detailed explanations for the arguments.
    #{};
do_format_error(F, As, _, Cause) ->
    do_format_error(F, As, Cause).

do_format_error(element, [Index, Tuple], _) ->
    Arg1 = if
               not is_integer(Index) ->
                   <<"not an integer">>;
               Index =< 0; Index > tuple_size(Tuple) ->
                   <<"out of range">>;
               true ->
                   []
           end,
    Arg2 = if
               not is_tuple(Tuple) -> <<"not a tuple">>;
               true -> []
           end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);

do_format_error(list_to_atom, _, _) ->
    #{1 => <<"not a flat list of characters">>};

do_format_error(register, [Name,PidOrPort], Cause) ->
    [Arg1, Arg2] =
    case Cause of
        registered_name ->
            [[],<<"this process or port already has a name">>];
        notalive ->
            [[],<<"the pid does not refer to an existing process">>];
        _ ->
            Errors =
                [if
                     Name =:= undefined -> <<"'undefined' is not a valid name">>;
                     is_atom(Name) -> [];
                     true -> <<"not an atom">>
                 end,
                 if
                     is_pid(PidOrPort) -> [];
                     is_port(PidOrPort) -> [];
                     true -> <<"not a pid or a port">>
                 end],
            case Errors of
                [[],[]] ->
                    [<<"name is in use">>];
                [_,_] ->
                    Errors
            end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);
      .
      .
      .

do_format_error(_, _, _) ->
    #{}.

请注意，对于不同的 BIF，使用不同的策略来确定扩展的错误信息

首先处理 system_limit 异常（无论调用哪个 BIF）。不返回扩展的错误信息，因为 system_limit 的解释足够清楚。
如果 element/2 失败，则 format_error/2 函数仅检查 element/2 的参数。
如果 list_to_atom/1 引发了 badarg 异常，则只有一个可能的错误原因，因此无需检查参数。
如果 register/2 BIF 失败，则与 cause 键对应的值为两个可能的失败原因提供了特定的错误原因。如果原因不是其中之一，则 format_error/2 将基于参数找出其他原因。

使用 `erlang:error/3` 提供扩展的错误信息 #

库或应用程序可以通过调用 erlang:error(Reason, Arguments, Options) 来引发带有扩展错误信息的错误异常。Reason 应该是错误原因（例如 badarg），Arguments 应该是调用函数的参数，而 Options 应该是 [{error_info,ErrorInfoMap}]。

erlang:error/3 的调用者应提供一个 format_error/2 函数（如果 ErrorInfoMap 具有 function 键，则不一定使用该名称），该函数的行为与上一节所述相同。

格式化堆栈跟踪 #

为了使应用程序和库能够以与 shell 相同的样式格式化堆栈跟踪，提供了函数 erl_error:format_exception/3 和 erl_error:format_exception/4。以下是如何使用 erl_error:format_exception/3 的示例

try
    .
    .
    .
catch
    C:R:Stk ->
        Message = erl_error:format_exception(C, R, Stk),
        io:format(LogFile, "~ts\n", [Message])
end.

erl_error:format_exception/4 函数类似，但具有第四个选项参数，以支持自定义消息。有关详细信息，请参阅参考实现中的文档。

未来可能的扩展 #

由于堆栈跟踪中的 error_info 元组包含映射，因此可以在本 EEP 的未来扩展中将更多数据添加到映射中。

同样，由于 format_error/2 的返回值是映射，因此映射中的其他键可以在将来被赋予含义。

例如，键 hint 的值可能是一条更长的消息，提供更多上下文或提供有关如何调查或避免错误的具体建议。

其他示例 #

让我们看一些使用 ETS 的示例

1> T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5168>
2> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 2: not a key that exists in the table

请注意，当评估在 shell 中输入的表达式时发生错误时，评估器进程将终止，并且该进程创建的任何 ETS 表都将被删除。因此，使用相同参数第二次调用 update_counter 会导致不同的消息

3> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 1: the table identifier does not refer to an existing ETS table

重新开始，创建一个新的 ETS 表

4> f(T), T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5205>
5> ets:insert(T, {k,a,0}).
true
6> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,1)
        *** argument 3: the value in the given position in the object is not an integer
7> ets:update_counter(T, k, bad).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,bad)
        *** argument 1: the table identifier does not refer to an existing ETS table
        *** argument 3: not a valid update operation

动机 #

当对 BIF 的调用因 badarg 原因而失败时，即使对于经验丰富的开发人员来说，也并非总是很清楚哪个参数“不好”以及以哪种方式。对于新手来说，不得不弄清楚 badarg 的含义是掌握新语言的另一个绊脚石。

即使对于经验丰富的开发人员来说，弄清楚某些 BIF 的 badarg 异常的原因也很困难或不可能。例如，在编写时，ets:update_counter/4 的文档列出了 ets:update_counter/4 将失败的 8 种情况。这个数字太低了。列表中缺少的原因包括，例如，ETS 表已被删除或访问权限不足。

添加 general 返回键是为了允许提供有关 io:format(“hello”) 中默认 I/O 设备的信息。它还允许第三方 error_report 实现（例如 Elixir）在它们可以返回的内容中具有更大的自由度。

添加 reason 返回键是为了让第三方 error_report 实现（例如 Elixir）影响打印的内容以描述实际错误。

基本原理 #

为什么不将 `badarg` 更改为更具信息性的内容？#

提供更多关于错误的信息的另一种方法是引入其他异常原因。例如，调用

element(a, b)

可能会引发异常

{badarg,[{1,not_integer},{2,not_tuple}]}

该更改可能会破坏那些期望 BIF 引发 badarg 异常的代码。现有代码匹配堆栈跟踪中的第四个条目的可能性较低。

相关的原因是修订所有内置函数的错误处理代码所需的工作量。在 C 中实现 Erlang 项的构建是繁琐且容易出错的。总是存在这样的风险：该代码中的错误会在发生错误时导致运行时系统崩溃。测试套件必须非常彻底，以确保找到所有错误，因为错误处理代码通常不经常执行。

为什么堆栈跟踪不能包含完整的错误原因？ #

我们确实考虑过修改所有 BIF 的实现，以便它们在失败时在堆栈跟踪中生成完整的错误信息。但是，如前所述，在 C 中构建 Erlang 项是繁琐且容易出错的。

通过我们采取的方法，让 Erlang 代码完成大部分的错误原因分析，错误处理导致应用程序或运行时系统崩溃的风险要低得多。

为什么 map 键名为 `cause` 而不是 `reason`？ #

为了避免与异常原因混淆。

为什么 `ErrorInfoMap` 中 `cause` 的值没有文档说明？ #

ErrorInfoMap 中的原因并非旨在用于以编程方式找出错误发生的原因，而仅供 Module:format_error/2 用于生成人类可读的消息。

此外，对于许多 BIF，cause 键将不存在，因为 Module:format/4 函数将仅根据 BIF 的名称及其参数生成消息。

`module` 键有什么用？ #

在 OTP 中，所有 format_error/2 函数都将位于与实现模块分离的模块中，以便在存储空间受限的系统中减少 OTP 的大小。使用 module 键可以避免在实现模块中拥有重定向的 format_error/2 函数的需要。
库或应用程序可能希望有一个单独的模块为多个模块实现 format_error/2。例如，在 OTP 中，我们可能有模块 erl_stdlib_errors 为模块 binary、ets、lists、maps、string 和 unicode 实现 format_error/2。

`function` 键有什么用？ #

一个模块可能已经有一个名为 format_error/2 的函数。
将来，我们可能希望扩展编译器以生成其自己的 format_error/2 错误函数，以提供有关 badmatch 或 function_clause 错误的更多信息。

向后兼容性 #

来自 BIF 的所有异常现在都将在调用堆栈回溯（堆栈跟踪）中包含一个 ExtraInfo 元素（在 OTP 23 的文档中称为 Location），其中包括一个 error_info 元组。在之前的版本中，失败的 BIF 调用的 ExtraInfo 元素将是一个空列表。

显式对堆栈跟踪进行匹配并假设 ExtraInfo 元素布局（例如，假设 Location 要么是一个空列表，要么是以特定顺序排列的 file 和 line 元组的列表）的应用程序可能需要进行修改。请注意，这种假设从未安全过，并且错误处理文档强烈建议开发人员不要依赖堆栈跟踪条目来进行调试以外的目的。

实现 #

参考实现包括 erlang 和 ets 模块中用 C 实现的大多数 BIF 的扩展错误信息。它可以在 PR #2849 中找到。

版权 #

本文档已放入公共领域。

EEP 54：提供更多关于错误的信息 #

摘要 #

规范 #

本 EEP 中的提案 #

扩展堆栈跟踪 #

使用 erlang:error/3 提供扩展的错误信息 #

格式化堆栈跟踪 #

未来可能的扩展 #

其他示例 #

动机 #

基本原理 #

为什么不将 badarg 更改为更具信息性的内容？#

为什么堆栈跟踪不能包含完整的错误原因？ #

为什么 map 键名为 cause 而不是 reason？ #

为什么 ErrorInfoMap 中 cause 的值没有文档说明？ #

module 键有什么用？ #

function 键有什么用？ #

向后兼容性 #

实现 #

版权 #

使用 `erlang:error/3` 提供扩展的错误信息 #

为什么不将 `badarg` 更改为更具信息性的内容？#

为什么 map 键名为 `cause` 而不是 `reason`？ #

为什么 `ErrorInfoMap` 中 `cause` 的值没有文档说明？ #

`module` 键有什么用？ #

`function` 键有什么用？ #