查看源码 健壮性
在 一个更大的例子 中的 messenger 示例存在几个问题。 例如,如果用户登录的节点在没有注销的情况下关闭,则该用户仍保留在服务器的 User_List
中,但客户端会消失。 这使得用户无法再次登录,因为服务器认为该用户已登录。
或者,如果服务器在发送消息的过程中宕机,导致发送客户端永远挂起在 await_result
函数中,会发生什么?
超时
在改进 messenger 程序之前,让我们先看看一些通用原则,以 ping pong 程序为例。回想一下,当“ping”完成时,它会向“pong”发送原子 finished
消息,告知“pong”它已完成,以便“pong”也可以完成。让“pong”完成的另一种方法是,如果“pong”在一定时间内没有收到来自 ping 的消息,则使其退出。这可以通过在 pong
中添加超时来实现,如下例所示
-module(tut19).
-export([start_ping/1, start_pong/0, ping/2, pong/0]).
ping(0, Pong_Node) ->
io:format("ping finished~n", []);
ping(N, Pong_Node) ->
{pong, Pong_Node} ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping(N - 1, Pong_Node).
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
after 5000 ->
io:format("Pong timed out~n", [])
end.
start_pong() ->
register(pong, spawn(tut19, pong, [])).
start_ping(Pong_Node) ->
spawn(tut19, ping, [3, Pong_Node]).
编译此代码并将文件 tut19.beam
复制到必要的目录后,在 (pong@kosken) 上会看到以下内容
(pong@kosken)1> tut19:start_pong().
true
Pong received ping
Pong received ping
Pong received ping
Pong timed out
在 (ping@gollum) 上会看到以下内容
(ping@gollum)1> tut19:start_ping(pong@kosken).
<0.36.0>
Ping received pong
Ping received pong
Ping received pong
ping finished
超时设置在
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
after 5000 ->
io:format("Pong timed out~n", [])
end.
超时 (after 5000
) 在进入 receive
时启动。 如果收到 {ping,Ping_PID}
,则超时取消。 如果没有收到 {ping,Ping_PID}
,则在 5000 毫秒后执行超时后的操作。after
必须是 receive
中的最后一个,也就是说,它前面必须是 receive
中所有其他消息接收规范。也可以调用返回整数的函数作为超时时间。
after pong_timeout() ->
一般来说,在监督分布式 Erlang 系统的各个部分时,有比使用超时更好的方法。超时通常适用于监督外部事件,例如,如果您期望在指定时间内收到来自某个外部系统的消息。例如,如果用户在十分钟内没有访问 messenger 系统,则可以使用超时将用户从系统中注销。
错误处理
在深入探讨 Erlang 系统中的监督和错误处理细节之前,让我们先看看 Erlang 进程如何终止,或者用 Erlang 术语来说,如何退出。
执行 exit(normal)
或只是无事可做的进程具有正常退出。
遇到运行时错误的进程(例如,除以零、错误匹配、尝试调用不存在的函数等)将以错误退出,也就是说,具有异常退出。执行 exit(Reason) 的进程,其中 Reason
是任何 Erlang 项,除了原子 normal
之外,也具有异常退出。
Erlang 进程可以与其他 Erlang 进程建立链接。 如果进程调用 link(Other_Pid),它会在自身和名为 Other_Pid
的进程之间建立双向链接。 当进程终止时,它会向所有与其有链接的进程发送一个称为信号的东西。
信号携带有关发送信号的 pid 和退出原因的信息。
接收到正常退出的进程的默认行为是忽略该信号。
在上述其他两种情况(即,异常退出)下的默认行为是
- 绕过所有发送给接收进程的消息。
- 杀死接收进程。
- 将相同的错误信号传播到被杀死进程的链接。
通过这种方式,您可以使用链接将事务中的所有进程连接在一起。 如果其中一个进程异常退出,则事务中的所有进程都将被杀死。 由于通常需要创建一个进程并同时链接到它,因此有一个特殊的 BIF,spawn_link,它的作用与 spawn
相同,但也会创建与所生成进程的链接。
现在,来看一个使用链接终止“pong”的 ping pong 示例
-module(tut20).
-export([start/1, ping/2, pong/0]).
ping(N, Pong_Pid) ->
link(Pong_Pid),
ping1(N, Pong_Pid).
ping1(0, _) ->
exit(ping);
ping1(N, Pong_Pid) ->
Pong_Pid ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping1(N - 1, Pong_Pid).
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
end.
start(Ping_Node) ->
PongPID = spawn(tut20, pong, []),
spawn(Ping_Node, tut20, ping, [3, PongPID]).
(s1@bill)3> tut20:start(s2@kosken).
Pong received ping
<3820.41.0>
Ping received pong
Pong received ping
Ping received pong
Pong received ping
Ping received pong
这是对 ping pong 程序的稍微修改,其中两个进程都从同一个 start/1
函数生成,并且“ping”进程可以在单独的节点上生成。 请注意 link
BIF 的使用。“Ping”在完成时调用 exit(ping)
,这会导致向“pong”发送退出信号,后者也会终止。
可以修改进程的默认行为,使其在接收到异常退出信号时不会被杀死。 相反,所有信号都将转换为格式为 {'EXIT',FromPID,Reason}
的普通消息,并添加到接收进程的消息队列的末尾。 此行为由以下设置
process_flag(trap_exit, true)
还有几个其他进程标志,请参阅 erlang(3)。 以这种方式更改进程的默认行为通常不在标准用户程序中完成,而是留给 OTP 中的监管程序。 但是,ping pong 程序已修改以说明退出捕获。
-module(tut21).
-export([start/1, ping/2, pong/0]).
ping(N, Pong_Pid) ->
link(Pong_Pid),
ping1(N, Pong_Pid).
ping1(0, _) ->
exit(ping);
ping1(N, Pong_Pid) ->
Pong_Pid ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping1(N - 1, Pong_Pid).
pong() ->
process_flag(trap_exit, true),
pong1().
pong1() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong1();
{'EXIT', From, Reason} ->
io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}])
end.
start(Ping_Node) ->
PongPID = spawn(tut21, pong, []),
spawn(Ping_Node, tut21, ping, [3, PongPID]).
(s1@bill)1> tut21:start(s2@gollum).
<3820.39.0>
Pong received ping
Ping received pong
Pong received ping
Ping received pong
Pong received ping
Ping received pong
pong exiting, got {'EXIT',<3820.39.0>,ping}
添加了健壮性的大型示例
让我们回到 messenger 程序,并添加更改使其更加健壮
%%% Message passing utility.
%%% User interface:
%%% login(Name)
%%% One user at a time can log in from each Erlang node in the
%%% system messenger: and choose a suitable Name. If the Name
%%% is already logged in at another node or if someone else is
%%% already logged in at the same node, login will be rejected
%%% with a suitable error message.
%%% logoff()
%%% Logs off anybody at that node
%%% message(ToName, Message)
%%% sends Message to ToName. Error messages if the user of this
%%% function is not logged on or if ToName is not logged on at
%%% any node.
%%%
%%% One node in the network of Erlang nodes runs a server which maintains
%%% data about the logged on users. The server is registered as "messenger"
%%% Each node where there is a user logged on runs a client process registered
%%% as "mess_client"
%%%
%%% Protocol between the client processes and the server
%%% ----------------------------------------------------
%%%
%%% To server: {ClientPid, logon, UserName}
%%% Reply {messenger, stop, user_exists_at_other_node} stops the client
%%% Reply {messenger, logged_on} logon was successful
%%%
%%% When the client terminates for some reason
%%% To server: {'EXIT', ClientPid, Reason}
%%%
%%% To server: {ClientPid, message_to, ToName, Message} send a message
%%% Reply: {messenger, stop, you_are_not_logged_on} stops the client
%%% Reply: {messenger, receiver_not_found} no user with this name logged on
%%% Reply: {messenger, sent} Message has been sent (but no guarantee)
%%%
%%% To client: {message_from, Name, Message},
%%%
%%% Protocol between the "commands" and the client
%%% ----------------------------------------------
%%%
%%% Started: messenger:client(Server_Node, Name)
%%% To client: logoff
%%% To client: {message_to, ToName, Message}
%%%
%%% Configuration: change the server_node() function to return the
%%% name of the node where the messenger server runs
-module(messenger).
-export([start_server/0, server/0,
logon/1, logoff/0, message/2, client/2]).
%%% Change the function below to return the name of the node where the
%%% messenger server runs
server_node() ->
messenger@super.
%%% This is the server process for the "messenger"
%%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...]
server() ->
process_flag(trap_exit, true),
server([]).
server(User_List) ->
receive
{From, logon, Name} ->
New_User_List = server_logon(From, Name, User_List),
server(New_User_List);
{'EXIT', From, _} ->
New_User_List = server_logoff(From, User_List),
server(New_User_List);
{From, message_to, To, Message} ->
server_transfer(From, To, Message, User_List),
io:format("list is now: ~p~n", [User_List]),
server(User_List)
end.
%%% Start the server
start_server() ->
register(messenger, spawn(messenger, server, [])).
%%% Server adds a new user to the user list
server_logon(From, Name, User_List) ->
%% check if logged on anywhere else
case lists:keymember(Name, 2, User_List) of
true ->
From ! {messenger, stop, user_exists_at_other_node}, %reject logon
User_List;
false ->
From ! {messenger, logged_on},
link(From),
[{From, Name} | User_List] %add user to the list
end.
%%% Server deletes a user from the user list
server_logoff(From, User_List) ->
lists:keydelete(From, 1, User_List).
%%% Server transfers a message between user
server_transfer(From, To, Message, User_List) ->
%% check that the user is logged on and who he is
case lists:keysearch(From, 1, User_List) of
false ->
From ! {messenger, stop, you_are_not_logged_on};
{value, {_, Name}} ->
server_transfer(From, Name, To, Message, User_List)
end.
%%% If the user exists, send the message
server_transfer(From, Name, To, Message, User_List) ->
%% Find the receiver and send the message
case lists:keysearch(To, 2, User_List) of
false ->
From ! {messenger, receiver_not_found};
{value, {ToPid, To}} ->
ToPid ! {message_from, Name, Message},
From ! {messenger, sent}
end.
%%% User Commands
logon(Name) ->
case whereis(mess_client) of
undefined ->
register(mess_client,
spawn(messenger, client, [server_node(), Name]));
_ -> already_logged_on
end.
logoff() ->
mess_client ! logoff.
message(ToName, Message) ->
case whereis(mess_client) of % Test if the client is running
undefined ->
not_logged_on;
_ -> mess_client ! {message_to, ToName, Message},
ok
end.
%%% The client process which runs on each user node
client(Server_Node, Name) ->
{messenger, Server_Node} ! {self(), logon, Name},
await_result(),
client(Server_Node).
client(Server_Node) ->
receive
logoff ->
exit(normal);
{message_to, ToName, Message} ->
{messenger, Server_Node} ! {self(), message_to, ToName, Message},
await_result();
{message_from, FromName, Message} ->
io:format("Message from ~p: ~p~n", [FromName, Message])
end,
client(Server_Node).
%%% wait for a response from the server
await_result() ->
receive
{messenger, stop, Why} -> % Stop the client
io:format("~p~n", [Why]),
exit(normal);
{messenger, What} -> % Normal response
io:format("~p~n", [What])
after 5000 ->
io:format("No response from server~n", []),
exit(timeout)
end.
添加了以下更改
messenger 服务器捕获退出。 如果它收到退出信号 {'EXIT',From,Reason}
,则意味着客户端进程已终止或由于以下原因之一而无法访问
- 用户已注销(删除“注销”消息)。
- 到客户端的网络连接已断开。
- 客户端进程所在的节点已关闭。
- 客户端进程执行了一些非法操作。
如果收到如上所述的退出信号,则使用 server_logoff
函数从服务器的 User_List
中删除元组 {From,Name}
。如果运行服务器的节点关闭,则会向所有客户端进程发送退出信号(由系统自动生成):{'EXIT',MessengerPID,noconnection}
,导致所有客户端进程终止。
此外,在 await_result
函数中引入了 5 秒的超时。 也就是说,如果服务器在五秒(5000 毫秒)内没有回复,则客户端终止。 这仅在客户端和服务器链接之前的登录序列中需要。
一个有趣的情况是,如果客户端在服务器链接到它之前终止。 这种情况会得到处理,因为链接到不存在的进程会导致自动生成退出信号 {'EXIT',From,noproc}
。 这就像进程在链接操作后立即终止一样。