【Ray】ray.remote和option
创始人
2024-02-10 04:36:12
0

https://docs.ray.io/en/latest/ray-core/package-ref.html?highlight=ray.remote#ray-remote

1 ray.remote

定义义远程函数或 actor 类。
remote 支持重启、分配资源等功能。

用法1:装饰器的方式

用作装饰器,来修饰函数或者类。
比如:

>>> import ray
>>>
>>> @ray.remote
... def f(a, b, c):
...     return a + b + c
>>>
>>> object_ref = f.remote(1, 2, 3)
>>> result = ray.get(object_ref)
>>> assert result == (1 + 2 + 3)
>>>
>>> @ray.remote
... class Foo:
...     def __init__(self, arg):
...         self.x = arg
...
...     def method(self, a):
...         return self.x + a
>>>
>>> actor_handle = Foo.remote(123)
>>> object_ref = actor_handle.method.remote(321)
>>> result = ray.get(object_ref)
>>> assert result == (123 + 321)

用法2:作为函数使用

使用函数调用来创建远程函数或actor。

>>> def g(a, b, c):
...     return a + b + c
>>>
>>> remote_g = ray.remote(g)
>>> object_ref = remote_g.remote(1, 2, 3)
>>> assert ray.get(object_ref) == (1 + 2 + 3)>>> class Bar:
...     def __init__(self, arg):
...         self.x = arg
...
...     def method(self, a):
...         return self.x + a
>>>
>>> RemoteBar = ray.remote(Bar)
>>> actor_handle = RemoteBar.remote(123)
>>> object_ref = actor_handle.method.remote(321)
>>> result = ray.get(object_ref)
>>> assert result == (123 + 321)

2 option

用来改变动态修改remote定义的参数。配置并覆盖任务调用参数。 参数与可以传递给 ray.remote 的参数相同。不支持覆盖 max_calls。

>>> @ray.remote(num_gpus=1, max_calls=1, num_returns=2)
... def f():
...     return 1, 2
>>>
>>> f_with_2_gpus = f.options(num_gpus=2) 
>>> object_ref = f_with_2_gpus.remote() 
>>> assert ray.get(object_ref) == (1, 2) >>> @ray.remote(num_cpus=2, resources={"CustomResource": 1})
... class Foo:
...     def method(self):
...         return 1
>>>
>>> Foo_with_no_resources = Foo.options(num_cpus=1, resources=None)
>>> foo_actor = Foo_with_no_resources.remote()
>>> assert ray.get(foo_actor.method.remote()) == 1

3 remote参数

  1. num_returns – This is only for remote functions. It specifies the
    number of object refs returned by the remote function invocation.
    Pass “dynamic” to allow the task to decide how many return values to
    return during execution, and the caller will receive an
    ObjectRef[ObjectRefGenerator] (note, this setting is experimental).

    num_cpus – The quantity of CPU cores to reserve for this task or for
    the lifetime of the actor.

    num_gpus – The quantity of GPUs to reserve for this task or for the
    lifetime of the actor.

    resources (Dict[str, float]) – The quantity of various custom
    resources to reserve for this task or for the lifetime of the actor.
    This is a dictionary mapping strings (resource names) to floats.

    accelerator_type – If specified, requires that the task or actor run
    on a node with the specified type of accelerator. See
    ray.accelerators for accelerator types.

    memory – The heap memory request for this task/actor.

    max_calls – Only for remote functions. This specifies the maximum
    number of times that a given worker can execute the given remote
    function before it must exit (this can be used to address memory
    leaks in third-party libraries or to reclaim resources that cannot
    easily be released, e.g., GPU memory that was acquired by
    TensorFlow). By default this is infinite.

    max_restarts – Only for actors. This specifies the maximum number of
    times that the actor should be restarted when it dies unexpectedly.
    The minimum valid value is 0 (default), which indicates that the
    actor doesn’t need to be restarted. A value of -1 indicates that an
    actor should be restarted indefinitely.

    max_task_retries – Only for actors. How many times to retry an actor
    task if the task fails due to a system error, e.g., the actor has
    died. If set to -1, the system will retry the failed task until the
    task succeeds, or the actor has reached its max_restarts limit. If
    set to n > 0, the system will retry the failed task up to n times,
    after which the task will throw a RayActorError exception upon
    ray.get. Note that Python exceptions are not considered system
    errors and will not trigger retries.

    max_retries – Only for remote functions. This specifies the maximum
    number of times that the remote function should be rerun when the
    worker process executing it crashes unexpectedly. The minimum valid
    value is 0, the default is 4 (default), and a value of -1 indicates
    infinite retries.

    runtime_env (Dict[str, Any]) – Specifies the runtime environment for
    this actor or task and its children. See Runtime environments for
    detailed documentation. This API is in beta and may change before
    becoming stable.

    retry_exceptions – Only for remote functions. This specifies whether
    application-level errors should be retried up to max_retries times.
    This can be a boolean or a list of exceptions that should be
    retried.

    scheduling_strategy – Strategy about how to schedule a remote
    function or actor. Possible values are None: ray will figure out the
    scheduling strategy to use, it will either be the
    PlacementGroupSchedulingStrategy using parent’s placement group if
    parent has one and has placement_group_capture_child_tasks set to
    true, or “DEFAULT”; “DEFAULT”: default hybrid scheduling; “SPREAD”:
    best effort spread scheduling; PlacementGroupSchedulingStrategy:
    placement group based scheduling.

    _metadata – Extended options for Ray libraries. For example, _metadata={“workflows.io/options”: } for Ray workflows.

相关内容

热门资讯

保存时出现了1个错误,导致这篇... 当保存文章时出现错误时,可以通过以下步骤解决问题:查看错误信息:查看错误提示信息可以帮助我们了解具体...
汇川伺服电机位置控制模式参数配... 1. 基本控制参数设置 1)设置位置控制模式   2)绝对值位置线性模...
不能访问光猫的的管理页面 光猫是现代家庭宽带网络的重要组成部分,它可以提供高速稳定的网络连接。但是,有时候我们会遇到不能访问光...
表格中数据未显示 当表格中的数据未显示时,可能是由于以下几个原因导致的:HTML代码问题:检查表格的HTML代码是否正...
本地主机上的图像未显示 问题描述:在本地主机上显示图像时,图像未能正常显示。解决方法:以下是一些可能的解决方法,具体取决于问...
表格列调整大小出现问题 问题描述:表格列调整大小出现问题,无法正常调整列宽。解决方法:检查表格的布局方式是否正确。确保表格使...
不一致的条件格式 要解决不一致的条件格式问题,可以按照以下步骤进行:确定条件格式的规则:首先,需要明确条件格式的规则是...
Android|无法访问或保存... 这个问题可能是由于权限设置不正确导致的。您需要在应用程序清单文件中添加以下代码来请求适当的权限:此外...
【NI Multisim 14...   目录 序言 一、工具栏 🍊1.“标准”工具栏 🍊 2.视图工具...
银河麒麟V10SP1高级服务器... 银河麒麟高级服务器操作系统简介: 银河麒麟高级服务器操作系统V10是针对企业级关键业务...