Kubeflow Artifact Store简介

   日期:2020-11-01     浏览:99    评论:0    
核心提示:artifact作为结果信息展示的工具,主要服务于kubeflow notebook server和kubeflow pipelines,现结合artifact的应用作介绍。Metadatakubeflow artifact store最早称之为metadata store,它的定位是记录和管理kubeflow机器学习工作流中的元数据。想要记录工程中的metadata,你需要使用专用的Metadata SDK,在python中使用pip安装即可:pip install kubeflow-metada

artifact作为结果信息展示的工具,主要服务于kubeflow notebook server和kubeflow pipelines,现结合artifact的应用作介绍。

Metadata

kubeflow artifact store最早称之为metadata store,它的定位是记录和管理kubeflow机器学习工作流中的元数据。
想要记录工程中的metadata,你需要使用专用的Metadata SDK,在python中使用pip安装即可:

pip install kubeflow-metadata

Metadata SDK简介
sdk默认配置配合kubeflow Metadata gRPC service使用,在这里将对sdk的主要api简单介绍。
首先介绍sdk中重要的基本类:
Class Store

class Store(object):
  """Metadata Store that connects to the Metadata gRPC service."""
 
  def __init__(self,
               grpc_host: str = "metadata-grpc-service.kubeflow",
               grpc_port: int = 8080,
               root_certificates: Optional[bytes] = None,
               private_key: Optional[bytes] = None,
               certificate_chain: Optional[bytes] = None):
    """ Args: grpc_host: Required gRPC service host, e.g."metadata-grpc-service.kubeflow". grpc_host: Required gRPC service port. root_certificates: Optional SSL certificate for secure connection. private_key: Optional private_key for secure connection. certificate_chain: Optional certificate_chain for secure connection. The optional parameters are the same as in grpc.ssl_channel_credentials. https://grpc.github.io/grpc/python/grpc.html#grpc.ssl_channel_credentials """

Store类定义了连接到的metadate grpc服务信息,由于当前kubeflow版本提供了全套组件,连接默认服务名称和端口即可。
Class Workspace

class Workspace(object):
  """Groups a set of runs of pipelines, notebooks and their related artifacts and executions. """
  CONTEXT_TYPE_NAME = "kubeflow.org/alpha/workspace"
 
  def __init__(self,
               store: Store = None,
               name: str = None,
               description: Optional[str] = None,
               labels: Optional[Mapping[str, str]] = None,
               reuse_workspace_if_exists: Optional[bool] = True,
               backend_url_prefix: Optional[str] = None):
    """ Args: store: Required store object to connect to MLMD gRPC service. name: Required name for the workspace. description: Optional string for description of the workspace. labels: Optional key/value string pairs to label the workspace. reuse_workspace_if_exists: Optional boolean value to indicate whether a workspace of the same name should be reused. backend_url_prefix: Deprecated. Please use 'store' parameter. Raises: ValueError: If a workspace of the same name already exists and `reuse_workspace_if_exists` is set to False. """

Workspace类定义一个工作空间对象,在该工作空间下可记录pipelines和notebook中的运行工作流信息、参数指标信息等。
配置参数reuse_workspace_if_exists,可以实现workspace的复用。
Class Run

class Run(object):
  """Run captures a run of pipeline or notebooks in a workspace and group executions. """
 
  def __init__(self,
               workspace: Workspace = None,
               name: str = None,
               description: Optional[str] = None):
    """ Args: workspace: Required workspace object to which this run belongs. name: Required name of this run. description: Optional description. """

Run类定义了一个工作空间中的运行对象,能记录在该对象下的多次任务执行。
sdk中描述工作流信息的类:
Class Exection

class Execution(object):
  """Captures a run of pipeline or notebooks in a workspace and group executions. Execution also serves as object for logging artifacts as its input or output. """
  EXECUTION_TYPE_NAME = "kubeflow.org/alpha/execution"
 
  def __init__(self,
               name: str = None,
               workspace: Workspace = None,
               run: Optional[Run] = None,
               description: Optional[str] = None):
    """ Args: name: Required name of this run. workspace: Required workspace object where this execution belongs to. run: Optional run object. description: Optional description. Creates a new execution in a workspace and run. The execution.log_XXX() methods will attach corresponding artifacts as the input or output of this execution. """

exection类可用来记录输入输出artifacts信息
Class DataSet

class DataSet(Artifact):
  """ Dataset captures a data set in a machine learning workflow. Attributes: uri: Required uri of the data set. name: Required name of the data set. workspace: Optional name of the workspace. description: Optional description of the data set. owner: Optional owner of the data set. version: Optional version tagged by the user. query: Optional query string on how this data set being fetched from a data source. labels: Optional string key value pairs for labels. Example: >>> metadata.DataSet(description="an example data", ... name="mytable-dump", ... owner="owner@my-company.org", ... uri="file://path/to/dataset", ... version="v1.0.0", ... query="SELECt * FROM mytable", ... labels={"label1","val1"})) """

dataset类用来记录输入输出数据集信息
Class Model

class Model(Artifact):
  """Captures a machine learning model. Attributes: uri: Required uri of the model artifact, e.g. "gcs://path/to/model.h5". name: Required name of the model. workspace: Optional name of the workspace. description: Optional description of the model. owner: Optional owner of the model. model_type: Optional type of the model. training_framework: Optional framework used to train the model. hyperparameters: Optional map from hyper param name to its value. labels: Optional string key value pairs for labels. kwargs: Optional additional keyword arguments are saved as additional properties of this model. Example: >>> metadata.Model(name="MNIST", ... description="model to recognize handwritten digits", ... owner="someone@kubeflow.org", ... uri="gcs://my-bucket/mnist", ... model_type="neural network", ... training_framework={ ... "name": "tensorflow", ... "version": "v1.0" ... }, ... hyperparameters={ ... "learning_rate": 0.5, ... "layers": [10, 3, 1], ... "early_stop": True ... }, ... version="v0.0.1", ... labels={"mylabel": "l1"})) """

model类用来记录输出模型信息
Class Metrics

class Metrics(Artifact):
  """Captures an evaluation metrics of a model on a data set. Attributes: uri: Required uri of the metrics. name: Required name of the metrics. workspace: Optional name of the workspace. description: Optional description of the metrics. owner: Optional owner of the metrics. data_set_id: Optional id of the data set used for evaluation. model_id: Optional id of a evaluated model. metrics_type: Optional type of the evaluation. values: Optional map from metrics name to its value. labels: Optional string key value pairs for labels. Example: >>> metadata.Metrics( ... name="MNIST-evaluation", ... description= ... "validating the MNIST model to recognize handwritten digits", ... owner="someone@kubeflow.org", ... uri="gcs://my-bucket/mnist-eval.csv", ... data_set_id="123", ... model_id="12345", ... metrics_type=metadata.Metrics.VALIDATION, ... values={"accuracy": 0.95}, ... labels={"mylabel": "l1"})) """

metrics类用来记录模型评估的信息

Artifacts应用

pipelines

notebook servers

artifact store在notebook servers中的使用可以参考工程例子:
http://gitlab.travelsky.com/BasicPlatform_DataProduct/kubeflow-examples/blob/master/artifact-example/notebook-servers/mnist-artifact.py
在artifact store ui界面中则可以查看工作流的环节信息:
如下面图中框住的,分别是同一个工作空间下的数据集、模型、模型预测信息。

以及artifact详细信息

 
打赏
 本文转载自:网络 
所有权利归属于原作者,如文章来源标示错误或侵犯了您的权利请联系微信13520258486
更多>最近资讯中心
更多>最新资讯中心
0相关评论

推荐图文
推荐资讯中心
点击排行
最新信息
新手指南
采购商服务
供应商服务
交易安全
关注我们
手机网站:
新浪微博:
微信关注:

13520258486

周一至周五 9:00-18:00
(其他时间联系在线客服)

24小时在线客服