Kubeflow Artifact Store简介-物联网技术文章-傲云油气装备网

artifact作为结果信息展示的工具，主要服务于kubeflow notebook server和kubeflow pipelines，现结合artifact的应用作介绍。

Metadata

kubeflow artifact store最早称之为metadata store，它的定位是记录和管理kubeflow机器学习工作流中的元数据。
想要记录工程中的metadata，你需要使用专用的Metadata SDK，在python中使用pip安装即可:

pip install kubeflow-metadata

Metadata SDK简介
sdk默认配置配合kubeflow Metadata gRPC service使用，在这里将对sdk的主要api简单介绍。
首先介绍sdk中重要的基本类：
Class Store

class Store(object):
  """Metadata Store that connects to the Metadata gRPC service."""
 
  def __init__(self,
               grpc_host: str = "metadata-grpc-service.kubeflow",
               grpc_port: int = 8080,
               root_certificates: Optional[bytes] = None,
               private_key: Optional[bytes] = None,
               certificate_chain: Optional[bytes] = None):
    """ Args: grpc_host: Required gRPC service host, e.g."metadata-grpc-service.kubeflow". grpc_host: Required gRPC service port. root_certificates: Optional SSL certificate for secure connection. private_key: Optional private_key for secure connection. certificate_chain: Optional certificate_chain for secure connection. The optional parameters are the same as in grpc.ssl_channel_credentials. https://grpc.github.io/grpc/python/grpc.html#grpc.ssl_channel_credentials """

Store类定义了连接到的metadate grpc服务信息，由于当前kubeflow版本提供了全套组件，连接默认服务名称和端口即可。
Class Workspace

class Workspace(object):
  """Groups a set of runs of pipelines, notebooks and their related artifacts and executions. """
  CONTEXT_TYPE_NAME = "kubeflow.org/alpha/workspace"
 
  def __init__(self,
               store: Store = None,
               name: str = None,
               description: Optional[str] = None,
               labels: Optional[Mapping[str, str]] = None,
               reuse_workspace_if_exists: Optional[bool] = True,
               backend_url_prefix: Optional[str] = None):
    """ Args: store: Required store object to connect to MLMD gRPC service. name: Required name for the workspace. description: Optional string for description of the workspace. labels: Optional key/value string pairs to label the workspace. reuse_workspace_if_exists: Optional boolean value to indicate whether a workspace of the same name should be reused. backend_url_prefix: Deprecated. Please use 'store' parameter. Raises: ValueError: If a workspace of the same name already exists and `reuse_workspace_if_exists` is set to False. """

Workspace类定义一个工作空间对象，在该工作空间下可记录pipelines和notebook中的运行工作流信息、参数指标信息等。
配置参数reuse_workspace_if_exists，可以实现workspace的复用。
Class Run

class Run(object):
  """Run captures a run of pipeline or notebooks in a workspace and group executions. """
 
  def __init__(self,
               workspace: Workspace = None,
               name: str = None,
               description: Optional[str] = None):
    """ Args: workspace: Required workspace object to which this run belongs. name: Required name of this run. description: Optional description. """

Run类定义了一个工作空间中的运行对象，能记录在该对象下的多次任务执行。
sdk中描述工作流信息的类：
Class Exection

class Execution(object):
  """Captures a run of pipeline or notebooks in a workspace and group executions. Execution also serves as object for logging artifacts as its input or output. """
  EXECUTION_TYPE_NAME = "kubeflow.org/alpha/execution"
 
  def __init__(self,
               name: str = None,
               workspace: Workspace = None,
               run: Optional[Run] = None,
               description: Optional[str] = None):
    """ Args: name: Required name of this run. workspace: Required workspace object where this execution belongs to. run: Optional run object. description: Optional description. Creates a new execution in a workspace and run. The execution.log_XXX() methods will attach corresponding artifacts as the input or output of this execution. """

exection类可用来记录输入输出artifacts信息
Class DataSet

class DataSet(Artifact):
  """ Dataset captures a data set in a machine learning workflow. Attributes: uri: Required uri of the data set. name: Required name of the data set. workspace: Optional name of the workspace. description: Optional description of the data set. owner: Optional owner of the data set. version: Optional version tagged by the user. query: Optional query string on how this data set being fetched from a data source. labels: Optional string key value pairs for labels. Example: >>> metadata.DataSet(description="an example data", ... name="mytable-dump", ... owner="owner@my-company.org", ... uri="file://path/to/dataset", ... version="v1.0.0", ... query="SELECt * FROM mytable", ... labels={"label1","val1"})) """

dataset类用来记录输入输出数据集信息
Class Model

class Model(Artifact):
  """Captures a machine learning model. Attributes: uri: Required uri of the model artifact, e.g. "gcs://path/to/model.h5". name: Required name of the model. workspace: Optional name of the workspace. description: Optional description of the model. owner: Optional owner of the model. model_type: Optional type of the model. training_framework: Optional framework used to train the model. hyperparameters: Optional map from hyper param name to its value. labels: Optional string key value pairs for labels. kwargs: Optional additional keyword arguments are saved as additional properties of this model. Example: >>> metadata.Model(name="MNIST", ... description="model to recognize handwritten digits", ... owner="someone@kubeflow.org", ... uri="gcs://my-bucket/mnist", ... model_type="neural network", ... training_framework={ ... "name": "tensorflow", ... "version": "v1.0" ... }, ... hyperparameters={ ... "learning_rate": 0.5, ... "layers": [10, 3, 1], ... "early_stop": True ... }, ... version="v0.0.1", ... labels={"mylabel": "l1"})) """

model类用来记录输出模型信息
Class Metrics

class Metrics(Artifact):
  """Captures an evaluation metrics of a model on a data set. Attributes: uri: Required uri of the metrics. name: Required name of the metrics. workspace: Optional name of the workspace. description: Optional description of the metrics. owner: Optional owner of the metrics. data_set_id: Optional id of the data set used for evaluation. model_id: Optional id of a evaluated model. metrics_type: Optional type of the evaluation. values: Optional map from metrics name to its value. labels: Optional string key value pairs for labels. Example: >>> metadata.Metrics( ... name="MNIST-evaluation", ... description= ... "validating the MNIST model to recognize handwritten digits", ... owner="someone@kubeflow.org", ... uri="gcs://my-bucket/mnist-eval.csv", ... data_set_id="123", ... model_id="12345", ... metrics_type=metadata.Metrics.VALIDATION, ... values={"accuracy": 0.95}, ... labels={"mylabel": "l1"})) """

metrics类用来记录模型评估的信息

Artifacts应用

pipelines

notebook servers

artifact store在notebook servers中的使用可以参考工程例子：
http://gitlab.travelsky.com/BasicPlatform_DataProduct/kubeflow-examples/blob/master/artifact-example/notebook-servers/mnist-artifact.py
在artifact store ui界面中则可以查看工作流的环节信息：
如下面图中框住的，分别是同一个工作空间下的数据集、模型、模型预测信息。

以及artifact详细信息

• 2020.10.28日培训练习一	• 聊聊python中的list——基本操作
• 华为mate40和苹果12pro参数对比哪个好？看了这	• 阿里P8分享能让MySQL性能提升了数百倍的神操作
• Ubuntu 20.04 Pycharm-中文无法输入问题-笔记	• 美团大数据查询技术

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享
• STM32查询式按键输入[直接用寄存器]	• Ubuntu系统 USB设备端口绑定
• 2021-04-14 第四次按键输入实验	• Flutter扫码功能完美实现