Caption Pipeline#

This guide will help you run the pipeline to caption your dataset.

Create the Configuration#

A custom pipeline_run_config.toml file needs to be created in order for the pipeline to target the dataset you want to caption. Follow these stesp to create the configuration.

  1. In the ./workspace/config/default_configs directory, you will find a pipeline_run_config.toml file. Copy this file into a different location in the ./workspace/config folder. We recommend using .local/ as a folder.

  2. Rename your copy of the pipeline_run_config.toml file, then open it.

  3. In the [pipeline] section, change the name to distinguish your pipeline from others.

  4. In the [io] section, change the dataset name to the same name as the dataset you wish to caption.

  5. Change the last subfolder of the input and output directories to match your dataset.

  6. In the [providers.default] section, change the name of the provider to one that exists in your provider.config.toml file, if necessary.

  7. In the [perspective.enabled] section, set perspectives to true or false, depending on which perspective you wish to use in the pipeline.

  8. Save your toml file, and remember the filename and filepath of your file.

Prepare the Pipeline#

In this section, we’ll take the toml file that was created earlier, and modify the default captioning pipeline to use it.

  1. In the Graphcap Studio UI, at the top right, click on the Pipelines link. This will open a new tab in your browser to Dagster, which runs the pipelines. You should see the Overview screen.

  2. At the top of the Dagster UI, click on the Jobs link. This will show the standard jobs that are pre-configured in your Dagster installation.

  3. In the list of jobs, click on the basic_perspective_pipeline job. This reveals the captioning pipeline that will use all of the perspectives that were set to true in the earlier section.

  4. Click on the dropdown arrow next to the Materialize all button, and click on the Open launchpad option. The Launchpad displays the default configuration of the pipeline.

  5. Replace the /workspace/config/default_configs/pipeline_run_config.toml in the configuration with the toml file from the earlier section.

  6. Click on the Materialize button at the bottom of the screen. The pipeline will start captioning your dataset.