Caption Pipeline#
This guide will help you run the pipeline to caption your dataset.
Create the Configuration#
A custom pipeline_run_config.toml file needs to be created in order for the pipeline to target the dataset you want to caption. Follow these stesp to create the configuration.
In the
./workspace/config/default_configsdirectory, you will find apipeline_run_config.tomlfile. Copy this file into a different location in the./workspace/configfolder. We recommend using .local/ as a folder.Rename your copy of the
pipeline_run_config.tomlfile, then open it.In the
[pipeline]section, change the name to distinguish your pipeline from others.In the
[io]section, change the dataset name to the same name as the dataset you wish to caption.Change the last subfolder of the input and output directories to match your dataset.
In the
[providers.default]section, change the name of the provider to one that exists in yourprovider.config.tomlfile, if necessary.In the
[perspective.enabled]section, set perspectives totrueorfalse, depending on which perspective you wish to use in the pipeline.Save your toml file, and remember the filename and filepath of your file.
Prepare the Pipeline#
In this section, we’ll take the toml file that was created earlier, and modify the default captioning pipeline to use it.
In the Graphcap Studio UI, at the top right, click on the
Pipelineslink. This will open a new tab in your browser to Dagster, which runs the pipelines. You should see the Overview screen.At the top of the Dagster UI, click on the
Jobslink. This will show the standard jobs that are pre-configured in your Dagster installation.In the list of jobs, click on the
basic_perspective_pipelinejob. This reveals the captioning pipeline that will use all of the perspectives that were set totruein the earlier section.Click on the dropdown arrow next to the
Materialize allbutton, and click on theOpen launchpadoption. The Launchpad displays the default configuration of the pipeline.Replace the
/workspace/config/default_configs/pipeline_run_config.tomlin the configuration with the toml file from the earlier section.Click on the
Materializebutton at the bottom of the screen. The pipeline will start captioning your dataset.