Open Model Initiative - Data Pipeline#

Contents:

Welcome to the Open Model Initiative Data Pipeline repository!

This project aims to provide open-source, community-driven training pipelines and code for developing baseline AI models for image generation. Other modalities may be released in future updates, whether in this repo or others.

About the Project#

The Open Model Initiative Data Pipeline is a collaborative effort to create and maintain high-quality, openly licensed baseline AI models, and provide tooling to maintain and curate datasets for large training projects. Our goal is to empower individuals and organizations to leverage and build upon these models for their own solutions, and to allow creatives the capacity to utilize these emerging tools for their own creative pursuits.

Key goals#

Effective captioning management and curation tools for large datasets
Standardized metadata format for utilizing datasets across applications
Comprehensive documentation and examples

Contributing#

We welcome contributions from the community! Whether you’re fixing bugs, improving documentation, or proposing new features, your input is valuable. Please read our Contribution Guidelines and our Contribution Guide for more information on how to get started.

License#

This project and its artifacts are planned to be licensed under various permissive licenses:

Software source code: Apache License, Version 2.0

Model parameters, weights, and metadata: CDLA-Permissive 2.0 License

Please see the respective license files for full details.

Contact#

For questions, suggestions, or discussions, please:

Open an issue in this repository
Join our community Discord server

We look forward to your participation in the Open Model Initiative!