1
50
MEGATHREAD (lemmy.dbzer0.com)
submitted 1 year ago by [email protected] to c/[email protected]

This is a copy of /r/stablediffusion wiki to help people who need access to that information


Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

#Community

Games

  • PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

Databases or Lists

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

  • Check out our guides section above!

Will it run on my machine?

  • Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
  • Only Nvidia cards are officially supported.
  • AMD support is available here unofficially.
  • Apple M1 Chip support is available here unofficially.
  • Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2
4
submitted 2 days ago by [email protected] to c/[email protected]
3
3
submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]
4
7
submitted 3 days ago* (last edited 3 days ago) by [email protected] to c/[email protected]
5
12
lllyasviel/Paints-UNDO (lllyasviel.github.io)
submitted 6 days ago by [email protected] to c/[email protected]

PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

Paints-Undo is a project aimed at providing base models of human drawing behaviors with a hope that future AI models can better align with the real needs of human artists.

The name "Paints-Undo" is inspired by the similarity that, the model's outputs look like pressing the "undo" button (usually Ctrl+Z) many times in digital painting software.

Paints-Undo presents a family of models that take an image as input and then output the drawing sequence of that image. The model displays all kinds of human behaviors, including but not limited to sketching, inking, coloring, shading, transforming, left-right flipping, color curve tuning, changing the visibility of layers, and even changing the overall idea during the drawing process.

Code: https://github.com/lllyasviel/Paints-UNDO

Project Page: https://lllyasviel.github.io/pages/paints_undo/

6
12
submitted 1 week ago by [email protected] to c/[email protected]

We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters. We do thoroughly experiments on SDXL and achieve superior performance both in control ability and aesthetic score. We release the method and the model to the open source community to make everyone can enjoy it.

More details can found: https://github.com/xinsir6/ControlNetPlus/tree/main

7
9
submitted 1 week ago by [email protected] to c/[email protected]

Abstract

We present Kolors, a latent diffusion model for text-to-image synthesis, characterized by its profound understanding of both English and Chinese, as well as an impressive degree of photorealism. There are three key insights contributing to the development of Kolors. Firstly, unlike large language model T5 used in Imagen and Stable Diffusion 3, Kolors is built upon the General Language Model (GLM), which enhances its comprehension capabilities in both English and Chinese. Moreover, we employ a multimodal large language model to recaption the extensive training dataset for fine-grained text understanding. These strategies significantly improve Kolors' ability to comprehend intricate semantics, particularly those involving multiple entities, and enable its advanced text rendering capabilities. Secondly, we divide the training of Kolors into two phases: the concept learning phase with broad knowledge and the quality improvement phase with specifically curated high-aesthetic data. Furthermore, we investigate the critical role of the noise schedule and introduce a novel schedule to optimize high-resolution image generation. These strategies collectively enhance the visual appeal of the generated high-resolution images. Lastly, we propose a category-balanced benchmark KolorsPrompts, which serves as a guide for the training and evaluation of Kolors. Consequently, even when employing the commonly used U-Net backbone, Kolors has demonstrated remarkable performance in human evaluations, surpassing the existing open-source models and achieving Midjourney-v6 level performance, especially in terms of visual appeal. We will release the code and weights of Kolors at https://github.com/Kwai-K010rs/K010rs, and hope that it will benefit future research and applications in the visual generation community.

Technical Report: https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf

Code: https://github.com/Kwai-Kolors/Kolors

Hugging Face Spaces: https://huggingface.co/Kwai-Kolors/Kolors

Team Page: https://kwai-kolors.github.io/

Official Website: https://kolors.kuaishou.com/

8
32
submitted 1 week ago by [email protected] to c/[email protected]

Features:

  • A lot of performance improvements (see below in Performance section)
  • Stable Diffusion 3 support (#16030)
    • Recommended Euler sampler; DDIM and other timestamp samplers currently not supported
    • T5 text model is disabled by default, enable it in settings
  • New schedulers:
  • New sampler: DDIM CFG++ (#16035)

Minor:

  • Option to skip CFG on early steps (#15607)
  • Add --models-dir option (#15742)
  • Allow mobile users to open context menu by using two fingers press (#15682)
  • Infotext: add Lora name as TI hashes for bundled Textual Inversion (#15679)
  • Check model's hash after downloading it to prevent corruped downloads (#15602)
  • More extension tag filtering options (#15627)
  • When saving AVIF, use JPEG's quality setting (#15610)
  • Add filename pattern: [basename] (#15978)
  • Add option to enable clip skip for clip L on SDXL (#15992)
  • Option to prevent screen sleep during generation (#16001)
  • ToggleLivePriview button in image viewer (#16065)

Extensions and API:

  • Add process_before_every_sampling hook (#15984)
  • Return HTTP 400 instead of 404 on invalid sampler error (#16140)

Performance:

  • [Performance 1/6] use_checkpoint = False (#15803)
  • [Performance 2/6] Replace einops.rearrange with torch native ops (#15804)
  • [Performance 4/6] Precompute is_sdxl_inpaint flag (#15806)
  • [Performance 5/6] Prevent unnecessary extra networks bias backup (#15816)
  • [Performance 6/6] Add --precision half option to avoid casting during inference (#15820)
  • [Performance] LDM optimization patches (#15824)
  • [Performance] Keep sigmas on CPU (#15823)
  • Check for nans in unet only once, after all steps have been completed
  • Added pption to run torch profiler for image generation

Bug Fixes:

  • Fix for grids without comprehensive infotexts (#15958)
  • feat: lora partial update precede full update (#15943)
  • Fix bug where file extension had an extra '.' under some circumstances (#15893)
  • Fix corrupt model initial load loop (#15600)
  • Allow old sampler names in API (#15656)
  • more old sampler scheduler compatibility (#15681)
  • Fix Hypertile xyz (#15831)
  • XYZ CSV skipinitialspace (#15832)
  • fix soft inpainting on mps and xpu, torch_utils.float64 (#15815)
  • fix extention update when not on main branch (#15797)
  • update pickle safe filenames
  • use relative path for webui-assets css (#15757)
  • When creating a virtual environment, upgrade pip in webui.bat/webui.sh (#15750)
  • Fix AttributeError (#15738)
  • use script_path for webui root in launch_utils (#15705)
  • fix extra batch mode P Transparency (#15664)
  • use gradio theme colors in css (#15680)
  • Fix dragging text within prompt input (#15657)
  • Add correct mimetype for .mjs files (#15654)
  • QOL Items - handle metadata issues more cleanly for SD models, Loras and embeddings (#15632)
  • replace wsl-open with wslpath and explorer.exe (#15968)
  • Fix SDXL Inpaint (#15976)
  • multi size grid (#15988)
  • fix Replace preview (#16118)
  • Possible fix of wrong scale in weight decomposition (#16151)
  • Ensure use of python from venv on Mac and Linux (#16116)
  • Prioritize python3.10 over python3 if both are available on Linux and Mac (with fallback) (#16092)
  • stoping generation extras (#16085)
  • Fix SD2 loading (#16078#16079)
  • fix infotext Lora hashes for hires fix different lora (#16062)
  • Fix sampler scheduler autocorrection warning (#16054)

Other:

  • fix changelog #15883 -> #15882 (#15907)
  • ReloadUI backgroundColor --background-fill-primary (#15864)
  • Use different torch versions for Intel and ARM Macs (#15851)
  • XYZ override rework (#15836)
  • scroll extensions table on overflow (#15830)
  • img2img batch upload method (#15817)
  • chore: sync v1.8.0 packages according to changelog (#15783)
  • Add AVIF MIME type support to mimetype definitions (#15739)
  • Update imageviewer.js (#15730)
  • no-referrer (#15641)
  • .gitignore trace.json (#15980)
  • Bump spandrel to 0.3.4 (#16144)
  • Defunct --max-batch-count (#16119)
  • docs: update bug_report.yml (#16102)
  • Maintaining Project Compatibility for Python 3.9 Users Without Upgrade Requirements. (#16088)
  • Update torch for ARM Macs to 2.3.1 (#16059)
  • remove deprecated setting dont_fix_second_order_samplers_schedule (#16061)
  • chore: fix typos (#16060)

This release has 2 assets:

  • Source code (zip)
  • Source code (tar.gz)

Visit the release page to download them.

9
21
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
10
7
submitted 1 week ago by [email protected] to c/[email protected]

Abstract

We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations, we propose FoleyCrafter, a novel framework that leverages a pre-trained text-to-audio model to ensure high-quality audio generation. FoleyCrafter comprises two key components: the semantic adapter for semantic alignment and the temporal controller for precise audio-video synchronization. The semantic adapter utilizes parallel cross-attention layers to condition audio generation on video features, producing realistic sound effects that are semantically relevant to the visual content. Meanwhile, the temporal controller incorporates an onset detector and a timestamp based adapter to achieve precise audio-video alignment. One notable advantage of FoleyCrafter is its compatibility with text prompts, enabling the use of text descriptions to achieve controllable and diverse video-to-audio generation according to user intents. We conduct extensive quantitative and qualitative experiments on standard benchmarks to verify the effectiveness of FoleyCrafter. Models and codes are available at Github.

Paper: https://arxiv.org/abs/2407.01494

Code: https://github.com/open-mmlab/foleycrafter

Demo: https://huggingface.co/spaces/ymzhang319/FoleyCrafter

Project Page: https://foleycrafter.github.io/

11
5
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
12
5
submitted 1 week ago by [email protected] to c/[email protected]
13
4
submitted 2 weeks ago by [email protected] to c/[email protected]
14
9
submitted 2 weeks ago by [email protected] to c/[email protected]
15
8
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

Quoted from Reddit:

Hello r/StableDiffusion --

A sincere thanks to the overwhelming engagement and insightful discussions following our announcement yesterday of the Open Model Initiative. If you missed it, check it out here.

We know there are a lot of questions, and some healthy skepticism about the task ahead. We'll share more details as plans are formalized -- We're taking things step by step, seeing who's committed to participating over the long haul, and charting the course forwards. 

That all said - With as much community and financial/compute support as is being offered, I have no hesitation that we have the fuel needed to get where we all aim for this to take us. We just need to align and coordinate the work to execute on that vision.

We also wanted to officially announce and welcome some folks to the initiative, who will support with their expertise on model finetuning, datasets, and model training:

  • AstraliteHeart, founder of PurpleSmartAI and creator of the very popular PonyXL models
  • Some of the best model finetuners including Robbert "Zavy" van Keppel and Zovya
  • Simo Ryu, u/cloneofsimo, a well-known contributor to Open Source AI 
  • Austin, u/AutoMeta, Founder of Alignment Lab AI
  • Vladmandic & SD.Next
  • And over 100 other community volunteers, ML researchers, and creators who have submitted their request to support the project

Due to voiced community concern, we’ve discussed with LAION and agreed to remove them from formal participation with the initiative at their request. Based on conversations occurring within the community we’re confident that we’ll be able to effectively curate the datasets needed to support our work. 


Frequently Asked Questions (FAQs) for the Open Model Initiative

We’ve compiled a FAQ to address some of the questions that were coming up over the past 24 hours.

How will the initiative ensure the models are competitive with proprietary ones?

We are committed to developing models that are not only open but also competitive in terms of capability and performance. This includes leveraging cutting-edge technology, pooling resources and expertise from leading organizations, and continuous community feedback to improve the models. 

The community is passionate. We have many AI researchers who have reached out in the last 24 hours who believe in the mission, and who are willing and eager to make this a reality. In the past year, open-source innovation has driven the majority of interesting capabilities in this space.

We’ve got this.

What does ethical really mean? 

We recognize that there’s a healthy sense of skepticism any time words like “Safety” “Ethics” or “Responsibility” are used in relation to AI. 

With respect to the model that the OMI will aim to train, the intent is to provide a capable base model that is not pre-trained with the following capabilities:

  • Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts
  • Generating the likeness of unconsented individuals
  • The production of AI Generated Child Sexual Abuse Material (CSAM).

There may be those in the community who chafe at the above restrictions being imposed on the model. It is our stance that these are capabilities that don’t belong in a base foundation model designed to serve everyone.

The model will be designed and optimized for fine-tuning, and individuals can make personal values decisions (as well as take the responsibility) for any training built into that foundation. We will also explore tooling that helps creators reference styles without the use of artist names.

Okay, but what exactly do the next 3 months look like? What are the steps to get from today to a usable/testable model?

We have 100+ volunteers we need to coordinate and organize into productive participants of the effort. While this will be a community effort, it will need some organizational hierarchy in order to operate effectively - With our core group growing, we will decide on a governance structure, as well as engage the various partners who have offered support for access to compute and infrastructure. 

We’ll make some decisions on architecture (Comfy is inclined to leverage a better designed SD3), and then begin curating datasets with community assistance.

What is the anticipated cost of developing these models, and how will the initiative manage funding? 

The cost of model development can vary, but it mostly boils down to the time of participants and compute/infrastructure. Each of the initial initiative members have business models that support actively pursuing open research, and in addition the OMI has already received verbal support from multiple compute providers for the initiative. We will formalize those into agreements once we better define the compute needs of the project.

This gives us confidence we can achieve what is needed with the supplemental support of the community volunteers who have offered to support data preparation, research, and development. 

Will the initiative create limitations on the models' abilities, especially concerning NSFW content? 

It is not our intent to make the model incapable of NSFW material. “Safety” as we’ve defined it above, is not restricting NSFW outputs. Our approach is to provide a model that is capable of understanding and generating a broad range of content. 

We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM.

What license will the model and model weights have?

TBD, but we’ve mostly settled between an MIT or Apache 2 license.

What measures are in place to ensure transparency in the initiative’s operations?

We plan to regularly update the community on our progress, challenges, and changes through the official Discord channel. As we evolve, we’ll evaluate other communication channels.

Looking Forward

We don’t want to inundate this subreddit so we’ll make sure to only update here when there are milestone updates. In the meantime, you can join our Discord for more regular updates.

If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI. 

Thank you for your support and enthusiasm!

Sincerely, 

The Open Model Initiative Team

16
7
submitted 2 weeks ago by [email protected] to c/[email protected]
17
17
submitted 2 weeks ago by [email protected] to c/[email protected]
18
5
submitted 2 weeks ago by [email protected] to c/[email protected]
19
18
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

Quoted from Reddit:

Today, we’re excited to announce the launch of the Open Model Initiative, a new community-driven effort to promote the development and adoption of openly licensed AI models for image, video and audio generation.

We believe open source is the best way forward to ensure that AI benefits everyone. By teaming up, we can deliver high-quality, competitive models with open licenses that push AI creativity forward, are free to use, and meet the needs of the community.

Ensuring access to free, competitive open source models for all.

With this announcement, we are formally exploring all available avenues to ensure that the open-source community continues to make forward progress. By bringing together deep expertise in model training, inference, and community curation, we aim to develop open-source models of equal or greater quality to proprietary models and workflows, but free of restrictive licensing terms that limit the use of these models.

Without open tools, we risk having these powerful generative technologies concentrated in the hands of a small group of large corporations and their leaders.

From the beginning, we have believed that the right way to build these AI models is with open licenses. Open licenses allow creatives and businesses to build on each other's work, facilitate research, and create new products and services without restrictive licensing constraints.

Unfortunately, recent image and video models have been released under restrictive, non-commercial license agreements, which limit the ownership of novel intellectual property and offer compromised capabilities that are unresponsive to community needs. 

Given the complexity and costs associated with building and researching the development of new models, collaboration and unity are essential to ensuring access to competitive AI tools that remain open and accessible.

We are at a point where collaboration and unity are crucial to achieving the shared goals in the open source ecosystem. We aspire to build a community that supports the positive growth and accessibility of open source tools.

For the community, by the community

Together with the community, the Open Model Initiative aims to bring together developers, researchers, and organizations to collaborate on advancing open and permissively licensed AI model technologies.

The following organizations serve as the initial members:

  • Invoke, a Generative AI platform for Professional Studios
  • ComfyOrg, the team building ComfyUI
  • Civitai, the Generative AI hub for creators
  • LAION, one of the largest open source data networks for model training

To get started, we will focus on several key activities: 

•Establishing a governance framework and working groups to coordinate collaborative community development.

•Facilitating a survey to document feedback on what the open-source community wants to see in future model research and training

•Creating shared standards to improve future model interoperability and compatible metadata practices so that open-source tools are more compatible across the ecosystem

•Supporting model development that meets the following criteria: ‍

  • True open source: Permissively licensed using an approved Open Source Initiative license, and developed with open and transparent principles
  • Capable: A competitive model built to provide the creative flexibility and extensibility needed by creatives
  • Ethical: Addressing major, substantiated complaints about unconsented references to artists and other individuals in the base model while recognizing training activities as fair use.

‍We also plan to host community events and roundtables to support the development of open source tools, and will share more in the coming weeks.

Join Us

We invite any developers, researchers, organizations, and enthusiasts to join us. 

If you’re interested in hearing updates, feel free to join our Discord channel

If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI. 

Sincerely,

Kent Keirsey
CEO & Founder, Invoke

comfyanonymous
Founder, Comfy Org

Justin Maier
CEO & Founder, Civitai

Christoph Schuhmann
Lead & Founder, LAION

20
11
Decartunizer (lemmy.dbzer0.com)
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]
21
6
submitted 3 weeks ago by [email protected] to c/[email protected]

Highlights for 2024-06-23

Following zero-day SD3 release, a 10 days later here's a refresh with 10+ improvements
including full prompt attention, support for compressed weights, additional text-encoder quantization modes.

But there's more than SD3:

  • support for quantized T5 text encoder FP16/FP8/FP4/INT8 in all models that use T5: SD3, PixArt-Σ, etc.
  • support for PixArt-Sigma in small/medium/large variants
  • support for HunyuanDiT 1.1
  • additional NNCF weights compression support: SD3, PixArt, ControlNet, Lora
  • integration of MS Florence VLM/VQA Base and Large models
  • (finally) new release of Torch-DirectML
  • additional efficiencies for users with low VRAM GPUs
  • over 20 overall fixes
22
9
submitted 3 weeks ago by [email protected] to c/[email protected]
23
12
submitted 3 weeks ago by [email protected] to c/[email protected]
24
20
submitted 3 weeks ago by [email protected] to c/[email protected]
25
7
submitted 3 weeks ago by [email protected] to c/[email protected]
view more: next ›

Stable Diffusion

4157 readers
2 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago
MODERATORS