Comfyui speed up github

Comfyui speed up github. 0 、 Kaggle Test Stable Fast and xformers run ComfyUI with --disable-cuda-malloc. Adjust the color of the dark and bright parts of the image. json fastblend for comfyui, and other nodes that I write for video2video. Feb 3, 2024 · I tried to reinstall ComfyUI. py looks the easiest to set right before a load. Use the sdxl branch of this repo to load SDXL models; The loaded model only works with the Flatten KSampler and a standard ComfyUI checkpoint loader is required for other KSamplers Acknowledgement. 1. You can create a release to package software, along with release notes and links to binary files, for other people to use. Note that --force-fp16 will only work if you installed the latest pytorch nightly. The same workflow was used for each test. No branches or pull requests. Does it mean 8G VRAM is too little in A1111? Jan 23, 2024 · I'm experiencing slow network speeds when using ComfyUI, especially during operations that involve downloading data. Then it seemed that --novram can work. But its worked before. but at least I am sure DML device name doesn't work , here is what comfyui log says if I left it at :: device = torch. Run git checkout 214197a24d0, and then restart ComfyUI to check its performance. Learned from Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images. 40 which is what I normally get with SDXL. 制作了中文版ComfyUI插件与节点汇总表，项目详见：【腾讯文档】ComfyUI 插件（模组）+ 节点（模块）汇总【Zho】 20230916 近期谷歌Colab禁止了免费层运行SD，所以专门做了Kaggle平台的免费云部署，每周30小时免费冲浪时间，项目详见： Kaggle ComfyUI云部署1. Fortunately the custom node fixed the problem quickly, but this is a classic dependency management problem and I wondered if there's been any discussion of making custom nodes Python dependencies to make it easier to handle and prevent such issues. githubuserconten I've implemented a way to increase speed by alot using parallelization of the non-GPU works. smoothvideo（逐帧渲染/smooth video use each frames） Convert Model using stable-fast (Estimated speed up: 2X) Train a LCM Lora for denoise unet (Estimated speed up: 5X) Training a new Model using better dataset to improve results quality (Optional, we'll see if there is any need for me to do it ;) Continuous research, always moving towards something better & faster🚀 Oct 17, 2023 · I really love comfyui. set_torch_threads(2) set_torch_interop_threads(2) would do a 用conda 新建立一个comfyui对应版本的python 拷贝Include 和libs到comfyui自带的python环境. However Aug 1, 2023 · edited. Check out my other extension that is part of this family: crystools-save (lets you save name, author, and more on your workflows!) CPU & RAM are the only ones that work. if there is input, only the colors within the mask range will be adjusted. Removing some steps could restore normal speed got prompt Loads SAM model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\sams\sam_vit_b_01ec64. catbox. py goes through upwards of hundreds of m That's really counterintuitive. Direct link to download. Jan 8, 2024 · Getting x1. In the first runtime, it needs 2. d by default. In both cases, if you want to use your NVIDIA GPUs to speed up processing, you need to install the NVIDIA CUDA Toolkit. 4 speed-up after engine compile (RTX4070) and reload a significant time upon lora or resolution changes (cudagraph off). 1. Mmmh, I will wait for comfyui to get the proper update to unvail the "x2" boost. This will allow you to access the Launcher and its workflow projects from a single port. For now it seems that nvidia foooocus(ed) (lol, yeah pun intended) on A1111 for this extension. The InsightFace model is antelopev2 (not the classic buffalo_l). Though they can have the smallest param size with higher numerical results, they are not very memory efficient and the processing speed is slow for Transformer model. I wanted raw unassisted outputs. Feature description Hello, I am currently using this plugin for face swapping in videos, and the results are excellent. if using higher or lower than 1, speed is only around 1. Just also cumbersome to use. enhancement new on Dec 15, 2023. Aug 4, 2023 · Many features are disabled on Auto 1111 such as training. Gourieff added accepted and removed new labels on Dec 15, 2023. txt. You can get to rgthree-settings by right-clicking on the empty part of the graph, and selecting rgthree-comfy > Settings (rgthree-comfy) or by clicking the rgthree-comfy settings in the ComfyUI settings dialog. Then each image image required less time for inference, depends on you inference device. 5) and sgm. 56/s. I do not know which of these is essential for the speed up process. See code below, using this in the cloud but otherwise it all works fine. Is that just how bad the LCM lora performs, even on base SDXL? Workflow used v Example3. mp4 The text was updated successfully, but these errors were encountered: Fooocus. 1 participant. I've tested a little bit and it seems to work fine. Contribute to StartHua/ComfyUI_Seg_VITON development by creating an account on GitHub. I've been using the program for maybe two weeks now. If you set n_threads equals to 1 it will fallback to the old system Mar 21, 2023 · Lumos99x commented on Mar 21, 2023. Non-Git and non-Folder compliant nodes. Notifications. 5 weights into an SDXL model). If you set n_threads equals to 1 it will fal This is not a completed list. Aug 11, 2023 · Creating the original image takes 40 seconds, but the 2x upscaling takes around 5 minutes with a GTX 1660 6GB card. Star 106. Nov 9, 2023 · Make sure you update ComfyUI to the latest, update/update_comfyui. This work is built on ComfyUI-AnimateDiff-Evolved, ComfyUI-VideoHelperSuite and ComfyUI-sampler-lcm-alternative but focus more on the accelearation of AnimateDiff text to video (t2v) generation. I use JSON. mp4 Aug 30, 2023 · I think CUDA graph compilation could help ComfyUI run faster and more efficiently. Might want to do a few torch performance tests to see how much it will actually slow down not speed up. I can get back to lower usage by copying old \ComfyUI folder over the new one. The import fails on load. There is no flag to turn on xfor Feb 10, 2024 · Forge backend = A1111’s sampling + A1111’s text encoder (clip, emphasize, prompt scheduling, etc) + A1111’s Stable Diffusion Object ldm. \python_embeded\python. Learn more about releases in our docs. Nov 13, 2023 · Do not share my personal information. Actions. I asked in WAS-NS Github and author answered me "This is a problem with a YAML file loading. py. I dont know how, I tried unisntall and install torch, its not help. This should not affect SSD drives really. InstantID requires insightface, you need to add it to your libraries together with onnxruntime and onnxruntime-gpu. Why so slow? In comfyUI the speed was approx 2-3 it/s for 1024*1024 image. Follow the guide on the Ollama website and install according to your operating system or opt for usage through a Docker container. (Note, settings are stored in an rgthree_config. Launch ComfyUI by running python main. Feb 23, 2023 · 4. So I had an issue when loading . Still DON'T KNOW > if this is a one time thing or the final solution. bat if you are using the standalone. or if you use portable (run this in ComfyUI_windows_portable -folder): Jul 20, 2023 · However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc. safetensor checkpoints from my NAS that used slower platter hard drives. You signed out in another tab or window. comfyui-模特换装(Model dress up). Contribute to kijai/ComfyUI-Marigold development by creating an account on GitHub. Each time I need to use a model, it requires a complete reload, which is quite time-consuming. I don't understand the part that need some "export default engine" part. There aren’t any releases here. 4x speed increase at a strength of 60% and 512x512. json in the rgthree-comfy directory. You signed in with another tab or window. Code. 0 When attempting to generate a picture that was 768x768 on novram and lowvram, CUDA experienced an out of memory err You signed in with another tab or window. Some utility nodes for ComfyUI. There's also TomeSD which is a different animal that removes effectively duplicate keywords from the model and their repo showed a drop from 3. The change caches the to_delete value by a given node's output at the prompt level (in execute) allowing recursive_output_delete_if_changed to skip nodes whose outputs have already been recursively evaluated. https://files. Feature: Disable Metadata Toggle. py --windows-standalone-build - 4 days ago · This repo pack GCoNet-plus as ComfyUI nodes, and make this fast SOTA model easier use for everyone. I tried to get InvokeAI's nodes to use the same Dec 8, 2023 · RikCost commented on Dec 8, 2023 •edited. Sep 13, 2023 · This change fixes #1502, increasing execution speed of re-executed workflows by a significant amount (scales exponentially as more nodes are added). It supports transparency and multi-image processing across all modules and nodes. Contribute to asagi4/comfyui-utility-nodes development by creating an account on GitHub. The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. I started messing with the flags because I had trouble loading the refiner, however I was not able to turn on xformers after. Contribute to dezi-ai/ComfyUI-AnimateLCM development by creating an account on GitHub. comfyui client patch tool. It is exactly a year since ComfyUI has become a thing now and there are still new nodes that requires copying a specific file to the folder, and ComfyUI-Manager is still providing support for such things, even when the node is hosted on GitHub. " ComfyUI Extensions by Blibla is a robust suite of enhancements, designed to optimize your ComfyUI experience. Note: Remember to add your models, VAE, LoRAs etc. Aug 29, 2023 · Development. 6GB of vram use and 5. Feb 20, 2024 · Upgrade ComfyUI to the latest version! Download or git clone this repository into the ComfyUI/custom_nodes/ directory or use the Manager. One more concern come from the TensorRT deployment, where Transformer architecture is hard to be adapted (needless to say for a modified version of Transformer like GRL). Intel has done a good job at tuning their chips for torch. 2023-11-26. py) Jan 27, 2024 · With instantID under comfyui, I had random OOM's all the time, poor results since I can't get above 768*768. ComfyUI is more a refined visual development environment than a real UI. exe -s ComfyUI\main. For instance if some has a bank of gpu's they could run them together to complete a generation. After testing, you can revert to the original state by running git checkout main. Fooocus is a rethinking of Stable Diffusion and Midjourney’s designs: Learned from Stable Diffusion, the software is offline, open source, and free. Nov 2, 2023 · According to the original repo it can be up to 4x faster at higher resolutions than that. Automatic1111 took 6 minutes 21 seconds. OS: Arch Linux Kernel: 6. r Nov 6, 2023 · Go to the ComfyUI-Manager directory in the cmd. Fork 5. onnx). 2. You switched accounts on another tab or window. This is part of a suit with interesting nodes as tools, like read/save metadata. RGB and scribble are both supported, and RGB can also be used for reference purposes for normal non-AD workflows if use_motion is set to False on the Load SparseCtrl Model node. Follow the ComfyUI manual installation instructions for Windows and Linux. Whether for individual use or team collaboration, our extensions aim to enhance productivity, readability, and Marigold depth estimation in ComfyUI. 05 CUDA Version: 12. TorchScript way is little bit slower than ONNXRuntime but doesn't require any additional library and still way way faster than CPU. Loading: ComfyUI-Manager (V2. Thus Jan 8, 2024 · Using the full workflow with faceid, until 60 seconds, the drawing did not start, and all nodes were working at a very slow speed, which was very frustrating. Node options: image: The input image. By using CUDA graph, ComfyUI could reduce the time and cost of launching each operation to the GPU, and achieve faster and more efficient image Mar 13, 2024 · Mar 13, 2024. I can copy the 5 GB file Checkpoint from the HDD in around 30sec. 37 secodns for checkpoint loading and move to device . Stable diffusion involves many iterations and operations on the GPU, such as sampling, denoising, and attention. WAS-NS doesn't use any YAML that I am aware of. Found out today that the --cpu key stopped working. Am I doing anything wrong? I thought I got all the settings right, but the results are straight up demonic. 5 and 2. gameltb / ComfyUI_stable_fast Public. I'm using the portable version. Haven't even tried using other InstantID nods because of their v-ram requirements but with this nod i can. Experimental use of stable-video-diffusion in ComfyUI - Releases · kijai/ComfyUI-SVD. Sign in to comment. The workflow I used is very basic and not ideal in most cases. If it's still slow, similarly run git checkout 2d6633ec6c and restart ComfyUI to assess its performance. moe/9vze6v. It runs with the following attributes to speed things up on a mac: --no-half --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate. Nov 17, 2023 · 1 Drive only for ComfyUI ,currently 2. mask: Optional input. LatentDiffusion (for sd1. It does still crash when I tried to enable a batch of 2 because I decided to push my luck and may still crash like the other UIs when IPEX decides to randomly stop working but maybe that is to be expected given what I mentioned above about VRAM allocation and It is 100% a hack, but it works and will save time if you change models often. Once the file is copied ComfyUI starts immediately with the checkpoint. 9-arch1-2 GPU: Nvidia GeForce GTX 1060 3GB Nvidia Driver Version: 525. InvokeAI (GUI) took 16 seconds. Dec 15, 2023 · Feature description. After more than ten minutes of waiting, I saw that the workflow was executing very slowly. Jul 28, 2023 · Why would two version of same program run at such different speeds? I looked at the how Automatic1111 starts up. Ollama Website; Ollama Docker Hub; Ollama GitHub You signed in with another tab or window. Running --cpu was used to upscale the image as my Quadro K620 only has 2Gb VRAM `c:\SD\ComfyUI>set CUDA_LAUNCH_BLOCKING=1 c:\SD\ComfyUI>git pull remote: Enumerating objects: 11, done. ComfyUI took 36. Sign up for free to join this conversation on GitHub . 18 TB free on the HDD. I made a little video from copy the file and ComfyUI with the"clear" Ram. 0 flows, but sdxl loads the checkpoints, take up about 19GB vram, then pushes to 24 GB vram upon running a prompt, once the prompt finishes, (or if I cancel it) it just sits at 24 GB until I close out the comfyui command prompt Either manager and install from git, or clone this repo to custom_nodes and run: pip install -r requirements. These defaults are conservative. it's still puts my comfyui in low -vram mode (controlnet model will cause it in this case) which half it's generation speed but at 25step 40~ second is still a win. Skipping some of them will significantly speed up TAESD previews at the cost of smaller preview image results. Me too, these guys rock, and i always welcome more optimizations and support specially if it's possible for SD 1. Could you guys add an option to change it to random it at the start of generation process? Nov 9, 2023 · Thank you, it works. The speed of image generation is about 10 s/it (10241024 batch size 1), refiner works faster up to 1+ s/it when refining at the same 10241024 resolution. If you have trouble extracting it, right click the file -> properties -> unblock. 5. DiffusionEngine (for SDXL) + Unet Patcher (include patches from A1111, comfyui, and Forge, customized for webui use cases) + memory management system almost same as Feb 3, 2024 · device_name(device_index)) # torch_directml. I've implemented a way to increase speed by alot using parallelization of the non-GPU works. - Pull requests · comfyanonymous/ComfyUI. The speed of inovation, internal performance, flexibility . ImportError: cannot import name 'processing' from 'modules' (xxxx\custom_nodes\ComfyUI_OOTDiffusion_CXH\preprocess\humanparsing\modules_init_. pt) checkpoints or ONNXRuntime (. Just a quick comparison of the speed and image quality of the two new fp8 options added today. . This means in practice, Gen2's Use Evolved Sampling node can be used without a model model, letting Context Options and Sample Settings be used without AnimateDiff. Pull requests 1. They would load around 24-32MB/s and take forever. Jan 9, 2024 · Is there a way to configure ComfyUI to maintain the mode I'm currently facing challenges with the repeated loading times of models in ComfyUI. Already have an account? Sign in to comment. 15-41-29. Seems not worth it currently unless running a simple static ComfyUI workflow. Notes - the ComfyUI node setup generates internally at 4096 x 4096 for a 1024 x 1024 output size. A torchscript bbox detector is compatiable with an onnx pose estimator and vice versa. Spent the whole week working on it. #2788 opened on Feb 13 by hku Loading. device('dml') Node: Load Checkpoint with FLATTEN model. I will give it a try ;) EDIT : got a bunch of errors at start. 85. Dec 15, 2023 · SparseCtrl is now available through ComfyUI-Advanced-ControlNet. I also noticed there is a big difference in speed when I changed CFG to 1. here is a working automatic1111 setting for low VRAM system: automatic additionnal args: --lowvram --no-half-vae --xformers --medvram-sdxl Follow the ComfyUI manual installation instructions for Windows and Linux. Test TensorRT and pytorch run ComfyUI with --disable-xformers. Install the ComfyUI dependencies. 忽略下面错误: from modules_ import processing, shared, images, devices, scripts. Please read the AnimateDiff repo README and Wiki for more information about how it works at its core. 0, 2. https://idkiro. tool guide. Jul 30, 2023 · All timings are an average of the second and third images generated to remove model load times. You can customize which monitor you want to view. fastblend node： 1. I found ComfyUI to be a extremely well developed tool but it lacks that "refinement" for actual comfortable use. This means to delete folder ComfyUI_windows_portable and againt install it withou instaling WAS node suite. Dec 19, 2023 · Follow the ComfyUI manual installation instructions for Windows and Linux. Nov 26, 2023 · The video attached is a comparison between a regular website scrolling vs ComfyUI on a scroll speed value of 3. In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered. However, after reinstalling the manager plugin, I encounter the following issues: Comfyui windows portable, fully up to date 13900k, 32 GB ram windows 11 h2 4090 newest drivers works fine with 1. It seemed that he was working with the CPU instead of the GPU. Testing was done on an RTX2060 6GB with counterfeitXLv2 as my test model. 5 checkpoint with the FLATTEN optical flow model. Feb 8, 2023 · bazettfraga commented on Feb 8, 2023. 8 seconds. on the Desktop SSD. Randomize the seed after generating img feels abit weird to use and become annoying when i try to get the seed of the last image. Fooocus is an image generating software (based on Gradio ). This doesn't helps - ComfyUI still not working. disable_tiled_resources(True) lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enable. 3 days ago · Additionally, you can run Ollama using Docker. Simply download, extract with 7-Zip and run. Improved AnimateDiff integration for ComfyUI, as well as advanced sampling options dubbed Evolved Sampling usable outside of AnimateDiff. But still a great attempt. D:\ComfyUI_windows_portable>. But yeah, it works for single image generation, was able to generate 5 images in a row without crashing. Sep 29, 2023 · comfy/utils. that's all seriously great work. Nov 13, 2023 · The only way to fix it was to rollback ComfyUI but that rollback broke other custom nodes. py --force-fp16. It provides a range of features, including customizable render modes, dynamic node coloring, and versatile management tools. In the same case, just delete the node where the ipdapter is applied, and the whole process is completed in 40 seconds. Feb 20, 2024 · Hello, I would like to suggest implementing my paper: Differential Diffusion: Giving Each Pixel Its Strength. I reinstalled python and everything broke. Add group templating. Loads any given SD1. Aug 2, 2023 · I am using RTX3070Ti Laptop GPU with 8G of VRAM. 4GB to 0. and ComfyUI fully lacks support for it. util already has torch imported so no need to import. 2~x1. Dec 4, 2023 · BetaDoggoon Dec 4, 2023. torchscript. Overall, Gen1 is the simplest way to use basic AnimateDiff features, while Gen2 separates model loading and application from the Evolved Sampling features. ComfyUI Custom Node for AnimateLCM. Reload to refresh your session. . The paper allows a user to edit a picture by a change map that describes how much each region should change. io/sdxs/ This model can achieve a speed of 100FPS on a single GPU Once the container is running, all you need to do is expose port 80 to the outside world. About Allor: Allor is a high-performance ComfyUI plugin designed for image processing. I have two accelerators, and I used to rely on one with fast internet speed for updating ComfyUI and plugins. If you have another Stable Diffusion UI you might be able to reuse the dependencies. It looks like recursive_output_delete_if_changed in execution. When I installed comfy it showed loading xformers [version] when I started it. But if there is any way to add support for AMD to your todo list it would be greatly appreciated. github. Is there a recommended setting for tile_width or tile_height to improve speed? In auto1111 upscaling by 2x takes mathematically 4x longer than the original image's generation speed, here it's 10x slower somehow. I've tested a little bit and it seems to work fine. rebatch image, my openpose. Change the line "lowvram_available=False" into "lowvram_available = True". Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints. KSampler VRAM usage went from ~7 to ~10 and VAE Decode from ~9 to ~13. i'm on a Both of them works, and it doesn't seem like there is much speed difference here either. to the corresponding Comfy folders, as discussed in ComfyUI manual installation. Sep 12, 2023 · I've noticed that ComfyUI queue execution hangs significantly when re-executing a previous prompt with many "pipe" or "context" nodes. #2782 opened on Feb 12 by digitaljohn Loading. I would recommend setting throttle_secs to something relatively high (like 5-10) especially if you are generating batches at high resolution. pth (device:Prefer GPU) model_type EPS adm 0 Hi, my VRAM usage seems to have increased after updating. Currently, PROXY_MODE=true only works with Docker, since NGINX is used within the container. It comprises more than 90 nodes, each with numerous parameters for your needs. Allor is fully configurable, offering the option to disable any functionality that is not required. Issues 10. If you're running the Launcher manually, you'll need to set up a reverse Dec 6, 2023 · Follow the ComfyUI manual installation instructions for Windows and Linux. There are other advanced settings that can only be Oct 4, 2023 · Please allow us to use GPU's in series for greater speeds in image and animation generations, when it comes to running multiple gpu's. For the TensorRT first launch, it will take up to 10 minutes to build the engine; with timing cache, it will reduce to about 2–3 minutes; with engine cache, it will reduce to about 20–30 seconds for now. I understand that many of people in the AI image generation world have a NVIDIA gpu or use a cloud service such as clipdrop. There are two ways to speed-up DWPose: using TorchScript checkpoints (. The node will not work if you attempt to change the model type (that is, don't try to load SD1. 1) ComfyUI Revision: 1901 [56d9496b] | Released on '2024-01-12' FETCH DATA from: https://raw. oq dr ro op bp aa kw tm qy si