Core is not Django-specific, but it has an optional integration. Sync and async, retries/cancellation/etc., very extensible, and IMO super clean architecture and well tested.
IIRC think the codebase is like one-tenth that of Celery.
If you like Procastinate, you might like my Chancy, which is also built on postgres but with a goal of the most common bells and whistles being included.
Rate limiting, global uniqueness, timeouts, memory limits, mix asyncio/processes/threads/sub-interpreters in the same worker, workflows, cron jobs, dashboard, metrics, django integrations, repriotization, triggers, pruning, Windows support, queue tagging (ex: run this queue on all machines running windows with a GPU, run this one on workers with py3.14 and this one on workers with py3.11) etc etc...
The pending v0.26 includes stabilizing of the HTTP API, dashboard improvements, workflow performance improvements for workflows with thousands of steps and django-tasks integration.
We moved all our celery tasks to procrastinate at work for all our django backends since almost two years now and it has been great.
Having tasks deferred in the same transaction as the business logic stuff is something that helped us a lot to improve consistency and debugability. Moreover, it's so nice to able to inspect what's going on by just querying our database or just looking at the django admin.
For those wondering, procrastinate has no built-in alternative to django-celery-beat, but you can easily build your own in a day: no need for an extra dependency for this :)
A customer of mine has two projects. One running on their own hardware, Django + Celery. The other one running on AWS EC2, Django alone.
In the first one we use Celery to run some jobs that may last from a few seconds to some minutes. In the other one we create a new VM and make it run the job and we make it self destroy on job termination. The communication is over a shared database and SQS queues.
We have periodic problems with celery: workers losing connection with RabbitMQ, Celery itself getting stuck, gevent issues maybe caused by C libraries but we can't be sure (we use prefork for some workers but not for everything)
We had no problems with EC2 VMs. By the way, we use VirtualBox to simulate EC2 locally: a Python class encapsulates the API to start the VMs and does it with boto3 in production and with VBoxManage in development.
What I don't understand is: it's always Linux, amd64, RabbitMQ but my other customer using Rails and Sidekiq has no problems and they run many more jobs. There is something in the concurrency stack inside Celery that is too fragile.
Can share the sentiment, had to work with celery years ago, and the maintenance/footguns exceeded the expectations. The codebase and docs are also a bit messy, it's a huge project used and contributed by many so it's understandable I guess.
Anyway, Argo if you are in K8S, something else if you aren't. And if you are a startup and need speed, just go with something like procrastinate.
Migrated Celery to Argo Workflows. No wisdom as it was straightforward. You lose a lot startup speed though, so it's not a drop-in replacement and is only a good choice for long-running workflows. Celery was easier than Argo Workflows. Celery is really easy to get started with. I like Airflow the best, but it's closer to Argo Workflows in terms of more long-lived workflows. I hope to try Hatchet soon. I've read Temporal is even harder to manage.
One of the major complaints with Celery is observability. Databased-backed options like Procastinate and Chancy will never reach the potential peak throughput of Celery+RabbitMQ, but they're still sufficient to run millions upon millions of tasks per day even on a $14/month VPS. The tradeoff to this is excellent insight into what's going on - all state lives in the database, you can just query it. Both Procastinate and Chancy come with Django integrations, so you can even query it with the ORM.
Chancy also comes with a "good enough" metrics plugin and a dashboard. Not suitable for an incredibly busy instance with tens of thousands of distinct types of jobs, but good enough for most projects. You can see the new UI and some example screenshots in the upcoming 0.26 release - https://github.com/TkTech/chancy/pull/58 (and that dashboard is for a production app running ~600k jobs a day on what's practically a toaster). The dashboard can be run standalone locally and pointing to any database as-needed, run inside a worker process, or embedded inside any existing asgi app.
Something about queuing systems that often gets me is that they can start to seem like the wrong abstraction as soon as one has tasks that enqueue additional tasks. Particularly when features start growing, and double particularly when modelling business processes.
This is because the code enqueuing the task needs to be aware of what happens next, which breaks separation of concerns. Why should the user sign-up code have to know that a report generation job now needs queuing?
Really what starts to make more sense to me is to fire off events. Code can say, "this thing just happened", and let other code decide if it wants to listen. When then makes it an event stream rather than a queue, with consumer groups at al.
I made the (now unmaintained) project https://lightbus.org around this, and it did work really well for our use case. Hopefully someone has now created something better.
So I'd say this: before grabbing for a task queue, take a moment to think about what you're actually modelling. But be careful of the event streaming rabbit-hole!
They're not mutually exclusive. Nothing about "event driven" means async. I have an event driven modular monolith and all events are handled synchronously. It's up to the receiver to queue a task if it needs to, so context boundaries are not crossed.
Does the api support progress reporting? ("30 % done")
Of course one could build this manually when you're building the worker implementation but I'd love to have it reflected in the api somewhere. Celery also seems to be missing api for that.
Anyone sees a reason why that's missing? I don't think it complicates the api much and seems such an obvious thing for longer-running background tasks.
To implement progress reporting, it means you are able to know the time a task would take to run upfront, no? Is it even possible to do it accurately ?
Though, I imagine you could have strategies to give an approximation of it, for example like keeping track of the past execution time of a given type of task in order to infer the progress of a currently running task of the same type.
I’ve been using the django-tasks library in production for about a year. The database backend and simple interface have been great. It definitely isn’t intended to replace all of celery, but for a simple task queue that doesn’t require additional infrastructure it works quite well.
How is the typing support? We just had downtime because a change to a celery task didn't trigger mypy to complain for all call sites until runtime. Too many python decorators aren't written with pretty weak typing support.
With regards to args and kwargs? None. Your callable is basically replaced with a Task instance that’s not callable. You need to invoke its enqueue(*args, **kwargs) method and yeah… that’s of course not typed.
Static analysis will never be fully robust in Python. As a simple example, you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically, or even know what the call path of the functions is, without actually running the code in trace/profiler mode.
You probably want something like pydantic’s @validate_call decorator.
> you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically
Can you say more, maybe with with an example, about a function which can't be typed? Are you talking about generating bytecode at runtime, defining functions with lambda expressions, or something else?
This is great! The prev recommendation was usually a lib called celery that I wasn't able to get working. I don't remember the details, but it had high friction points or compatibility barriers I wasn't able to overcome. This integration fits Django's batteries included approach.
I've been handling this, so far, with separate standalone scripts that hook into Django's models and ORM. You have to use certain incantations in an explicit order at the top of the module to make this happen.
This is an exciting development. I imagine I'll continue using Celery in most cases but being able to transparently swap back-ends for testing, CI, etc. is very compelling.
I haven't looked into this in any detail but I wonder if the API or defaults will shave off some of the rough edges in Celery, like early ACKs and retries.
Assuming you're fine with keeping the queue in postgres, I've used Procrastinate and it's great:
https://procrastinate.readthedocs.io/en/stable/index.html
Core is not Django-specific, but it has an optional integration. Sync and async, retries/cancellation/etc., very extensible, and IMO super clean architecture and well tested.
IIRC think the codebase is like one-tenth that of Celery.
If you like Procastinate, you might like my Chancy, which is also built on postgres but with a goal of the most common bells and whistles being included.
Rate limiting, global uniqueness, timeouts, memory limits, mix asyncio/processes/threads/sub-interpreters in the same worker, workflows, cron jobs, dashboard, metrics, django integrations, repriotization, triggers, pruning, Windows support, queue tagging (ex: run this queue on all machines running windows with a GPU, run this one on workers with py3.14 and this one on workers with py3.11) etc etc...
https://tkte.ch/chancy/ & https://github.com/tktech/chancy
The pending v0.26 includes stabilizing of the HTTP API, dashboard improvements, workflow performance improvements for workflows with thousands of steps and django-tasks integration.
I also warmly recommend procrastinate !
We moved all our celery tasks to procrastinate at work for all our django backends since almost two years now and it has been great.
Having tasks deferred in the same transaction as the business logic stuff is something that helped us a lot to improve consistency and debugability. Moreover, it's so nice to able to inspect what's going on by just querying our database or just looking at the django admin.
For those wondering, procrastinate has no built-in alternative to django-celery-beat, but you can easily build your own in a day: no need for an extra dependency for this :)
Celery is such garbage to run/maintain at any sort of scale. Very excited for this. Rq/temporal also seem to solve this well.
Anyone here done the migration off of celery to another thing? Any wisdom?
A customer of mine has two projects. One running on their own hardware, Django + Celery. The other one running on AWS EC2, Django alone.
In the first one we use Celery to run some jobs that may last from a few seconds to some minutes. In the other one we create a new VM and make it run the job and we make it self destroy on job termination. The communication is over a shared database and SQS queues.
We have periodic problems with celery: workers losing connection with RabbitMQ, Celery itself getting stuck, gevent issues maybe caused by C libraries but we can't be sure (we use prefork for some workers but not for everything)
We had no problems with EC2 VMs. By the way, we use VirtualBox to simulate EC2 locally: a Python class encapsulates the API to start the VMs and does it with boto3 in production and with VBoxManage in development.
What I don't understand is: it's always Linux, amd64, RabbitMQ but my other customer using Rails and Sidekiq has no problems and they run many more jobs. There is something in the concurrency stack inside Celery that is too fragile.
Can share the sentiment, had to work with celery years ago, and the maintenance/footguns exceeded the expectations. The codebase and docs are also a bit messy, it's a huge project used and contributed by many so it's understandable I guess. Anyway, Argo if you are in K8S, something else if you aren't. And if you are a startup and need speed, just go with something like procrastinate.
Migrated Celery to Argo Workflows. No wisdom as it was straightforward. You lose a lot startup speed though, so it's not a drop-in replacement and is only a good choice for long-running workflows. Celery was easier than Argo Workflows. Celery is really easy to get started with. I like Airflow the best, but it's closer to Argo Workflows in terms of more long-lived workflows. I hope to try Hatchet soon. I've read Temporal is even harder to manage.
We switched from Celery to Temporal. Temporal is such a great piece of distributed system.
What were the problems you had with Celery?
Really cool to see a batteries‑included option in Django for background jobs.
For folks who’ve used Celery/Procrastinate/Chancy: how does retry/ACK behavior feel in real projects? Any rough edges?
What about observability — dashboards, tracing, metrics — good enough out of the box, or did you bolt on extra stuff?
Also, any gotchas with type hints or decorator-style tasks when refactoring? I’ve seen those bite before.
And lastly, does swapping backends for tests actually feel seamless, or is that more of a “works in the demo” thing?
(I'm biased, I'm the author of Chancy)
One of the major complaints with Celery is observability. Databased-backed options like Procastinate and Chancy will never reach the potential peak throughput of Celery+RabbitMQ, but they're still sufficient to run millions upon millions of tasks per day even on a $14/month VPS. The tradeoff to this is excellent insight into what's going on - all state lives in the database, you can just query it. Both Procastinate and Chancy come with Django integrations, so you can even query it with the ORM.
For Chancy in particular, retries are a (very trivial) plugin (that's enabled by default) - https://github.com/TkTech/chancy/blob/main/chancy/plugins/re.... You can swap it out and add whatever complex retry strategies you'd like.
Chancy also comes with a "good enough" metrics plugin and a dashboard. Not suitable for an incredibly busy instance with tens of thousands of distinct types of jobs, but good enough for most projects. You can see the new UI and some example screenshots in the upcoming 0.26 release - https://github.com/TkTech/chancy/pull/58 (and that dashboard is for a production app running ~600k jobs a day on what's practically a toaster). The dashboard can be run standalone locally and pointing to any database as-needed, run inside a worker process, or embedded inside any existing asgi app.
Something about queuing systems that often gets me is that they can start to seem like the wrong abstraction as soon as one has tasks that enqueue additional tasks. Particularly when features start growing, and double particularly when modelling business processes.
This is because the code enqueuing the task needs to be aware of what happens next, which breaks separation of concerns. Why should the user sign-up code have to know that a report generation job now needs queuing?
Really what starts to make more sense to me is to fire off events. Code can say, "this thing just happened", and let other code decide if it wants to listen. When then makes it an event stream rather than a queue, with consumer groups at al.
I made the (now unmaintained) project https://lightbus.org around this, and it did work really well for our use case. Hopefully someone has now created something better.
So I'd say this: before grabbing for a task queue, take a moment to think about what you're actually modelling. But be careful of the event streaming rabbit-hole!
They're not mutually exclusive. Nothing about "event driven" means async. I have an event driven modular monolith and all events are handled synchronously. It's up to the receiver to queue a task if it needs to, so context boundaries are not crossed.
That's intersting to see.
Does the api support progress reporting? ("30 % done")
Of course one could build this manually when you're building the worker implementation but I'd love to have it reflected in the api somewhere. Celery also seems to be missing api for that.
Anyone sees a reason why that's missing? I don't think it complicates the api much and seems such an obvious thing for longer-running background tasks.
To implement progress reporting, it means you are able to know the time a task would take to run upfront, no? Is it even possible to do it accurately ?
Though, I imagine you could have strategies to give an approximation of it, for example like keeping track of the past execution time of a given type of task in order to infer the progress of a currently running task of the same type.
> To implement progress reporting, it means you are able to know the time a task would take to run upfront, no?
No. You just need to know the total number of steps and what step are you currently on.
I’ve been using the django-tasks library in production for about a year. The database backend and simple interface have been great. It definitely isn’t intended to replace all of celery, but for a simple task queue that doesn’t require additional infrastructure it works quite well.
This one? https://github.com/RealOrangeOne/django-tasks
That and the rq backend sound promising to me.
How is the typing support? We just had downtime because a change to a celery task didn't trigger mypy to complain for all call sites until runtime. Too many python decorators aren't written with pretty weak typing support.
With regards to args and kwargs? None. Your callable is basically replaced with a Task instance that’s not callable. You need to invoke its enqueue(*args, **kwargs) method and yeah… that’s of course not typed.
Static analysis will never be fully robust in Python. As a simple example, you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically, or even know what the call path of the functions is, without actually running the code in trace/profiler mode.
You probably want something like pydantic’s @validate_call decorator.
> you can define a function that only exists at runtime, so even in principle it wouldn’t be possible to type check that statically
Can you say more, maybe with with an example, about a function which can't be typed? Are you talking about generating bytecode at runtime, defining functions with lambda expressions, or something else?
This is great! The prev recommendation was usually a lib called celery that I wasn't able to get working. I don't remember the details, but it had high friction points or compatibility barriers I wasn't able to overcome. This integration fits Django's batteries included approach.
I've been handling this, so far, with separate standalone scripts that hook into Django's models and ORM. You have to use certain incantations in an explicit order at the top of the module to make this happen.
> scripts that hook into Django's models and ORM
Django has management commands for that [1].
When you use Django over time, you experience this pleasant surprise over and over when you need something, “oh, Django already has that”
[1] https://docs.djangoproject.com/en/5.2/howto/custom-managemen...
This is an exciting development. I imagine I'll continue using Celery in most cases but being able to transparently swap back-ends for testing, CI, etc. is very compelling.
I haven't looked into this in any detail but I wonder if the API or defaults will shave off some of the rough edges in Celery, like early ACKs and retries.
Really you want automatic transpilation to go. A good Christmas project for someone.
Django this is about 10 years too late. It's frustrating because we use all manner of hacks to work around this being part of the builtin story.
The best time to plant a tree is twenty years ago. The second best time is now.
[dead]