I have a little bit of hope that this one is going to be different with the lessons they got from Argo, but I'm not holding my breath.
Ps. when I first saw Argo, I thought this is it. The solution to all my problems.
And Argo and now Kargo are even more complex.
From a business standpoint, Akuity had their Series A in 2022 and raised $25M. They have yet to show up on anyone's radar, IMO. Maybe Kargo is their PMF but I wouldn't move my CI/CD over yet.
Imagine 10 apps deployed. All are actively deployed let's say few times a day.
You want to go back 10 days for App A. But in doing so you would have reverted whole state and all apps as they were 10 days ago.
Only way is to cherry pick particular commit and revert it.
No? I mean how git can be useful in rolling back singel components back and forth?
Then there’s the whole constant reconciliation of your version controlled env specification, and the actual env, and how you automatically resolve differences. With the most important principle being that the version controlled code/config is absolute truth and something needs to figure out how to bend the world to match.
But importantly in all of this, Git isn’t that important. Version control is important, infrastructure as code is important, but Git isn’t. Arguably Git isn’t a great tool for GitOps due to issues like the ones you mention. But the huge ecosystem around Git makes the pain worth it.
I would argue the “correct” solution to your problem is a tool that automatically creates the correct cherrypicks and reverts for you based on a request to rollback application X.
Treat git as a dumb version control system, and broadly ignore “good practice”, because at lot of those good practices are designed for software development, not infrastructure development. We need to develop new working practices, built on top of Git fundamental components, rather than trying to rationalise existing working practices against the new problems that appear in GitOps.
The trap here is this only works for stateless infrastructure. If you do it with stateful resources, you'll lose all data. Your gitops tool will happily recreate EC2 instances, S3 buckets and RDS instances, all empty/initialized to whatever you defined.
For managed services like S3 and RDS, there are other GitOps tools like Crossplane.io which you can use for similar GitOps management. But the paradigm shift might also be that you add GitOps config to perform regular backups, and also add config to ensure that if it is being recreated, it restores from a backup.
But frankly, GitOps works best with stateless apps. Managing stateful apps is possible but you need to take care of state yourself.
You version your apps, of course, and you would publish some artifact that represents the release. Historically this has been a Helm chart, but for Flux we are seeing many people use OCI repositories for this now. They give many of the benefits of Helm repositories without most of the drawbacks, the way that Flux uses them you retain traceability to the Git commit that started your release, and even Helm itself has already adopted OCI repositories in the current version, (just waiting for many chart publishers to catch up, we are getting there!)
The app manages its own manifests in the app repo, the app devs deploy from the main branch or a dev branch on their own app repo, but everyone else that uses the app will deploy from a release artifact. Those artifacts are tagged with semver numbers, so you can automatically move to the next version of the app as soon as its published with a valid signature.
If your app devs are the only ones using the app, then this should not change anything as they are building for production it should be versioned and managed like any production concern – whether it's for distribution or not, you still do releases.
It's not any more complicated than what you are already doing with `docker build` and `docker push` I assure you, it's nearly the same. And since those OCI manifest release tags all logically come from a git tag, there's traceability and it is still GitOps in every important sense of the word.
Automation as policy directives state declaratively that an app is always on the latest published version at any given time, a `spec.semver` with a wildcard accomplishes this very simply with a one-liner addition to your config in Flux.
When you need to roll back app A, you remove the automation (in Flux the easiest way is a wildcard) and pin your gitops config for that one app to the particular version that you wanted in git, the cluster repo, the app is pinned to the one version that doesn't have an issue. Then as the issue is resolved, you may remove the pin and put the automation back in place.
As an added benefit, you get a permanent history that shows when incident response began, how long the automation was disabled, and what version was pinned at that time, so you can calculate metrics like "MTTR" and "MBTF" that we love so much.
Devs tend to be opinionated on these projects because without that you end up with feature sprawl to the point that projects become unmaintainable.
On the other side, new projects need to focus on their specific segment and solve that problem well without worrying about corner cases.
Either way, any solution for a corner case needs to be implemented in a way that solves it broadly for any user.
cat <<EOF | kubectl apply -f -
I dont blame the author(s) but things are getting more specialized
and more terms are created. (which usually maps to an existing
term which also maps to an existed term which also maps an existing
term and so on.
I have though about trying to create a "terminator"(hah) where you can
paste in something from a new product and it will map the terms
to existing terms as a sort of translator or term machine.
We are thrilled to announce Kargo, a multi-stage application lifecycle orchestrator for continuously delivering and promoting changes through environments. Kargo, brought to you by the creators of the Argo Project, is a reimagining of CD pipelines for the cloud-native era, with first-class GitOps support, progressive delivery features, and is 100% open source.
application lifecycle: CI & CD
orchestrator: a thing that runs jobs with dependencies
continuously delivering: CD
promoting changes: if the tests work in the staging environment, allow someone to click a button that says "deploy to prod"
environments: dev, test, staging, production
CD pipelines: a bunch of continuous delivery jobs
cloud-native era: microservices/SaaS/PaaS/IaaS/IaC/II/containerization/webhooks/OIDC
first-class GitOps support: if you push a commit, a job is run
progressive delivery: deploy to 10% of users, if lots of errors, roll back the deploy
100% open source: our code is [currently] available but we will charge you out the ass to manage it for you and Enterprise features will be locked up once we write them
But I understand how it could look like the undecoded bytefall in the Matrix for those outside the know.
Its like arguing that cloudformation locks you into aws
If you had Terraform-defined infrastructure for AWS you'll still need to define entirely different infrastructure for othercloud. Starting with Terraform is going to be marginally easier because maybe you intelligently organized your code using the tools Terraform offers (didn't you?), and didn't just materialize it all as a big pile of root modules (or worse, root modules instantiated by running with different var arguments--you didn't do that, did you?). Because with Terraform you can do some code reuse, etc. in a way you can't with CloudFormation.
But it's only going to be easier because you already know the language and because you have less code (maybe) to examine to inspire your new code base. All your resources will be different and will mostly behave differently as well, so it's not a matter of renaming but of rearchitecture anyway.
 Well, the third thing that's going to be easier is that you'll understand the DIY aspects of actually running your Terraform, since it's not a service like CloudFormation. Wrangling states and so forth.
This way each environment is in its own directory which can have its own patches such as using a private load balancer instead of public for a staging environment or setting whatever environment variables that need to be different.
Then at the Argo CD level, Argo CD running in prod will look for an Application in the prod/ directory and Argo CD running in staging will look for an Application in the staging/ directory.
All in all you end up deploying the same artifact across environments and all of this is managed and exposed by Argo CD. For example you might want AutoSync enabled for staging so developers can auto-deploy their code but prod requires a tech lead or manager to push the sync button to deploy it.
The above works well in practice and isn't too complicated to pull off.
So in your example, suppose you changed something in your Kustomize `base/` directory. With _just_ Kustomize and Argo CD, those changes roll out to all environments all at once. With Kargo, you can progress the change and (re-)validate it environment-by-environment.
In our current setup it's not possible to rollout a new version of an Helm chart in dev, then in staging and finally in prod (without causing an out of sync state.
The rest is managed via values-$env.yaml, which works perfectly fine
In prod it wouldn't get rolled out until someone manually clicks the sync button for each app in that environment. But yes, in staging it would get AutoSync'd to all of the apps. I get what you're saying though.
It will be curious to see how Kargo works across multiple clusters in the staging vs prod example where both clusters each have their own Argo CD instance running but you want to view and promote an artifact across N clusters.
In any case, I'll give it a look when it's available to see how it can improve my current workflows. Overall I'm really happy I started using Argo CD ~2 years ago.
That's a great point and if that works for you, awesome! Note though that relies on the state of the prod Application to temporarily contradict the source of truth that is your repo. If you lose that prod Application and re-create it, prod will be in a state you hadn't yet blessed.
But thanks for pointing out the mismatch on the source of truth. Definitely something to consider.
With Kargo, it looks like it lets you define your preferred promotion process, and then lets you release each promotion with a single click. I think the part that is the most interesting to me though is that it writes the changes it is making directly to the git repo.
I didn't see an explanation of how Stages are different than jobs. Every single usage of Stage could have been replaced with job and the meaning would stay the same.
The data and DevOps marketing people really need to drop the buzzwordism.
Job doesn’t really encapsulate this at all. An alternative may be Metaenvironments.
For what it's worth, my colleagues and I have had great luck with Argo Workflows and wrote up a blog post about some of its advantages a few years ago: https://www.interline.io/blog/scaling-openstreetmap-data-wor...
Kargo has native support for Argo CD today, but one of it's goals is to be agnostic and could probably work with Dagger or other tools in the future.
But it looks to allow you to defines pipelines using your preferred imperative programming language -- which is awesome. I would totally use that for CI.
Kargo is really about divorcing CI and CD and creating something that's actually CD-focused and compatible with established GitOps principles.
You make per project makefile-like python pipelines that are executed by an installed runtime/docker container?
Then I will give it a try, I've been looking for a lightweight local ad-hoc jenkins substitute.
We disallow writing back to GitHub to avoid this issue, and manage stages through branches, combined with directories for overlays. Things can get out of sync, but comparing branches is easily automated.
In my experience it gets pretty hairy to build automated conditional promotion in a GitOps pipeline, especially if you don’t go all the way to continuous delivery and might have different versions at different stages between environments.
Can’t say I’m a fan of “we call it freight” though. Artefact is a perfectly fine word.
> Something like this is definitely needed in the GitOps space...always felt like something was missing between promoting things and rolling them out
> Interesting. Will share with my team and try it out.
> There is a webinar tomorrow with Kelsey Hightower! Here is the link if you want to join https:...
> Looks promising, I'm definitely going to take it for a spin!
Something tells me this isn't organically gaining traction...
Tighter integration with other Argo products? Why is this not simple a new component of Argo CI/CD? How much is "thought-leadership" a part of it, it is Kubernetes-adjacent, after all.
So far even the answers in this thread leaving me asking these questions even more strongly? If Argo CD is a "continuous deployment" tool, why isn't the priority making staged-rollouts a first class feature in Argo CD itself?
This is coming from a place of curiosity, not simply being a miser, I promise.
Want to go camping? You go on Amazon and order a tent. An Amazon truck _delivers_ it to you. Not the same as deployment. Deployment is when you pitch the tent.
Argo CD and Flux are great at Deployment. "Make this run in my cluster."
Neither of them addresses the other need -- delivery -- getting artifacts from point A to point B with adequate quality gates, etc. along the way.
Though, I guess maybe I just need Kargo and not the rest of Argo? Can someone confirm that? I'm here and bothering commenting because I do still see the value of "pipelines" and gating. Just, incredibly skeptical of new K8s-related products using k8sy buzzwords. And this announcement blogpost is dripping in them.
: https://discourse.nixos.org/t/nix-snapshotter-native-underst... (aka, just deploy nix paths as native containers and you get the power of the nix + nix-store)
I guess, gitops + gating/staging is compelling, but again, I don't see how this is a distinct product.