If you want POSIX shell you'll have at least 5K lines of parsing code; if you want bash it's at least 10K lines. It's closer to 20K lines of C in bash itself.
There's really no way around that, and IMO the best answer is to use a different language -- which is ALSO hard, because many language runtimes don't support fork() or signals in the way that a shell needs.
(e.g. CPython is actually closer than say Go because it supports fork() and exec(), but even it has issues with signals, EINTR, etc.)
I wrote a bunch of posts on how Oil does it:
How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html
posts tagged #parsing-shell: https://www.oilshell.org/blog/tags.html?tag=parsing-shell#pa...
Oil Is Being Implemented "Middle Out" https://www.oilshell.org/blog/2022/03/middle-out.html
bash is actually one of the only shells that uses yacc, and the maintainer regards it as a mistake. It uses yacc for maybe 1/4 of the language and the rest is all hand written stuff intertwined with generated code. It's pretty messy.
and e.g. https://www.oilshell.org/blog/2016/10/13.html
Despite that, fish-shell still uses a traditional handwritten recursive descent parser. Link if you want to see: https://github.com/fish-shell/fish-shell/blob/master/src/ast...
Some people (myself included) enjoy "unsafe" languages like C. I'm not one of those people that argue that being careful is enough. For applications where security really matters, _please_ use something with a bit more verification (though even that doesn't disqualify C, see seL4).
Now take a singleplayer game, or a text editor. It's not a security risk if these crash, so do you need the safety? I'd argue it's unnecessary, I can't remember a time where I saw a program like this print 'segmentation fault'.
I encourage anyone to write new software in "unsafe" languages, so long as it's not a security risk.
Any program that operates on untrusted data can be a security vulnerability. If an attacker can make your text editor execute arbitrary code if you open a specially crafted file, that's a major security problem. Why would you create the risk of this sort of problem on purpose when we have adequate safe languages these days?
Instead of making people decide on a case-by-case basis when "security really matters", let's make all programs safe. I mean, isn't your suggestion that text editors have not security implications evidence in itself that people will get it wrong when asked, "Does my program need security?".
I am starting to wonder lately if all this "implicit language security" (use Rust, use Go so you don't have memory errors, overflows, etc.) is not just some way to shift accountability to some other layer.
I do understand that lower level languages require better programming skills because you actually need to know what you are doing, unlike Python which generally shields you from a lot of ugly things, but that's about it. You can do bad shit in Python/Java too. And that happens like A LOT.
So what are we actually protecting our services/software from? Reducing the attack surface, totally agreed. But then log4j...
Or are we moving to more "user-friendly" languages because they don't require such an amount of knowledge?
I don't know. I do see the value Rust and Go create, but if we follow good software practices in C, don't you think we could ship decent safe software there too? Or are all C programs inherently buggy by default?
Does this mean that eliminating memory errors is not worth it? I don’t think so. In C you have both memory errors and business logic bugs. So it takes more effort to get C code right than Rust.
No, it's a way to eliminate THE MOST COMMON CLASS OF SECURITY BUG. Boom, gone, because you used safe Rust instead of C.
It doesn't mean you won't discover other bugs -- you will. But those bugs might have lurked undiscovered because you were too busy fighting buffer overflows and UAFs. Any given team of engineers has only so much time and energy; with fewer bug types to eliminate, the same team can get closer to bug-free.
This is why the My Little Pony character alter of your average 25-year-old trans furry plural system wearing uwu kawaii programming socks, working in Rust, can code circles around even the most jaded C grognard with decades of experience -- and will be writing an OS kernel or driver near you.
> I don't know. I do see the value Rust and Go create, but if we follow good software practices in C, don't you think we could ship decent safe software there too? Or are all C programs inherently buggy by default?
All C programs but the most trivial are inherently buggy by default. As I put it, C is unsafe at any speed. Theoretically, it should be possible to establish sound disciplines and best practices to ensure safe C code, but experience has taught us that C is so full of potholes and footguns that it is practically impossible to write safe C even for experienced developers.
I frankly haven't found an ecosystem in which I feel more comfortable than the one from C.
Yes, C has its vulnerabilities, but for my own projects I do in my own time, I will use any language I have fun with, even if it has huge problems.
Same when people post "I built X with Language Y" and some one comments "Why did you use Y? You should have used Z". What difference does it make? You don't like it don't use it!
Don't get me wrong, I'm all for constructive criticism but sometimes the comments do not come across as criticisms but as attacks.
Again, just my opinion.
“Some People Were Meant for C”
This reads like an article in Social Text.
> Meditating on this communicativity suddenly gave way to a
realisation: C is designed for communicating with aliens!
The memes make themselves.
While the Rust definitely is way more helpful than C, when it comes to writing secure code.
I'd argue that generally the following holds:
It's not possible for humans to write correct and secure code on an ongoing and consistent basis.
This in my ears echoes the understanding that humans aren't fully rational creatures, we just convince ourselves we are most of the time. So it would follow that we wouldn't be able to write secure (i.e. rational in its own context) code consistently.
"Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows."
"I understand that many people, particularly those already enamored with Rust, won’t agree with much of this article. But now you know why we are still writing C, and hopefully you’ll stop bloody bothering us about it."
The problem is that when you write a program in C for the public, this program's buffer overflows and segfaults aren't a problem only for you, but also for everyone around you. Security vulnerabilities are a serious problem. You can think of them as a form of software pollution: "Safety. Yes. Asbestos is unsafe. I don't really care. In light of all the these problems with fiberglass, I'll take my lung cancer and expensive structure remediation".
See what I mean? We all have an interest in secure software, and the aesthetic preferences expressed in the article to which you've linked have to take a back seat to ecosystem robustness and information security.
Unfortunately, this pro-C cowboy attitude is entrenched in this industry. It's going to take a lot of retirements to move us forward.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
When I first heard it, I smiled. Then I thought about it...