Wouldn’t it make more sense to define dark matter as all the stuff that is installed in a container but never activated (unless exploited?)
That seems to specifically exclude software installed by, say, language-specific package managers (Cargo, Rubygems, npm and derivatives) -- which on the whole seems pretty perverse. Dealing with those does indeed complicate SBOM maintenance -- but people use them anyway for very good reasons (which sometimes include getting more secure versions of the packaged code!), and having tools that work in the real world requires dealing with that complexity, not wishing it away.
Also because of as they write about containers. In a container all files are tracked. That's the container.
"Dark matter" here is anything these tools can't see / notice vulnerabilities in.
Seems like really useless metric for containers.
I can get it for OSes (some packages there do manage DB data, and even have option to remove it when removing package) but for container it does seem a bit pointless
(But also, even ignoring that, I believe the metric used by the article is number-of-files, not byte-size. A DB might be large in byte-size, but is usually relatively negligible in number-of-files, usually holding individual table chunk files of 1GB or larger.)
The goal is, presumably, to figure out when a given docker image was created in such a way that it burns in a vulnerable version of some library; so that the author can be alerted that they need to (update their Dockerfile and) rebuild their image.
"Dark matter", under this definition, is anything that gets injected during the build process of the image, that is not itself traceable to some other versioned package management system with vulnerable-version deprecation. Without such information, an automated agent like the one described in the article cannot then propagate deprecations from consumed package-versions to produced image-tags.
A good example of such "dark matter" would be a static binary built outside the Dockerfile using a CI system, where the CI then creates a docker image by running a Dockerfile that simply injects the expected prebuilt binary into an image with an ADD stanza. Does that binary contain vulnerable versions of embedded static libraries? Who knows?
For Wordpress, most scanners will miss that PHP or Wordpress are even installed in the image. The scanners spit out lots of data, but it's only about what they can find, offering the illusion of completeness or transparency.
I also agree with your "wouldn't it make more sense" definition. From the perspective of a developer concerned about the security and robustness of their own deployment, "dark matter" would be anything that ends up in my container that I don't actually need to run the app in the container.
Eventually it will enable a Debian system to account for (in some way) every single file visible on the live system.
That appears to be a difficult definition.
I'm not sure what the software analogy is, except that there's no software dark matter, just people not familiar with their tools.
FileHasher (or whatever I called it) -- was basically a "poor man's antivirus utility" -- that is, it didn't scan memory, didn't check boot blocks, didn't scan system [E|EE]PROMS like BIOS, and it knew nothing about rootkits -- or how to detect them.
But what FileHasher did do was to take a point-in-time "metadata snapshot" -- of all of the files on my PC -- their path, their filename, their size, their date, and a custom 16 or 32 byte hash of their contents. This data was put into a single simple space or tab or comma delimited text file (a "poor man's database" <g>) which contained in its filename the date and time (as a string) when this file was generated.
The idea was, I'd run a completely fresh OS install. Then, as the absolute first thing I'd do after the OS install, I'd copy "FileHasher" onto my PC via USB drive, and run it to generate a metadata snapshot file of all of the system's files...
FileHasher could then be run at any time subsequent -- to generate an additional "point-in-time"
metadata snapshot information file.
Once two such files were created from two points in time -- FileHasher could compare them -- and list ALL files that had been created, deleted, or modified -- since the initial or previous run.
The idea was, that a virus, if it were to exist, would probably create/modify/delete at least one file -- and FileHasher in reporting mode (if used with diligence, say, before and after software installs, and at various other dates/times) -- would help a person with a keen eye -- in finding/identifying/fixing what the problem was, based on the list of created/deleted/modified files...
Tracking the Software Dark Matter in the various layers of container(ized) images -- sounds like a very similar (and good!) idea!
Will it solve every possible container security problem?
Probably not -- but it's a good step in the right direction!
(Was my "virus checker" perfect? No! But it was better than no virus checker! <g> ("A Little Bit Of Something" > "Nothing" -- you know, from Philosophy 101! <g>))