Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Java Apache Commons Text vulnerability (nist.gov)
112 points by daitangio on Oct 17, 2022 | hide | past | favorite | 37 comments


Its just affecting the Apache Commons Text component.

The title ("CVE-2022-42889: Java Apache Commons vulerability") is overly broad and suggests that somehow the whole Apache Commons project, which consists of over 40 components, has a vulnerability


Agreed. I've seen/used Apache Commons on almost all big java projects I've ever touched in my career. But never the Text component. So the heading makes it seem like a big deal, while it probably is not? Unless it's used a lot implicitly. But even then I'm not sure how many actually uses the string interpolation functions.


It's one of the oldest set of OSS libraries for Java that is still useful and actively maintained. Some of this stuff dates back to when it was still called Apache Jakarta in the nineties. Things like commons-io, commons-lang, etc. are very widely use in almost any Java code base. The only thing more common is the standard library.

To be clear, this is about a handful of utility functions in one of the libraries. Unless you are actively using those functions, you should be fine. And if you are, you'd be misguided to pass in unvalidated input straight from your APIs. That's almost always a bad idea with any library that implements functionality like this.

And of course it's something any security auditing would check on principle to see what falls over. I've been on multiple such projects where we found issues like this with other forms of string interpolation. It's a common junior mistake to not handle input properly and only do happy path testing.

If you are using these functions with e.g. configuration files or other strings under your control, it's not a problem; as that's pretty much the way this stuff is intended to be used. Disabling this is biasing on the border of better safe than sorry; which is a good thing of course. But if you need it, it's nice to not have to reinvent the wheel.

The other thing that's useful to know is that the maintainers of the Apache commons libraries have been pretty good about maintaining API compatibility. Some of these libraries have very infrequent updates but it still happens. Usually, updating these libraries is very low risk. I've never really seen good reasons to not update to minor releases of any apache commons libraries. You get bug fixes, some new functions, and generally no breakage whatsoever. And in the rare case something needs fixing, it's probably for a good reason.


Commons itself has been around for awhile, but it looks like Commons Text was first released in 2017:

https://commons.apache.org/proper/commons-text/changes-repor...

If I read the disclosure correctly, it's been vulnerable for four of the five years it's been publicly available.


Unless you are actively using those functions, you should be fine.

Up to a point. As with all CVEs, the issue tend to be whether your dependencies are using it, or their dependencies, etc, etc.

    mvn dependency:tree -Dverbose
is you friend!


Here is a list of packages that depend on commons-text [0]. Hadoop, Spark, and Apache Commons Configuration are some of the most popular projects to depend on commons-text. Not sure if they're using the vulnerable code paths, of course.

[0] https://mvnrepository.com/artifact/org.apache.commons/common...


Commons Configuration disabled the vulnerable parts by-default a couple of months ago.


The title ("CVE-2022-42889: Java Apache Commons vulerability") is overly broad and suggests that somehow the whole Apache Commons project, which consists of over 40 components, has a vulnerability.

Nobody familiar with Java would interpret it this way. Also it's probably correct. There are vulnerabilities lurking in all software of any size and functionality, and Apache Commons is no exception.


Apache Commons has 43 components:

https://commons.apache.org/

If the vulnerability is just in one component, it is important to be specific about which one it is.

Does anyone know what depends on Commons Text?

Edit: "Artifacts using Apache Commons Text" according to Maven:

https://mvnrepository.com/artifact/org.apache.commons/common...

There are some well-known projects that might be impacted: Hadoop, Spark, Velocity, Hive, Solr, and many more.


> Nobody familiar with Java would interpret it this way

When I read the title in the morning I had a short moment of panic because we make extensive use of for example Apache Commons Collections in our projects. Updating these libraries and delivering the fixes to customers would have resulted in quite some work.

If the title would have been more accurate it would probably have saved me from a mini heart-attack in the morning.


This doc page [1] seems to have all the interesting API methods listed that could be exploitable. Looks like an attacker would need to be able to inject a malicious payload like `${script:xyz}` inside of a String template used for one of the functions that ultimately do String interpolate/replace/lookup actions. While not as trivial as the exploit path for Log4j, it seems conceivable that some applications have injection points here, especially if they perform multi-stage/recursive replace operations.

What's quite interesting is that the `env` Lookup is still enabled by default. If I understand correctly, this would imply that leaking environment variables is still possible even with the fix, if the attacker has an injection point and access to the return value of the vulnerable function.

[1] https://commons.apache.org/proper/commons-text/apidocs/org/a...


> These lookups are: - "script" - execute expressions using the JVM script execution engine (javax.script) - "dns" - resolve dns records - "url" - load values from urls, including from remote servers Applications using the interpolation defaults in the affected versions may be vulnerable to remote code execution or unintentional contact with remote servers if untrusted configuration values are used

So, if you let untrusted users somehow set configuration, then you're vulnerable.

Never a bad idea to remove an attack surface, but damn, you'd have to plan 3 - 4 sprints ahead to mess up bad enough to let third parties set config files. (And besides, as log4j2 showed us, if you want to allow that, LDAP is how the cool kids do it.)


I see no reason to assume that configuration would be the only (or dominant) source of unchecked interpolation inputs. Might be a common use case, but certainly not exclusive. The big issue is that many projects, perhaps deep in the transitive dependency stack, use the library, likely just for some trivial missing convenience methods on the standard lib String (and in the StringUtils of commons-lang/lang3), and once it's on the classpath it will get used by every intern who ever committed something to the code. Most dependors will likely not use interpolation at all so the impact is much lower than in the log4j incident, but a lot of things will need to be checked and/or updated. Commons text is much less ubiquitous than commons lang, but that might just put it in the perfect position for a sneaky transitive presence.

According to the sentence before the of about untrusted configurations, "untrusted" configuration includes the default: unless your configuration is explicitly hardened, any call to the string interpolator with unsanitized content is an open door. (the open question is wether calls to the string interpolator exist)


> I see no reason to assume that configuration would be the only (or dominant) source of unchecked interpolation inputs.

Given as it was the maintainers who lodged the CVE, and their description that I was quoting, I'm reasonably happy it's a correct statement.

I'll be honest, I haven't dived into the code to verify that myself.

But if you have, please feel free to share your findings :)


There was an unexpected package update released a couple of days ago, the first in over 2 years https://twitter.com/Y4tacker/status/1580193254665920513


It seems to me that enabling string replacement with all lookups enabled by default would be a dangerous idea to begin with. Why would it be implemented that way?

Having a replacement that is based on arbitrary scripts (!) seems especially questionable to me, in my brain that is a niche use case and should be turned off by default.

Maybe we have to sharpen the awareness of the common developer to these kind of dangerous practices, like we did with SQL injection attacks where string concatenation to create your queries is generally frowned upon and is regarded as a bad practice industry-wide.


Off the cuff reaction: it's astonishing that this was considered a good idea.

I'm coming from the Explicit Is Better Than Implicit world, where things like Jinja2 keep templates doing templatey things, so perhaps I'm biased, but it seems incredible that anyone thought that allowing functionality like this to be accessible directly from string templating was a good idea.


In programming, every string must be turned into a Turing-complete machine. Then we invent a new framework that “just does string replacement” etc.

With so many eval() in the wild and viruses injecting random strings, I’m actually surprised so system became randomly sentient.


Pretty accurate indeed. Log4j was a huge eyeopener though in regards of just slaping in dependencies in all codebases.


Having a look through the larger dependents, Apache Flume fixed exposure to the issue in flume-ng-node a bit over a month ago, on 9th August.

https://github.com/apache/flume/commit/60561ffc

https://issues.apache.org/jira/browse/FLUME-3433



This seems to be the GitHub pull request fixing this vulnerability: https://github.com/apache/commons-text/pull/341


Why is it called a vulnerability. It’s a feature.


It's an insecure feature, by default. Most people don't need it and it exposes you to serious security risk.

The fix is just to turn it off by default. If you need it, presumably you will have to ensure that untrusted or unsanitised user input can't reach it.


It might be true, but assigning CVE just because someone thinks that some API is not secure enough in some random library looks very strange to me. I understand that it's similar to log4shell vulnerability in that regard, but log4j was much more popular than this commons-text library. I'm Java developer and I used commons libraries many times, but I never heard about this particular library so I'm curious where that particular API is used.


Renote code execution is about as serious as you can get. Any API that could result in that unintentionally should have a CVE assigned to it.

Vulnerabilities are not bugs. They are vulnerabilities. They often arise from bugs, but not always.


Previous post with same link: https://news.ycombinator.com/item?id=33211721



Haha nice!


This is very similar to the log4j vuln or not?

I had expected that all code like that would have been scrutinized immediately.


> This is very similar to the log4j vuln or not?

Yes, it appears similar. It involves string interpolation that leads to arbitrary code execution via crafted values. It's also similar in that Apache Commons components are widespread and deeply embedded in innumerable backend systems.

I wonder how widespread this particular Commons component is in real world. I can't recall ever explicitly seeking it out as a direct dependency myself. However, it is probably a common transitory dependency. It shows up as being used by a few thousand other Java components on mvnrepository.com, although at first glance I didn't see things in the list of usages that made me panic.

The ones that stick out are Commons JPA, Apache ServiceMix and Apache Turbine and Struts 2.

Disclaimer: The above is not comprehensive. Just me clicking around a few minutes. Don't bet your career on it.


Similar, yes, but (I might be wrong here) the severity in log4j stems from the fact that interpolation was happening on a logged string, and most users did expect this to be passed through unmodified (i.e. param1 in log.info(param1)).

Here, the interpolation happens on a string that is expected to be a template (it's even documented that way), so users would usually be cautious where the template originates from. Recursive interpolation also needs to be enabled explicitly.


* vulnerability

(in title)


Fixed. Thanks!


And perhaps change to Apache Commons Text.


Ok, added now. Thanks!


^ yes, please and thank you




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: