Java Apache Commons Text vulnerability

jabiko · on Oct 17, 2022

Its just affecting the Apache Commons Text component.

The title ("CVE-2022-42889: Java Apache Commons vulerability") is overly broad and suggests that somehow the whole Apache Commons project, which consists of over 40 components, has a vulnerability

matsemann · on Oct 17, 2022

Agreed. I've seen/used Apache Commons on almost all big java projects I've ever touched in my career. But never the Text component. So the heading makes it seem like a big deal, while it probably is not? Unless it's used a lot implicitly. But even then I'm not sure how many actually uses the string interpolation functions.

jillesvangurp · on Oct 17, 2022

It's one of the oldest set of OSS libraries for Java that is still useful and actively maintained. Some of this stuff dates back to when it was still called Apache Jakarta in the nineties. Things like commons-io, commons-lang, etc. are very widely use in almost any Java code base. The only thing more common is the standard library.

To be clear, this is about a handful of utility functions in one of the libraries. Unless you are actively using those functions, you should be fine. And if you are, you'd be misguided to pass in unvalidated input straight from your APIs. That's almost always a bad idea with any library that implements functionality like this.

And of course it's something any security auditing would check on principle to see what falls over. I've been on multiple such projects where we found issues like this with other forms of string interpolation. It's a common junior mistake to not handle input properly and only do happy path testing.

If you are using these functions with e.g. configuration files or other strings under your control, it's not a problem; as that's pretty much the way this stuff is intended to be used. Disabling this is biasing on the border of better safe than sorry; which is a good thing of course. But if you need it, it's nice to not have to reinvent the wheel.

The other thing that's useful to know is that the maintainers of the Apache commons libraries have been pretty good about maintaining API compatibility. Some of these libraries have very infrequent updates but it still happens. Usually, updating these libraries is very low risk. I've never really seen good reasons to not update to minor releases of any apache commons libraries. You get bug fixes, some new functions, and generally no breakage whatsoever. And in the rare case something needs fixing, it's probably for a good reason.

blincoln · on Oct 17, 2022

Commons itself has been around for awhile, but it looks like Commons Text was first released in 2017:

https://commons.apache.org/proper/commons-text/changes-repor...

If I read the disclosure correctly, it's been vulnerable for four of the five years it's been publicly available.

kitd · on Oct 17, 2022

Unless you are actively using those functions, you should be fine.

Up to a point. As with all CVEs, the issue tend to be whether your dependencies are using it, or their dependencies, etc, etc.

    mvn dependency:tree -Dverbose

is you friend!

lolinder · on Oct 17, 2022

Here is a list of packages that depend on commons-text [0]. Hadoop, Spark, and Apache Commons Configuration are some of the most popular projects to depend on commons-text. Not sure if they're using the vulnerable code paths, of course.

[0] https://mvnrepository.com/artifact/org.apache.commons/common...

19870213 · on Oct 17, 2022

Commons Configuration disabled the vulnerable parts by-default a couple of months ago.

gjvc · on Oct 17, 2022

The title ("CVE-2022-42889: Java Apache Commons vulerability") is overly broad and suggests that somehow the whole Apache Commons project, which consists of over 40 components, has a vulnerability.

Nobody familiar with Java would interpret it this way. Also it's probably correct. There are vulnerabilities lurking in all software of any size and functionality, and Apache Commons is no exception.

thesuperbigfrog · on Oct 17, 2022

Apache Commons has 43 components:

https://commons.apache.org/

If the vulnerability is just in one component, it is important to be specific about which one it is.

Does anyone know what depends on Commons Text?

Edit: "Artifacts using Apache Commons Text" according to Maven:

https://mvnrepository.com/artifact/org.apache.commons/common...

There are some well-known projects that might be impacted: Hadoop, Spark, Velocity, Hive, Solr, and many more.

jabiko · on Oct 17, 2022

> Nobody familiar with Java would interpret it this way

When I read the title in the morning I had a short moment of panic because we make extensive use of for example Apache Commons Collections in our projects. Updating these libraries and delivering the fixes to customers would have resulted in quite some work.

If the title would have been more accurate it would probably have saved me from a mini heart-attack in the morning.

tiarafawn · on Oct 17, 2022

This doc page [1] seems to have all the interesting API methods listed that could be exploitable. Looks like an attacker would need to be able to inject a malicious payload like `${script:xyz}` inside of a String template used for one of the functions that ultimately do String interpolate/replace/lookup actions. While not as trivial as the exploit path for Log4j, it seems conceivable that some applications have injection points here, especially if they perform multi-stage/recursive replace operations.

What's quite interesting is that the `env` Lookup is still enabled by default. If I understand correctly, this would imply that leaking environment variables is still possible even with the fix, if the attacker has an injection point and access to the return value of the vulnerable function.

[1] https://commons.apache.org/proper/commons-text/apidocs/org/a...

EdwardDiego · on Oct 17, 2022

> These lookups are: - "script" - execute expressions using the JVM script execution engine (javax.script) - "dns" - resolve dns records - "url" - load values from urls, including from remote servers Applications using the interpolation defaults in the affected versions may be vulnerable to remote code execution or unintentional contact with remote servers if untrusted configuration values are used

So, if you let untrusted users somehow set configuration, then you're vulnerable.

Never a bad idea to remove an attack surface, but damn, you'd have to plan 3 - 4 sprints ahead to mess up bad enough to let third parties set config files. (And besides, as log4j2 showed us, if you want to allow that, LDAP is how the cool kids do it.)

usrusr · on Oct 17, 2022

I see no reason to assume that configuration would be the only (or dominant) source of unchecked interpolation inputs. Might be a common use case, but certainly not exclusive. The big issue is that many projects, perhaps deep in the transitive dependency stack, use the library, likely just for some trivial missing convenience methods on the standard lib String (and in the StringUtils of commons-lang/lang3), and once it's on the classpath it will get used by every intern who ever committed something to the code. Most dependors will likely not use interpolation at all so the impact is much lower than in the log4j incident, but a lot of things will need to be checked and/or updated. Commons text is much less ubiquitous than commons lang, but that might just put it in the perfect position for a sneaky transitive presence.

According to the sentence before the of about untrusted configurations, "untrusted" configuration includes the default: unless your configuration is explicitly hardened, any call to the string interpolator with unsanitized content is an open door. (the open question is wether calls to the string interpolator exist)

EdwardDiego · on Oct 18, 2022

> I see no reason to assume that configuration would be the only (or dominant) source of unchecked interpolation inputs.

Given as it was the maintainers who lodged the CVE, and their description that I was quoting, I'm reasonably happy it's a correct statement.

I'll be honest, I haven't dived into the code to verify that myself.

But if you have, please feel free to share your findings :)

Maxious · on Oct 17, 2022

There was an unexpected package update released a couple of days ago, the first in over 2 years https://twitter.com/Y4tacker/status/1580193254665920513

hjanssen · on Oct 17, 2022

It seems to me that enabling string replacement with all lookups enabled by default would be a dangerous idea to begin with. Why would it be implemented that way?

Having a replacement that is based on arbitrary scripts (!) seems especially questionable to me, in my brain that is a niche use case and should be turned off by default.

Maybe we have to sharpen the awareness of the common developer to these kind of dangerous practices, like we did with SQL injection attacks where string concatenation to create your queries is generally frowned upon and is regarded as a bad practice industry-wide.

robertlagrant · on Oct 17, 2022

Off the cuff reaction: it's astonishing that this was considered a good idea.

I'm coming from the Explicit Is Better Than Implicit world, where things like Jinja2 keep templates doing templatey things, so perhaps I'm biased, but it seems incredible that anyone thought that allowing functionality like this to be accessible directly from string templating was a good idea.

eastbound · on Oct 17, 2022

In programming, every string must be turned into a Turing-complete machine. Then we invent a new framework that “just does string replacement” etc.

With so many eval() in the wild and viruses injecting random strings, I’m actually surprised so system became randomly sentient.

fHr · on Oct 17, 2022

Pretty accurate indeed. Log4j was a huge eyeopener though in regards of just slaping in dependencies in all codebases.

mhio · on Oct 17, 2022

Having a look through the larger dependents, Apache Flume fixed exposure to the issue in flume-ng-node a bit over a month ago, on 9th August.

https://github.com/apache/flume/commit/60561ffc

https://issues.apache.org/jira/browse/FLUME-3433

teddyh · on Oct 17, 2022

https://www.cve.org/CVERecord?id=CVE-2022-42889

lukax · on Oct 17, 2022

This seems to be the GitHub pull request fixing this vulnerability: https://github.com/apache/commons-text/pull/341

vbezhenar · on Oct 17, 2022

Why is it called a vulnerability. It’s a feature.

MattPalmer1086 · on Oct 17, 2022

It's an insecure feature, by default. Most people don't need it and it exposes you to serious security risk.

The fix is just to turn it off by default. If you need it, presumably you will have to ensure that untrusted or unsanitised user input can't reach it.

vbezhenar · on Oct 17, 2022

It might be true, but assigning CVE just because someone thinks that some API is not secure enough in some random library looks very strange to me. I understand that it's similar to log4shell vulnerability in that regard, but log4j was much more popular than this commons-text library. I'm Java developer and I used commons libraries many times, but I never heard about this particular library so I'm curious where that particular API is used.

MattPalmer1086 · on Oct 18, 2022

Renote code execution is about as serious as you can get. Any API that could result in that unintentionally should have a CVE assigned to it.

Vulnerabilities are not bugs. They are vulnerabilities. They often arise from bugs, but not always.

kerng · on Oct 17, 2022

Previous post with same link: https://news.ycombinator.com/item?id=33211721

richbell · on Oct 18, 2022

I'll do you one better. ;)

https://news.ycombinator.com/item?id=33204788

kerng · on Oct 20, 2022

Haha nice!

Bootvis · on Oct 17, 2022

This is very similar to the log4j vuln or not?

I had expected that all code like that would have been scrutinized immediately.

topspin · on Oct 17, 2022

> This is very similar to the log4j vuln or not?

Yes, it appears similar. It involves string interpolation that leads to arbitrary code execution via crafted values. It's also similar in that Apache Commons components are widespread and deeply embedded in innumerable backend systems.

I wonder how widespread this particular Commons component is in real world. I can't recall ever explicitly seeking it out as a direct dependency myself. However, it is probably a common transitory dependency. It shows up as being used by a few thousand other Java components on mvnrepository.com, although at first glance I didn't see things in the list of usages that made me panic.

The ones that stick out are Commons JPA, Apache ServiceMix and Apache Turbine and Struts 2.

Disclaimer: The above is not comprehensive. Just me clicking around a few minutes. Don't bet your career on it.

groestl · on Oct 17, 2022

Similar, yes, but (I might be wrong here) the severity in log4j stems from the fact that interpolation was happening on a logged string, and most users did expect this to be passed through unmodified (i.e. param1 in log.info(param1)).

Here, the interpolation happens on a string that is expected to be a template (it's even documented that way), so users would usually be cautious where the template originates from. Recursive interpolation also needs to be enabled explicitly.

emmelaich · on Oct 17, 2022

* vulnerability

(in title)

dang · on Oct 17, 2022

Fixed. Thanks!

matsemann · on Oct 17, 2022

And perhaps change to Apache Commons Text.

dang · on Oct 17, 2022

Ok, added now. Thanks!

exabrial · on Oct 17, 2022

^ yes, please and thank you