Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(Not a C person here, but I'll ask anyway)

What should I use instead of these things if they're so dangerous?



Almost every one has a n version you should use, and their man pages will tell you that.

strcpy -> strncpy (or strlcpy on BSD)

strcat -> strncat (or strlcat on BSD)

sprintf -> snprintf (but still watch out for printf format attacks)

gets -> something else entirely. The man page says "programs should NEVER use gets()".


They told me if I used the n version I'd be safe. They told me the risks were tiny. That all we had to do was practice safe counting.

They told me that... ˚sob˚ but they were wrong.


Hi, Is there any resource on Web or even any book that documents all the risks. I had once seen a talk given by you .. was good !! Any other such resources out there ??


I like _The Art of Software Security Assessment_ for this.


Good for laughs. Thanks. Now the serious follow up: Ok, what else should someone do?

Programs (often) handle text. Apparently, that pretty much means you're fucked. So what is a reasonable way to write such programs?


Make damned sure you know what you're doing. That means making sure you have enough memory allocated to avoid overflows, and that any input is sanitized before putting it down. Meaning, if you're using a function that's expecting a null terminated string, make SURE it's null terminated before copying. Or that you know the exact length to pass into a length specified function.

The problem isn't necessarily the functions themselves, it's coders who make assumptions that don't pan out to be true.


Do all of your string work with these guys:

    struct string {
      char *str;
      unsigned length;
    }


Let's assume your string struct is solid. Then does that mean you can safely use it with `printf, fprintf, sprintf` (e.g. printf("%s", string->value)? Or must you also write custom versions of those functions? How deep does this rabbit hole go?


You don't have to write custom versions of any of those functions; just use the char pointer in the struct instead of a bare char pointer. Keeping track of the length of your strings gives an easy way to provide the 'n' in all of those 'n' functions, and has other advantages besides. But the use of such a struct in and of itself, of course, provides no guarantees of safety. There is no such thing in C anyway :)


You'd probably want

    "%.*s"
or possibly even

    "%*.*s"
if you want it space padded. In principle you can bound your space usage and avoid an snprintf with such constructs; in practice, it's probably better to still use snprintf (if you're using standard-library string functions at all).


If you're able, don't treat text as zero-terminated char arrays. Ideally, you'd have a well fleshed-out library for encoding-aware ropes.


Which libraries do you recommend?


Unfortunately, I don't know of one. My recent C work has worked with text only in a very limited capacity (parsing and building packets in an ascii format - for the later, vectorized write buffers are a poor-man's ropes).

Edited to add:

Apparently there is this: http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh...


I recommend bstring for length-prefixed strings in C:

http://bstring.sourceforge.net/


Bstring relies on undefined behavior for security. Don't use it if you care about security.


The alternative to gets() is fgets(). In fact, I'm pretty sure the gets() function is completely deprecated in the C11 standard.

However, despite the n functions being generally safer, you're still propagating misinformation by touting them as secure alternatives.

The original use of the n functions was to manipulate strings in matters of fixed size arrays. If you don't know what you're doing and just blindly use strncpy() as a strcpy() replacement, you could end up truncating your strings.

OpenBSD's l functions, on the other hand, were specifically designed with security in mind.


Keeping in mind that strncpy has it's own problems, like not null terminating the buffer if it would overflow.


I guess the whole point is don't rely on null terminated strings in your code.


This is why you should use strlcpy()!


The point of the n is that you specify the maximum length that it's allowed to copy. It's slightly harder to mess that up than with regular strcpy.


The n in strncpy describes to what size the destination buffer (not string) should be padded with '\0'.

This is useful for copying a string into a fixed size buffer that is sent over the network, to give an example. It's not what the programmer generally means when using strncpy.

Many programmers are surprised when they learn that strncpy() really writes 1M-strlen("abc") zeros in the 1M char array every time it's called...


"The n in strncpy describes to what size the destination buffer (not string) should be padded with '\0'."

This is false.

From the man page: "The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated."

And the following code prints foo:ar on my system.

    char buffer[10];
    strcpy(buffer, "foobar");
    strcpy(buffer, "foo");
    printf("%s:%s\n", buffer, buffer+4);
Edited to add: huh, scratch that. Obvious error in above test :-P. Testing it with strncat like I had meant to, it seems it is in fact padded, not just (possibly) terminated. Interesting, and very worth knowing if you are trying to move a probably small string to a large buffer under time pressure.


What is false? I'm not talking about the source string not being terminated when it's too long. That's obvious and there are plenty posts about it in this thread.

Not sure what you want to say with the example code. Maybe swap it for strncpy and strlcpy and see whether that matches your expectations?


Yeah, my bad, had meant to use strncpy. My contention had been that it only zeroes the first following character (if that). On closer reading of the man page, it is in fact clear.


exactly.



As somebody who did little other than C for just over a decade, my answer would be "FreePascal", which lets you do similar low level things, but has a few "safeties" built in. Objective C looks to be a little better with strings as well, but I'm not very familiar with it. (dabbled in iOS years ago)

The people who brought us the [in]famous "Why Pascal is not my favorite language" article would have done well to look at their own glass house.

OTOH, C does make a great portable assembler if you are using it to implement another language (which is exactly what we did at one of my jobs in the early 90s)

... Had to explain to my daughter this morning why I was laughing at the "abstinence" posters ...


I haven't used pascal for closing in on 20 years, and don't miss it one bit.

For you younger people reading this who have never been exposed to pascel, go dig up that article and scroll down to section 2.1 and just think about that for a minute. Ask yourself 'is this the kind of computing environment I want to work in?'


FYI: FreePascal has Dynamic Arrays that are automatically resized by the runtime, which addresses section 2.1.

2.4 - separate compilation was added in Turbo Pascal 4. (one could argue that the result is Modula, rather than Pascal - so be it)

2.2 - initialization of module data was added in Turbo Pascal 5. Yes, "static" data has to be at the module level instead of hidden within individual routines. Bug or feature? (let the jihad/crusade commence...)

As the commentary on http://c2.com/cgi/wiki?WhyPascalIsNotMyFavoriteProgrammingLa... points out, the critique was against the 1981 academic version of Pascal.


For C: strncpy(), strncat(), snprintf() (or the non-standard asprintf()), and fgets(), respectively. I believe pickle is a Python thing.


The sa[fn]e alternative to pickling your objects is to write your data into a well defined file format and read it back with stringent input checks.


It's worth noting that strncpy doesn't always null-terminate strings (on some platforms), so strlcpy is preferable.


Python, Ruby, Java... ;-)


Serious question: Aren't the C-based problems simply hidden from the programmer there? That is, the problems still exist, but you can no longer address them, qua Python, Ruby, Java programmer. That seems worse, not better.


Depends on implementation. CPython may have problems due to C bugs. Java is self-bootstrapping, though, and has no relation to C except for interfacing with external programs using C calling conventions. In this case problems are not hidden, they are truly not existing, except for problems introduced by other programs you interface with - but not your code.


You're quite wrong regarding Java. Much of its standard library is implemented in C and C++, and there have been frequent security vulnerabilities found in the language that are a result of buffer overflows or similar memory corruption bugs within the underlying C/C++ implementation.

There are many, but here's one such example from last year: http://osvdb.org/show/osvdb/94336




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: