Hi,
Is there any resource on Web or even any book that documents all the risks. I had once seen a talk given by you .. was good !! Any other such resources out there ??
Make damned sure you know what you're doing. That means making sure you have enough memory allocated to avoid overflows, and that any input is sanitized before putting it down. Meaning, if you're using a function that's expecting a null terminated string, make SURE it's null terminated before copying. Or that you know the exact length to pass into a length specified function.
The problem isn't necessarily the functions themselves, it's coders who make assumptions that don't pan out to be true.
Let's assume your string struct is solid. Then does that mean you can safely use it with `printf, fprintf, sprintf` (e.g. printf("%s", string->value)? Or must you also write custom versions of those functions? How deep does this rabbit hole go?
You don't have to write custom versions of any of those functions; just use the char pointer in the struct instead of a bare char pointer. Keeping track of the length of your strings gives an easy way to provide the 'n' in all of those 'n' functions, and has other advantages besides. But the use of such a struct in and of itself, of course, provides no guarantees of safety. There is no such thing in C anyway :)
if you want it space padded. In principle you can bound your space usage and avoid an snprintf with such constructs; in practice, it's probably better to still use snprintf (if you're using standard-library string functions at all).
Unfortunately, I don't know of one. My recent C work has worked with text only in a very limited capacity (parsing and building packets in an ascii format - for the later, vectorized write buffers are a poor-man's ropes).
The alternative to gets() is fgets(). In fact, I'm pretty sure the gets() function is completely deprecated in the C11 standard.
However, despite the n functions being generally safer, you're still propagating misinformation by touting them as secure alternatives.
The original use of the n functions was to manipulate strings in matters of fixed size arrays. If you don't know what you're doing and just blindly use strncpy() as a strcpy() replacement, you could end up truncating your strings.
OpenBSD's l functions, on the other hand, were specifically designed with security in mind.
The n in strncpy describes to what size the destination buffer (not string) should be padded with '\0'.
This is useful for copying a string into a fixed size buffer that is sent over the network, to give an example. It's not what the programmer generally means when using strncpy.
Many programmers are surprised when they learn that strncpy() really writes 1M-strlen("abc") zeros in the 1M char array every time it's called...
"The n in strncpy describes to what size the destination buffer (not string) should be padded with '\0'."
This is false.
From the man page: "The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated."
And the following code prints foo:ar on my system.
Edited to add: huh, scratch that. Obvious error in above test :-P. Testing it with strncat like I had meant to, it seems it is in fact padded, not just (possibly) terminated. Interesting, and very worth knowing if you are trying to move a probably small string to a large buffer under time pressure.
What is false? I'm not talking about the source string not being terminated when it's too long. That's obvious and there are plenty posts about it in this thread.
Not sure what you want to say with the example code. Maybe swap it for strncpy and strlcpy and see whether that matches your expectations?
Yeah, my bad, had meant to use strncpy. My contention had been that it only zeroes the first following character (if that). On closer reading of the man page, it is in fact clear.
As somebody who did little other than C for just over a decade, my answer would be "FreePascal", which lets you do similar low level things, but has a few "safeties" built in. Objective C looks to be a little better with strings as well, but I'm not very familiar with it. (dabbled in iOS years ago)
The people who brought us the [in]famous "Why Pascal is not my favorite language" article would have done well to look at their own glass house.
OTOH, C does make a great portable assembler if you are using it to implement another language (which is exactly what we did at one of my jobs in the early 90s)
... Had to explain to my daughter this morning why I was laughing at the "abstinence" posters ...
I haven't used pascal for closing in on 20 years, and don't miss it one bit.
For you younger people reading this who have never been exposed to pascel, go dig up that article and scroll down to section 2.1 and just think about that for a minute. Ask yourself 'is this the kind of computing environment I want to work in?'
FYI: FreePascal has Dynamic Arrays that are automatically resized by the runtime, which addresses section 2.1.
2.4 - separate compilation was added in Turbo Pascal 4. (one could argue that the result is Modula, rather than Pascal - so be it)
2.2 - initialization of module data was added in Turbo Pascal 5. Yes, "static" data has to be at the module level instead of hidden within individual routines. Bug or feature? (let the jihad/crusade commence...)
Serious question: Aren't the C-based problems simply hidden from the programmer there? That is, the problems still exist, but you can no longer address them, qua Python, Ruby, Java programmer. That seems worse, not better.
Depends on implementation. CPython may have problems due to C bugs. Java is self-bootstrapping, though, and has no relation to C except for interfacing with external programs using C calling conventions. In this case problems are not hidden, they are truly not existing, except for problems introduced by other programs you interface with - but not your code.
You're quite wrong regarding Java. Much of its standard library is implemented in C and C++, and there have been frequent security vulnerabilities found in the language that are a result of buffer overflows or similar memory corruption bugs within the underlying C/C++ implementation.
What should I use instead of these things if they're so dangerous?