big maxsi → small maxsi

Sortix libdeflate

libdeflate is a general purpose data compression library that provides the deflate compression algorithm. Both the zlib container format and raw deflate streams are provided.

Sortix libdeflate is developed as part of the Sortix operating system by Jonas 'Sortie' Termansen (sortie@maxsi.org) and contributors.

libdeflate provides a well designed API that is simple and safe to use. There are no hidden caveats with too small data types leading to truncation and overflow. In addition to a powerful core API, there are many useful utility functions that make common tasks trivial. This library strives to use a robust coding style aiming to reduce bugs both inside the library itself, but importantly also in uses of the library.

This is the transitional edition of libdeflate that uses zlib.h as a backend.

A prototype has appeared in the git repository.

This is Sortix libdeflate. This library is licensed under the ISC license (see deflate.h) and depends on your system libz, which is probably under the zlib license.

Sortix libz is a modernized zlib fork that retains zlib.h compatibility but strives to have safer and cleaner internals.

zlib.h design issues

libdeflate exists because the zlib.h API is dangerous because it is hard to use it correctly, especially for large input files. In particular, it has unfixable bugs (without breaking compatibility) such as using unsigned int instead of size_t. This leads to potential truncation and overflow issues in application code that does not take care to properly pass very large inputs in smaller chunks. There are probably a lot of such 64-bit issues in application code.

zlib.h also heavily uses custom needless typedefs for far segmented pointers, something we haven't had to deal with since the 16-bit days. There are const correctness issues in the API that are haphazardly fixed with z_const. The API attempts to avoid to take advantage of system headers like <stddef.h> and thus uses Z_NULL and unsigned int instead of NULL and size_t.

However, it manages to use off_t which would be great except for transitional Linux Large File Support madness, which it tries to solve by providing 32-bit and 64-bit editions of oft_t interfaces, poorly. The whole off_t situation is so fundamentally broken that libraries should not use off_t.

The zlib.h support for gzip is clunky to use correctly. The stdio-like interface for easy gzip processing contains many design issues that makes it subtly different from stdio. For instance, use of unsigned int and int instead of size_t as well as subtly different semantics from the expected. Special wide char versions of file opening interfaces has to be supplied for Windows. The printf interfaces look okay at first, but zlib is willing to use the dangerous vsprintf call rather than the safe vsnprintf it is lacking. Additionally it does not properly handle strings larger than 8192 bytes, which leads to truncation issues. The gzip printf interfaces are thus both dangerous and broken. libdeflate does away with this whole class of issues by not doing any IO and instead moves gzip container support to a potential libgzip library.

Finally, I've examined a bunch of common program's use of zlib.h and the result is scary. The unsigned int issues is generally not worked around when it needs to be. There's probably truncation and overflow issues. Library functions are often not even error checked. It would not be surprising if there are security issues here.

Migrating from zlib.h

Application code written for the zlib.h API can be adapted to equivalent code for the deflate.h API. Exercise caution: The zlib.h API uses suboptimal data types (such as using unsigned int instead of size_t) and a lot of application code using zlib contains subtle overflow or 64-bit issues, or even failing to check for error conditions. Please review for any such defects while porting to libdeflate and think about security considerations. libdeflate is designed to avoid these issues and you should do right by using libdeflate robustly.

Contact

libdeflate development is coordinated in #libz on freenode IRC.

Jonas 'Sortie' Termansen can be contacted at sortie@maxsi.org.

I am an operating systems developer and not a web developer. Appreciate the correctness and simplicity of this website.