User:Timothee Flutre/Notebook/Postdoc/2012/09/12

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Autocreate 2012/09/12 Entry for User:Timothee_Flutre/Notebook/Postdoc)
(Entry title: first version (without code))
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-
==Entry title==
+
==Handling compressed files with gzip in C++==
-
* Insert content here...
+
 
 +
* It's more and more common in biology to handle large amount of data, and thus more and more required to work with compressed files. The [http://www.gzip.org/ gzip] programs offers a good balance between compression speed and size. That's why high-level languages such as [http://docs.python.org/library/gzip.html Python] and [http://stat.ethz.ch/R-manual/R-patched/library/base/html/connections.html R] natively provide ways to handle files compressed with gzip. But what about C++?
 +
 
 +
* The [http://www.cs.unc.edu/Research/compgeom/gzstream/ gzstream] library is often mentioned as a good solution. However, it's likely to be already installed neither on your machine, nor the one of people interested in your code. So you and them will have to install it (or you'll have to distribute it with your own package). Moreover, it doesn't support seek and is unlikely to do so in the near future. Also, if you want your code to successfully read files whether they are compressed or not, you will have to check the extension of the file name by yourself (.gz) and use ifstream if uncompressed and igzstream otherwise: messy...
 +
 
 +
* Why not using [http://www.zlib.net/ zlib] directly? If you work on Linux it's already installed (it's used by the Linux kernel), and if you work on Mac OS it's likely to be already there also. (And it even works on Windows, but who cares?) More importantly, if you're not a professional software developer, it happens to be pretty easy to use zlib. Below is an example code showing how I typically use it in my own C++ code..

Revision as of 17:47, 12 September 2012

Project name Main project page
Previous entry      Next entry

Handling compressed files with gzip in C++

  • It's more and more common in biology to handle large amount of data, and thus more and more required to work with compressed files. The gzip programs offers a good balance between compression speed and size. That's why high-level languages such as Python and R natively provide ways to handle files compressed with gzip. But what about C++?
  • The gzstream library is often mentioned as a good solution. However, it's likely to be already installed neither on your machine, nor the one of people interested in your code. So you and them will have to install it (or you'll have to distribute it with your own package). Moreover, it doesn't support seek and is unlikely to do so in the near future. Also, if you want your code to successfully read files whether they are compressed or not, you will have to check the extension of the file name by yourself (.gz) and use ifstream if uncompressed and igzstream otherwise: messy...
  • Why not using zlib directly? If you work on Linux it's already installed (it's used by the Linux kernel), and if you work on Mac OS it's likely to be already there also. (And it even works on Windows, but who cares?) More importantly, if you're not a professional software developer, it happens to be pretty easy to use zlib. Below is an example code showing how I typically use it in my own C++ code..



Personal tools