I needed a way to decipher the delimiter character for a csv file I was processing in Ruby.
In order to move massive amounts of data around between organizations we (at my current job) have found file transfer to be significantly more performant than web service calls. Therefore we often share data with our clients through Secure FTP. A particular project I was working on required parsing csv files that are provided by clients which could have different formats. So I needed an automated way to tell the csv parser whether to use commas, tabs or pipes in parsing without having to manually examine the files or setup some per-client configuration. That's why I wrote the csv_sniffer ruby gem. The word "sniffer" is meant to indicate that the determination of the properties of the csv file is done heuristically. Plus Python has a csv.Sniffer class that does many of the same things my Ruby csv_sniffer does. So the terminology is already familiar to developers in another dynamic language.
The design goals of csv_sniffer were:
Initially csv_sniffer started out with delimiter detection, quote enclosed values detection and quote enclosing character detection. I later added in header detection by porting the Python algorithm for header detection to Ruby.
All the source code, tests and usage examples are available on GitHub under the MIT license so it's free use, modify and redistribute. The gem itself is is published to rubygems.org and can be installed with a simple
In order to move massive amounts of data around between organizations we (at my current job) have found file transfer to be significantly more performant than web service calls. Therefore we often share data with our clients through Secure FTP. A particular project I was working on required parsing csv files that are provided by clients which could have different formats. So I needed an automated way to tell the csv parser whether to use commas, tabs or pipes in parsing without having to manually examine the files or setup some per-client configuration. That's why I wrote the csv_sniffer ruby gem. The word "sniffer" is meant to indicate that the determination of the properties of the csv file is done heuristically. Plus Python has a csv.Sniffer class that does many of the same things my Ruby csv_sniffer does. So the terminology is already familiar to developers in another dynamic language.
The design goals of csv_sniffer were:
- To answer the question the fastest, simplest and resource-conscious way possible at the same time maintaining as high of an accuracy standard as more complex methods
- To be easy to use
Initially csv_sniffer started out with delimiter detection, quote enclosed values detection and quote enclosing character detection. I later added in header detection by porting the Python algorithm for header detection to Ruby.
All the source code, tests and usage examples are available on GitHub under the MIT license so it's free use, modify and redistribute. The gem itself is is published to rubygems.org and can be installed with a simple
gem install csv_sniffer
.