|
Unfit
3.1.1
Data fitting and optimization software
|
Reads in numeric data from a file. More...
#include <DataFileReader.hpp>
Public Member Functions | |
| DataFileReader () | |
| unsigned | ReadFile (std::string file_name, unsigned skip=0) |
| void | AddDelimiters (const std::string &new_delimiters) |
| unsigned | RetrieveColumn (std::size_t column_number, std::vector< T > &column, bool return_incomplete_columns=false) |
| std::vector< T > | RetrieveDataRowWiseAsVector () |
| void | SplitLine (const std::string &line, const std::string &delimiters, std::vector< std::string > &words) |
Public Attributes | |
| std::vector< std::vector< T > > | data |
| std::string | default_delimiters |
Reads in numeric data from a file.
The main goal of this object is to read numeric data from a file and to store it in an array (a vector of vectors). See the documentation for the ReadFile method for details. In addition, the string splitting method that is used internally can be called directly if that functionality is needed. It is also possible to access the data that has been read in and extract a column via the RetrieveColumn method. The key design consideration was that the implementation should be self contained, i.e., it does not use e.g. boost's split function. It is also a header-only implementation, meaning that you only have to include this header file in your code to access all of the functionality. You do not have to link against any extra libraries. The struct has been templated so you can choose to read the data into a container of any numeric data type. Normal truncation rules apply.
Extensive unit testing has been performed on this implementation using the UnitTest++ testing framework. The tests and example files can be found in the "unittests" directory. gcov reports 100% line and function coverage, and valgrind reports no memory errors or leaks for the test suite.
| Unfit::DataFileReader< T >::DataFileReader | ( | ) |
Create a DataFileReader with no data and initialise the default element delimiters to tab, space and comma.
Intended use: DataFileReader data_file_reader;
| void Unfit::DataFileReader< T >::AddDelimiters | ( | const std::string & | new_delimiters | ) |
The default delimiters used to split each line in the input file are tab, space, and comma. If you want to add to this list, call this method and pass in a quoted string. Each character is treated as a separate delimiter.
Intended use: data_file_reader.AddDelimiters( ";:" );
| new_delimiters | the additional delimiters to be used |
| unsigned Unfit::DataFileReader< T >::ReadFile | ( | std::string | file_name, |
| unsigned | skip = 0 |
||
| ) |
This method attempts to open the file name that is passed in and will read the (numeric) data inside and put it in a 2D array (a std::vector of std::vectors, called "data". The method makes no attempt to search for the file so the file name will need to include the appropriate path where needed.
The lines in the file are read in one at a time and each line is then stored in a vector of numbers, hence the the storage is row-wise. Non-numeric data in the file is ignored, as are blank lines. This means, for example, if your data file has one or more header rows, there is no need to remove them. The method will also read in lines of different lengths with no problem.
By default, the data in the file will be split on tabs, spaces and commas. You can add to this list via the AddDelimiter method. Checks are performed to make sure the data being read fits into the data type that has been specified, and exits with a non-zero return code if this is not the case.
If you read a second file in with the same data reader, the original data stored by the reader will be lost and replaced with the new data.
Intended use: DataFileReader data_file_reader; data_file_reader.ReadFile( "file_name" );
| file_name | a string containing the file name and path information |
| skip | an optional parameter to skip the first "skip" lines |
Return codes: 0 = data was read successfully 1 = file could not be opened 2 = number larger than the maximum for the data type 3 = number larger than the maximum negative for the data type, or writing a negative into an unsigned type 4 = file contains no data (numbers)
| unsigned Unfit::DataFileReader< T >::RetrieveColumn | ( | std::size_t | column_number, |
| std::vector< T > & | column, | ||
| bool | return_incomplete_columns = false |
||
| ) |
This method allows you to get a column from the data that has been read in and copies it to a std::vector, which has to be the same type as the reader (int reader = int vector, double reader = double vector, etc). The first argument is the index of the desired column, starting from zero. If the rows that were read in had irregular lengths, some columns will only be partially populated. In this case this function will flag this with a return code, and will not return the column.
Intended use: int rc = data_file_reader.RetrieveColumn( index, column );
| column_number | the index (from zero) of the desired column |
| column | a vector in which the requested column is placed |
| return_incomplete_columns | When false (default) returns an empty vector if the column is incomplete. When true returns the incomplete column |
Return codes: 0 = success 1 = reader contains no data 2 = requested column does not exist 3 = requested column exists, but is not fully populated
| std::vector< T > Unfit::DataFileReader< T >::RetrieveDataRowWiseAsVector | ( | ) |
This method allows you to return all of the data as a single 1D vector. The result will contain all of the rows appended one after the other. For example, if you have:
[1 2 3] [7 8 9] [4 5 6]
then the resulting vector will be [1 2 3 7 8 9 4 5 6].
| void Unfit::DataFileReader< T >::SplitLine | ( | const std::string & | line, |
| const std::string & | delimiters, | ||
| std::vector< std::string > & | words | ||
| ) |
This method is used internally by the ReadFile method, so you do not need to call this at all if all you want to do is read data from a file. However, it can be very useful if you have a string that you want to split into separate elements based on certain delimiters. Just pass the string of interest and the delimiter(s) you want to split on, and this method will perform the split and give you back a vector of strings. It will ignore multiple sequential delimiters (e.g. three spaces in a row does not create empty elements) and removes any preceding or trailing delimiters. When using multiple delimiters, each character is treated as a separate delimiter.
Intended use: data_file_reader.SplitLine( in_string, "\t ,", vector_of_words );
| line | the string to be split |
| delimiters | a list of the delimiters on which to split the line |
| words | a vector of strings, one for each word after the split |
| std::vector<std::vector<T> > Unfit::DataFileReader< T >::data |
An array to store the data that is read in
| std::string Unfit::DataFileReader< T >::default_delimiters |
Each line will be split on these delimiters
1.8.13