Almost two years ago, colleague of mine, had a problem. At the beginning seemed to be trivial thing. He had to check data file, fixed size text file where every line contained distinct client information. File had about 12 million lines and every line had 210 characters length. After a few text editors failed to open it, with or without error message, turned out that just viewing a few hundred lines is not so trivial task. At the time, he was working as Quality Assurance Specialist (or something like that) and on daily bases he had to check data files coming in and going out. To “check” file he had to import it into SQL server and run queries on it. Needless to say, if you move file into database it’s not necessarily the same file.
Text file structure and size checking sounds like a trivial task. But is it really?
- Let’s take the most trivial tasks of all – count number of lines in the text file. If file has 100 lines of data lines might be counted manually, for 10,000 spread sheet or some of the text editors will do the job to the certain extend but for 1 Million lines hardly there is any other solution available on the market today. Text file can be imported into SQL Server but after that it’s really not that file. It’s table in the SQL server. Conversion easily can make one broken line, in the text file, to become two lines in SQL server;
- Or checking if each and every line has the same number of characters (for fixed size files) or the same number of fields (for delimited files). Even for delimited text file with 100 lines and 10 fields per line it can be time consuming process and very annoying. Some spreadsheets will open comma delimited text file very quickly but they will also assume that lines with different number of fields should be like that. It’s not spreadsheet software bug. Those programs are designed to behave like that. TextMaster is designed to check line length of every line, determine it’s length or number of fields are report those counts per line length / field count;
- Pretty often spot checking of small number of lines randomly selected from the big file is very handy feature but very hard to find. TextMaster can export to another file randomly selected lines based on user criteria very quickly and easily. If it’s not the only one than it belongs to the group of a few software tools capable of doing it;
- From time to time file received will have a file name without too much meaning, if any, and with undefined structure. If it was processed before and saved in the profile within a few seconds structure and purpose can be determined. In addition processing instruction might be available.
That was beginning. There is lot more and lot more to come. Sign up and we will keep you posted.
Saturday, August 4, 2007
TextMaster Beginning
Subscribe to:
Posts (Atom)