Careful consideration needs to be given to the manner in which files are created and written for (read) processing by the File and FTP Adapters
Hereinafter the term "the Adapter" means either the File or FTP Adapter. The abbreviation EA means External Application.
The objective of the Adapter (when reading files) is simple - viz. identify file(s) to be read, process the file(s), (optionally) delete the file and (optionally) archive the file. [ I recommend always archiving the files to aid recovery in case of disaster. ]
And so, conceptually, the requirements are extremely simple. But, as is often the case, the simplest of scenarios may present problems.
When either of the Adapters is configured, the file(s) to be processed will be identified according to some pattern. Furthermore, the Adapter will be configured to poll the source location at a predefined interval.
At each polling interval, the source location is examined for any file(s) that fit the pattern-matching criterion and (optionally) are of at least a certain age. More on this last point later.
When considering what problem(s) might arise, one needs to think about what the Adapter does internally when polling. In the simplest case, a source directory listing is acquired and all files observed are examined to see if they fit all of the criteria specified - i.e. name and (perhaps) age. If all criteria are matched then the file(s) will be processed.
The Adapter has no way of knowing if writing of the source file has been completed. This is the problem. A file that is still being written by the EA may be selected by the Adapter as a candidate for further processing.
However, if the file has not been completely written, then it is possibly going to be malformed or at the very least missing data.
Now we have to think about how the EA writes the file(s) into the source location. How does that application work? How big are the files? And, most importantly, how long does it take the EA to write the files? For example, the EA may create the file but then during its own acquisition of data to populate that file, there may be delays of indeterminate time.
If you know how long (at a maximum) the length of time taken from creation to full population is, then the minimum file age should be configured. And this is really the predicament.
Can you know without any doubt whatsoever that a file will have been completely written once it's 5 seconds old or 2 minutes old or however old it is.
I suggest that the only reasonable answer to this is "probably" and that's just not good enough for a robust implementation.
Therefore, configuring the Adapter for a file age is almost certainly a compromise.
This is a double-edged sword. Firstly there is the question of uncertainty around how long the EA will take to write a file. Secondly, the FMW developer will usually want to be able to process fully formed files as soon as they are ready. [ I recently worked with a client who had the unfortunate brief of providing near real-time processing with the FTP Adapter which was, quite frankly, an unrealistic proposition ]
It is not uncommon for the EA (i.e. the process by which files are written into the source location) to be some piece of legacy software that everyone's afraid to touch or maybe not allowed to modify. If that is the case, then great care needs to be taken when considering the polling rate and the file age specification. Always allow generous margins beyond what you would hope to be reasonable. That's probably the best you can do. Unfortunately, this is likely to mean that files will not be processed as quickly as they otherwise might. High rate polling (e.g. once per second) probably makes little sense if the file age is set to, say, 10 seconds!
However, if you do have control over the EA, then there is a better option.
I contend that the optimum solution is for the EA to create and write to a file in the source location using a name that will not match what any instance of the Adapter is expecting. In other words, it will be ignored.
Once the EA has finished writing, the file should be renamed to something that will be recognized by the Adapter.
If it is not practical for the EA to write what is effectively a temporary file into the source location, then there is another option but with limitations. The EA can write to some location that the Adapter knows nothing about. In this case, fairly obviously, the file can be created with its ultimate name. Once writing is complete, this file can be moved to its final location. But beware! The way the file is "moved" needs to be an atomic action. The way the "move" is performed may depend on the underlying operating system and / or the physical locations of the temporary and ultimate source locations. Think about the traditional Unix mv command and you'll get the point.
If either of these strategies can be employed then the polling rate of the Adapter can be set to whatever best represents business needs. The minimum age should be set to zero. This works because we now know with absolute certainty that any file that matches the configured pattern is guaranteed to be ready for processing.
The Adapter has a facility to work with a so-called "trigger" file. This is where a file with a well-known name (not pattern matched) is created by the EA (or something working in associated with it) in some well-known location. The mere existence of this file indicates to the Adapter that all / any files in the source location are ready for processing. This is most useful when, for example, source files may be constructed during a business working day for processing at some later time. At the appropriate time, the trigger file is created and files matching the criteria specified in the Adapter will be processed. It is essential when employing this strategy to ensure that there is no possibility that files that might otherwise match the selection criteria are created in the source location after the trigger file has been created.