In this article, we are going to learn about retrieving the file based on the Last Modified Date / time in a given folder of Azure Data Lake Gen2
Introduction
Azure Data Factory is a powerful tool for managing and processing data in the cloud. One of the common tasks in data processing is retrieving the latest file from a set of folders. This can be a time-consuming and error-prone task, especially when dealing with large volumes of data.
Fortunately, Azure Data Factory provides a simple and effective way to automate this process using its built-in activities and connectors. In this article, we will explore how to retrieve the latest file in each folder using Azure Data Factory, and how to use this feature to streamline your data processing workflows.
Let’s get started.
Retrieve Latest Files
Create a Pipeline Variable JSONArray of type String as shown in the below screenshot which stores information about all the files available in the given folder.

Create a Dataset named ds_Folder which refers the Container in which you would like to get the latest file from. If you have folders inside the container then you can specify the directory path/name in the Directory field. However, please make sure that the Filename field is empty as shown in the below screenshot.

Create a Get Metadata Activity named GetAllFiles, select the dataset and then create a field list and choose ChildItems as shown below.

In the above scenario, the Get Metada retrieves all the child items of the given folder in the dataset as shown below.

Please make a note that the about output has only the file name and the type as file.
In order to iterate through all the all the files, create a new ForEach activity named IterateEachFile with the settings shown below.

If you would like to test till the current step, you can’t as the ForEach activity is empty. In order to test, let’s add Wait activity within the ForEach and test it to review the output.
The next important step is to create another Dataset named ds_File with a Dataset Parameter named FileName as shown below.

Now, navigate to Connection tab and add refer the FileName parameter as shown below.

It’s time to retrieve the LastModified property of each file. So, Navigate to ForEach loop and perform the below steps which are shown in the next screenshot.
- Navigate inside the ForEach activity
- Create a new Get Metadata activity named Get Last Modified Date
- Goto Settings tab
- Select ds_File dataset
- Assign the @item().name property to the FileName parameter
- In the Field list property, click New
- In the Argument dropdown, select Last Modified

Now, when you run you should view the LastModified value for each file that exist in your Container.
At this stage, we have Filenames and LastModified values separately. We should develop the functionality of combining them in a single Array. Let’s now, create another Array variable named FileListWithModifiedDate as shown below.

Navigate to the ForEach loop, create a new activity FrameJsonArray of type Append variable which

The Dynamic contents of the value for the FileListWithModifiedDate array value is given below.
@concat('{"FileName":"',item().name,'"',',"LastModified":"',activity('Get Last Modified Date').output.lastModified,'"}')
In the above code a Key-Value pair for each File is create in the json format. You can review the same in the output when you run the pipeline.

In the above screenshot, the output of one of the FrameJsonArray activity has been shown. The value is a Key-Value pair of the FileName and the LastModified of each file.
The next step is to combine the Key-Value pairs available in the FileListWithModifiedDate Array of all the files into a single Json String. We can achieve it using the join function.
After the ForEach activity, create a new SetVariable named Set JSONArray Content activity with the below settings as shown below.

You can use the below expression that create the values in the Json specific array
@concat('[',join(variables('FileListWithModifiedDate'),','),']')
The output of the above expression would be something similar to the below.

In the above JSON we have both the FileName along with LastModifiedDate. It’s time to sort the data based on the LastModifiedDate. As of this writing, Azure Data Factory doesn’t support sorting functionality. So, we need to leverage some other service (like Azure Functions, Logic App etc) which could help in sorting the json data. In this example, we will take SQLDatabase Stored Procedure to perform the sorting of the JSON Content.
Create a new Stored Procedure which takes JSON string as input, sorts the json array and returns back the json items sorted based on the LastModified date in the descending order (recent one on the top)
CREATE PROCEDURE SortJsonArray
@JsonArray NVARCHAR(MAX)
AS
BEGIN
DECLARE @OutputJson NVARCHAR(MAX)
SELECT @OutputJson = (
SELECT * FROM OPENJSON(@JsonArray)
WITH (
FileName NVARCHAR(MAX) '$.FileName',
LastModified DATETIME '$.LastModified'
)
ORDER BY LastModified DESC FOR JSON PATH )
SELECT @OutputJson AS OutputJson END
Let’s now, add a Lookup Activity to execute the above Stored Procedure as shown below.

Execute the pipeline now which at the end returns the files sorted based on the LastModified date as shown below.

That’s how you can retrieve the latest file uploaded to the Folder of a given container.
Do you like this article? If you want to get more updates about these kind of articles, you can join my Learning Groups