Sunday, April 2, 2023

Dabbling with Trade Data from Customs: Changing Cumulative values to Monthly trade values using STATA

This blog is aimed at helping readers find monthly trade patterns of Nepal for a given month in a given year by analyzing customs cumulative data on Nepal's import and export. The data can be found on the customs data portal and should be downloaded for each month separately. Once downloaded, you should focus on the "Imports by Commodity Partner" sheet and analyze the following variables: HSCode, Description, Quantity, Unit, Country, Values, and Revenue.

Firstly, you will have to delete the first two columns of each file. You can do it without deleting manually as well by running a few other codes, but I decided to delete it manually as I incurred a few errors that took a lot of my time to solve, while trying to do it without running the code. The two columns at the top pointed by the arrow are the ones that you will have to get rid of: 

Secondly, given that you will have to pull multiple files all at once and append it later. It is essential to keep in mind that when pulling multiple files at once, it is crucial to store all files in the same folder to avoid any errors in running the command. Therefore, one should ensure that all files are stored in a specific folder.

Furthermore, it is recommended to name each file using a numerical format, such as naming the first month's file as "1" and the second month's file as "2", and so on. This naming convention helps in easily identifying the files and automating the process of appending the data later. The image below illustrates how the files can be saved in a folder using this naming convention. You can use other methods but naming the files in numeric order makes it easy and quick to atomate the process. 

Now, given that you set the directory and do some data cleaning manually as mentioned above, you should proceed on to run a loop that pulls all 12 files uploaded on the same folder taken from customs portal. Run the following command after that: 

forvalues i=1/12{
clear
import excel `i', sheet("4_Imports_By_Commodity_Partner") firstrow
gen Month=.
replace Month=`i' if `i'==`i'
save `i'.dta, replace 
}

The command mentioned in the previous section imports all 12 months' files at once and converts them to. dta format, saving them in the same folder. By specifying the sheet "Imports by Commodity Partner," the code filters out other unnecessary sheets and extracts only the relevant data required for monthly analysis. If readers want to analyse other parameters, they can refer to other sheets and modify the code accordingly.

Also, to identify the month in which an import transaction took place, the code generates a variable called "Month" and replaces the observations on this variable with the name of the file itself. For instance, if the file name is "1", then the Month variable in that particular file will have the value of the Month as "1." By generating this variable, you can easily identify the month in which each transaction took place. 

Now, we have to run a loop again that appends all 12 files. Append command lists all the in order, i.e, if you append file 1 and file 2, all the observations from file 2 will be kept below file 1. Use the following command for that:

forvalues i =1/5 {
append using `i'.dta 
}

Your data should look somewhat like this. Check the Month columns, you can see data from all Months. 

You might incur a minor error while running these commands which is that the last file, i.e 12, might run twice. In such scenario, you can drop the duplicate values by running the following command:

duplicates drop if Month==5

Now, usually, I change the variables names that look familiar for simplicity. This is all up to you. If you want to change it, then you can change, if not, don’t change. But if you change the variable names, make sure you use the same variable name to run the command later. 

drop if HSCode==""
rename Imports_Revenue Revenue
rename Imports_Value Value
ren PartnerCountries Country

Now, I will have to sort the variables. I am keeping the variables HSCode and Country in descending order and Month in ascending order for which I will have to run the command as mentioned below: 

gsort -HSCode -Country Month

I am doing this mostly because I want to deduct each month’s data with earlier month data which gives us the import value of each month in absolute number which is exactly what we are looking for. Run the command as mentioned below for that: 

        gen total=Value-Value[_n-1]
gen QTY=Quantity-Quantity[_n-1]
gen Revenue1=Revenue-Revenue[_n-1]

        replace total=Value if Month<=Month[_n-1]
replace QTY=Quantity if Month<=Month[_n-1]
replace Revenue1=Revenue if Month<=Month[_n-1]

//rearranging variable names for simplicity 
drop Quantity Value Revenue
rename total Value 
rename Revenue1 Revenue

Your data should look somewhat like this: 

Now, rename the months in numeric values with alphabet, i.e., the first month in a fiscal year is Shrawan which is, hence, you need to replace 1 with Shrawan. Use the command below, 

//generating a new variable and replacing the variable with respective months 
gen Month1=""
replace Month1="Shrawan" if Month==1
replace Month1="Bhadra" if Month==2
replace Month1="Ashoj" if Month==3
replace Month1="Kartik" if Month==4
replace Month1="Mangsir" if Month==5

Use the same code for other months as well by changing the Value of the Month accordingly. 

//rearranging variable names for simplicity 
drop x x2 x3 x4 Month
rename Month1 Month

Your data should look somewhat like this: 

While working with customs data, it is important to note that errors may occur. In some cases, the data may contain inconsistencies or discrepancies that can affect the accuracy of the analysis. For instance, in the case of the year 2077-78, it was found that the customs data itself had some errors. Specifically, the same goods were imported with different unit values, such as a bottle being imported in both KG and PCS units, leading to confusion in the analysis.

Furthermore, it is possible to encounter other errors while working with customs data, such as discrepancies in cumulative values between different months. For example, it may be observed that the cumulative value of the 4th month appears to be higher than that of the 5th month, which is mathematically impossible if the values are all positive. This makes the data inconsistent and requires careful consideration.

Therefore, when working with customs data, it is important to exercise caution and carefully analyze the data to identify any errors or discrepancies that may affect the accuracy of the analysis. By doing so, you can derive meaningful insights and draw accurate conclusions about Nepal's monthly trade patterns.

While as a data analyst or economist, we can do nothing about the error with respect to cumulative value, other than contacting the customs department and inquiring about it. While errors in customs data can pose challenges for data analysts and economists, there are still ways to work towards improving the accuracy of our analysis and gaining meaningful insights from the data. Hence, what we can do is, try to see which HSCode product were imported in different Unit and try to modify the values slightly wherever necessary. Run the command as mentioned below: 

duplicates list HSCode Country Month
egen x=group (Unit)
tab x
bys HSCode Country: egen x2=min(x)
bys HSCode Country: egen x3=max(x)
ins x*
gen x4=x2-x3
su x4
tab x4
tab HSCode if x4!=0
gsort -HSCode -Country Month
brow if x4!=  0

When you run the group (Unit) command, you should get something like this on your console window on STATA: 

The above-mentioned command will show you all such products sold in two different Unit. As shown in the picture below, the same product is sold in MTR and SQM. 


To address discrepancies in customs data where the same product is imported with different unit values, one approach is to check the values for each HSCode and replace the unit value with the unit value that has been imported with the highest value. For instance, if a product with HS Code 23556892 is imported in both Kg and PCS units, we can check how much of that product has been imported in each unit. If the imported value in Kg is higher than that in PCS, we can replace the PCS unit with Kg and drop any duplicate values.

For example, we have such issues with HSCode= 39269099 which is sold oin KG and PCS both. For that run a similar command as mentioned below as per the HSCode: 

tab Unit if HSCode=="39269099"   
total Value if HSCode=="39269099" & Unit=="Kg"
total Value if HSCode=="39269099" & Unit=="PCS"
replace Unit="KG" if x==9 & x2==4 & x3==9 & x4==-5 & HSCode=="39269099"

Here, the HSCode had its maximum imports in KG, hence we change all the unit with PCS to KG and add the value for the same products and drop duplicates. 

To drop duplicates and add values with respect to the HS Code and Country, use the following command: 

bys HSCode Country Month: egen total_1=sum(total)
bys HSCode Country Month: egen QTY_1=sum (QTY)
bys HSCode Country Month: egen Revenue_1=sum (Revenue1)
duplicates drop HSCode Country Month, force

However, it's worth noting that this approach may not be applicable to all cases and should be used with caution.

You can now save and export the data in excel or whichever format you want your data on. 

Cheers!!!



No comments:

Post a Comment