Ranjan Sapkota's Blog: Trade-based Data Cleaning through STATA (Nepal Exports--2020/21)

This blog is a result of something that I recently worked on while exploring Trade data of Nepal (exports) for the year 2017/18 (2074/75 BS) to 2020/21 (2077/78 BS). You can download the data from the data portal available at the Trade and Export Promotion Centre (TEPC) website. For folks who wish to use this data to see the HS Code in 8 digit or 6 digit along with total export amount and Country, this article might be of good use to those who are using STATA to clean the data. Hence, I thought I might as well put it in my blog.

Once you download the data from the trade portal, let’s say for the year 2077/78. Generally it will look somewhat like this:

Here, you can see that the data includes many information that might be of great use to you and some might be redundant as well. When you want the data only in terms of 8 digit or in 6 digit terms, you will have to get rid of a lot of variables like “SN” and many observations in the variable “HS Code”. For example rows with information related to “chapter” and “4 digit HS Code”. Also, you would want to create a separate column for country and shift the information related to country from the current column as you can see in the figure that the information related to “Chapter/Commodity/Country” are all kept in the same Column. It’s upto you if you want to drop the variable “Unit” and “Quantity” as per the need of your analysis.

I will try to put as clear a description as possible and would welcome any suggestion regarding the STATA coding part that can help me do the same code in a much smarter way, if I may say so.

Here it is:

Firstly, it would be better if you would set the directory. In my case I had the following directory:

cd "/Users/Desktop/TEPC"

Then, you will have to import the excel file into STATA.

import excel “destination of the file in your computer”

drop SN

For simplicity, I will change the variable name from “Chapter/Commodity/Country” as just “Commodity” given that we will only keep information related to commodity in this column and keep information related to country in the newly generated column.

gen Country=""

replace Country=Commodity if HSCode==""

replace Commodity="" if HSCode==""

Now, in order to only keep the variable “HS Code” with 8 digit, we will have to generate a new column that drops any other observation in HS Code with anything other than 8 digit which is to say that observation with information related to “Chapters” and “4 digit HS Code” will be dropped leaving just the observation with 8 digit.

gen HS_length=length(HSCode)

replace HSCode="" if HS_length != 8 (!= denotes “not equal to”)

After running the above mentioned command, the data will look something like this in the STATA .dta file,

Now, what we will do is, replace the observation with a blank space in the variable “HS Code” and “Commodity” with whatever information that we have above that blank space. For example, the first observation of Commodity variable is “Butter”. Here, we are going to replace “Butter” in the blank space below “Butter” till another observation with a some information appears which in this case is “Dairy spreads”. Use the same logic with the variable “HS Code”.

We will have to run a loop for this case. Run the command as mentioned below:

Set trace on

forvalues row=1/`=_N' {

replace Commodity =Commodity[_n-1] if HSCode==""

replace HSCode = HSCode[_n-1] if HSCode==""

replace Unit = Unit[_n-1] if Unit==""

}

set trace off

The data will look something like this after you run the above mentioned command;

Now, we need to get rid of total values of observation in order to keep the data with respect to the exports to country. We need to do this to make sure we don’t have double counting of total values as well. We will drop any observation of the Variable “HS_length” with observation “8” and any blank observation of the variable “Country”.

drop if HS_length==8

drop HS_length

drop if Country == ""

Now, we can have the data as per our need with total values of goods exported by Nepal to different countries. The data file should look something like this;

Again, It is up to you if you want to keep units and quantity. Help yourself.

If you want to analyse the data in 6 digit HS Code since 6 digit is used internationally as a common HS Code format.

For that, we need to replace the same 8 digit HS Code to 6 digit (first 6 numbers). Simultaneously, we need to drop duplicates as the total export value in amounts (Values Rs) will be added multiple times in the same row. Hence it is pertinent to avoid double counting.

replace (HSCode) = substr(HSCode,1,6)

bys HSCode: egen total_75=sum(ValuesRs)

duplicates drop HSCode, force

save "name of the file", replace.

***END***

Ranjan Sapkota's Blog

Friday, July 1, 2022

Trade-based Data Cleaning through STATA (Nepal Exports--2020/21)

2 comments:

Popular Posts