This blog is a result of something that I recently worked on while exploring Trade data of Nepal (exports) for the year 2017/18 (2074/75 BS) to 2020/21 (2077/78 BS). You can download the data from the data portal available at the Trade and Export Promotion Centre (TEPC) website. For folks who wish to use this data to see the HS Code in 8 digit or 6 digit along with total export amount and Country, this article might be of good use to those who are using STATA to clean the data. Hence, I thought I might as well put it in my blog.
Once you download the data from the trade portal, let’s say for the year 2077/78. Generally it will look somewhat like this:
I will try to put as clear a description as possible and would welcome any suggestion regarding the STATA coding part that can help me do the same code in a much smarter way, if I may say so.
Here it is:
Firstly, it would be better if you would set the directory. In my case I had the following directory:
cd "/Users/Desktop/TEPC"
Then, you will have to import the excel file into STATA.
import excel “destination of the file in your computer”
drop SN
For simplicity, I will change the variable name from “Chapter/Commodity/Country” as just “Commodity” given that we will only keep information related to commodity in this column and keep information related to country in the newly generated column.
gen Country=""
replace Country=Commodity if HSCode==""
replace Commodity="" if HSCode==""
Now, in order to only keep the variable “HS Code” with 8 digit, we will have to generate a new column that drops any other observation in HS Code with anything other than 8 digit which is to say that observation with information related to “Chapters” and “4 digit HS Code” will be dropped leaving just the observation with 8 digit.
gen HS_length=length(HSCode)
replace HSCode="" if HS_length != 8 (!= denotes “not equal to”)
After running the above mentioned command, the data will look something like this in the STATA .dta file,
Now, what we will do is, replace the observation with a blank space in the variable “HS Code” and “Commodity” with whatever information that we have above that blank space. For example, the first observation of Commodity variable is “Butter”. Here, we are going to replace “Butter” in the blank space below “Butter” till another observation with a some information appears which in this case is “Dairy spreads”. Use the same logic with the variable “HS Code”.
We will have to run a loop for this case. Run the command as mentioned below:
Set trace on
forvalues row=1/`=_N' {
replace Commodity =Commodity[_n-1] if HSCode==""
replace HSCode = HSCode[_n-1] if HSCode==""
replace Unit = Unit[_n-1] if Unit==""
}
set trace off
The data will look something like this after you run the above mentioned command;
Now, we need to get rid of total values of observation in order to keep the data with respect to the exports to country. We need to do this to make sure we don’t have double counting of total values as well. We will drop any observation of the Variable “HS_length” with observation “8” and any blank observation of the variable “Country”.
drop if HS_length==8
drop HS_length
drop if Country == ""
Now, we can have the data as per our need with total values of goods exported by Nepal to different countries. The data file should look something like this;
Again, It is up to you if you want to keep units and quantity. Help yourself.For that, we need to replace the same 8 digit HS Code to 6 digit (first 6 numbers). Simultaneously, we need to drop duplicates as the total export value in amounts (Values Rs) will be added multiple times in the same row. Hence it is pertinent to avoid double counting.
replace (HSCode) = substr(HSCode,1,6)
bys HSCode: egen total_75=sum(ValuesRs)
duplicates drop HSCode, force
save "name of the file", replace.
***END***




Nicely done!
ReplyDeleteThank you Kshitiz dai. Appreciate your help as well.
Delete