getting started with NSSO unit level data

Standard

Data provided by National Sample Survey Office (NSSO) is rich source source of information on many issues concerning social scientists and research students.  But I find that many students and faculties struggle to get hold of detailed data that are provided in CD form by NSSO (Unit level). Most of people rely on reports produced by NSSO itself. The NSSO unit level data handling skill has remained among very few experts who have not come out openly to share their skills with others who are keen to learn.  When you google NSSO unit level data you will see many entries but non of them deal with how to convert the unit level data provided by NSSO in raw form to a use-able (or in spreadsheet) form. Therefore, my objective here is to provide a  helping hand for  handling/working with NSSO data. The first step is to be confident that you can handle it.

You must know that raw data files stored in Workfile folder has nothing but only matrix of numbers. for example

12345500

22143600

These number would not mean any thing to you until I tell you that first digit is the serial number of the person then second and third digit from left side represents the age of a person, next two digit is weight of the person and last three digit is the expenditure per month of the person. Since there are two rows therefore we have information of two individuals (two observations and four variables statistically speaking). Our objective now is to get above raw numbers in a format given below (spread sheet as we see in MS Excel or other statistical soft wares)

person           age                   Weight                    Expenditure Per month

1                       23                        45                          500

2                        21                        43                         600

Well it is very easy to do this because there is very less number of digits to deal with. But NSSO data has huge matrix of number which we can’t see through our open eyes in one go. So separating manually is simply not possible like the way I did above

Let us first understand how to extract data by using Sata a statistical software from any file which has raw data as is provided by the NSSO.

suppose we have following data file in .text or .dat format (as is given by the NSSO). This is how the the following data file containing variables for individuals’ name, age, and weight would look in a fixed-width text file (fixed-width text files will not have a line for the variable names):

zakaria                       3078
lucy                          2754
zulki veryVeryverylonglastname2060

Note that the data are all packed together. The blanks between the names and ages of the first two people are called “filler”. They are there for two reasons: one, to leave room for people with longer names(like our third person), and two, to force each value of each variable to occupy the same columns. In this example, we would say that the variable “name” is in columns 1-30, “age” is in columns 31-32, and “weight” is in columns 33-34.  Assuming it was named “zakku.dat”, the command to read this file with infix would be i.e., the Stata command line would read like

infix str name 1-30 age 31-32 weight 33-34 using zakku.dat

Note that the variable “name” is a string (text) variable, while the other two variables are numeric. You can tell Stata that a variable is a string by placing “str” in front of its name. Otherwise Stata assumes that the variable is numeric. The infix command can read files with longer lines of data in an analogous way. Simply list as many variables and columns as you have.

Now, I hope that idea of extracting the data from raw data file to spread sheet is clear. Let us now turn back to the NSSO data.

No one can provide exhaustive guide to NSSO data. My objective here is to get you started and then you can start learning by doing more and more on your own and discussing it with others.

Initially I will be working with mostly NSSO 55th round consumption expenditure data which contains information for the year 1999-2000. My objective here is not to give details about data. I am here to explain how to get raw data into a spreadsheet (use-able) form by using a statistical software Stata. If you have question you can post it in comments.

First step is to Know what is there in the CD that is provided by NSSO.

NSSO55th CD contains  Two folders

Docoment and Workfiles

Document folder has files on Instructions and Sampling Procedures. Most important file in this folder is the file named “Schedule.doc” (the questionnaire of the survey done by the NSSO) and the layout file named wrk01055.doc

Workfile has raw data files in “.dat” format (Zipped). It is important here to note that file name are of significance.  From the information contained in schedule file and layout file we get the idea about which data file has to be used for extracting the data that you are looking for. For your ease I am attaching here [wrk01055]. Another file that is quite useful is the schedule file (the questionnaire of the survey done by NSSO). What we need now is the data file. I am unable to attach it here. So you have to manage NSS 55th round data CD on your own.

Understanding layout file:

Layout file named as wrk01055.doc basically tells you which data file is to be used for the variables that you are looking for and where exactly to look for those variable within the given raw data file. Different headings in the layout file also gives us an idea about which part of questionnaire is being considered in the layout given below the heading before each lay out.

for example one of the headings in lay out file reads as following

item                 length    byte-pos.    remarks

————————————————————————-

work-file-id      2           1-2        “W1”

round-sch         3           3-5        “551”

sector                   1           6           –

state region       3           7-9

stratum              2           10-11

here names of variables appear under the heading item. For example state region is variable whose byte length in the data file is 3. The exact Location of the variable starts from 7th column in the data file and finishes at 9th column.

so if you want to extract this variable the Stata command would read as

infix strgn 7-9 using “D:\nss55\All101.dat”

note here that I have changed the name from ‘state region’ to of the variable ‘strgn’. This is because Stata will not accept two words for same variable. So best suggestion is to keep it as small as possible as long it helps you to remember what it is. While mentioning the data file from where Stata has to read 7-9 column(7th to 9th column i.e., 7th,8th and 9th) location of file (path) in computer should be properly mentioned. With this command you can add any number of variable e.g.,

infix str wrkid 1-2 rdsch 3-5 sector 6-6 strgn 7-9 stratum 10-11 using “D:\nss55\All101.dat”

I hope now there is something for you to go ahead with the NSSO data. Additions and queries are most welcome. Let us make NSSO Data accessible to all students of social sciences

To cite this blog post:

Siddiqui, Md Zakaria (2009), “Getting started with NSS unit level data [web log post]” Retrieved from https://zakku78.wordpress.com/2009/02/19/nsso-unit-level-data/

In-Text Citation (Siddiqui, 2009)