Development of a user-friendly interface routine in C++ for arithmetic and statistical processing of census data

A routine was developed in C++ for the processing of social and environmental census data acquired by the Brazilian Institute of Geography and Statistics (IBGE). The routine employs a simple graphical environment. The data generated are presented in a tabular format, which facilitates a broad and objective view of the values, and provides a convenient means of querying the database. The source code used to develop the routine permits updates and changes, as required by the user. Statistical and mathematical analysis enables the generation of social and environmental indicators, together with quantitative and qualitative classification of the socio-environmental quality of the region analyzed. As an example, the routine was applied using census data for the city of Sorocaba (São Paulo State, Brazil), including conditions of household occupation, water supply, sanitation, level of education, income, and other factors. It is envisaged that the proposed analytical model will assist professionals from different fields of research and teaching to develop urban planning and management strategies.


Introduction
Free software programs are those developed by one or more persons who, in addition to making the software freely available, allow access to the source code for alterations by the user (CHRISTOPH, 2005). The philosophy of free software is based on four fundamental concepts: (1) Freedom to use the program for any purpose; (2) freedom to study how the program functions, and adapt it according to need (access to the source code is a prerequisite for this ability); (3) freedom to redistribute copies in order to improve access to the software by individuals and institutions; (4) freedom to improve the program, and release such improvements, so that the whole community can benefit, without any additional cost implications. These concepts seek to guarantee that the user can use, copy, study, and modify the software, with a view to maintaining the freedom of production (or improvement) and usage of the program (BACIC, 2003).
The C programming language was developed based on two other languages: BCPL, created by Martin Richards; and B, written by Ken Thompson, and influenced by the first (ALVES, 2002). C is a general purpose language, widely employed in the development of software, operating systems, office applications, and games. An interesting characteristic of C is that execution time checking is minimized, so that there is no limit checking of matrix-based indices, and it is possible to add a character value to a number. This results in faster data processing (ALVES, 2002). The language C++, as its name suggests, is a natural evolution of C, developed by Bjarne Stroustrup in the laboratories of Bell (AT&T) between 1983 and 1985. A routine in C++ corresponds to a set of executions, functions, or calculations, in order to generate a desired result. Such a routine, according to Alves (2002), is characterized by three properties: encapsulation, which considers the combination of data structure with functions that manipulate, also known as methods; inheritance, which is the capacity to create new classes that inherit functions and defined data structures from other classes, with the ability to redefine or add new elements; and polymorphism, a resource that permits nominating a determined element of the class that is shared by the entire hierarchy of objects, but that performs the function of the class that summoned it.
Borland entered the market of C compilers in the mid-1980s, with the introduction of Turbo C. Turbo C++ was launched in 1990, with an improved environment and full compatibility with version 2.0 of C++ created by AT&T, and a few years later consolidation was achieved with the introduction of Borland C++ Builder (ALVES, 2002).
The aim of this study was to develop a routine in C + +, using the integrated development platform Borland C + + Builder 6.0, for statistical treatment of census data in order to build socio-environmental indices. The development routine should contribute to studies of the socio-environmental conditions in cities considering the conditions and levels of social inclusion at different scales.

Material and methods
This work utilized the integrated development platform Borland C++ Builder 6.0 for the development of routines in C++. Sociodemographic data were collected by the ESTATCART System for Retrieval of Georeferenced Information (IBGE, 2002), and considered different districts (sectors) of the municipality of Sorocaba (São Paulo State, Brazil).
According to Silva and Previdelli (2012) there's an existent need in evaluate latent variables which is not directly measurable, as socio-occupational condition, satisfaction, learning, happiness, et cetera. These variables, which cannot be directly measured, are evaluated by a scale of values based in instruments like tests or questionnaires. These instruments are constituted by items (specific questions, according to the applying model) which are associated to the variable of interest.
The C++ routine is able to process a collection of numerical data, and hence characterize each sector by establishment of a socio-environmental quality index (SQI) that considers a number of separate indices and their weightings. A simple interactive graphical interface makes the system accessible to non-specialist users.
Access to the database containing information relevant to the socio-environmental indicator is achieved by means of a spreadsheet displaying all the characteristics and conditions according to which a sector is analyzed for determination of its SQI. The spreadsheet can communicate with a variety of other software programs, since data is imported and exported in the form of "TXT" (text) files, and the identification of cells is achieved using simple tabulation. The program was developed to be able to load and process the data of up to ten thousand different sectors, with the actual number being easily altered in the program's source code. The routine was implemented to obtain the SQI by means of statistical treatment of variables that generated a Domestic Quality Index (DQI) and a Social Inclusion Index (SII). It was therefore necessary to first obtain these two indices, which required the implementation of sub-routines to perform the necessary modeling. The values and weightings of each of the variables comprising the DQI and SII were obtained according to the methodology proposed by Vedovato et al. (2011).
The DQI was determined using the variables water supply (WAT), sewage treatment (SEW), and rubbish disposal (RUB). The following formula was used for the water supply variable: WAT = Good x 0.33 + Average x 0.66 + + Poor x 1.00 (1) Here, the conditions "good", "average", and "poor" consider the ratios obtained between the number of situations considered good, average, and poor, and the number of households in the sector. The condition considered "good" was the presence of water supplied from the public network, average conditions were those where the household possessed a well or spring, with the water piped to at least one room, and poor conditions were those that differed from the two preceding conditions. The SEW index for each sector was constructed as follows: SEW = Good x 0.33 + Average x 0.66 + + Poor x 1.00 (2) As in the case of WAT, the conditions were considered to be good, average, or poor. Good conditions were those where sewage facilities were provided by the public network, average conditions were those where domestic sewage was discharged into a septic tank, and poor conditions were those where neither of the two preceding criteria were satisfied (such as disposal into ditches or in the open).
As previously, the rubbish collection index used the formula: RUB = Good x 0.33 + Average x 0.66 + + Poor x 1.00 (3) In this case, good conditions were those where rubbish was removed by waste collection services, average conditions were where waste containers (skips) were provided by refuse services, and poor conditions were those that differed from the two preceding categories, or where no service was provided.
The weighting of each individual index in the DQI was established using the coefficient of variation of each index (WAT, SEW, and RUB). A relationship was established by which one or other index could exert a greater influence on the DQI, resulting from the variance in the measurement data (VEDOVATO et al., 2011). For a collection of sample data, the coefficient of variation (in percentage terms) is given by: According to Correa (2003), determination of the coefficient of variation first requires calculation of the standard deviation and mean of the set of data, as follows: Where, x represents each data point in the set of data, and n represents the total number of samples.
The weighting (Wt.) related to the water supply index could be expressed as follows: Similarly, the weighting related to the sewage treatment index was obtained by: Finally, the weighting for rubbish disposal was obtained by: After calculation of the values of the indices WAT, SEW, and RUB, and the corresponding weightings, the DQI was determined: The condition "Sta" referred to the ratio between the number of situations considered to be stable, and the number of households in the sector. A stable condition existed where the property either belonged to the occupant responsible, or was in the process of being acquired by the occupant responsible. The condition "StaA" considered the ratio between the number of situations which were of average stability, and the number of households in the sector. This was the case of rented properties. Situations where the conditions of occupation showed low stability ("StaL") were those where the property was ceded by an employer or by any other agency. Finally, the unstable condition ("Uns") was determined as the ratio between situations considered unstable and the number of households in the sector. Unstable situations included collective or improvised housing, as well as any other situation not included in the previous categories.
The income indicator (INC) was classified qualitatively and quantitatively according to the average per capita income (MIPC), as shown in Table 1. Each MIPC value was allocated to the corresponding interval, and the numerical value associated with each interval was then used in the INC indicator for each sector. The education indicator (EDU) was derived from the number of years of study undertaken by the head of each household. The predominant average education (PAE) in a sector was obtained from the weighted mean using the percentages of household heads in each education class, and the average points of the intervals, as follows: PAE = [(less than 1 year of study) x 0.5/P] + [(1-4 years of study) x 2.5/P] + [(5-8 years of study) x 6.5/P] + [(9-13 years of study) x 11/P] + [(14 or more years of study) x 14/P] Where, P represents the number of persons responsible for permanent private households. The education indicator was classified qualitatively and quantitatively according to the PAE indicator, as shown in Table 2. As in the case of the income indicator, the PAE values were allocated to the corresponding intervals, and the numerical value associated with each interval was used in the EDU indicator for each sector.
Once the values of the domestic quality index (DQI) and the social inclusion index (SII) had been determined for each sector, the weightings of each index were derived statistically, and the contributions of each index to the final socioenvironmental quality index (SQI) were calculated.
A subroutine was implemented, considering formulae 4, 5, and 6, in order to calculate the values of the coefficient of variation, standard deviation, and mean of the DQI and SII for each sector. The weightings for the contributions of the DQI and SII indices were obtained as follows: In this subroutine, the qualitative and quantitative classification criteria of the SQI were determined by weighting the DQI and SII values within predetermined intervals (Table 3)  Results and discussion Figure 1 shows the form displaying the buttons for the import, export, and processing of the data, as well as the presentation layout of the initial data extracted from the IBGE system, required for calculation of the indices. The positioning of the buttons was strategically chosen to facilitate use of the software commands according to the needs of the user. The buttons are named in order to ensure easy understanding of their functions. Tables are used for data entry and retrieval, so that the data can be easily visualized during the operations of loading the database and obtaining final results. Communication between the developed routine and other popular software is facilitated by the use of a format common to both, namely the "TXT" (text) format. Data import and export employs files with the .txt extension.
Data presentation is achieved in two mutually linked ways. Firstly, the data to be investigated can be inserted either by completion of an entry data table, or by loading a dataset in "TXT" format (with standard tabulation). Secondly, the calculated values can be retrieved by consulting the database in the data entry table, with presentation of the values on a sector-by-sector basis achieved simply by inserting the number(s) corresponding to the sector(s) of interest.
Access to the processed DQI, SII, and SQI information for a given sector is provided in the form used to consult the database (Figure 2). Details of each index can be exhibited by clicking on the icons located in the upper left hand corner of the form, as shown in Figure 2. These icons open new forms containing the detailed information. The source code developed to program the routines was written in C++, with relevant comments inserted between the command lines in order to facilitate access for users needing to modify or add software functions. The variables composing the source code were selected so as to be readily identifiable according to their functions. For example, the variables describing maximum and minimum values are denoted "max" and "min", respectively.  Figure 3 illustrates a section of the source code, including comments to aid identification of the commands and the decision-making structure according to which the software was developed. The main routine and the subroutines were developed so that the processing was optimized and occurred at the second level. Achievement of a desired result does not require any intermediation of the user in the processing of the data. With a simple command, the software is able to present all the results relevant to the indices of socio-environmental quality, and perform all the necessary statistical treatments.
The optimization of the source code was confirmed by the minimal amount of memory needed to maintain program execution, and by the fast processing speed for a large collection of data.
The socio-economic database obtained from the most recent (in year 2000) IBGE demographic census for the municipality of Sorocaba was used to provide an example of the various functions of the programmed routines. The information concerning household occupation conditions, education, income, and other variables necessary for calculation of the socio-environmental index was inserted (in tabular format) in the data entry form (Figure 1). The data were processed, and the SQI results were generated for each sector.
For all sectors of the municipality of Sorocaba, the DQI, SII, and SQI values were generally indicative of a good level of socio-environmental development, with a few sectors showing average development. In the latter case, the poorer SQI indices were largely associated with low income, which was reflected in poor social inclusion indices. This could be clearly observed by the contribution of SII, shown in the form in which all the results and statistical data are presented.
A sector-by-sector analysis was performed to check communication between the entry table data and the presentation of the data following consultation using the number of a particular sector. The data presented, including the processed and calculated values, were exactly the same in both cases. Exact agreement was also obtained when the forms containing detailed information concerning the domestic quality and social inclusion indices were opened.
Finally, the data were imported and exported in standard tabulation text format, with spacing using simple tab stops ("TAB"), and it was confirmed that the files could be named and saved correctly. Figure 4 illustrates the graphical interface provided by the software for the analysis of different sectors, showing the statistical results for the indices DQI, SII, and SQI, obtained for the municipality of Sorocaba, São Paulo State. Visualization of the spatial distribution of the socio-environmental indicators generated by the software was achieved by transferring the spreadsheets containing the indicators to a digital mapping program, where the values were associated with polygons representing the different districts of Sorocaba (Figures 5-7).
In most districts, especially in the centralsouthern sector of the city, the domestic quality index (DQI, Figure 5) was classified as excellent, indicating adequacy of the services providing water, sewage, and rubbish disposal. Several districts in the west and northwest sectors were classified as good, while in the north and northeast sectors there were a few districts where the DQI classification was average.
The classification of the districts of Sorocaba in terms of the social inclusion index (SII), which considers levels of income and education, together with household occupation conditions, is shown in Figure 6. Most districts showed average index values, with the highest concentration of districts with good and excellent ratings in the southern sector of the city, indicating that this sector was the most developed in social terms. There were several districts, mainly in the northern and eastern sectors, where the SII rating was poor.  The spatial distribution of the socioenvironmental quality index is shown in Figure 7. Again, the districts rated as excellent were mainly located in the central-southern sector of the city, due to the influence of the factors related to household conditions and levels of social inclusion. In the more peripheral districts, socioenvironmental conditions tended to be average, influenced mainly by the levels of income and education, as well as the household occupation conditions. Although some of these districts were classified as good or excellent according to the DQI, they were rated average or poor according to the SII, resulting in an overall SQI rating of average. A file was produced containing important information concerning the use of the software. This "help" file, available on the toolbar, is designed to assist the user in understanding the functions available. The contents of the file provide guidance in following the correct sequence of steps in using the program, and show how to perform tasks including the import and export of files, generation of data, and database consultation. An example of a "help" file page is illustrated in Figure 8 (showing the procedure to follow for data import).

Conclusion
The graphical environment developed in the routine was efficient and concise in the treatment of data. The use of self-explanatory function and command buttons provides the user with assistance that is both intuitive and objective. The source code was produced in a form that ensures ease of access by the software user. Modifications can be made according to individual requirements, adding functions and algorithms, or simply studying the code in order to understand the way in which the model calculates the index values and weightings required for qualitative and quantitative generation of an overall socio-environmental quality index.
Finally, new routines could be developed in the future, for example enabling the generation of maps that use a coordinates system to describe the geographical locations of sectors of interest.