|
||||||||||||||||||||||||||||
|   | Internals - Progress/OpenEdge Index Rebuild
Behind
the Scenes - Index Rebuild
[Why Rebuild Indexes?] [Index Process] [Considerations/Not Using Sort] [Considerations/Using Sort] [Summary] [Coruption Reason] Most developers that have used Progress for any length of time will have had to perform an Index Rebuild at some point, but have you ever wondered what actually happens behind the scenes of the idxbuild command? Why should we rebuild indexes? There are a number of reasons why we would want to perform this activity, some are more obvious than others. Here is a couple of them. Probably the most common reason is We were told to do so by Technical Support. This is not Technical Support being awkward. If the database becomes corrupt then many issues can surface as a knock on effect. Some issues will be able to be fixed using less disruptive tools. From version 8 of Progress there are utilities within the database to identify and repair corruption within a specific index. These utilities will usually be a lot faster than performing a full index rebuild and can often resolve the problem [ See A Reason for Corruption below]. However if you are using a version of Progress before version 8 when the idxfix utility was introduced then you will have no choice but to run an index rebuild to resolve corruption within indexes. Alternatively (and it would depend on the problem) the system could be upgraded to a later more suitable version of Progress that contains the tools that are required. If many of the indexes have become corrupted then it may be easier and quicker to perform a full index rebuild. Another reason for using the index rebuild facility is if you perform a dump and reload of the database. If the data is reloaded into the database with the indexes disabled (e.g. When using the bulkloader) then the indexes will need to be built before the database can be used. The Index Process The re-index process is split into a number of phases:
In phase one, Progress scans through the whole of the database and marks all the index blocks that it can find as empty blocks. If only a selection of indices are being rebuilt then only the index blocks relating to those indices will be removed. On a large database this can take quite some time to complete. Phase 2 In phase two, Progress scans through the whole of the database from front to back and for every block it encounters that contains a record that is being indexed it will do one of two things: If the question Do you have enough disk space for sorting ? is answered with a positive response then Progress will store the information it needs for the index in a temporary file on the disk which it will use during Phase 3. If the response is negative then Progress adds the index information for the current record to each of the indexes that are assigned to the file creating the required indices. Phase 3 Phase 3 is only run if the sorting facility was selected. Each of the indices is split into one of eight groups. Each group is then processed, sorting the indices using the temporary files that were created in phase 2. Once the files have been sorted the data is added to the database in bulk. Considerations When NOT using the Sort Option When Progress is performing Phase 2 all the blocks in the database will be read in ascending order. Each block that is processed will contain records from one or more files within the database. The more records and indices the longer the block will take to process. During phase two Progress is creating the indices and adding the data to the database. So as well as reading from the database the system is having to put the updated index blocks back into the database as well. This all takes time. If the database that is being processed has just been loaded using the bulkloader then the data has been loaded a table at a time. The entries in each table will all have the same indices on them therefore the time taken to update the database will be reasonably consistent. A change will be noticed when each of the subsequent files are processed as each file will contain different indices. Calculating how much time is remaining for phase 2 is therefore very difficult. The main factors (excluding hardware) that will effect how long the process will take are:
Typically the rebuild will be taking place on a Multi-Volume database. So there will be empty blocks at the backend of the database ready for the index blocks to be stored into them. This makes this part of the process more efficient, as Progress is not having to request more disk space from the operating system and then format the blocks before it can use them. Whilst processing Phase 2, Progress reports how far it has got e.g. Processing Block 12345 out of 25038. The first block reference indicates the current block that is being processed and the second block reference is referring to the end of the database. Once Progress reaches the index blocks that it has already added to the database, the blocks will be processed exceedingly quickly, as there is no processing to be performed with the blocks. If the second block reference starts to increase, this means that the available empty blocks within the database have run out and Progress has had to expand the database to add the index records. This is also the case if the database was created as a single volume database, as the database will be expanding to hold all the new index blocks. Considerations When using the Sort Option As mentioned above, the sort option should always be used. By default Progress always stores the temporary files in the current directory; the exception being when the temporary directory has been changed using the -T facility. If there is not enough disk space available to perform the sort operation in the current directory, then once Progress has filled the space that is available, Progress will stop informing that it ran out of disk space. If this occurs then you must re-start the database after sufficient disk space has been made available. With the introduction of Version 8.1 of Progress a new feature was added to assist in this process. From Version 8.1 we can create a srt file which will indicate to Progress the location of multiple areas on the disks where free space exists. To create a srt file we simple create a file prefixed with the database name with the srt suffix and locate it in the same directory as the main database files (e.g. dbname.db). An example srt file would look something like:
If we indicate that more space is available than is physically available on the disk then Progress will still cause an error SYSTEM ERROR: Out of disk space for sort (5285). So it is important that we calculate the free disk space carefully taking into account how much disk space we need for the sort operation and the disk requirements of other users and processes that are using the same disks. If all available space was allocated for the sort and Progress still ran out of disk space then only two options are available to us: 1. Do not perform sorting (large increase in time) Summary So, for optimum performance use the sorting facility and locate as much of the temporary disk space on different disks to your database as is possible. There are also a few options that you can set with the index rebuild -TM, -TB, -B these are worth experimenting with, as they can provide better performance during the sorting phases. A Reason for Corruption It is possible for a single block of the database to become corrupted, this is nearly always due to hardware failure. Because this block is no longer a valid block the Progress database engine will complain and the problem will have to be resolved. The immediate course of action would be to re-format this corrupted block so that Progress sees the block as an empty block and can re-use the block at a later date. Unfortunately the contents of the block will have gone forever. Unless we restored the database (and the block within the restored database could be corrupted as well) we will not be able to determine what type of information Progress was storing within the block and therefore unable to say what information was lost. At some point in the future if no further action was taken a different error may occur similar to Record already exists with unique key. This would normally be due to the software that is running not creating records correctly. However in this instance the problem may be due to the lost block of information. This block may have contained the data records for a file which had a Unique index. The result of losing the block will be that the records were lost but the index elements were left behind. The next time the program tries to create the next record on the file, it is possible that a record will be created with the same unique reference as the one that already exists in the index.
When the next record is added with the cust-num field set to 3 Progress will complain that an index entry already exists. This will cause the Progress session to stop. This particular problem can be resolved by running the idxfix utility instead of performing a full index rebuild which would take much more time to run. The idxfix utility will remove the index entries relating to the records that no longer exist.
© Copyright Proxcom Limited 1999 - All Rights Reserved [Top] [Why Rebuild Indexes?] [Index Process] [Considerations/Not Using Sort] [Considerations/Using Sort] [Summary] [Coruption Reason] |
|||||||||||||||||||||||||||
|
27 Mill Field Road Cottingley West Yorkshire England BD16 1PY |