|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Sorting Algorithms | Recommended Books | Recommended Links | Donald Knuth | |
Animations | Test suits | History | Humor | Etc |
|
Radix sort is classified by Knuth as "sorting by distribution" (See Vol 3, 5.2.5. Sorting by distribution. P.168). It is the most efficient sorting method for alphanumeric keys on modern computers provided that the keys are not too long. Floating number sorting is also possible, but you need to do some tricks here. Radix sort is stable, very fast and generally is an excellent algorithm on modern computers that usually have large memory.
|
The idea of radix soft is very similar to the idea of hashing algorithms. You compute the final position of the record for each key. If there is already a record(s) with this keys you place the record after them (overflow area). You do not compare the key with other keys at all. It is similar to the method by which people lookup up a person's number in a telephone book: first the first letter is checked, then the second, etc.
Knuth illustrated it on the example of sorting of a card deck:
These methods were used to sort punched cards for many years, long before electronic computers existed. The same approach can be adapted computer programming, and it is generally known as "bucket sorting", "radix sorting," or "digital sorting," because it is based on the digits on the keys,
Suppose we want to sort a 52-card deck of playing cards. We may define
A < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < J < Q < K
as an ordering of the face values, and for the suits we may define:
c(lubs) < d(iamonds) < h(earts) < s(pades).
One card is to precede another if either (i) its suit is less than the other suit, or (ii) its suit equals the other suit but its face value is less. (This is a particular case of lexographic ordering between ordered pairs of objects; see exersize 5-2). Thus
cA < c2 < ... < cK < dA <d2... <dK <hA ... <hK <sA...<sK
We could sort the cards by any of the methods already discussed. Card players often use a technique somewhat analogous to the idea behind radix exchange. First they divide the cards into four piles, according to suit, then they fiddle with each individual pile until everything is in order.
But there is a faster way to do the trick!
- First deal the cards face up into 13 piles, one for each face value.
- Then collect these piles by putting the aceson the bottom, the 2s face up on top of them, then the 3s, etc., finally putting the kings (face up) on the top.
- Turn the deck face down and deal again this time into four piles for the four suits. (Again you turn the cards face up as you deal dial.
- Putting the resulting piles together, with clubs on the bottom, then diamonds, hearts, and spades, you'' get the deck in perfect order.
In radix-sorting algorithms, all keys should be the same size or padded to the same size. The algorithm treats the keys as numbers represented as numbers of the base R (for example base 8 means that bytes are used as key constituents, base 16 means that halfwords are used, etc). It is most efficient for short keys for example classic sorting case of sorting 4 byte integers is perfect for radix sort. h
In order to efficiently program radix sort, the operation "extract the ith digit from a key" should be available. The designers of all modern languages, including C, C++, and Java recognized that direct manipulation of bits is often useful and provided corresponding operations.
Availability of byte and half-word operation in hardware can speed radix sort, but is not absolutely necessary.
There are two approaches to radix sorting.
MSD most-significant-digit radixsort methods examine the digits in the keys in a left-to-right order, working with the most significant digits first. MSD radix sorts partition the file according to the leading digits of the keys, then recursively apply the same method to the subfiles. You can think of quicksort as a method somewhat similar to radix sort with base 2.
LSS -- least-significant-digit ) radix sorts. The second class of radix-sorting methods examine the digits in the keys in a right-to-left order, working with the least significant digits first.
Postscript: Radix Sort Rocks (not examinable)
Linear-time sorts tend to get less attention than comparison-based sorts in most computer science classes and textbooks. Perhaps this is because the theory behind linear-time sorts isn't as interesting as for other algorithms. Nevertheless, the library sort routines for machines like Crays use radix sort, because it kicks major ass in the speed department.
Radix sort can be used not only with integers, but with almost any data that can be compared bitwise, like strings. The IEEE standard for floating-point numbers is designed to work with radix sort combined with a simple prepass and postpass (to flip the bits, except the sign bit, of each negative number).
Strings of different lengths can be sorted in time proportional to the total length of the strings. A first stage sorts the strings by their lengths. A second stage sorts the strings character by character (or several characters at a time), starting with the last character of the longest string and working backward to the first character of every string. We don't sort every string during every pass of the second stage; instead, a string is included in a pass only if it has a character in the appropriate place.
For instance, suppose we're sorting the strings CC, BA, CCAAA, BAACA, and BAABA. After we sort them by length, the next three passes sort only the last three strings by their last three characters, yielding CCAAA BAABA BAACA. The fifth pass is on the second character of each string, so we prepend the two-character strings to our list, yielding CC BA CCAAA BAABA BAACA. After sorting on the second and first characters, we end with
BA BAABA BAACA CC CCAAA.
Observe that BA precedes BAABA and CC precedes CCAAA because of the stability of the sort. That's why we put the two-character strings before the five- character strings when we began the fifth pass.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
CS 61B: Lecture 35 Monday, April 21, 2014 Today's reading: Goodrich & Tamassia, Sections 11.3.2. Counting Sort ------------- If the items we sort are naked keys, with no associated values, bucket sort can be simplified to become _counting_sort_. In counting sort, we use no queues at all; we need merely keep a count of how many copies of each key we have encountered. Suppose we sort 6 7 3 0 3 1 5 0 3 7: 0 1 2 3 4 5 6 7 ----------------------------------------------------------------- counts | 2 | 1 | 0 | 3 | 0 | 1 | 1 | 2 | ----------------------------------------------------------------- When we are finished counting, it is straightforward to reconstruct the sorted keys from the counts: 0 0 1 3 3 3 5 6 7 7. Counting Sort with Complete Items --------------------------------- Now let's go back to the case where we have complete items (key plus associated value). We can use a more elaborate version of counting sort. The trick is to use the counts to find the right index to move each item to. Let x be an input array of objects with keys (and perhaps other information). 0 1 2 3 4 5 6 7 8 9 ----------------------------------------------------------------------- x | . | . | . | . | . | . | . | . | . | . | ----|------|------|------|------|------|------|------|------|------|--- v v v v v v v v v v ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- | 6 | | 7 | | 3 | | 0 | | 3 | | 1 | | 5 | | 0 | | 3 | | 7 | ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- Begin by counting the keys in x. for (i = 0; i < x.length; i++) { counts[x[i].key]++; } Next, do a _scan_ of the "counts" array so that counts[i] contains the number of keys _less_than_ i. 0 1 2 3 4 5 6 7 ----------------------------------------------------------------- counts | 0 | 2 | 3 | 3 | 6 | 6 | 7 | 8 | ----------------------------------------------------------------- total = 0; for (j = 0; j < counts.length; j++) { c = counts[j]; counts[j] = total; total = total + c; } Let y be the output array, where we will put the sorted objects. counts[i] tells us the first index of y where we should put items with key i. Walk through the array x and copy each item to its final position in y. When you copy an item with key k, you must increment counts[k] to make sure that the next item with key k goes into the next slot. for (i = 0; i < x.length; i++) { y[counts[x[i].key]] = x[i]; counts[x[i].key]++; } --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 0 | 2 | 3 | 3 | 6 | 6 | 8 | 8 | ---------------|----- --------------------------------- v 6 --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 0 | 2 | 3 | 3 | 6 | 6 | 8 | 9 | ---------------|-|--- --------------------------------- v v 6 7 --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 0 | 2 | 3 | 4 | 6 | 6 | 8 | 9 | -------|-------|-|--- --------------------------------- v v v 3 6 7 --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 1 | 2 | 3 | 4 | 6 | 6 | 8 | 9 | -|-----|-------|-|--- --------------------------------- v v v v 0 3 6 7 --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 1 | 2 | 3 | 5 | 6 | 6 | 8 | 9 | -|-----|-|-----|-|--- --------------------------------- v v v v v 0 3 3 6 7 --------------------- --------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 1 | 3 | 3 | 5 | 6 | 6 | 8 | 9 | -|---|-|-|-----|-|--- --------------------------------- v v v v v v 0 1 3 3 6 7 ... --------------------- ---------------------------------- y |.|.|.|.|.|.|.|.|.|.| counts | 2 | 3 | 3 | 6 | 6 | 7 | 8 | 10 | -|-|-|-|-|-|-|-|-|-|- ---------------------------------- v v v v v v v v v v 0 0 1 3 3 3 5 6 7 7 Bucket sort and counting sort both take O(q + n) time. If q is in O(n), then they take O(n) time. If you're sorting an array, counting sort is slightly faster and takes less memory than bucket sort, though it's a little harder to understand. If you're sorting a linked list, bucket sort is more natural, because you've already got listnodes ready to put into the buckets. However, if q is not in O(n)--there are many more _possible_values_ for keys than keys--we need a more aggressive method to get linear-time performance. What do we do if q >> n? Radix Sort ---------- Suppose we want to sort 1,000 items in the range from 0 to 99,999,999. If we use bucket sort, we'll spend so much time initializing and concatenating empty queues we'll wish we'd used selection sort instead. Instead of providing 100 million buckets, let's provide q = 10 buckets and sort on the first digit only. (A number less than 10 million is said to have a first digit of zero.) We use bucket sort or counting sort, treating each item as if its key is the first digit of its true key. 0 1 2 3 4 5 6 7 8 9 ----------------------------------------------------------------------- | . | . | * | . | * | . | . | . | * | . | ----|------|-------------|-------------|------|------|-------------|--- v v v v v v v ------ ------ ------ ------ ------ ------ ------ | 342| |1390| |3950| |5384| |6395| |7394| |9362| |9583| |5849| |8883| |2356| |1200| |2039| |9193| ---|-- ------ ---|-- ------ ------ ---|-- ---|-- v v v v ------ ------ ------ ------ | 59| |3693| |7104| |9993| |2178| |7834| |2114| |3949| ------ ------ ------ ------ Once we've dealt all 1,000 items into ten queues, we could sort each queue recursively on the second digit; then sort the resulting queues on the third digit, and so on. Unfortunately, this tends to break the set of input items into smaller and smaller subsets, each of which will be sorted relatively inefficiently. Instead, we use a clever but counterintuitive idea: we'll keep all the numbers together in one big pile throughout the sort; but we'll sort on the _last_ digit first, then the next-to-last, and so on up to the most significant digit. The reason this idea works is because bucket sort and counting sort are stable. Hence, once we've sorted on the last digit, the numbers 55,555,552 and 55,555,558 will remain ever after in sorted order, because their other digits will be sorted stably. Consider an example with three-digit numbers: Sort on 1s: 771 721 822 955 405 5 925 825 777 28 829 Sort on 10s: 405 5 721 822 925 825 28 829 955 771 777 Sort on 100s: 5 28 405 721 771 777 822 825 829 925 955 After we sort on the middle digit, observe that the numbers are sorted by their last two digits. After we sort on the most significant digit, the numbers are completely sorted. Returning to our eight-digit example, we can do better than sorting on one decimal digit at a time. With 1,000 keys, sorting would likely be faster if we sort on two digits at a time (using a base, or _radix_, of q = 100) or even three (using a radix of q = 1,000). Furthermore, there's no need to use decimal digits at all; on computers, it's more natural to choose a power-of-two radix like q = 256. Base-256 digits are easier to extract from a key, because we can quickly pull out the eight bits that we need by using bit operators (which you'll study in detail in CS 61C). Note that q is both the number of buckets we're using to sort, and the radix of the digit we use as a sort key during one pass of bucket or counting sort. "Radix" is a synonym for the base of a number, hence the name "radix sort." How many passes must we perform? Each pass inspects log2 q bits of each key. If all the keys can be represented in b bits, the number of passes is ceiling(b / log2 q). So the running time of radix sort is in b O( (n + q) ceiling( ------ ) ). log q 2 How should we choose the number of queues q? Let's choose q to be in O(n), so each pass of bucket sort or counting sort takes O(n) time. However, we want q to be large enough to keep the number of passes small. Therefore, let's choose q to be approximately n. With this choice, the number of passes is in O(1 + b / log2 n), and radix sort takes b O(n + ----- n) time. log n For many kinds of keys we might sort (like ints), b is technically a constant, and radix sort takes linear time. Even if the key length b tends to grow logarithmically with n (a reasonable model in many applications), radix sort runs in time linear in the total number of bits in all the keys together. A practical, efficient choice is to make q equal to n rounded down to the next power of two. If we want to keep memory use low, however, we can make q equal to the square root of n, rounded to the nearest power of two. With this choice, the number of buckets is far smaller, but we only double the number of passes. Postscript: Radix Sort Rocks (not examinable) ----------------------------- Linear-time sorts tend to get less attention than comparison-based sorts in most computer science classes and textbooks. Perhaps this is because the theory behind linear-time sorts isn't as interesting as for other algorithms. Nevertheless, the library sort routines for machines like Crays use radix sort, because it kicks major ass in the speed department. Radix sort can be used not only with integers, but with almost any data that can be compared bitwise, like strings. The IEEE standard for floating-point numbers is designed to work with radix sort combined with a simple prepass and postpass (to flip the bits, except the sign bit, of each negative number). Strings of different lengths can be sorted in time proportional to the total length of the strings. A first stage sorts the strings by their lengths. A second stage sorts the strings character by character (or several characters at a time), starting with the last character of the longest string and working backward to the first character of every string. We don't sort every string during every pass of the second stage; instead, a string is included in a pass only if it has a character in the appropriate place. For instance, suppose we're sorting the strings CC, BA, CCAAA, BAACA, and BAABA. After we sort them by length, the next three passes sort only the last three strings by their last three characters, yielding CCAAA BAABA BAACA. The fifth pass is on the second character of each string, so we prepend the two-character strings to our list, yielding CC BA CCAAA BAABA BAACA. After sorting on the second and first characters, we end with BA BAABA BAACA CC CCAAA. Observe that BA precedes BAABA and CC precedes CCAAA because of the stability of the sort. That's why we put the two-character strings before the five- character strings when we began the fifth pass.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Google matched content |
Radix sort - Wikipedia, the free encyclopedia
11.2 Counting Sort and Radix Sort
***** Radix Sort Revisited
***** An Efficient Radix Sort
***** A New Efficient Radix Sort - Andersson, Nilsson (ResearchIndex)
**** Radix Sort
**** Parallel Radix Sort
*** Radix sorting (http://www.math.grin.edu/~stone/courses/fundamentals/radix-sorting.html)
**** Radix Sort Tutorial
*** Data Structures and Algorithms Radix Sort
*** NIST: Radix sort[Robert McIvor is a retired research chemist. He has been programming the Macintosh since 1984, and has developed programs for teaching and parsing Loglan®. Loglan is a constructed speakable language, capable of expressing the full range of human experience, yet whose sound stream and grammar are completely unambiguous The written language and potentially the spoken also are machine parsible.1]Introduction
Prograph™ has been briefly described in MacTutor, March 1990. Now that a compiler to produce stand-alone applications is available, I decided to compare the performance with a C implementation. As a first experience, I attempted to convert the radix sort algorithm I described elsewhere2 to Prograph.
The sort time of the C version of the radix algorithm is linear in n, the number of records, in contrast to most currently used sorts, which are proportional to n log n. Unlike other algorithms, the time is also directly proportional to the length of the sort key; i.e. it takes as long to process 1000 10-byte keys, as to process 10000 1-byte keys. The previously published version sorted 20,000 bytes/sec. on an SE-30. An improved version sorts 88,235 bytes/sec. The C code for unbelievers is provided on this month's source disk, along with Prograph implementations.
Adapting Radix Sort to the Memory Hierarchy - Rahman, Raman (ResearchIndex)
Presentations
Implementations
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019