Sorting
This web page was created to demonstrate a problem with the CSS nth-child pseudo-class that I discovered in another app.
Unfortunately, this implementation worked as anticipated.
I chose a random list of records in a table as my input data for the test.
But I first had to write a quick script to generate the list of numbers.
On a whim I created a list of 15,000 records.
The alternating background shading, created using the nth-child CSS pseudo-class worked perfectly.
Clearly the problem is with my other app.
Well, not to have wasted the effort, I decided to write a quick routine to sort the list of numbers.
That's when the fun began.
The script timed out.
Wait a minute. What?
The script timed out.
Why, on earth, did that happen?
I reduced the number of records to 500 and ran the script again.
The script timed out.
I reduced it to 250 records and tried again.
The script timed out.
I rewrote the sort routine and added the input block in the upper left so I could change the number of records easily.
I started over with 15,000 records and ran the sort again.
The response was nearly immediate.
It successfully sorted the records in about 1 second.
I tried more records.
I added a zero and generated a data table with 150,000 records and ran the sort again.
The response was slow, but it completed the sort in a little over 8 seconds.
Further refining the script, I tried again with 150,000 records.
It sorted the list of 150,000 numbers in just less than 1.5 seconds.
I added another zero and tried to generate 1.5 million entries in the data table.
The Generate Data Table timed-out.
I guess that wasn't a surprise.
I tried 500,000 records.
Generate Data Table took about 2 seconds to generate the data table.
So I tried the sort again.
The script timed out.
I set the maximum number of records to 400,000 and tried again.
The sort completed in about 4 seconds.
Javascript in the browser had just sorted an HTML table with 400,000 records in just under 4 seconds!!!
That was something to behold.
Of course a compiled language like C++ could have done it in a few milliseconds.
But this was a front-end Javascript, interpreted at run-time, running on a user's personal computer.
I thought this would be a good example of different methods of sorting.
I created an interface which would test all three data sort methods and provide statistics for how long it took.
And then I went a step further.
How long would it take to remove the duplicates from the same table?
My first attempt was mightily slow.
But by my third attempt, like sort, it was fast. In-fact, it could find the unique numbers in a list of numbers in about a third of the time it took to sort the list.
More on this below.
Up first are the 3 sort methods.
Here are the 3 sort methods that I tried, from slowest to fastest.
I found the sort speed to be counterintuitive.
What I expected would be the fastest sort method, was the slowest,
and the one I thought would be the slowest, was the fastest.
This would be good to keep in mind when sorting large multidimensional tables using Javascript.
-
The first method requires the developer to create their own sort function.
document.getElementById('the-table').rows returns a live nodelist instead of a stdArray (Javascript's standard array).
Array.prototype.sort() exists in standard Javascript, but
nodelist.prototype.sort() does not.
var theTable = document.getElementById('the-table');
var rows = theTable.rows;
var done;
/*
* This is likely the reason this sort method is so slow.
* For a data table with 200 rows, the following do while loop
* will process 200 times for each value out of place in the final
* sorted order.
* The built-in Array.prototype.sort() function is probably written in a compiled language,
* making it much faster.
*/
do {
done = true;
/*
* rowIndex == 0 is the table header 'Numbers'
* a = rowIndex - 1
* b = rowIndex
* so start with rowIndex = 2 such that rowIndex - 1 = rows[1].
*/
for ( var rowIndex = 2; rowIndex < rows.length; rowIndex++) {
let a = parseInt(rows[rowIndex - 1].cells[0].innerText);
let b = parseInt(rows[rowIndex].cells[0].innerText);
if (a === b) {
continue;
} else if ( a > b ) {
let parent = rows[rowIndex].parentNode;
let greaterRow = rows[rowIndex - 1];
let lesserRow = rows[rowIndex];
parent.insertBefore(lesserRow, greaterRow);
done = false;
} else {
/*
* if ( b < a ) then we already tripped on the above condition
* and swapped the rows.
*/
continue;
}
}
} while ( ! done );
-
The second method works essentially the same way, but instead of trying to sort the live nodelist, the live nodelist array that is returned by getElementById().rows is converted to a standard javascript array (stdArray) before attempting to sort it. The final step reorders the existing table rows according to the sorted order.
I use the function nodelist_to_stdarray(). For the definition, look at the file /table-test/js/class_common.js.
var theTable = document.getElementById('the-table');
var rows = nodelist_to_stdarray(theTable.rows);
let header_row = rows.splice(0, 1)[0];
rows.sort((a, b) => {
a = parseInt(a.cells[0].innerText);
b = parseInt(b.cells[0].innerText);
return ( a === b ? 0 : ( a > b ? 1 : -1 ) );
});
rows.unshift(header_row);
rows.forEach( (row) => {
theTable.appendChild(row);
});
-
The last method is truly fast when compared to the others.
It can easily sort 400,000 records without timing-out.
The biggest problem that I ran into when playing with 400,000 records was that Javascript ran out of memory.
Javascript supposedly does it's own memory management, but after crashing the browser hard enough that I had to restart my computer, once, I learned that it doesn't always do a really great job at that.
To help prevent you from having to deal with that, I limited this script to 250,000 rows in the data table.
This method extracts the data from the table, sorts it, deletes the original table, and finally recreates it.
The first and second method preserve the existing table and just sort the rows.
Commonsense would seem to dictate that it would be faster not having to recreate hundreds or thousands of records,
but that turned out to be wrong. It's actually faster this way.
var theTable = document.getElementById('the-table');
var rows = nodelist_to_stdarray(theTable.rows);
var new_rows = [];
var parent = theTable.parentNode;
var headerRow = rows.splice(0, 1)[0];
rows.forEach( (row) => {
new_rows.push(parseInt(row.cells[0].innerText));
});
parent.removeChild(theTable);
theTable.remove();
new_rows.sort(function (a, b) {
return ( a === b ? 0 : ( a > b ? 1 : -1 ) );
});
theTable = document.createElement('table');
theTable.id = 'the-table';
theTable.appendChild(headerRow);
new_rows.forEach( (rowValue) => {
let tr = document.createElement('tr');
let td = document.createElement('td');
tr.appendChild(td);
td.innerText = rowValue;
theTable.appendChild(tr);
});
parent.appendChild(theTable);
This method is very fast.
Eliminating Duplicates