Each entry in the database contains a language and a number of pages. I sorted all the entries by language and took the average number of pages for each of them. But it also display a major weakness, each language don’t have the same number of entries, some have thousands, others less than a hundred. I should have “normalized” the number of entry for each language and exclude languages which don’t have enough entries.
Each entry in the database contains a language and a number of pages. I sorted all the entries by language and took the average number of pages for each of them. But it also display a major weakness, each language don’t have the same number of entries, some have thousands, others less than a hundred. I should have “normalized” the number of entry for each language and exclude languages which don’t have enough entries.