sept.
2010
Java Quiz #42 : A string too far
(FR)
Ce code semble totalement inoffensif, mais peut causer des erreurs sous certaines circonstances.
Pouvez-vous détecter le problème ?
(EN)
This code may seem harmless, but will cause serious problems in certain circumstances.
Can you spot the problem ?
public class Quiz42 { public static void main(String[] args) { new Quiz42(); } private static final int MAX_STRING_LENGTH = 6; private Map<Integer, String> db = new HashMap<Integer, String>(); private int nextId = 0; public Quiz42() { // Introductory text Console console = System.console(); if (console==null) { System.out.println("Don't cheat in Eclipse :)"); System.exit(0); } console.printf( "FakeDB 1.0%n"+ "'exit': exits FakeDB.%n" + "'db' : dumps the DB content.%n"+ "Words are limited to %d characters.%n%n> ",MAX_STRING_LENGTH); // Read - Eval loop String s = null; while ((s = console.readLine()) != null) { if ("exit".equals(s)) { break; } if ("db".equals(s)) { console.printf("DB content : %s%n",db); continue; } // Check DB constraints client-side, then save the user's string. if (s.length() > MAX_STRING_LENGTH) { console.printf("String too long, please enter another one.%n"); continue; } persistInDatabase(s.toUpperCase()); console.printf("Saved '%s'.%n> ",s); } } private void persistInDatabase(String s) { // Simulate a database constraint if (s.length() > MAX_STRING_LENGTH) { throw new IllegalArgumentException("DATA TOO LARGE"); } db.put(nextId++, s); } }
(FR)
Réponse :
Aussi surprenant que cela puisse paraître, les méthode toUpperCase() et toLowerCase() peuvent changer la taille des chaînes de caractères sur lesquelles elles agissent !
Cela dépend de règles propres à chaque langue. Par exemple, en Allemand, le ß minuscule est transformé en double S majuscules.
Dans notre exemple, le mot "straße" (rue) qui fait 6 caractères sera transformé en "STRASSE", qui fait 7 caractères et provoquera une exception.
A noter, le comparateur String.CASE_INSENSITIVE_ORDER, utilisé par la méthode compareToIgnoreCase(), est obligé d'effectuer deux transformations avant de comparer les caractères des deux chaînes :
private static class CaseInsensitiveComparator implements Comparator<String>, java.io.Serializable { public int compare(String s1, String s2) { int n1 = s1.length(), n2 = s2.length(); for (int i1 = 0, i2 = 0; i1 < n1 && i2 < n2; i1++, i2++) { char c1 = s1.charAt(i1); char c2 = s2.charAt(i2); if (c1 != c2) { c1 = Character.toUpperCase(c1); c2 = Character.toUpperCase(c2); if (c1 != c2) { c1 = Character.toLowerCase(c1); c2 = Character.toLowerCase(c2); if (c1 != c2) { return c1 - c2; } } } } return n1 - n2; } }
(EN)
Answer :
It might seem surprising, but the
toUpperCase() and toLowerCase() methods may change the string's length. It depends on rules specific to each language.
For example, in German, the lowercase ß becomes two uppercase S, so the word "straße" (a street), which is 6 characters long, becomes "STRASSE", which is 7 characters long and breaks our database constraint.
This is why the String.CASE_INSENSITIVE_ORDER comparator, used by the compareToIgnoreCase() method, must apply 2 string transformations in a row (toUpperCase() then toLowerCase()) to be able to compare the given strings correctly :
private static class CaseInsensitiveComparator implements Comparator<String>, java.io.Serializable { public int compare(String s1, String s2) { int n1 = s1.length(), n2 = s2.length(); for (int i1 = 0, i2 = 0; i1 < n1 && i2 < n2; i1++, i2++) { char c1 = s1.charAt(i1); char c2 = s2.charAt(i2); if (c1 != c2) { c1 = Character.toUpperCase(c1); c2 = Character.toUpperCase(c2); if (c1 != c2) { c1 = Character.toLowerCase(c1); c2 = Character.toLowerCase(c2); if (c1 != c2) { return c1 - c2; } } } } return n1 - n2; } }
Surprising, isn't it ?
Commentaires
Client-side check : you are checking the length of the initial String.
Database constraint : you are checking the length of the upper cased String, that could be longer => fatal exception thrown.
Also, you are not specifying which Locale to use for toUpperCase().
For those who wonder how a String can get longer when upper cased: the german Eszett ß becomes SS.
La méthode toUpperCase() peut changer la longueur de la chaîne. Il y a donc incohérence dans le résultat des 2 tests qui testent la longueur de la chaîne.
I just arrive a bit after the challenge started, but I agree with the previous answers.
By the way I just looked at the toUppercase code, it's funny to see labels. I forgot these constructions are still possible in Java.