This reminds me of another HTML input type that's really poorly implemented across browsers. Namely, date and time input fields.
In that case, it varies so much across browsers to be almost unusable. Some (like Firefox) don't seem to support the calendar aspect, some have a terrible calendar UI, some fallback to the device calendar UI, some let you select invalid dates despite greying them out and that's before you get to the actual data, how it's formatted or the lack of any events associated with the fields.
Honestly, it seems like today's browsers just can't seem to handle any of the newer input types in any reasonable way.
Not only date/datetime/time fields are broken. Also type=number does not work consistently across different locales and different browsers. Some don't let you type decimal comma.
Hi, I'm one of the engineers working on date/time fields [0] (you can see the design document for them here[1]) for Firefox.
We're on the way to support them and hopefully support them well.
Now, I also happen to be working on refactor of our Intl code, so I should be able to help with the number field being inconsistent.
If you can file a bug and prepare a minimized testcase, I'd be happy to fix it! :)
Hey @zbraniecki, thanks for the work you're doing. I use type=number cautiously for self-built tools for my own use, but so far have avoided it in production code. Thanks to your effort maybe someday I'll be able to use this 'for real'!
Also with type=number fields: some support showing a leading "0" while others don't. Seems like a small detail, but it all adds up to death by a thousand cuts and custom implementations of everything.
I think this is because its so difficult to support all the different options, like languages and different calendars ...but you're right: getting English + Gregorian consistently right would be a start!
I’ve got a version of Chrome here that, if you query with .value from JS the value of a datetime-local field, returns you a String in "MM/DD YYYY" format. Despite the user’s locale being de_DE.
I’ve also discovered one browser only supporting ISO8601, and another only support it if one removes the time section of it. Another only if you have a time section, but remove the time zone part at the end.
Surely you're not suggesting that the representation of the date that is returned by the browser should be locale-dependent? That would be a nightmare.
Well, it’d be less of a nightmare than returning it in a random locale!
At least when it’s locale-dependent I can use a look up table and parse it somehow, but getting it back in MM/DD YYYY?
I’d love to just get ISO8601 ideally, btw, but who knows if that will ever happen.
(Also, you should check out Microsoft Excel, its date type and CSV converter are both locale-dependent. Locale of the program opening the file, not the file itself. A file created in de_DE will be unusable in en_US)
So Date type, then, but if we're talking input fields, they're also meant to be used with forms submitted without all that JavaScript cruft on top. So if you have to convert the date input to text for sending (or direct retrieval in code), I'd say just stick to ISO. Why? Because it's a standard.
That could be a reasonable default, but you need to give the programmer control over the format, or else everyone will keep using JS-powered replacements.
User agents may transform the value for display
and editing; in particular, user agents should
convert punycode in the value to IDN in the
display and vice versa.
ie you're supposed to be able to type human-readable addresses but they're converted to punycode for submission, because that's what the server will need in order to use the address. The regexp is used to validate the punycode not what the user types.
"they're converted to punycode for submission, because that's what the server will need in order to use the address"
I disagree with that. Neither the client nor the server need to know anything about punycode. The only time punycode is required is at the very last moment when it comes time to actually send an email to that address. Whether it's the email sending library or the mail server it's self, it doesn't matter.
A user should be able to type "person@ü.example.com" into a form. I should receive "person@ü.example.com" on the server side. I should save "person@ü.example.com" to the database. I should be able to send "person@ü.example.com" back to the browser for display, and I should be able to pass "person@ü.example.com" as the To address to my mail sending library.
Punycode is an implementation detail that I shouldn't need to think about.
I'd also agree that my code should be able to handle unicode, for a different reason: I never trust validation by a browser (old clients, bad clients, malicious clients)
It's much worse than that. This doesn't even cover the rules for domain names, which are mapped to a subset of Unicode. Read the horrors of "Unicode IDNA Compatibility Processing".[1] This is an incredibly complicated scheme to deal with homoglyphs - characters which look the same, but are not. There's filtering for specific characters that look the same. There's detection of mixed left-to-right and right-to-left characters. This is all to prevent attacks where the domain name in a link looks like some trusted site, but isn't.
The "." in a domain name is no longer always a period. It can be any of
U+002E ( . ) FULL STOP
U+FF0E ( . ) FULLWIDTH FULL STOP
U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP
Validating an email address is quite hard if done right.
This is why we badly need an alternative to input's `type` attribute, as the type attribute encapsulates many different things:
1. Validation
2. Autofill hints
3. Native helper widgets (date calendar, number input spinner)
4. Mobile keyboard layouts
And, confusingly, while most `type`s are text inputs with differing values of the above, others are very, very different (radio, checkbox, select, file, etc). Still others use completely different tags (like <textarea>) and you even have to switch on `value` or `checked` for checkboxes.
Many people use `type="tel"` or `type="number"` just for the mobile layouts, and spend a long amount of time working around all the awful bugs in number inputs. Our own `<NumberInput>` React component works around multiple browser bugs and took weeks to get right. The incidental complexity even creates very hairy React bugs (https://github.com/facebook/react/issues/7253).
Even if you get around all the bizarre failures in differing implementations, you still have ridiculous spec bugs like the (intended) lack of selectionEnd/selectionStart on number inputs (https://www.w3.org/Bugs/Public/show_bug.cgi?id=24796) and the like.
I don't know if anyone is championing splitting these concerns into separate attributes, but the web really needs it.
The problem goes much deeper than just HTML fields - sometimes the punycode will be converted to Unicode and then get rejected further down the chain. For example, Google accepts punycode domains for custom email domains but will convert it to a Unicode domain name when receiving it. It handles most IDN domains fine, but it fails at domains with Emoji within it (my domain which fails is http://xn--p38h.ws).
The current standard for IDNs is IDNA2008, which does not allow emojis. There are some ccTLDs that don't abide by the IDNA standard, and I think you'll find different browsers will show the emoji and others won't. Permitted code points for gTLDs are listed here. http://www.iana.org/domains/idn-tables
So strictly speaking xn--p38h.ws is not an IDN, and Google is doing the right thing by not allowing it. This doesn't make your job any easier, of course.
If you really want to explore this rabbit hole some good reading.
RFC 4290
RFC 5891
RFC 6912
RFC 7940
I don't understand why he quotes from w3.org. HTML5 is being developed by whatwg.org
"The WHATWG was founded by individuals of Apple, the Mozilla Foundation, and Opera Software in 2004, after a W3C workshop. Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML and apparent disregard for the needs of real-world authors. So, in response, these organisations set out with a mission to address these concerns and the Web Hypertext Application Technology Working Group was born."
Because we're now in a horrendous mess where the W3C has a fork[0] of the WHATWG spec[1], which semi-regularly takes selected patches from the WHATWG spec… but at times end up with the spec in an inconsistent state. As it is, basically all implementers are working from the WHATWG spec.
No one's asking for bulletproof validation. What I want out of input[type="fancy"] elements is:
1. Better detection of the platform than I can do myself. That leads to a UI targeted at the platform (different default phone keyboards, etc).
2. Uniform UIs across websites. I hate having to learn a new calendar widget on every website, it's worse for regular users, not to mention any users that need accessibility features that are prooobably not included in most home-rolled (or even popular) widgets.
3. Basic validation. Just to help the user a bit. I'll validate it again on the back-end, and I might even be validating it myself on the front-end. But basic validation helps and, once again, that uniformity thing shows up again here since the user could be aware of how that widget reports errors on their platform already and could be looking for them.
Logic is also logic, so most texts (your comment excepted) shouldn't be allowed in markup.
Note that nobody actually wants to embed the code for validation within the document, but you want to be able to properly identify a form-elements attributes so as to allow clients to make a smart choice about validation, presentation, defaults etc.
JPEG decoding is also logic, but it's also pretty useful for client to know that the binary stream it's reading may best be handled with the jpeg library.
The niggling part that doesn't is where we decide where our level of abstraction is. It can be very convenient to be able to declare what the type is. If you don't do it in the markup, you're likely going to want this in some kind of library. And that library will need to be extensible or you've just shifted the problem, and now it's the library. So that part of it needs to be well thought out and portable. But in the end, I do think that the solution is more tractable in logic rather than markup.
Though maybe markup tries to do too much. Look at all the additions to HTML5. It's great to be able to express so much more in the markup. Yet there are still going to be times where you're going to come across a situation that doesn't fit neatly into the existing elements. So you're going to add some extra domain-specific meaning to the markup you're using. You can work around this with class attributes, but that's just it: it's a work around.
Enough rambling.
I think there's a corresponding issue with display and behavior and the intersection of markup, CSS, and JavaScript, boiling down to inadequate separation of concerns.
This is a reasonable concern, but far smaller IMHO than the main problem with email fields on web forms today: there is no way to verify that someone actually entered a working email address, because fear of spammers has meant all the plausible techniques for doing so get closed down.
We've occasionally had an unusual but syntactically correct address cause problems in the past because it wasn't processed properly, but we get problems with people who have accidentally mistyped their address when signing up and then can't log in to our systems all the time. If you have a system where the email address is the primary ID for a user's account and you're charging real money, this is not a trivial problem and it does not have a simple solution: requiring active confirmation at sign-up time does horrible things to conversion rates, but anything else is potentially vulnerable to security issues later.
Only the local part of an email address potentially being case sensitive causes us more headaches in this area today.
you could inform your users that email addresses are case sensitive much like passwords usually are or just normalize the local part along with the host and use that as the primary ID and have less issues in the future
servers that have case sensitive mailboxes are more likely to be used for throwaways or the user may control the server anyways so they could still respond to normalized local parts
as for verifying the email at registration you could check to see what the remote smtp server responds to when you issue the RCTP command to check if they consider the email valid
you could inform your users that email addresses are case sensitive much like passwords usually are or just normalize the local part along with the host and use that as the primary ID and have less issues in the future
We could normalise, and as far as I'm aware none of the largest e-mail services allow distinctions based on case so most of the time it would be OK. It would still be a security risk, though.
as for verifying the email at registration you could check to see what the remote smtp server responds to when you issue the RCTP command to check if they consider the email valid
You can, but many servers including some of the major services will just return a false positive for any mailbox to prevent that technique being used to collect addresses to spam, and even those that don't may consider such requests when not followed with a real message to be a black mark on the sender.
I only use type=text inputs and implement custom logic on top of it; the others are crap, implemented in a rush and very inconsistently across browsers.
What about type="number" ? There are clear benefits to using it when a user is entering a number and I'm not aware of any drawbacks? One of the benefits is the way some software keyboards display a number based keyboard instead of a qwerty one...
1. type=number only accepts the period as a decimal separator, and there are lots of locales using comma for that purpose. Customers often ask us to support commas, which requires type=text and manual validation.
2. One customer requested the ability to enter decimals in a field, but to keep up/down spinner in steps of whole numbers. You can't do that, the step attribute makes any values other than (min + N * step) invalid. I can see some sense in that, but I think the step attribute should only affect the increments/decrements done by the spinner, not completely reject certain values. (Though you can use step=any to mark all values valid and step by 1.0, which is good enough for common cases.)
There is an awful UI problem on iOS and Android with type=number: they both allow invalid values to be entered, but the value given is just "" blank.
The user thinks they have typed in something valid, but JavaScript (or value from form submit) only gets to see "". E.g. use a thousands separator or paste in a trailing space and you will get an error "input must not be left blank".
There are worse issues with the less common input types.
I may be remembering it wrong but, less than 2 years ago, there was some issue about Android not firing change events on those for some reason and it was in a version of the browser that a lot of users were going to be using for a long time to come because they're weren't getting OTA updates anymore. It may not be a real concern anymore but it's junk like that just makes these things a pain. Unfortunatley, there's no other good way to get the number pad, which I don't think is a real issue unless individual users will be filling out your forms a lot, like for a data entry application, in which case a custom number pad might be worth considering.
form/validation should be what HTML excels at, it should even one of the few things it does. and we should use another technology to do other things we've been commandeering it to do.
Client-side validation can be very useful. At this point I think that the inconsistencies between implementations are the primary pain points, similar to what prompted the Web Standards Project.[0] Is there anything similar to the -{webkit,moz} in CSS for HTML? Or abstraction libraries comparable to jQuery?
It would also be very useful if the validation is extensible, as different projects require different validation rules.
I'd also want validation rules that are portable between the client and the server to reduce code duplication.
One combination I've been toying with is using clojure.spec[1] with ClojureScript/Clojure with varying levels of success. I imagine you can do similar things using server-side JavaScript.
> we should use another technology to do other things we've been commandeering it to do.
there should be a form language that works client/server and is just for classic web such as forms, something similar to meteor or volt. i'd even resort to using those in future or write a dsl on top that makes web development a breeze.
Because a Latin-only Internet disenfranchises billions of people who are confortable with other writing systems?
IDNs will take years to be pervasive, just like Unicode before it, because it is a painful upgrade to something designed in a different era. It doesn't mean the endeavour isn't worthwhile.
I'm not sure what their keyboards have to do with it. If you were a native Arabic speaker, wouldn't you want to to be able to see domain names in Arabic that are easy for you to read and type? If you're a company operating in an Arabic market and you want to put your domain name on the posters you're designing to display at bus stops, you probably would prefer an Arabic domain name so that your primarily Arabic-speaking audience are more likely to remember it instead of dealing with an alphabet which belongs to a completely different language they may not speak at all (or not very well).
I know English is increasingly pervasive, but there are still billions of people who don't speak it at all or speak it only very poorly (and it's not exactly very easy to learn). The internet is a global network, and so should allow people to communicate easily in their own languages. This current pain with IDN is just a legacy of the internet's origins in America and Western Europe.
Heck, the example in the article are just using one letter found in German. You don't need to go very far from English to find these problems.
All of the above are fair points, but IDN makes clicking on
www.cítíbank.com
instead of
www.citibank.com
a very real problem and will make the web more dangerous and will encourage the further growth of "walled gardens" on the internet which I think most here agree is a bad thing.
In that case, it varies so much across browsers to be almost unusable. Some (like Firefox) don't seem to support the calendar aspect, some have a terrible calendar UI, some fallback to the device calendar UI, some let you select invalid dates despite greying them out and that's before you get to the actual data, how it's formatted or the lack of any events associated with the fields.
Honestly, it seems like today's browsers just can't seem to handle any of the newer input types in any reasonable way.