vendredi 6 mars 2015

Request returning unicode replacement character

Using the request module to load a webpage, I notice that for he UK pound symbol £ I sometimes get back the unicode replacement character \uFFFD.


An example URL that I'm parsing is this Amazon UK page: http://ift.tt/1NqpRtC


I'm also using the iconv-lite module to decode using the charset returned in the response header:



request(urlEntry.url, function(err, response, html) {
const contType = response.headers['content-type'];
const charset = contType.substring(contType.indexOf('charset=') + 8, contType.length);

const encBody = iconv.decode(html, charset);
...


But this doesn't seem to be helping. I've also tried decoding the response HTML as UTF-8.


How can I avoid this Unicode replacement char?


Aucun commentaire:

Enregistrer un commentaire