httpClient采集到的数据乱码问题完整解决（3）

论坛元老

Rank: 8 Rank: 8

UID: 1066743

1^#

打印

字体大小: tT

look_w发表于 2019-4-20 11:04 | 只看该作者

httpClient采集到的数据乱码问题完整解决（3）

发现它会先去获取一遍网站头文件传回来的编码如果有编码就不用我们的编码

但是我们有时候会遇到网站的头文件传回来的编码是 gb2312 但其实网站用的是gbk

所以  我们要把以上方法重新，把获取头文件编码部分注释掉

我最后用的方法如下：

result = enCodetoString(httpEntity, encode);// 取出应答字符串

   public static String enCodetoString(
                  final HttpEntity entity, final String defaultCharset) throws IOException, ParseException {
            return enCodetoStringDo(entity, defaultCharset != null ? Charset.forName(defaultCharset) : null);
         }

      public static String enCodetoStringDo(
                  final HttpEntity entity, Charset defaultCharset) throws IOException, ParseException {
            if (entity == null) {
                  throw new IllegalArgumentException("HTTP entity may not be null");
            }
            InputStream instream = entity.getContent();
            if (instream == null) {
                  return null;
            }
            try {
                  if (entity.getContentLength() > Integer.MAX_VALUE) {
                     throw new IllegalArgumentException("HTTP entity too large to be buffered in memory");
                  }
                  int i = (int)entity.getContentLength();
                  if (i < 0) {
                     i = 4096;
                  }
                  Charset charset = null;
                  try {
//                   ContentType contentType = ContentType.get(entity);
//                   if (contentType != null) {
//                      charset = contentType.getCharset();
//                   }
                  } catch (final UnsupportedCharsetException ex) {
                     throw new UnsupportedEncodingException(ex.getMessage());
                  }
                  if (charset == null) {
                     charset = defaultCharset;
                  }
                  if (charset == null) {
                     charset = HTTP.DEF_CONTENT_CHARSET;
                  }
                  Reader reader = new InputStreamReader(instream, charset);
                  CharArrayBuffer buffer = new CharArrayBuffer(i);
                  char[] tmp = new char[1024];
                  int l;
                  while((l = reader.read(tmp)) != -1) {
                     buffer.append(tmp, 0, l);
                  }
                  return buffer.toString();
            } finally {
                  instream.close();
            }
         }

下面还有一个方法可以检测字符的解析

System.out.println(Arrays.toString("堎".getBytes(Charset.forName("gbk"))));

         System.out.println(new String(new byte[]{-120, -39},Charset.forName("gb2312")));

收藏分享评分

回复引用

订阅 TOP

返回列表