httpClient采集到的数据乱码问题完整解决(3)
- UID
- 1066743
|
httpClient采集到的数据乱码问题完整解决(3)
发现它会先去获取一遍 网站头文件传回来的 编码 如果有编码就不用我们的编码
但是 我们有时候会遇到 网站的头文件 传回来的编码 是 gb2312 但其实网站用的是gbk
所以 我们要把以上方法重新,把 获取 头文件编码部分注释掉
我最后用的方法如下 :
result = enCodetoString(httpEntity, encode);// 取出应答字符串
public static String enCodetoString(
final HttpEntity entity, final String defaultCharset) throws IOException, ParseException {
return enCodetoStringDo(entity, defaultCharset != null ? Charset.forName(defaultCharset) : null);
}
public static String enCodetoStringDo(
final HttpEntity entity, Charset defaultCharset) throws IOException, ParseException {
if (entity == null) {
throw new IllegalArgumentException("HTTP entity may not be null");
}
InputStream instream = entity.getContent();
if (instream == null) {
return null;
}
try {
if (entity.getContentLength() > Integer.MAX_VALUE) {
throw new IllegalArgumentException("HTTP entity too large to be buffered in memory");
}
int i = (int)entity.getContentLength();
if (i < 0) {
i = 4096;
}
Charset charset = null;
try {
// ContentType contentType = ContentType.get(entity);
// if (contentType != null) {
// charset = contentType.getCharset();
// }
} catch (final UnsupportedCharsetException ex) {
throw new UnsupportedEncodingException(ex.getMessage());
}
if (charset == null) {
charset = defaultCharset;
}
if (charset == null) {
charset = HTTP.DEF_CONTENT_CHARSET;
}
Reader reader = new InputStreamReader(instream, charset);
CharArrayBuffer buffer = new CharArrayBuffer(i);
char[] tmp = new char[1024];
int l;
while((l = reader.read(tmp)) != -1) {
buffer.append(tmp, 0, l);
}
return buffer.toString();
} finally {
instream.close();
}
}
下面还有一个方法 可以检测字符的解析
System.out.println(Arrays.toString("堎".getBytes(Charset.forName("gbk"))));
System.out.println(new String(new byte[]{-120, -39},Charset.forName("gb2312"))); |
|
|
|
|
|