java下载html页面---把网页内容保存成本地html(1)
- UID
- 1066743
|
java下载html页面---把网页内容保存成本地html(1)
我们在前面讲到httpclient抓取网页内容的时候 通常都是获取到页面的源代码content存入数据库。
详见下文:
HTTPClient模块的HttpGet和HttpPost
httpclient常用基本抓取类
那么如果我们除了获得页面源代码之外 还想把页面保存到本地存成html应该怎么做呢?
其实很简单 我们先来看访问页面获取content的代码
private static String getUrlContent(DefaultHttpClient httpPostClient,
String urlString) throws IOException, ClientProtocolException {
HttpGet httpGet = new HttpGet(urlString);
HttpResponse httpGetResponse = httpPostClient.execute(httpGet);// 其中HttpGet是HttpUriRequst的子类
httpPostClient.getParams().setParameter(
CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);// 连接时间20s
httpPostClient.getParams().setParameter(
CoreConnectionPNames.SO_TIMEOUT, 8000);// 数据传输时间60s
if (httpGetResponse.getStatusLine().getStatusCode() == 200) {
HttpEntity httpEntity = httpGetResponse.getEntity();
if (httpEntity.getContentEncoding() != null) {
if ("gzip".equalsIgnoreCase(httpEntity.getContentEncoding()
.getValue())) {
httpEntity = new GzipDecompressingEntity(httpEntity);
} else if ("deflate".equalsIgnoreCase(httpEntity
.getContentEncoding().getValue())) {
httpEntity = new DeflateDecompressingEntity(httpEntity);
}
}
String result = enCodetoString(httpEntity, encode);// 取出应答字符串
// System.out.println(result);
return result;
}
return "";
}
public static String enCodetoStringDo(final HttpEntity entity,
Charset defaultCharset) throws IOException, ParseException {
if (entity == null) {
throw new IllegalArgumentException("HTTP entity may not be null");
}
InputStream instream = entity.getContent();
if (instream == null) {
return null;
}
try {
if (entity.getContentLength() > Integer.MAX_VALUE) {
throw new IllegalArgumentException(
"HTTP entity too large to be buffered in memory");
}
int i = (int) entity.getContentLength();
if (i < 0) {
i = 4096;
}
Charset charset = null;
try {
// ContentType contentType = ContentType.get(entity);
// if (contentType != null) {
// charset = contentType.getCharset();
// }
} catch (final UnsupportedCharsetException ex) {
throw new UnsupportedEncodingException(ex.getMessage());
}
if (charset == null) {
charset = defaultCharset;
}
if (charset == null) {
charset = HTTP.DEF_CONTENT_CHARSET;
}
Reader reader = new InputStreamReader(instream, charset);
CharArrayBuffer buffer = new CharArrayBuffer(i);
char[] tmp = new char[1024];
int l;
while ((l = reader.read(tmp)) != -1) {
buffer.append(tmp, 0, l);
}
return buffer.toString();
} finally {
instream.close();
}
}
我们得到content之后就可以直接 把它存成本地文件 就 可以了。 |
|
|
|
|
|