频繁访问也可能导致ip被封锁而导致无法访问网站,这在我们爬取数据过程中经常遇到。
这时我们需要通过IE,FireFox进行Http的代理设置,
当然httpClient也为我们提供这样的设置 。
代理ip用很多种形式,可以找到一个可用Ip或者使用goagent,或者购买代理ip池。
那在httpclicent中要怎么样运用呢。我们以goagent为例。
goagent打开后,只要通过127.0.0.1 端口8087来访问 就可以了
httpclient的具体用法有两种 post和get,具体使用方法可参考:
HTTPClient模块的HttpGet和HttpPost
要调用代理,只需要增加以下代码:
HttpHost proxy = new HttpHost("127.0.0.1",8087, null);
httpclient.getParams().setParameter(ConnRouteParams.DEFAULT_PROXY, proxy);
如果代理需要用户,密码进行验证
httpClient.getCredentialsProvider().setCredentials(
new AuthScope(proxyHost, proxyPort),
new UsernamePasswordCredentials(userName, password));
完整例子:
public static void main(String args[])
{
StringBuffer sb = new StringBuffer();
//创建HttpClient实例
HttpClient client = getHttpClient();
//创建httpGet
HttpGet httpGet = new HttpGet("http://www.csdn.net");
//执行
try {
HttpResponse response = client.execute(httpGet);
HttpEntity entry = response.getEntity();
if(entry != null)
{
InputStreamReader is = new InputStreamReader(entry.getContent());
BufferedReader br = new BufferedReader(is);
String str = null;
while((str = br.readLine()) != null)
{
sb.append(str.trim());
}
br.close();
}
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(sb.toString());
}
//设置代理
public static HttpClient getHttpClient() {
DefaultHttpClient httpClient = new DefaultHttpClient();
String proxyHost = "proxycn2.huawei.com";
int proxyPort = 8080;
String userName = "china\\******";
String password = "*******“
httpClient.getCredentialsProvider().setCredentials(
new AuthScope(proxyHost, proxyPort),
new UsernamePasswordCredentials(userName, password));
HttpHost proxy = new HttpHost(proxyHost,proxyPort);
httpClient.getParams().setParameter(ConnRouteParams.DEFAULT_PROXY, proxy);
return httpClient;
}
导入:commons-logging-1.1.jar,httpclient-4.0-beta2.jar ,httpcore-4.1-alpha1.jar 和 commons-codec-1.4.jar架包 |