我们在用java的httpclient爬取网站数据时,有时候会遇到需要登录的网站。
怎样模拟登录呢
如果熟悉http的原理,就会知道 其实我们只要用httpclient模拟账号密码的参数发送就可以了
至于具体网站用的帐号和密码对应的名称 是什么 可以使用 fiddler来具体分析
java的代码如下: 这里的userName和password要跟具体的网站的传输字段对应起来
package test;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import org.apache.http.util.EntityUtils;
/**
* A example that demonstrates how HttpClient APIs can be used to perform
* form-based logon.
*/
public class login {
public static void main(String[] args) throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient();
try {
HttpPost httpost = new HttpPost("http://127.0.0.1:8080/newProject/login.action");
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
nvps.add(new BasicNameValuePair("userName", "admin"));
nvps.add(new BasicNameValuePair("password", "123456"));
httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));
HttpResponse response = httpclient.execute(httpost);
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
int l;
byte[] tmp = new byte[2048];
while ((l = instream.read(tmp)) != -1) {
String str = new String(tmp);
System.out.print(str + "\n");
}
System.out.println("Login form get: " + response.getStatusLine());
EntityUtils.consume(entity);
System.out.println("Post logon cookies:");
List<Cookie> cookies = httpclient.getCookieStore().getCookies();
if (cookies.isEmpty()) {
System.out.println("None");
} else {
for (int i = 0; i < cookies.size(); i++) {
System.out.println("- " + cookies.get(i).toString());
}
}
} finally {
// When HttpClient instance is no longer needed,
// shut down the connection manager to ensure
// immediate deallocation of all system resources
httpclient.getConnectionManager().shutdown();
}
}
}
这里获取到了登录后的cookie,一般网站都是通过cookie验证来检查是否登录。
如果获取得到,我们就相当于登录成功了。 |