爬蟲java代碼
❶ 求一個網路爬蟲的java代碼,
貼吧有特定的嗎?還是泛指各種貼吧?
❷ 跪求Java網路爬蟲 代碼
我不會,我知道一個人,他肯定會,我同學Q:820215725,不是廣告
❸ java網路爬蟲
源代碼如下
package com.cellstrain.icell.util;
import java.io.*;
import java.net.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* java實現爬蟲
*/
public class Robot {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
// String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
String regex = "https://[\\w+\\.?/?]+\\.[A-Za-z]+";//url匹配規則
Pattern p = Pattern.compile(regex);
try {
url = new URL("網址");//爬取的網址、這里爬取的是一個生物網站
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("D:/SiteURL.txt"), true);//將爬取到的鏈接放到D盤的SiteURL文件中
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("爬取成功^_^");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}
❹ 求java網路爬蟲的源代碼
package com.heaton.bot;import com.heaton.bot.*;import java.net.*; /** * The SpiderWorker class performs the actual work of * spidering pages. It is implemented as a thread * that is created by the spider class. * * Copyright 2001-2003 by Jeff Heaton ( http://www.jeffheaton.com) * * @author Jeff Heaton * @version 1.2 */public class SpiderWorker extends Thread { /** * The URL that this spider worker * should be downloading. */ protected String target; /** * The owner of this spider worker class, * should always be a Spider object. * This is the class that this spider * worker will send its data to. */ protected Spider owner; /** * Indicates if the spider is busy or not. * true = busy * false = idle */ protected boolean busy; /** * A descendant of the HTTP object that * this class should be using for HTTP * communication. This is usually the * HTTPSocket class. */ protected HTTP http; /** * Constructs a spider worker object. * * @param owner The owner of this object, usually * a Spider object. * @param http */ public SpiderWorker(Spider owner,HTTP http) { this.http = http; this.owner = owner; } /** * Returns true of false to indicate if * the spider is busy or idle. * * @return true = busy * false = idle */ public boolean isBusy() {<-文章出處: http://www.diybl.com/course/3_program/java/javajs/200797/69988.html
❺ 用java編寫 網路爬蟲求代碼和流程 急
import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;
import javax.swing.*;
import javax.swing.table.*;//一個Web的爬行者(註:爬行在這里的意思與抓取,捕獲相同)
public class SearchCrawler extends JFrame{
//最大URL保存值
private static final String[] MAX_URLS={"50","100","500","1000"};
//緩存robot禁止爬行列表
private HashMap disallowListCache=new HashMap();
//搜索GUI控制項
private JTextField startTextField;
private JComboBox maxComboBox;
private JCheckBox limitCheckBox;
private JTextField logTextField;
private JTextField searchTextField;
private JCheckBox caseCheckBox;
private JButton searchButton;
//搜索狀態GUI控制項
private JLabel crawlingLabel2;
private JLabel crawledLabel2;
private JLabel toCrawlLabel2;
private JProgressBar progressBar;
private JLabel matchesLabel2;
//搜索匹配項表格列表
private JTable table;
//標記爬行機器是否正在爬行
private boolean crawling;
//寫日誌匹配文件的引用
private PrintWriter logFileWriter;
//網路爬行者的構造函數
public SearchCrawler(){
//設置應用程序標題欄
setTitle("搜索爬行者");
//設置窗體大小
setSize(600,600);
//處理窗體關閉事件
addWindowListener(new WindowAdapter(){
public void windowClosing(WindowEvent e){
actionExit();
}
});
//設置文件菜單
JMenuBar menuBar=new JMenuBar();
JMenu fileMenu=new JMenu("文件");
fileMenu.setMnemonic(KeyEvent.VK_F);
JMenuItem fileExitMenuItem=new JMenuItem("退出",KeyEvent.VK_X);
fileExitMenuItem.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
actionExit();
}
});
fileMenu.add(fileExitMenuItem);
menuBar.add(fileMenu);
setJMenuBar(menuBar);
❻ 200分求java網路爬蟲的源代碼
http://search.gougou.com/search?search=%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB&id=2
❼ Java源碼 實現網路爬蟲
//Java爬蟲demo
importjava.io.File;
importjava.net.URL;
importjava.net.URLConnection;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.Scanner;
importjava.util.UUID;
importjava.util.regex.Matcher;
importjava.util.regex.Pattern;
publicclassDownMM{
publicstaticvoidmain(String[]args)throwsException{
//out為輸出的路徑,注意要以\結尾
Stringout="D:\JSP\pic\java\";
try{
Filef=newFile(out);
if(!f.exists()){
f.mkdirs();
}
}catch(Exceptione){
System.out.println("no");
}
Stringurl="http://www.mzitu.com/share/comment-page-";
Patternreg=Pattern.compile("<imgsrc="(.*?)"");
for(intj=0,i=1;i<=10;i++){
URLuu=newURL(url+i);
URLConnectionconn=uu.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0(WindowsNT6.3;WOW64;Trident/7.0;rv:11.0)likeGecko");
Scannersc=newScanner(conn.getInputStream());
Matcherm=reg.matcher(sc.useDelimiter("\A").next());
while(m.find()){
Files.(newURL(m.group(1)).openStream(),Paths.get(out+UUID.randomUUID()+".jpg"));
System.out.println("已下載:"+j++);
}
}
}
}
❽ Java源碼 實現網路爬蟲
給我郵箱~~~~ 看你問好幾天了
❾ 網路爬蟲解析網頁怎樣用java代碼實現
爬蟲的原理其實就是獲取到網頁內容,然後對其進行解析。只不過獲取的網頁、解析內容的方式多種多樣而已。
你可以簡單的使用httpclient發送get/post請求,獲取結果,然後使用截取字元串、正則表達式獲取想要的內容。
或者使用像Jsoup/crawler4j等這些已經封裝好的類庫,更方便的爬取信息。
❿ 求用JAVA編寫網路爬蟲的源代碼
我不知道 你用來干什麼 網路爬蟲太多了 你說的詳細點 才能給你編寫 我有一套採集qvod視頻 自己編寫的 你可以告訴我你想採集那個網站 我給你編一套