爬蟲java代碼

發布時間: 2022-03-06 06:47:54

❶ 求一個網路爬蟲的java代碼，

貼吧有特定的嗎？還是泛指各種貼吧？

❷ 跪求Java網路爬蟲代碼

我不會,我知道一個人,他肯定會,我同學Q:820215725,不是廣告

❸ java網路爬蟲

源代碼如下
package com.cellstrain.icell.util;

import java.io.*;
import java.net.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* java實現爬蟲
*/
public class Robot {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
// String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
String regex = "https://[\\w+\\.?/?]+\\.[A-Za-z]+";//url匹配規則
Pattern p = Pattern.compile(regex);
try {
url = new URL("網址");//爬取的網址、這里爬取的是一個生物網站
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("D:/SiteURL.txt"), true);//將爬取到的鏈接放到D盤的SiteURL文件中
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("爬取成功^_^");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}

❹ 求java網路爬蟲的源代碼

package com.heaton.bot;import com.heaton.bot.*;import java.net.*; /** * The SpiderWorker class performs the actual work of * spidering pages. It is implemented as a thread * that is created by the spider class. * * Copyright 2001-2003 by Jeff Heaton ( http://www.jeffheaton.com) * * @author Jeff Heaton * @version 1.2 */public class SpiderWorker extends Thread { /** * The URL that this spider worker * should be downloading. */ protected String target; /** * The owner of this spider worker class, * should always be a Spider object. * This is the class that this spider * worker will send its data to. */ protected Spider owner; /** * Indicates if the spider is busy or not. * true = busy * false = idle */ protected boolean busy; /** * A descendant of the HTTP object that * this class should be using for HTTP * communication. This is usually the * HTTPSocket class. */ protected HTTP http; /** * Constructs a spider worker object. * * @param owner The owner of this object, usually * a Spider object. * @param http */ public SpiderWorker(Spider owner,HTTP http) { this.http = http; this.owner = owner; } /** * Returns true of false to indicate if * the spider is busy or idle. * * @return true = busy * false = idle */ public boolean isBusy() {<-文章出處： http://www.diybl.com/course/3_program/java/javajs/200797/69988.html

❺ 用java編寫網路爬蟲求代碼和流程急

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;
import javax.swing.*;
import javax.swing.table.*;//一個Web的爬行者(註：爬行在這里的意思與抓取，捕獲相同)
public class SearchCrawler extends JFrame{
//最大URL保存值
private static final String[] MAX_URLS={"50","100","500","1000"};

//緩存robot禁止爬行列表
private HashMap disallowListCache=new HashMap();

//搜索GUI控制項
private JTextField startTextField;
private JComboBox maxComboBox;
private JCheckBox limitCheckBox;
private JTextField logTextField;
private JTextField searchTextField;
private JCheckBox caseCheckBox;
private JButton searchButton;

//搜索狀態GUI控制項
private JLabel crawlingLabel2;
private JLabel crawledLabel2;
private JLabel toCrawlLabel2;
private JProgressBar progressBar;
private JLabel matchesLabel2;

//搜索匹配項表格列表
private JTable table;

//標記爬行機器是否正在爬行
private boolean crawling;

//寫日誌匹配文件的引用
private PrintWriter logFileWriter;

//網路爬行者的構造函數
public SearchCrawler(){
//設置應用程序標題欄
setTitle("搜索爬行者");
//設置窗體大小
setSize(600,600);

//處理窗體關閉事件
addWindowListener(new WindowAdapter(){
public void windowClosing(WindowEvent e){
actionExit();
}
});

//設置文件菜單
JMenuBar menuBar=new JMenuBar();
JMenu fileMenu=new JMenu("文件");
fileMenu.setMnemonic(KeyEvent.VK_F);
JMenuItem fileExitMenuItem=new JMenuItem("退出",KeyEvent.VK_X);
fileExitMenuItem.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
actionExit();
}
});
fileMenu.add(fileExitMenuItem);
menuBar.add(fileMenu);
setJMenuBar(menuBar);

❻ 200分求java網路爬蟲的源代碼

http://search.gougou.com/search?search=%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB&id=2

❼ Java源碼實現網路爬蟲

//Java爬蟲demo

importjava.io.File;
importjava.net.URL;
importjava.net.URLConnection;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.Scanner;
importjava.util.UUID;
importjava.util.regex.Matcher;
importjava.util.regex.Pattern;

publicclassDownMM{
publicstaticvoidmain(String[]args)throwsException{
//out為輸出的路徑,注意要以\結尾
Stringout="D:\JSP\pic\java\";
try{
Filef=newFile(out);
if(!f.exists()){
f.mkdirs();
}
}catch(Exceptione){
System.out.println("no");
}

Stringurl="http://www.mzitu.com/share/comment-page-";
Patternreg=Pattern.compile("<imgsrc="(.*?)"");
for(intj=0,i=1;i<=10;i++){
URLuu=newURL(url+i);
URLConnectionconn=uu.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0(WindowsNT6.3;WOW64;Trident/7.0;rv:11.0)likeGecko");
Scannersc=newScanner(conn.getInputStream());
Matcherm=reg.matcher(sc.useDelimiter("\A").next());
while(m.find()){
Files.(newURL(m.group(1)).openStream(),Paths.get(out+UUID.randomUUID()+".jpg"));
System.out.println("已下載:"+j++);
}
}
}
}

❽ Java源碼實現網路爬蟲

給我郵箱~~~~ 看你問好幾天了

❾ 網路爬蟲解析網頁怎樣用java代碼實現

爬蟲的原理其實就是獲取到網頁內容，然後對其進行解析。只不過獲取的網頁、解析內容的方式多種多樣而已。
你可以簡單的使用httpclient發送get/post請求，獲取結果，然後使用截取字元串、正則表達式獲取想要的內容。
或者使用像Jsoup/crawler4j等這些已經封裝好的類庫，更方便的爬取信息。

❿ 求用JAVA編寫網路爬蟲的源代碼

我不知道你用來干什麼網路爬蟲太多了你說的詳細點才能給你編寫我有一套採集qvod視頻自己編寫的你可以告訴我你想採集那個網站我給你編一套

閱讀全文

熱點內容

java返回this 發布：2025-10-20 08:28:16 瀏覽：577

製作腳本網站發布：2025-10-20 08:17:34 瀏覽：871

python中的init方法發布：2025-10-20 08:17:33 瀏覽：566

圖案密碼什麼意思發布：2025-10-20 08:16:56 瀏覽：750

怎麼清理微信視頻緩存發布：2025-10-20 08:12:37 瀏覽：668

c語言編譯器怎麼看執行過程發布：2025-10-20 08:00:32 瀏覽：994

郵箱如何填寫發信伺服器發布：2025-10-20 07:45:27 瀏覽：239

shell腳本入門案例發布：2025-10-20 07:44:45 瀏覽：98

怎麼上傳照片瀏覽上傳發布：2025-10-20 07:44:03 瀏覽：790

python股票數據獲取發布：2025-10-20 07:39:44 瀏覽：696

爬蟲java代碼

與爬蟲java代碼相關的資訊