博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
在node.js中一次读取一行文件?
阅读量:2289 次
发布时间:2019-05-09

本文共 7025 字,大约阅读时间需要 23 分钟。

本文翻译自:

I am trying to read a large file one line at a time. 我试图一次读一行大文件。 I found that dealt with the subject but I'm missing some connections to make the whole thing fit together. 我发现这个问题的问题,但是我错过了一些联系,以使整个事情融合在一起。

var Lazy=require("lazy"); new Lazy(process.stdin)     .lines     .forEach(          function(line) {               console.log(line.toString());           } ); process.stdin.resume();

The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample. 我想弄清楚的是我如何从文件而不是STDIN一次读取一行,如本示例所示。

I tried: 我试过了:

fs.open('./VeryBigFile.csv', 'r', '0666', Process); function Process(err, fd) {    if (err) throw err;    // DO lazy read  }

but it's not working. 但它不起作用。 I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out. 我知道,在紧要关头我可以回归使用像PHP这样的东西,但我想弄明白这一点。

I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for. 我认为其他答案不会起作用,因为文件比我正在运行它的服务器大得多。


#1楼

参考:


#2楼

Edit: 编辑:

Use a . 使用 。


With a you can read lines. 使用您可以读取行。

new BufferedReader ("lorem ipsum", { encoding: "utf8" })    .on ("error", function (error){        console.log ("error: " + error);    })    .on ("line", function (line){        console.log ("line: " + line);    })    .on ("end", function (){        console.log ("EOF");    })    .read ();

#3楼

there is a very nice module for reading a file line by line, it's called 有一个非常好的模块用于逐行读取文件,它被称为

with it you simply just write: 有了它,你只需写:

var lineReader = require('line-reader');lineReader.eachLine('file.txt', function(line, last) {  console.log(line);  // do whatever you want with line...  if(last){    // or check if it's the last one  }});

you can even iterate the file with a "java-style" interface, if you need more control: 如果需要更多控制,你甚至可以使用“java风格”界面迭代文件:

lineReader.open('file.txt', function(reader) {  if (reader.hasNextLine()) {    reader.nextLine(function(line) {      console.log(line);    });  }});

#4楼

You can always roll your own line reader. 您可以随时滚动自己的线路阅读器。 I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\\n' 我还没有对此片段进行基准测试,但它正确地将传入的块流分成不带尾随'\\ n'的行

var last = "";process.stdin.on('data', function(chunk) {    var lines, i;    lines = (last+chunk).split("\n");    for(i = 0; i < lines.length - 1; i++) {        console.log("line: " + lines[i]);    }    last = lines[i];});process.stdin.on('end', function() {    console.log("line: " + last);});process.stdin.resume();

I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash. 我在处理日志解析期间需要累积数据的快速日志解析脚本时确实想到了这一点,我觉得尝试使用js和节点而不是使用perl或bash这样做会很好。

Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest . 无论如何,我确实认为小nodejs脚本应该是自包含的而不是依赖于第三方模块,所以在阅读了这个问题的所有答案后,每个人都使用各种模块来处理行解析,13 SLOC本机nodejs解决方案可能会引起关注。


#5楼

I wanted to tackle this same problem, basically what in Perl would be: 我想解决同样的问题,基本上是Perl中的问题:

while (<>) {    process_line($_);}

My use case was just a standalone script, not a server, so synchronous was fine. 我的用例只是一个独立的脚本,而不是服务器,所以同步很好。 These were my criteria: 这些是我的标准:

  • The minimal synchronous code that could reuse in many projects. 可在许多项目中重用的最小同步代码。
  • No limits on file size or number of lines. 文件大小或行数没有限制。
  • No limits on length of lines. 线条长度没有限制。
  • Able to handle full Unicode in UTF-8, including characters beyond the BMP. 能够处理UTF-8中的完整Unicode,包括BMP之外的字符。
  • Able to handle *nix and Windows line endings (old-style Mac not needed for me). 能够处理* nix和Windows系列结尾(我不需要旧式Mac)。
  • Line endings character(s) to be included in lines. 行结尾要包含在行中的字符。
  • Able to handle last line with or without end-of-line characters. 能够处理包含或不包含行尾字符的最后一行。
  • Not use any external libraries not included in the node.js distribution. 不使用node.js发行版中未包含的任何外部库。

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl. 这是一个让我了解node.js中的低级脚本类型代码的项目,并决定它作为Perl等其他脚本语言的替代品的可行性。

After a surprising amount of effort and a couple of false starts this is the code I came up with. 经过一番惊人的努力和一些错误的开始,这就是我提出的代码。 It's pretty fast but less trivial than I would've expected: 它比我预期的要快但不那么简单:(

var fs            = require('fs'),    StringDecoder = require('string_decoder').StringDecoder,    util          = require('util');function lineByLine(fd) {  var blob = '';  var blobStart = 0;  var blobEnd = 0;  var decoder = new StringDecoder('utf8');  var CHUNK_SIZE = 16384;  var chunk = new Buffer(CHUNK_SIZE);  var eolPos = -1;  var lastChunk = false;  var moreLines = true;  var readMore = true;  // each line  while (moreLines) {    readMore = true;    // append more chunks from the file onto the end of our blob of text until we have an EOL or EOF    while (readMore) {      // do we have a whole line? (with LF)      eolPos = blob.indexOf('\n', blobStart);      if (eolPos !== -1) {        blobEnd = eolPos;        readMore = false;      // do we have the last line? (no LF)      } else if (lastChunk) {        blobEnd = blob.length;        readMore = false;      // otherwise read more      } else {        var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);        lastChunk = bytesRead !== CHUNK_SIZE;        blob += decoder.write(chunk.slice(0, bytesRead));      }    }    if (blobStart < blob.length) {      processLine(blob.substring(blobStart, blobEnd + 1));      blobStart = blobEnd + 1;      if (blobStart >= CHUNK_SIZE) {        // blobStart is in characters, CHUNK_SIZE is in octets        var freeable = blobStart / CHUNK_SIZE;        // keep blob from growing indefinitely, not as deterministic as I'd like        blob = blob.substring(CHUNK_SIZE);        blobStart -= CHUNK_SIZE;        blobEnd -= CHUNK_SIZE;      }    } else {      moreLines = false;    }  }}

It could probably be cleaned up further, it was the result of trial and error. 它可能会被进一步清理,这是反复试验的结果。


#6楼

i use this: 我用这个:

function emitLines(stream, re){    re = re && /\n/;    var buffer = '';    stream.on('data', stream_data);    stream.on('end', stream_end);    function stream_data(data){        buffer += data;        flush();    }//stream_data    function stream_end(){        if(buffer) stream.emmit('line', buffer);    }//stream_end    function flush(){        var re = /\n/;        var match;        while(match = re.exec(buffer)){            var index = match.index + match[0].length;            stream.emit('line', buffer.substring(0, index));            buffer = buffer.substring(index);            re.lastIndex = 0;        }    }//flush}//emitLines

use this function on a stream and listen to the line events that is will emit. 在流上使用此函数并侦听将要发出的行事件。

gr- GR-

转载地址:http://xscnb.baihongyu.com/

你可能感兴趣的文章
想要彻底搞懂微服务架构?必先学:SpringBoot+SpringCloud+docker
查看>>
6天面试10家,已经拿到offer,Java程序员的面试总结分享
查看>>
渣本的逆袭之路!备战3个月,三面蚂蚁金服成功斩获Offer
查看>>
10月末美团、滴滴、蘑菇街9次面试总结(Java岗)
查看>>
热气腾腾的腾讯后台开发面经(总共五面)
查看>>
深入理解设计模式(设计原则+种设计模式+设计模式PK+设计模式混编)
查看>>
谷歌大佬回国发展,吊打各大厂面试官!吐血总结大厂面试高频点及笔记解析
查看>>
面试复盘:面完字节、美团、阿里等大厂,今年面试到底问什么?
查看>>
从0到1,决战Spring Boot《Spring Boot 2实战之旅》
查看>>
5面终于拿到字节跳动offer!忍不住和大家分享一波
查看>>
拿到阿里、字节offer后。我总结了一线大厂Java面试重难点:Java基础+并发+JVM+算法+框架+分布式+架构设计
查看>>
金九银十已过 成功入职美团,面试回顾及个人总结:算法+框架+Redis+分布式+JVM
查看>>
香!阿里P8手写3份满级“并发编程”笔记,原理→精通→实战
查看>>
五面美团后,我总结出美团面试四大难题:JVM+微服务+MySQL+Redis
查看>>
滴滴Java后台3面题目:网络+内存溢出+各种锁+高性能+消息队列
查看>>
大厂面试果然名不虚传,蚂蚁三面凉经,真的是“太难了”
查看>>
分享一次止于三面的阿里面试之旅,是我不配呀
查看>>
美团工作7年,精华全在这份学习笔记里了,已成功帮助多位朋友拿到5个大厂Offer
查看>>
淘宝架构师又出神作,Java异步编程实战笔记总结,彻底被征服
查看>>
深入OAuth2核心源码,阿里大佬的Spring Security手册惊呆我了
查看>>