1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | :::::::::::::: part-00000 :::::::::::::: "review1.doc"|"I do not care for the camera. " "review10.doc"|"It was very reliable " "review2.doc"|"The product was simply bad. " "review3.doc"|"The user interface was simply atrocious. " "review4.doc"|"The product interface is simply broken. " ... :::::::::::::: part-00001 :::::::::::::: "review5.doc"|"The Windows client is simply crappy. " "review6.doc"|"I liked the camera. It is a good product. " "review7.doc"|"It is a phenomenal camera. " "review8.doc"|"Just an awesome product. " "review9.doc"|"I really liked the Camera. It is excellent. " ... |
1 2 3 | import tika(*); localRead(tikaRead("file:///home/biadmin/Tika/CameraReviews")) -> write(seq("/tmp/output")); |
1 2 3 4 5 6 7 8 9 | import tika(*); import systemT; read(tikaRead("/tmp/reviews")) -> transform { label: $.path, text: $.content } -> transform { label: $.label, sentiments: systemT::annotateDocument( $, ["EmotiveTone"], ["file:///home/biadmin/Tika/"], tokenizer="multilingual", outputViews=["EmotiveTone.AllClues"])}; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | [ { "label": "review1.doc", "sentiments": { "EmotiveTone.AllClues": [ { "clueType": "dislike", "match": "not care for" } ], "label": "review1.doc", "text": "I do not care for the camera. " } }, { "label": "review10.doc", "sentiments": { "EmotiveTone.AllClues": [ { "clueType": "positive", "match": "reliable" } ], "label": "review10.doc", "text": "It was very reliable " } }, ... |
1 2 | hadoop archive -archiveName archive_name.har -p /path_to_input_files /path_to_output_directory |
1 | hadoop fs -lsr har:///path_to_output_directory/archive_name.har |
欢迎光临 电子技术论坛_中国专业的电子工程师学习交流社区-中电网技术论坛 (http://bbs.eccn.com/) | Powered by Discuz! 7.0.0 |